In [19]:
# Imports
import pandas as pd
import pathlib

# Load raw data submitted by operators
### Notes on formatting:
- Operators added their own QC indicators, thus not all columns are uniform across reports
- Values left in the Excel file are replaced during import into PyCharm with "nan"
- Naming convention for dataframes: operator_stage

### Carbon Mapper Stage 1 and 2 data

#### Submission details
- Stage 1 submitted on 2023-01-03
- Stage 2 submitted on 2023-02-13

#### QC Indicator:
Column: "Good Quality (Y/N)"
- Y = good quality, quantification included for this stage
- N = not good quality, quantification estimate included for potential use in a later stage, but not included in this stage
- nan = left blank by Carbon Mapper


In [11]:
# Carbon Mapper Stage 1
cm_1_path = pathlib.PurePath('00_raw_data', 'CM_Stage1_submitted-2023-01-03.xlsx')
cm_1 = pd.read_excel(cm_1_path, sheet_name='Survey Summary')

# Carbon Mapper Stage 2
cm_2_path = pathlib.PurePath('00_raw_data', 'CM_Stage2_submitted-2023-02-13.xlsx')
cm_2 = pd.read_excel(cm_1_path, sheet_name='Survey Summary')


### GHGSAT Stage 1 and 2 data

#### Submission details:
- Stage 1 data submitted on 2022-11-21
- Stage 2 data submitted on 2022-12-23
- Stage 3 data same as Stage 2, submitted 2023-02-17

#### QC Indicator:

Column: "QC Flag"
- 1 = Good conditions
- 2 = Emissions detected and quantified, but suboptimal conditions may affect SR
- 3 = Emissions detected, but not quantified due to suboptimal conditions
- 4 = Diffuse emission visible over site (presumably from previous release, due to low wind)
- 5 = Discarded (Bad weather/conditions, including clouds, cloud shadow, highly irregular aircraft trajectory, etc.)

#### Data processing notes
Unable to open the .xlsx file provided by GHGSat in Python, possibly related to read-only restrictions. I have saved the relevant data sheets as csv files to load instead, original submissions by GHGSat are included in 00_raw_data as .xlsx files.

In [22]:
# GHGSat Stage 1
ghg_1_path = pathlib.PurePath('00_raw_data', 'GHG_Stage1_submitted-2022-11-21.csv')
ghg_1 = pd.read_csv(ghg_1_path)

# GHGSat Stage 2
ghg_2_path = pathlib.PurePath('00_raw_data', 'GHG_Stage2_submitted-2022-12-23.csv')
ghg_2 = pd.read_csv(ghg_2_path)

### Kairos Stage 1 and 2 data

#### Submission details
- Stage 1 submitted on 2022-11-17
- Stage 2 submitted on 2022-12-20
- Kairos submitted data for two pods, LS23 and LS25. They analyzed the data independently, but did not report this until after testing was complete.

#### QC Indicator:
(I ran the UNIQUE function in Excel to identify values in their original report)
- "Plane deviated from flightline"
- "PARTIAL DETECTION"
- "Cutoff - low confidence quantification"
- "Excessive methane pooling near site"
- "Excessive methane pooling over site" (appears twice - possible extra space at end?)
- "Plane deviation from flightpath"
- "Glare"

#### Data processing notes
Unable to open the .xlsx file provided by GHGSat in Python, possibly related to read-only restrictions. I have saved the relevant data sheets as csv files to load instead, original submissions by GHGSat are included in 00_raw_data as .xlsx files.


In [25]:
# Kairos Stage 1
kairos_ls23_1_path = pathlib.PurePath('00_raw_data', 'Kairos_Stage1_podLS23_submitted-2022-11-17.csv')
kairos_ls25_1_path = pathlib.PurePath('00_raw_data', 'Kairos_Stage1_podLS25_submitted-2022-11-17.csv')

kairos_ls23_1 = pd.read_csv(kairos_ls23_1_path)
kairos_ls25_1 = pd.read_csv(kairos_ls25_1_path)

# GHGSat Stage 2
kairos_ls23_2_path = pathlib.PurePath('00_raw_data', 'Kairos_Stage2_podLS25_submitted-2022-12-20.csv')
kairos_ls25_2_path = pathlib.PurePath('00_raw_data', 'Kairos_Stage2_podLS25_submitted-2022-12-20.csv')
kairos_ls23_2 = pd.read_csv(kairos_ls23_2_path)
kairos_ls25_2 = pd.read_csv(kairos_ls25_2_path)