# Summarize datasets for all operator results
Code author: Sahar H. El Abbadi
Date started: 2023-03-13
Date last edited: 2023-03-17

## Generate summary tables

Generate overpass summary tables for each operator. Overpass summary tables extract all relevant information from operator reports and metering data that is needed for downstream analysis. These tables are saved as CSV files in 03_results > overpass_summary, using the format operator_stage_overpasses.csv.

Columns in the overpass summary tables are:
- overpass_id: this is the ID for the specific aircraft overpass, and matches PerformerExperimentID overpass number in the raw operator data
- overpass_datetime: date and time of overpass, in UTC time. This can be generated using the Flightradar GPS timestamp (timekeeper = 'flightradar'), Stanford's on the ground estimate of overhead time (timekeeper = 'stanford'), or using the timestamp according to the operator report (timekeeper = 'team'). Value used for all analysis is in paper is Flightradar
- zero_release: True if the release by Stanford is 0 kgh, False if greater than 0 kgh
- non_zero_release: True if release by Stanford is greater than 0 kgh, False if equal to 0 kgh
- operator_kept: True if this overpass passed the operator QC criteria
- stanford_kept: True if this overpass passed Stanford's QC criteria
- phase_iii: True if we provided this overpass to the operator during Phase III of unblinding
- pass_all_qc: True if passed both operator and Stanford QC
- fail_all_qc: True if this overpass failed both operator and Stanford QC
- operator_detected: True if operator detected a release. False if they did not
- operator_quantification: operator's quantification estimate as reported in operator report
- operator_lower: lower bound on operator's lower bound quantification estimate
- operator_upper: upper bound on operator's quantification estimate
- qc_summary: summarizes results of both operator and Stanford QC. Must be one of the following: 'pass_all', 'fail_stanford', 'fail_operator', 'fail_all'

Note: due to the small number of data points for Scientific Aviation and the fact that they did not submit using a template, sciav_1_overpasses.csv was manually constructed by Sahar El Abbadi on 2023-03-16.

In [1]:
# Imports
from methods_source import generate_all_overpass_reports

In [2]:
# Generate all overpass reports
generate_all_overpass_reports(strict_discard=False, timekeeper='flightradar', gas_comp_source='km', time_ave='60')
generate_all_overpass_reports(strict_discard=True, timekeeper='flightradar', gas_comp_source='km', time_ave='60')

Generating operator summary file for Carbon Mapper Stage 1
Generating operator summary file for Carbon Mapper Stage 2
Generating operator summary file for Carbon Mapper Stage 3
Generating operator summary file for GHGSat Stage 1
Generating operator summary file for GHGSat Stage 2
Generating operator summary file for GHGSat Stage 3
Generating operator summary file for Kairos Stage 1
Generating operator summary file for Kairos Stage 2
Generating operator summary file for Kairos Stage 3
Generating operator summary file for Kairos LS23 Stage 1
Generating operator summary file for Kairos LS23 Stage 2
Generating operator summary file for Kairos LS23 Stage 3
Generating operator summary file for Kairos LS25 Stage 1
Generating operator summary file for Kairos LS25 Stage 2
Generating operator summary file for Kairos LS25 Stage 3
Generating operator summary file for MethaneAIR Stage 1
Generating operator summary file for MethaneAIR mIME Stage 1
Generating operator summary file for MethaneAIR DI S

In [3]:
# Generate MAIR mIME and DI reports
from methods_source import generate_overpass_summary

generate_overpass_summary(operator='MethaneAIR mIME', stage=1, timekeeper='flightradar', strict_discard=False, gas_comp_source='km', time_ave=60)
generate_overpass_summary(operator='MethaneAIR DI', stage=1, timekeeper='flightradar', strict_discard=False, gas_comp_source='km', time_ave=60)

Unnamed: 0,overpass_id,overpass_datetime,zero_release,non_zero_release,operator_kept,stanford_kept,phase_iii,pass_all_qc,altitude_feet,fail_all_qc,...,upper_95CI,lower_95CI,sigma_flow_variability,sigma_meter_reading,sigma_gas_composition,operator_detected,operator_quantification,operator_lower,operator_upper,qc_summary
0,1.0,2022-10-25 17:15:08,False,True,True,True,0,True,41623.97797,False,...,206.548104,203.919646,0.348049,0.276716,0.002407,True,196.1,141.2,251.0,pass_all
1,2.0,2022-10-25 17:32:52,False,True,True,True,0,True,41630.46886,False,...,97.662102,96.418108,0.165219,0.130839,0.002407,True,56.6,13.4,99.8,pass_all
2,3.0,2022-10-25 17:50:42,False,True,True,False,0,False,42633.14464,False,...,1038.134963,936.133216,27.299022,1.32447,0.001366,True,957.4,681.8,1233.0,fail_stanford
3,4.0,2022-10-25 18:09:11,False,True,True,True,0,True,42635.03948,False,...,1036.518753,928.474944,28.925115,1.318248,0.001366,True,889.9,648.3,1131.5,pass_all
4,5.0,2022-10-25 18:27:57,True,False,True,False,0,False,43640.08684,False,...,0.0,0.0,0.0,0.0,0.0,False,-13.4,-34.8,40.0,fail_stanford
5,6.0,2022-10-25 18:47:09,False,True,True,True,0,True,43643.14464,False,...,472.846408,463.821605,2.229167,0.628379,0.001366,True,769.4,510.9,1027.8,pass_all
6,7.0,2022-10-25 19:07:10,False,True,True,True,0,True,43626.23549,False,...,24.533066,24.307216,0.035178,0.032765,0.001366,False,-74.0,-123.2,-24.8,pass_all
7,8.0,2022-10-25 19:28:04,False,True,True,True,0,True,43632.51964,False,...,643.676728,625.459249,4.717392,0.851421,0.001366,True,530.7,364.1,697.3,pass_all
8,9.0,2022-10-25 20:10:31,False,True,True,True,0,True,44646.26964,False,...,142.7484,141.521687,0.160786,0.190707,0.001366,True,350.3,231.0,469.7,pass_all
9,10.0,2022-10-25 20:29:58,False,True,True,True,0,True,44667.51964,False,...,74.993598,74.315879,0.101548,0.100167,0.001366,False,-133.2,-183.1,-83.4,pass_all


## Sanity checks on data

Basic double checks on the data to make sure everything looks good

##### Number of overpasses

Check that the length of the dataframe matches the highest overpass ID. This will flag the potential for missing or duplicate data in the summary file. Doing this check as we previously observed duplicate rows in the meter summary file.

In [4]:
# Check Carbon Mapper

from methods_source import load_overpass_summary, check_overpass_number

cm_overpasses = load_overpass_summary(operator='Carbon Mapper', stage=1, strict_discard=False, time_ave=60, gas_comp_source='km')
cm_max_id = cm_overpasses.max()['overpass_id']
cm_overpasses_length = len(cm_overpasses)
check_overpass_number(operator='Carbon Mapper', max_overpass_id=cm_max_id, overpasses_length=cm_overpasses_length)

The length of the Carbon Mapper overpasses dataframe (121) matches the highest value for overpass_id (121).


In [5]:
# Check GHGSat data length

from methods_source import load_overpass_summary, check_overpass_number

ghg_overpasses = load_overpass_summary(operator='GHGSat', stage=1, strict_discard=False, time_ave=60, gas_comp_source='km')
ghg_max_id = ghg_overpasses.max()['overpass_id']
ghg_overpasses_length = len(ghg_overpasses)
check_overpass_number(operator='GHGSat', max_overpass_id=ghg_max_id, overpasses_length=ghg_overpasses_length)

The length of the GHGSat overpasses dataframe (192) matches the highest value for overpass_id (192).


In [6]:
# Check Kairos data length

from methods_source import load_overpass_summary, check_overpass_number

kairos_overpasses = load_overpass_summary(operator='Kairos', stage=1, strict_discard=False, time_ave=60, gas_comp_source='km')
kairos_max_id = kairos_overpasses.max()['overpass_id']
kairos_overpasses_length = len(kairos_overpasses)
check_overpass_number(operator='Kairos', max_overpass_id=kairos_max_id, overpasses_length=kairos_overpasses_length)

The length of the Kairos overpasses dataframe (349) matches the highest value for overpass_id (349).


In [7]:
# Check Scientific Aviation data length

from methods_source import load_overpass_summary, check_overpass_number

sciav_overpasses = load_overpass_summary(operator='Scientific Aviation', stage=1, strict_discard=False,time_ave=60, gas_comp_source='km')
sciav_max_id = sciav_overpasses.max()['overpass_id']
sciav_overpasses_length = len(sciav_overpasses)
check_overpass_number(operator='Scientific Aviation', max_overpass_id=sciav_max_id, overpasses_length=sciav_overpasses_length)

The length of the Scientific Aviation overpasses dataframe (18) matches the highest value for overpass_id (18).


In [8]:
# Check Methane Air

from methods_source import load_overpass_summary, check_overpass_number

mair_overpasses = load_overpass_summary(operator='Methane Air', stage=1, strict_discard=False, time_ave=60, gas_comp_source='km')
mair_max_id = mair_overpasses.max()['overpass_id']
mair_overpasses_length = len(mair_overpasses)
check_overpass_number(operator='Methane Air', max_overpass_id=mair_max_id, overpasses_length=mair_overpasses_length)

The length of the Methane Air overpasses dataframe (24) matches the highest value for overpass_id (24).


##### Calculate average flight altitude across all reported overpasses

Use "altitude_feet" column in overpass summary files

In [1]:
from methods_source import load_overpass_summary, feet_per_meter

cm_overpasses = load_overpass_summary(operator='Carbon Mapper', stage=1, strict_discard=False, time_ave=60, gas_comp_source='km')
ghg_overpasses = load_overpass_summary(operator='GHGSat', stage=1, strict_discard=False, time_ave=60, gas_comp_source='km')
kairos_overpasses = load_overpass_summary(operator='Kairos', stage=1, strict_discard=False, time_ave=60, gas_comp_source='km')
mair_overpasses = load_overpass_summary(operator='Methane Air', stage=1, strict_discard=False, time_ave=60, gas_comp_source='km')
cm_altitude = cm_overpasses.altitude_feet.mean()
ghg_altitude = ghg_overpasses.altitude_feet.mean()
kairos_altitude = kairos_overpasses.altitude_feet.mean()
mair_altitude = mair_overpasses.altitude_feet.mean()
print('Average overpass altitude (feet) across all reported overpasses:\n')
print(f'Carbon Mapper: {cm_altitude:,.1f} ft')
print(f'GHGSat: {ghg_altitude:,.1f} ft')
print(f'Kairos: {kairos_altitude:,.1f} ft')
print(f'Methane Air: {mair_altitude:,.1f} ft')

feet_per_meter = feet_per_meter()
print('\nAverage overpass altitude (meters) across all reported overpasses:\n')
op_alt = {
    'Carbon Mapper': cm_altitude,
    'GHGSat': ghg_altitude,
    'Kairos': kairos_altitude,
    'MethaneAIR': mair_altitude,
}

operators = ['Carbon Mapper', 'GHGSat', 'Kairos', 'MethaneAIR']

for operator in operators:
    alt_meters = op_alt[operator] / feet_per_meter
    print(f'{operator}: {alt_meters:,.1f} meters')

Average overpass altitude (feet) across all reported overpasses:

Carbon Mapper: 10,341.6 ft
GHGSat: 6,594.9 ft
Kairos: 1,351.4 ft
Methane Air: 43,591.6 ft

Average overpass altitude (meters) across all reported overpasses:

Carbon Mapper: 3,152.1 meters
GHGSat: 2,010.1 meters
Kairos: 411.9 meters
MethaneAIR: 13,286.7 meters


##### Calculate average flight for each day of testing


In [10]:
from methods_source import calc_daily_altitude
operator = 'Carbon Mapper'
cm_alt = calc_daily_altitude(operator)
print(f'{operator} average daily altitude:\n')
print(cm_alt.to_string(formatters={'altitude_feet':'{:,.1f} feet'.format}))

Carbon Mapper average daily altitude:

           altitude_feet
2022-10-10 10,372.2 feet
2022-10-11 10,327.0 feet
2022-10-12 10,218.8 feet
2022-10-28 10,182.4 feet
2022-10-29 10,373.5 feet
2022-10-31 10,519.8 feet


In [11]:
operator = 'GHGSat'
ghg_alt = calc_daily_altitude(operator)
print(f'{operator} average daily altitude:\n')
print(ghg_alt.to_string(formatters={'altitude_feet':'{:,.1f} feet'.format}))

GHGSat average daily altitude:

           altitude_feet
2022-10-31  6,565.1 feet
2022-11-02  6,561.0 feet
2022-11-04  6,575.5 feet
2022-11-07  6,694.1 feet


In [12]:
operator = 'Kairos'
kairos_alt = calc_daily_altitude(operator)
print(f'{operator} average daily altitude:\n')
print(kairos_alt.to_string(formatters={'altitude_feet':'{:,.1f} feet'.format}))

Kairos average daily altitude:

           altitude_feet
2022-10-24  1,334.4 feet
2022-10-25  1,317.9 feet
2022-10-26  1,328.9 feet
2022-10-27  1,389.3 feet
2022-10-28  1,385.6 feet


In [13]:
operator = 'Methane Air'
mair_alt = calc_daily_altitude(operator)
print(f'{operator} average daily altitude:\n')
print(mair_alt.to_string(formatters={'altitude_feet':'{:,.1f} feet'.format}))

Methane Air average daily altitude:

           altitude_feet
2022-10-25 43,367.8 feet
2022-10-29 43,780.9 feet


### Overpasses that fail operator QC

Carbon Mapper and Scientific Aviation both provided quantification estimates in a column in their report. Kairos and GHGSat did not. Because points that were detected but not quantified may be included in the probability of detection assessment, I examine these points individually here. Specifically, I look at points that are lower than 30 kgh and pass Stanford QC but fail operator QC.

Operator QC for Carbon Mapper and SciAv referred to quantification, and this is not specified for Kairos or GHGSat. MAIR QC only indicates confidence in detection, but not quantification.

In [14]:
# Carbon Mapper points that fail QC - points currently excluded from probability of detection
from methods_source import load_overpass_summary
operator = 'Carbon Mapper'
overpasses = load_overpass_summary(operator, stage=1)
# Fail CM QC but pass stanford:
# Which overpasses would be included in a probability of detection plot:
columns = ['overpass_id','release_rate_kgh', 'operator_detected']
detection_prob = (overpasses['qc_summary'] =='fail_operator') & (overpasses['release_rate_kgh'] <= 50)
# detection_prob = (overpasses['qc_summary'] =='fail_operator')
print(overpasses.loc[detection_prob][columns])



    overpass_id  release_rate_kgh  operator_detected
2           3.0          8.638673               True
7           8.0         35.006595               True
22         23.0         26.001371               True
36         37.0         42.147015               True
40         41.0         29.358277               True
46         47.0         14.418687               True


All four overpasses under 30 kgh were marked as detected by Carbon Mapper, but flagged as poor quality for quantification estimates. Exclude from analysis in the main text but include in SI. We were not aiming to flesh out a probability of detection for Carbon Mapper and thus are only including points that. ALl four of these points are also excluded in Stage 2 analysis, but overpasses 3 and 41 are re-added in Stage 3 (see below).

In [15]:
# Compare with Stage 3
operator = 'Carbon Mapper'
overpasses = load_overpass_summary(operator, stage=3)
# Fail CM QC but pass stanford:
# Which overpasses would be included in a probability of detection plot:
columns = ['overpass_id','release_rate_kgh', 'operator_detected']
detection_prob = (overpasses['qc_summary'] =='fail_operator') & (overpasses['release_rate_kgh'] <= 30)
# detection_prob = (overpasses['qc_summary'] =='fail_operator')
print(overpasses.loc[detection_prob][columns])

    overpass_id  release_rate_kgh  operator_detected
22         23.0         26.001371               True
46         47.0         14.418687               True


In [16]:
# GHGSat points that fail QC - points currently excluded from probability of detection
from methods_source import load_overpass_summary
operator = 'GHGSat'
overpasses = load_overpass_summary(operator, stage=1)
# Fail CM QC but pass stanford:
# Which overpasses would be included in a probability of detection plot:
columns = ['overpass_id','release_rate_kgh', 'operator_detected']
detection_prob = (overpasses['qc_summary'] =='fail_operator') & (overpasses['release_rate_kgh'] <= 30)
print(overpasses.loc[detection_prob][columns])

    overpass_id  release_rate_kgh  operator_detected
82         83.0         25.209784              False


Overpass 83 was discarded by GHGSat due to bad weather. This indicates it should not be included in a probability of detection plot.

In [17]:
# Kairos points that fail QC - points currently excluded from probability of detection
from methods_source import load_overpass_summary
operator = 'Kairos'
overpasses = load_overpass_summary(operator, stage=1)
# Fail CM QC but pass stanford:
# Which overpasses would be included in a probability of detectin plot:
detection_prob = (overpasses['qc_summary'] =='fail_operator') & (overpasses['release_rate_kgh'] <= 30)
columns = ['overpass_id','release_rate_kgh', 'operator_detected']
print(overpasses.loc[detection_prob][columns])

     overpass_id  release_rate_kgh  operator_detected
14          15.0          3.414754              False
60          61.0         14.098147              False
61          62.0         14.046313              False
312        313.0          5.507972              False


Overpass ID:
15 - flagged by Kairos as Partial Detection (KA-2)
61 - flagged by Kairos as 'Excessive Methane Pooling Over Site' (KA-4)
62 - flagged by Kairos as 'Excessive Methane Pooling Over Site' (KA-4)
312 - flagged by Kairos as 'Excessive Methane Pooling Over Site' (KA-4)

Based on this assessment, removing keeping all four overpasses from the probability of detection chart seems consistent with QC flags. Additionally, Kairos will indicate a non-detect by submitting 0 kgh as quantification estimate.

In [18]:
from methods_source import load_overpass_summary
operator = 'Scientific Aviation'
overpasses = load_overpass_summary(operator, stage=1)
# detection_prob = (overpasses['qc_summary'] =='fail_operator') & (overpasses['release_rate_kgh'] <= 30)
detection_prob = (overpasses['release_rate_kgh'] <= 30)
columns = ['overpass_id','release_rate_kgh', 'operator_detected']
print(overpasses.loc[detection_prob][columns])

    overpass_id  release_rate_kgh  operator_detected
2             3          7.095313               True
7             8          0.000000              False
10           11          4.062655               True
12           13          3.773422               True
13           14         12.467597              False
14           15         27.088342               True
15           16          0.000000              False
