# Summarize datasets for all operator results
Code author: Sahar H. El Abbadi
Date started: 2023-03-13
Date last edited: 2023-03-17

## Generate summary tables

Generate overpass summary tables for each operator. Overpass summary tables extract all relevant information from operator reports and metering data that is needed for downstream analysis. These tables are saved as CSV files in 03_results > overpass_summary, using the format operator_stage_overpasses.csv.

Columns in the overpass summary tables are:
- overpass_id: this is the ID for the specific aircraft overpass, and matches PerformerExperimentID overpass number in the raw operator data
- overpass_datetime: date and time of overpass, in UTC time. This can be generated using the Flightradar GPS timestamp (timekeeper = 'flightradar'), Stanford's on the ground estimate of overhead time (timekeeper = 'stanford'), or using the timestamp according to the operator report (timekeeper = 'team'). Value used for all analysis is in paper is Flightradar
- zero_release: True if the release by Stanford is 0 kgh, False if greater than 0 kgh
- non_zero_release: True if release by Stanford is greater than 0 kgh, False if equal to 0 kgh
- operator_kept: True if this overpass passed the operator QC criteria
- stanford_kept: True if this overpass passed Stanford's QC criteria
- phase_iii: True if we provided this overpass to the operator during Phase III of unblinding
- pass_all_qc: True if passed both operator and Stanford QC
- fail_all_qc: True if this overpass failed both operator and Stanford QC
- operator_detected: True if operator detected a release. False if they did not
- operator_quantification: operator's quantification estimate as reported in operator report
- operator_lower: lower bound on operator's lower bound quantification estimate
- operator_upper: upper bound on operator's quantification estimate
- qc_summary: summarizes results of both operator and Stanford QC. Must be one of the following: 'pass_all', 'fail_stanford', 'fail_operator', 'fail_all'

Note: due to the small number of data points for Scientific Aviation and the fact that they did not submit using a template, sciav_1_overpasses.csv was manually constructed by Sahar El Abbadi on 2023-03-16.

In [None]:
# Imports
from methods_source import generate_all_overpass_reports

In [None]:
# Generate all overpass reports
generate_all_overpass_reports(strict_discard=False, timekeeper='flightradar', gas_comp_source='km', time_ave='60')
generate_all_overpass_reports(strict_discard=True, timekeeper='flightradar', gas_comp_source='km', time_ave='60')

In [None]:
# Generate MAIR mIME and DI reports
from methods_source import generate_overpass_summary

generate_overpass_summary(operator='MethaneAIR mIME', stage=1, timekeeper='flightradar', strict_discard=False, gas_comp_source='km', time_ave=60)
generate_overpass_summary(operator='MethaneAIR DI', stage=1, timekeeper='flightradar', strict_discard=False, gas_comp_source='km', time_ave=60)

## Sanity checks on data

Basic double checks on the data to make sure everything looks good

##### Number of overpasses

Check that the length of the dataframe matches the highest overpass ID. This will flag the potential for missing or duplicate data in the summary file. Doing this check as we previously observed duplicate rows in the meter summary file.

In [None]:
# Check Carbon Mapper

from methods_source import load_overpass_summary, check_overpass_number

cm_overpasses = load_overpass_summary(operator='Carbon Mapper', stage=1, strict_discard=False, time_ave=60, gas_comp_source='km')
cm_max_id = cm_overpasses.max()['overpass_id']
cm_overpasses_length = len(cm_overpasses)
check_overpass_number(operator='Carbon Mapper', max_overpass_id=cm_max_id, overpasses_length=cm_overpasses_length)

In [None]:
# Check GHGSat data length

from methods_source import load_overpass_summary, check_overpass_number

ghg_overpasses = load_overpass_summary(operator='GHGSat', stage=1, strict_discard=False, time_ave=60, gas_comp_source='km')
ghg_max_id = ghg_overpasses.max()['overpass_id']
ghg_overpasses_length = len(ghg_overpasses)
check_overpass_number(operator='GHGSat', max_overpass_id=ghg_max_id, overpasses_length=ghg_overpasses_length)

In [None]:
# Check Kairos data length

from methods_source import load_overpass_summary, check_overpass_number

kairos_overpasses = load_overpass_summary(operator='Kairos', stage=1, strict_discard=False, time_ave=60, gas_comp_source='km')
kairos_max_id = kairos_overpasses.max()['overpass_id']
kairos_overpasses_length = len(kairos_overpasses)
check_overpass_number(operator='Kairos', max_overpass_id=kairos_max_id, overpasses_length=kairos_overpasses_length)

In [None]:
# Check Scientific Aviation data length

from methods_source import load_overpass_summary, check_overpass_number

sciav_overpasses = load_overpass_summary(operator='Scientific Aviation', stage=1, strict_discard=False,time_ave=60, gas_comp_source='km')
sciav_max_id = sciav_overpasses.max()['overpass_id']
sciav_overpasses_length = len(sciav_overpasses)
check_overpass_number(operator='Scientific Aviation', max_overpass_id=sciav_max_id, overpasses_length=sciav_overpasses_length)

In [None]:
# Check Methane Air

from methods_source import load_overpass_summary, check_overpass_number

mair_overpasses = load_overpass_summary(operator='Methane Air', stage=1, strict_discard=False, time_ave=60, gas_comp_source='km')
mair_max_id = mair_overpasses.max()['overpass_id']
mair_overpasses_length = len(mair_overpasses)
check_overpass_number(operator='Methane Air', max_overpass_id=mair_max_id, overpasses_length=mair_overpasses_length)

##### Calculate average flight altitude across all reported overpasses

Use "altitude_feet" column in overpass summary files

In [None]:
from methods_source import load_overpass_summary

cm_overpasses = load_overpass_summary(operator='Carbon Mapper', stage=1, strict_discard=False, time_ave=60, gas_comp_source='km')
ghg_overpasses = load_overpass_summary(operator='GHGSat', stage=1, strict_discard=False, time_ave=60, gas_comp_source='km')
kairos_overpasses = load_overpass_summary(operator='Kairos', stage=1, strict_discard=False, time_ave=60, gas_comp_source='km')
mair_overpasses = load_overpass_summary(operator='Methane Air', stage=1, strict_discard=False, time_ave=60, gas_comp_source='km')
cm_altitude = cm_overpasses.altitude_feet.mean()
ghg_altitude = ghg_overpasses.altitude_feet.mean()
kairos_altitude = kairos_overpasses.altitude_feet.mean()
mair_altitude = mair_overpasses.altitude_feet.mean()
print('Average overpass altitude (feet) across all reported overpasses:\n')
print(f'Carbon Mapper: {cm_altitude:,.1f} ft')
print(f'GHGSat: {ghg_altitude:,.1f} ft')
print(f'Kairos: {kairos_altitude:,.1f} ft')
print(f'Methane Air: {mair_altitude:,.1f} ft')

##### Calculate average flight for each day of testing


In [None]:
from methods_source import calc_daily_altitude
operator = 'Carbon Mapper'
cm_alt = calc_daily_altitude(operator)
print(f'{operator} average daily altitude:\n')
print(cm_alt.to_string(formatters={'altitude_feet':'{:,.1f} feet'.format}))

In [None]:
operator = 'GHGSat'
ghg_alt = calc_daily_altitude(operator)
print(f'{operator} average daily altitude:\n')
print(ghg_alt.to_string(formatters={'altitude_feet':'{:,.1f} feet'.format}))

In [None]:
operator = 'Kairos'
kairos_alt = calc_daily_altitude(operator)
print(f'{operator} average daily altitude:\n')
print(kairos_alt.to_string(formatters={'altitude_feet':'{:,.1f} feet'.format}))

In [None]:
operator = 'Methane Air'
mair_alt = calc_daily_altitude(operator)
print(f'{operator} average daily altitude:\n')
print(mair_alt.to_string(formatters={'altitude_feet':'{:,.1f} feet'.format}))

##### Summarize distribution of overpasses

Total number of overpasses, missing data, max and min given to each team, number that passed Stanford QC

Missing overpasses: documented as an overpass by Stanford, but not included in Operator Report file

In [1]:
from writing_analysis import operator_releases_summary_stats
# Carbon Mapper
operator = 'Carbon Mapper'
operator_releases_summary_stats(operator)

Carbon Mapper: 121 flightlines reported to SU
8 overpasses that fail SU QC
113 overpasses that pass SU QC
82 overpasses quantified by Carbon Mapper
31 overpasses removed by Carbon Mapper

3 overpasses documented by Stanford but not reported by Carbon Mapper

Total releases conducted by Stanford (including missing overpasses not reported by Carbon Mapper: 124
Number of zero releases to Carbon Mapper: 8

Largest volume overpass for Carbon Mapper:
Release Rate: 1443.09917328525 kg CH4 / hr
[1366.7685660986533, 1519.4297804718465, 95% CI]
(sigma from gas flow: 40.877526)
(sigma from meter: 1.936967)
(sigma from gas composition: 0.001362)
(combined total sigma: 38.944187)

Smallest non-zero volume overpass for Carbon Mapper:
Release Rate: 4.4451804176775 kg CH4 / hr
[4.301638947741193, 4.588721887613808, 95% CI]
(sigma from gas flow: 0.067039)
(sigma from meter: 0.039443)
(sigma from gas composition: 0.001617)
(combined total sigma: 0.073235)

No false positives detected, all zero releases 

In [1]:
from writing_analysis import operator_releases_summary_stats
# GHGSat
operator = 'GHGSat'
operator_releases_summary_stats(operator)

GHGSat: 192 flightlines reported to SU
57 overpasses that fail SU QC
135 overpasses that pass SU QC
134 overpasses quantified by GHGSat
1 overpasses removed by GHGSat

2 overpasses documented by Stanford but not reported by GHGSat

Total releases conducted by Stanford (including missing overpasses not reported by GHGSat: 194

Largest volume overpass for GHGSat:
Release Rate: 1144.809472515 kg CH4 / hr
[1112.419998540914, 1177.1989464890858, 95% CI]
(sigma from gas flow: 16.629319)
(sigma from meter: 1.530164)
(sigma from gas composition: 0.003645)
(combined total sigma: 16.525242)

Smallest non-zero volume overpass for GHGSat:
Release Rate: 1.049607814791 kg CH4 / hr
[1.0192273822188926, 1.0799882473631073, 95% CI]
(sigma from gas flow: 0.014922)
(sigma from meter: 0.005792)
(sigma from gas composition: 0.002693)
(combined total sigma: 0.015500)

Number of zero releases to GHGSat: 4

No false positives detected, all zero releases were correctly categorized

False positives detected: 9


In [14]:
# Check GHGSat's internal QC for points they quantified but flagged as sub-optimal
from methods_source import load_operator_report_dictionary, load_overpass_summary
operator = 'GHGSat'
ghg_report = load_operator_report_dictionary()['ghg_1']

# QC flag of GH-2 means that emissions were quantified despite sub-optimal conditions.
# QC flag of GH-4 means diffuse emissions visible over site

ghg_report_quantified = ghg_report.loc[ghg_report.QuantifiedPlume == True]
qc_mask = (ghg_report_quantified['QCFlag'] == 'GH-2') | (ghg_report_quantified['QCFlag'] == 'GH-4')
ghg_report_poor_conditions = ghg_report_quantified.loc[qc_mask]
# list of overpass ID's with poor conditions but quantified
poor_condition_overpasses = ghg_report_poor_conditions['overpass_id']
overpasses = load_overpass_summary(operator, stage=1)

# Find the overpasses in the overpass summary, to see which ones of them pass Stanford QC
poor_condition_overpass_summary = overpasses[overpasses['overpass_id'].isin(poor_condition_overpasses)]

# Select overpasses that pass Stanford QC
poor_condition_overpasses_pass_SU = poor_condition_overpass_summary.loc[poor_condition_overpass_summary.stanford_kept == True]

print(f'Number of overpasses that pass SU quality control but are flagged by GHGSat as sub-optimal conditions for quantification: {len(poor_condition_overpasses_pass_SU)}')

Number of overpasses that pass SU quality control but are flagged by GHGSat as sub-optimal conditions for quantification: 9


In [2]:
# Further investigate GHGSat false negatives, as they detects small releases but also had some higher false negatives
from methods_source import load_overpass_summary, classify_confusion_categories
operator = 'GHGSat'
overpasses = load_overpass_summary(operator, stage=1)
pass_su_qc = overpasses.loc[overpasses.stanford_kept == True]
true_positives, false_positives, true_negatives, false_negatives = classify_confusion_categories(pass_su_qc)

# How many of these releases were less than 5 kgh?
#

### Overpasses that fail operator QC

Carbon Mapper and Scientific Aviation both provided quantification estimates in a column in their report. Kairos and GHGSat did not. Because points that were detected but not quantified may be included in the probability of detection assessment, I examine these points individually here. Specifically, I look at points that are lower than 30 kgh and pass Stanford QC but fail operator QC.

Operator QC for Carbon Mapper and SciAv referred to quantification, and this is not specified for Kairos or GHGSat. MAIR QC only indicates confidence in detection, but not quantification.

In [None]:
# Carbon Mapper points that fail QC - points currently excluded from probability of detection
from methods_source import load_overpass_summary
operator = 'Carbon Mapper'
overpasses = load_overpass_summary(operator, stage=1)
# Fail CM QC but pass stanford:
# Which overpasses would be included in a probability of detection plot:
columns = ['overpass_id','release_rate_kgh', 'operator_detected']
detection_prob = (overpasses['qc_summary'] =='fail_operator') & (overpasses['release_rate_kgh'] <= 50)
# detection_prob = (overpasses['qc_summary'] =='fail_operator')
print(overpasses.loc[detection_prob][columns])



All four overpasses under 30 kgh were marked as detected by Carbon Mapper, but flagged as poor quality for quantification estimates. Exclude from analysis in the main text but include in SI. We were not aiming to flesh out a probability of detection for Carbon Mapper and thus are only including points that. ALl four of these points are also excluded in Stage 2 analysis, but overpasses 3 and 41 are re-added in Stage 3 (see below).

In [None]:
# Compare with Stage 3
operator = 'Carbon Mapper'
overpasses = load_overpass_summary(operator, stage=3)
# Fail CM QC but pass stanford:
# Which overpasses would be included in a probability of detection plot:
columns = ['overpass_id','release_rate_kgh', 'operator_detected']
detection_prob = (overpasses['qc_summary'] =='fail_operator') & (overpasses['release_rate_kgh'] <= 30)
# detection_prob = (overpasses['qc_summary'] =='fail_operator')
print(overpasses.loc[detection_prob][columns])

In [None]:
# GHGSat points that fail QC - points currently excluded from probability of detection
from methods_source import load_overpass_summary
operator = 'GHGSat'
overpasses = load_overpass_summary(operator, stage=1)
# Fail CM QC but pass stanford:
# Which overpasses would be included in a probability of detection plot:
columns = ['overpass_id','release_rate_kgh', 'operator_detected']
detection_prob = (overpasses['qc_summary'] =='fail_operator') & (overpasses['release_rate_kgh'] <= 30)
print(overpasses.loc[detection_prob][columns])

Overpass 83 was discarded by GHGSat due to bad weather. This indicates it should not be included in a probability of detection plot.

In [None]:
# Kairos points that fail QC - points currently excluded from probability of detection
from methods_source import load_overpass_summary
operator = 'Kairos'
overpasses = load_overpass_summary(operator, stage=1)
# Fail CM QC but pass stanford:
# Which overpasses would be included in a probability of detectin plot:
detection_prob = (overpasses['qc_summary'] =='fail_operator') & (overpasses['release_rate_kgh'] <= 30)
columns = ['overpass_id','release_rate_kgh', 'operator_detected']
print(overpasses.loc[detection_prob][columns])

Overpass ID:
15 - flagged by Kairos as Partial Detection (KA-2)
61 - flagged by Kairos as 'Excessive Methane Pooling Over Site' (KA-4)
62 - flagged by Kairos as 'Excessive Methane Pooling Over Site' (KA-4)
312 - flagged by Kairos as 'Excessive Methane Pooling Over Site' (KA-4)

Based on this assessment, removing keeping all four overpasses from the probability of detection chart seems consistent with QC flags. Additionally, Kairos will indicate a non-detect by submitting 0 kgh as quantification estimate.

In [None]:
from methods_source import load_overpass_summary
operator = 'Scientific Aviation'
overpasses = load_overpass_summary(operator, stage=1)
# detection_prob = (overpasses['qc_summary'] =='fail_operator') & (overpasses['release_rate_kgh'] <= 30)
detection_prob = (overpasses['release_rate_kgh'] <= 30)
columns = ['overpass_id','release_rate_kgh', 'operator_detected']
print(overpasses.loc[detection_prob][columns])