# EPA GHGRP demo of Data in Data Commons

The purpose of this notebook is to provide a visual representation of the availability and connectivity of some of the data in the Data Commons.  The Data are:
* EPA_GHGRP: Asset-level data (physical plants and operations in the US that emit >= 25 kt CO2e and others covered by GHG Reporting Project)
* GLEIF: Legal Entity Identifiers for parent companies that own assets (and identification of parent companies that don't have LEIs)
* Type of emissions from parent companies: Direct Emitters, LDC Emissions, On-Shore Refining, Gathering and Boosting, Transmission Pipelines, SF6 from Electrical Equipment
* Types of operations by SIC/NAICS codes (Steel, Cement, Electricity Generation, Pulp and Paper Manufacturing, etc)
* Sectors (Manufacturing, Transportation Communications and Utilities, Service Industries, Mining, etc)
* SEC 10-K reports: Revenue Data (can be compared/contrasted with EPA CO2e emissions data)

The data developed in this notebook can be visualized by running the notebook https://github.com/os-climate/data-platform-demo/blob/master/notebooks/Sankey.ipynb

Then visualizing the data in SuperSet here: https://superset-secure-odh-superset.apps.odh-cl2.apps.os-climate.org/superset/explore/

This data is incomplete from a number of perspectives:
* Major non-emitting power plants (hydro electric dams, solar arrays, wind turbines, and nuclear powerplants) may be missing as assets
* There are no metrics for energy generation or consumption, nor for other units of production (such as tons of steel produced)
* All emissions are essentially Scope 1 emissions; there are no Scope 2 attributions for major energy consumers (such as Steel manufacturing)
* There are also no Scope 3 metrics
* The data is exclusively US-based

Despite these shortcomings, this illustration/demonstration shows how additional data can be linked in to provide a more complete picture:
* WRI Power Plant data, providing a global perspective on power plants, including emissions, generation, capacity, fuel type, etc.
* RMI Utility Transition Hub data, providing fine-grained, up-to-date information about US power plants, including emissions targets
* SEC data at Business Segment level (to separate Berkshire Hathaway's \\$65B energy business from their overall \\$265B enterprise, for example)
* SPGI sustainability reports (require NLP analysis to yield quantitative metrics)
* etc.

# Begin with Credentials and Connection to Trino

In [None]:
import os
import pathlib
from dotenv import load_dotenv

# Load some standard environment variables from a dot-env file, if it exists.
# If no such file can be found, does not fail, and so allows these environment vars to
# be populated in some other way
dotenv_dir = os.environ.get('CREDENTIAL_DOTENV_DIR', os.environ.get('PWD', '/opt/app-root/src'))
dotenv_path = pathlib.Path(dotenv_dir) / 'credentials.env'
if os.path.exists(dotenv_path):
    load_dotenv(dotenv_path=dotenv_path,override=True)

Set session variable CATALOG to make query terms much more compact

In [None]:
import trino
from sqlalchemy.engine import create_engine

sqlstring = 'trino://{user}@{host}:{port}/'.format(
    user = os.environ['TRINO_USER'],
    host = os.environ['TRINO_HOST'],
    port = os.environ['TRINO_PORT']
)

ingest_catalog = 'osc_datacommons_dev'
ingest_schema = 'sandbox'
epa_table_prefix = 'epa_'
dera_table_prefix = 'dera_'

sqlargs = {
    'auth': trino.auth.JWTAuthentication(os.environ['TRINO_PASSWD']),
    'http_scheme': 'https',
    'catalog': ingest_catalog,
    'schema': ingest_schema,
}
engine = create_engine(sqlstring, connect_args = sqlargs)
connection = engine.connect()

import pandas as pd
import osc_ingest_trino as osc

In [None]:
cleanup = False

if cleanup:
    qres = engine.execute(f"show tables in {ingest_schema}")
    l = qres.fetchall()

    for schema in [ ingest_schema ]:
        print(schema)
        qres = engine.execute(f'show tables in {schema}')
        l = qres.fetchall()

        for table in l:
            qres = engine.execute(f'drop table {schema}.{table[0]}')
            display(qres.fetchall())

        qres = engine.execute(f'show tables in {schema}')
        display(qres.fetchall())

        qres = engine.execute(f'drop schema {schema}')
        display(qres.fetchall())


    qres = engine.execute('show schemas')
    qres.fetchall()

# Introduction to EPA GHG Reporting Program data (EPA_GHGRP)

The EPA's GHG Reporting Program (GHGRP) seems to be a gold standard in terms of creating a bottoms-up list that's good enough to play a major role in tops-down estimates.

`Direct_Emitters` are the lion's share of CO2 _emissions_.  `Suppliers` tracks fuels and products which, when used as intended, will create GHG emissions (by direct emitters or others).

In [None]:
qres = engine.execute(f"describe {epa_table_prefix}direct_emitters")
display(qres.fetchall())

In [None]:
qres = engine.execute(f"""
select format('%tY', year), format('%,.2f', sum(total_reported_emissions)/1e9) || ' Gt CO2e'
from {epa_table_prefix}direct_emitters group by year order by year desc""")
display(qres.fetchall())

Here's a look at how they stack up (from a Database perspective--we should also look at this in Super Set).

In [None]:
qres = engine.execute(f"""
select count (*), latest_reported_industry_type_sectors,
       format('%,.2f', sum(total_reported_emissions)/1e6) || ' Mt CO2e' as MtCO2e
from {epa_table_prefix}direct_emitters
where year>=DATE('2020-01-01') and year<DATE('2021-01-01')
group by latest_reported_industry_type_sectors
order by MtCO2e desc
""")
display(qres.fetchall())

This looks at the `Minerals` industry (which includes cement).  We see that the top emitters have multiple facility locations.

In [None]:
qres = engine.execute(f"""
select count (*), parent_company_name, format('%5.2f', sum(total_reported_emissions)/1e6) || ' Mt CO2e' as MtCO2e
from {epa_table_prefix}direct_emitters, {epa_table_prefix}parent_company
where year>=DATE('2020-01-01') and year<DATE('2021-01-01') and year=reporting_year
      and latest_reported_industry_type_sectors='Minerals'
      and {epa_table_prefix}direct_emitters.facility_id={epa_table_prefix}parent_company.ghgrp_facility_id
group by parent_company_name
order by MtCO2e desc
limit 20
""")
display(qres.fetchall())

`Suppliers` are those who buy and sell GHG-emitting products, but they do not, themselves, cause the emissions.  They merely enable others to emit.

In [None]:
qres = engine.execute(f"describe {epa_table_prefix}suppliers")
display(qres.fetchall())

A quick summary of how many rows of data we have in `epa_ghgrp`.

68k rows in `direct_emitters`: lots of facilities  
103k rows in `parent_company`: lots of facility/owner relationships

In [None]:
qres = engine.execute('show tables in sandbox')
l = qres.fetchall()

l = [t for t in l if t[0].startswith('epa_')]
totalrows = 0
for e in l:
    s = f'select count (*) from {e[0]}'
    qres = engine.execute(s)
    rowcount = qres.fetchall()[0][0]
    totalrows += rowcount
    print(f"{rowcount:>6} <- {s})")

print(f'{totalrows} <- total rows')

# Reshaping tables to make them easier to chart

The key metric is total_emissions (in metric tons of CO2e), but the name of the metric depends on the source/process.  Nevertheless, we know that `year` is our last metric and that the CO2e metric is 2nd-to-last (hence the `-2` index).

We also know that when building our final summary table, the sums feeding into it are all only one row per year.  We use `iat[0,1` to access the 0th row and the 1st column (which will be named specifically to the source/process).  By using `iat`, we get a scalar value we can sum, instead of a Series object we'd have to `squeeze`.

In [None]:
import pandas as pd

emission_tables = ['direct_emitters', 'onshore_oil_gas_prod', 'gathering_boosting',
                   'transmission_pipelines', 'ldc_direct_emissions', 'sf6_from_elec_equip']
tot_em_columns = []

q_dict = {}

for t in emission_tables:
    qres = engine.execute(f"describe {epa_table_prefix}{t}")
    tr = qres.fetchall()
    # Each table's total reported emissions are in column total_reported_emissions
    # (paired with total_reported_emissions_units).  We need to make unique names for this merge
    total_emission_cname = t+"_em"
    tot_em_columns.append(total_emission_cname)
    qres = engine.execute(f"""
select year, sum(total_reported_emissions), total_reported_emissions_units
from {epa_table_prefix}{t}
group by year, total_reported_emissions_units
""")
    q_dict[t] = pd.DataFrame(qres.fetchall(), columns=['year', total_emission_cname, total_emission_cname+"_units"])

In [None]:
# A function that excludes terms using SQL to say "and X!=Y"
def excl_text(excl):
    return ' and '.join([f"latest_reported_industry_type_sectors!='{e}'" for e in excl])

# A function that includes text that matches; SQL that says "or X like '%Y%'"
def incl_text(incl):
    return ' or '.join([f"latest_reported_industry_type_sectors like '%{e}%'" for e in incl])

t = 'direct_emitters'
qres = engine.execute(f"describe epa_{t}")
t_cols = qres.fetchall()
total_emission_cname = t+"_em"

incl = [ 'Power', 'Petroleum']
qres = engine.execute(f"""
select year, sum(total_reported_emissions),  total_reported_emissions_units
from {epa_table_prefix}{t}
where {incl_text(incl)}
group by year, total_reported_emissions_units
""")

q_dict[t + f" (incl {','.join(incl)})"] = pd.DataFrame(qres.fetchall(),
                                                       columns=['year',
                                                                total_emission_cname + f" (matching {','.join(incl)})",
                                                                total_emission_cname + f" (matching {','.join(incl)})" + "_units"])

excl = [ 'Minerals', 'Other', 'Waste', 'Chemicals', 'Pulp and Paper,Waste',
        'Metals,Waste', 'Pulp and Paper']
qres = engine.execute(f"""
select year, sum(total_reported_emissions),  total_reported_emissions_units
from {epa_table_prefix}{t}
where {excl_text(excl)}
group by year, total_reported_emissions_units
""")
q_dict[t + f" (excl {','.join(excl)})"] = pd.DataFrame(qres.fetchall(),
                                                       columns=['year',
                                                                total_emission_cname + f" (excl {','.join(excl)})",
                                                                total_emission_cname + f" (excl {','.join(excl)})" + "_units"])

In [None]:
for t in emission_tables:
    qres = engine.execute(f"describe {epa_table_prefix}{t}")
    tr = qres.fetchall()
    total_emission_cname = t+"_em"
    qres = engine.execute(f"""
select year, sum(total_reported_emissions),  total_reported_emissions_units
from {epa_table_prefix}{t}
group by year, total_reported_emissions_units
""")
    q_dict[t] = pd.DataFrame(qres.fetchall(), columns=['year',
                                                       total_emission_cname,
                                                       total_emission_cname + "_units"])

grand_total = {}

for year in q_dict['direct_emitters'].year:
    grand_total[year] = sum([q_dict[t][q_dict[t].year==year].iat[0,1]
                             for t in emission_tables if year in q_dict[t].year.values])

df = pd.DataFrame.from_dict(grand_total, orient='index', columns=['total_co2e']).reset_index()
df.rename(columns={'index':'year'}, inplace=True)
# Grab the common unit that all these reports share
df['total_co2e_units'] = q_dict['direct_emitters'].iat[0,2]
q_dict['grand_total'] = df

This gem comes from https://stackoverflow.com/questions/44327999/python-pandas-merge-multiple-dataframes

In [None]:
from functools import reduce

df_merged = reduce(lambda left,right: pd.merge(left,right,on=['year'], how='outer'), q_dict.values()).fillna(0)
df_merged.sort_values(by='year', ascending=False, inplace=True)
df_merged.index = pd.RangeIndex(len(df_merged.index))

In [None]:
df_merged

A summary table consolidating the totals from the GHGRP, plus three additional columns:
1. direct emitters that match "Power" or "Petroleum"
2. direct emitters that are not the top other industries
3. total co2e

In [None]:
df_merged.rename(columns={v:v.replace('_', ' ') for v in df_merged.columns.values})

# Cross-check with ESSD tops-down dataset

A quick look at *just* CO2.  We'll look at CO2e in the next set of cells.

In [None]:
qres = engine.execute("""
select format('%tY', year), sector_title, format('%,.2f', sum(value)/1e9) || ' Gt CO2' as GtCO2 from essd_ghg_data
where sector_title='Energy systems' and gas='CO2' and year>=DATE('2010-01-01') and year<DATE('2021-01-01') and ISO='USA'
group by year, sector_title, gas order by year desc""")
qres.fetchall()

In [None]:
qres = engine.execute('describe essd_ghg_data')
qres.fetchall()

In [None]:
qres = engine.execute('describe essd_gwp100_data')
qres.fetchall()

A look at CO2e (presuming that's what GHG gives us from the GWP100 table) for the category `Energy Systems`.

In [None]:
qres = engine.execute("""
select format('%tY', year), sector_title, format('%,.2f', sum(GHG)/1e9) || ' Gt CO2' as GtCO2 from essd_gwp100_data
where sector_title='Energy systems' and year>DATE('2010-01-01') and year<=DATE('2021-01-01') and ISO='USA'
group by year, sector_title order by year desc""")
qres.fetchall()

# Connect with economic data provided by US CENSUS All-sector Survey (2017)

In [None]:
qres = engine.execute("describe census_all_sector_survey_2017")
display(qres.fetchall())
qres = engine.execute("select * from census_all_sector_survey_2017 where naics2012='221112'")
display(qres.fetchall())


Exercise the connection to NAICS and sector information provided by US Department of Commerce (US_CENSUS)

In [None]:
# Show how many facilities are tagged with what primary NAICS codes

qres = engine.execute(f"""
select count (*), format('%tY', {epa_table_prefix}direct_emitters.year), primary_naics_code, naics2012_label
from {epa_table_prefix}direct_emitters, census_all_sector_survey_2017
where primary_naics_code=naics2012
      and census_all_sector_survey_2017.year='2017' and {epa_table_prefix}direct_emitters.year=DATE('2017-01-01')
group by {epa_table_prefix}direct_emitters.year, primary_naics_code, naics2012_label
order by count (*) desc limit 20
""")
display(qres.fetchall())

# More table reshaping: attribution estimation

In [None]:
df = pd.read_sql(f"""
select facility_id, year, latitude, longitude, latest_reported_industry_type_sectors, total_reported_emissions, total_reported_emissions_units
from {epa_table_prefix}direct_emitters""", engine)
df.facility_id = df.facility_id.astype('int64')
df.year = df.year.astype('datetime64[ns, UTC]')
df.total_reported_emissions = df.total_reported_emissions.astype('float64')
df.total_reported_emissions_units = df.total_reported_emissions_units.astype('string')
df.latest_reported_industry_type_sectors.fillna('Other', inplace=True)

df['sector_groupings'] = pd.Series([f"{s[0]} ({len(s)+1})" if len(s)>1 else s[0] for s in df.latest_reported_industry_type_sectors.str.split(',')])

In [None]:
for sl in df.latest_reported_industry_type_sectors.str.split(','):
    # Ensure all primary (and if listed, secondary) sectors are represented
    if f's_{sl[0]}' not in df.columns:
        df[f's_{sl[0]}'] = 0.0
    if len(sl)>1 and f's_{sl[1]}' not in df.columns:
        df[f's_{sl[1]}'] = 0.0

In [None]:
attribution_vector = [ pd.Series([1.0]),
                       pd.Series([2.0/3.0, 1.0/3.0]),
                       pd.Series([0.5, 0.3, 0.2]),
                       pd.Series([0.4, 0.3, 0.2, 0.1]),
                       pd.Series([0.30, 0.25, 0.20, 0.15, 0.10]),
                       pd.Series([0.30, 0.24, 0.19, 0.14, 0.09, 0.04])]

def apply_attribution(x):
    sl = x.latest_reported_industry_type_sectors.split(',')
    # Tertiary sectors not previously mentioned are silently converted to Other, keeping our attribution columns from exploding
    appropriate_columns = list(set([f's_{s}' if f's_{s}' in x else 's_Other' for s in sl]))
    x[ appropriate_columns ] = x.total_reported_emissions * attribution_vector[len(appropriate_columns)-1].values
    return x

df_emitters = df.apply(apply_attribution, axis=1)

In [None]:
df_emitters[df_emitters.latest_reported_industry_type_sectors.str.contains(',')]

In [None]:
df_emitters[df_emitters.latest_reported_industry_type_sectors.str.count(',')>1]

# Working with Materialized Views

For now, we use Database Tables, because the specific Trino connector we are using does not support Materialized Views.

Here's an example of a facility with many owners...

In [None]:
qres = engine.execute(f"""
select ghgrp_facility_id,frs_id_facility,lei,format('%tY', reporting_year),facility_name,
       facility_city,facility_state,parent_company_name,facility_naics_code
from {epa_table_prefix}parent_company where YEAR(reporting_year)=2020 and ghgrp_facility_id=1005071 order by lei""")
qres.fetchall()

...meaning 10 rows of data that's outside our easy-to-aggregate data

In [None]:
qres = engine.execute(f"""
select facility_id,facility_name,total_reported_emissions,
       city,state,latitude,longitude,primary_naics_code,
       latest_reported_industry_type_subparts,latest_reported_industry_type_sectors,format('%tY', year)
from {epa_table_prefix}direct_emitters where facility_id=1005071 order by year""")
qres.fetchall()

Create actual materialized data from a large concatenation operation

In [None]:
import osc_ingest_trino as osc
import itertools

engine.execute(f"create schema if not exists {ingest_schema}")

# display([(x, y) for x, y in zip(emission_tables,tot_em_columns)])

emission_selects = [ f"""
select ghgrp_facility_id, reporting_year, lei, '{e_tbl}' as table_source,
         primary_naics_code, parent_co_percent_ownership * 0.01 * {e_col} as fractional_emissions,
         facility_naics_code, parent_company_name
    from {epa_table_prefix}parent_company as PC join {epa_table_prefix}{e_tbl} as ET on PC.ghgrp_facility_id=ET.facility_id and PC.reporting_year=ET.year
""" for e_tbl, e_col in zip(emission_tables,itertools.repeat('total_reported_emissions')) ]

qres = engine.execute(f"drop table if exists {epa_table_prefix}parent_attribution")
print(qres.fetchall())

sql = f"""
create table {epa_table_prefix}parent_attribution as {' union all '.join(emission_selects)}
"""

print(sql)

qres = engine.execute(sql)
print(qres.fetchall())

In [None]:
qres = engine.execute(f"describe {epa_table_prefix}parent_attribution")
display(qres.fetchall())

qres = engine.execute(f"""
select ghgrp_facility_id, YEAR(reporting_year), lei, table_source, format('%,.2f', fractional_emissions) || ' t CO2e' as metric
from {epa_table_prefix}parent_attribution""")
qres.fetchall()[::2000]

How many **_facilities owned by public companies_** match to corporate reports we can see using the SEC's DERA dataset?

See how many `PARENT_COMPANY` records have LEIs we know.  Note that there are about 8400 total facilities, so 4 facilities not covered by LEI for each that is.
There are 3K-4K distinctly named entities, so average entity owns (at least partially) approx 2-3 facilities.  It also means we know the LEIs of approximately half of the parent copmanies.

In [None]:
qres = engine.execute(f"""select count (*), DATE(reporting_year)
from (select lei, reporting_year from {epa_table_prefix}parent_company where LEI is not null group by lei, reporting_year)
group by reporting_year order by reporting_year desc""")
qres.fetchall()

In [None]:
qres = engine.execute(f"describe {dera_table_prefix}sub")
qres.fetchall()

In [None]:
qres = engine.execute(f"describe {dera_table_prefix}num")
qres.fetchall()

In [None]:
qres = engine.execute(f"""select count (*), DATE(reporting_year)
from {epa_table_prefix}parent_attribution PA join {dera_table_prefix}sub S on PA.lei=S.lei
where (form='10-K' or form='20-F' or form='40-F')
and YEAR(reporting_year)=2020 and YEAR(fy)=2020
and PA.lei is not null
group by PA.reporting_year
order by PA.reporting_year
""")
qres.fetchall()

We can tie these companies to ticker symbols...

In [None]:
qres = engine.execute(f"""select * from ticker limit 10""")
qres.fetchall()

How many distinct companies own these facilities (and what are their ticker symbols)?

In [None]:
qres = engine.execute(f"""
with leis as (select DISTINCT(S.lei), name, if(tname IS NULL, '<private>', tname) as ticker
              from {epa_table_prefix}parent_attribution PA join {dera_table_prefix}sub S on PA.lei=S.lei
                   join ticker on S.cik=ticker.cik
              where (form='10-K' or form='20-F' or form='40-F')
              and period>=DATE('2020-01-01')
              and period<DATE('2021-01-01'))
select count (*), ticker, leis.lei, name, format('%tY', reporting_year)
from {epa_table_prefix}parent_attribution PA join leis on PA.lei=leis.lei
where YEAR(reporting_year)=2020
group by leis.ticker, leis.lei, name, reporting_year
order by count(*) desc
-- limit 10
""")
ticker_list = qres.fetchall()
print(len(ticker_list))

Note that some comapnies have more than one ticker symbol!

In [None]:
ticker_list[0:50]

We can try to add up all the faciltiies for all the tickers, but that leads to counting duplicates for companies that have multiple ticker symbols...(should be 2651, not 5746)

In [None]:
sum([te[0] for te in ticker_list])

Sample data to cross-check LEI, Facility ID and EDGAR submission data

In [None]:
qres = engine.execute(f"""
select DISTINCT(S.lei), ghgrp_facility_id, adsh
              from {epa_table_prefix}parent_attribution PA join {dera_table_prefix}sub S on PA.lei=S.lei
              where YEAR(reporting_year)=2020
              and (form='10-K' or form='20-F' or form='40-F')
              and YEAR(period)=2020
              order by S.lei desc""")
l = qres.fetchall()
print(len(l))
display(l[::100])

Compute intensity in metric tons of CO2e per million dollars

In [None]:
qres = engine.execute(f"""
select PA.lei, sic, floor(sic/100) as sic_2digit, format('%1$tY-%1$tm-%1$td', reporting_year),
       name, sum(fractional_emissions) as tot_co2e,
       uom || ' $M', round(max(value)/1e6,3) as tot_revenue,
       format('%7.2f', 1e6*sum(fractional_emissions)/sum(value)) || ' t CO2e/$M' as intensity
from {epa_table_prefix}parent_attribution as PA join {dera_table_prefix}sub as S on PA.lei=S.lei
     join {dera_table_prefix}num as N on S.adsh=N.adsh
where YEAR(reporting_year)=2020
and (form='10-K' or form='20-F' or form='40-F')
and YEAR(period)=2020
and YEAR(ddate)=2020
and coreg is NULL
and (N.tag='Revenues'
     or N.tag='RevenueFromContractWithCustomerIncludingAssessedTax'
     or N.tag='RevenueFromContractWithCustomerExcludingAssessedTax'
     or N.tag='RevenuesNetOfInterestExpense'
     or N.tag='RegulatedAndUnregulatedOperatingRevenue'
     or N.tag='RegulatedOperatingRevenuePipelines')
and N.qtrs=4
group by PA.lei, PA.reporting_year, sic, name, uom
order by intensity desc
-- limit 100
""")
rows = qres.fetchall()
print(len(rows))
display(rows[::5])

# A Deep Dive into outlier data

In [None]:
qreg=engine.execute(f"""
select DISTINCT(S.lei), ghgrp_facility_id, name, adsh
from {epa_table_prefix}parent_attribution PA join {dera_table_prefix}sub S on PA.lei=S.lei
where YEAR(reporting_year)=2020 and S.lei='549300O4B5CVWMKUES27'
and (form='10-K' or form='20-F' or form='40-F')
and YEAR(period)=2020""")
qreg.fetchall()

In [None]:
qreg=engine.execute(f"""
select DATE(reporting_year), format ('%,10.2f', sum(fractional_emissions)) || ' t CO2e' as metric
from {epa_table_prefix}parent_attribution
where lei='549300O4B5CVWMKUES27'
group by DATE(reporting_year)
""")
l = qreg.fetchall()
l

# GHGRP Direct Emitters include Cement and Steel Plans (which we can connect to SFI data)

In [None]:
qres = engine.execute("describe sfi_cement")
display(qres.fetchall())

In [None]:
qres = engine.execute("select count (*) from sfi_cement")
display(qres.fetchall())
qres = engine.execute("select count (*) from sfi_steel")
display(qres.fetchall())

# There are 105 US-located cement plants listed in the SFI report with parent LEIs
qres = engine.execute("select count (*), iso3 from sfi_cement where iso3='USA' group by iso3")
display(qres.fetchall())

qres = engine.execute(f"""
select owner_name, parent_name, lei, parent_lei, facility_id
from sfi_cement, {epa_table_prefix}direct_emitters, {epa_table_prefix}parent_company
where ghgrp_facility_id=facility_id
and reporting_year={epa_table_prefix}direct_emitters.year
and YEAR(reporting_year)=2020
and sfi_cement.iso3='USA'
and abs(sfi_cement.latitude-{epa_table_prefix}direct_emitters.latitude)<0.01
and abs(sfi_cement.longitude-{epa_table_prefix}direct_emitters.longitude)<0.01
""")
l = qres.fetchall()
print(f"{len(l)}: facilities/parent relationships matched in USA using lat/lon")

In [None]:
l[3::2]