# National generation and fuel consumption
The data in this notebook is generation and consumption by fuel type for the entire US. These values are larger than what would be calculated by summing facility-level data. Note that the fuel types are somewhat aggregated (coal rather than BIT, SUB, LIG, etc). So when we multiply the fuel consumption by an emissions factor there will be some level of error.

The code assumes that you have already downloaded an `ELEC.txt` file from [EIA's bulk download website](https://www.eia.gov/opendata/bulkfiles.php).

In [15]:
import json
import pandas as pd
import os
from os.path import join
import numpy as np
import sys

cwd = os.getcwd()
data_path = join(cwd, '..', 'Data storage')
idx = pd.IndexSlice

### Date string for filenames
This will be inserted into all filenames (reading and writing)

In [16]:
file_date = '2019-02-26'

In [17]:
%load_ext watermark

The watermark extension is already loaded. To reload it, use:
  %reload_ext watermark


In [18]:
%watermark -iv -v

json        2.0.9
pandas      0.22.0
numpy       1.14.5
CPython 3.6.5
IPython 6.4.0


In [19]:
# Load the "autoreload" extension
%load_ext autoreload

# always reload modules marked with "%aimport"
%autoreload 1

# add the 'src' directory as one where we can import modules
src_dir = join(os.getcwd(), os.pardir, 'src')
sys.path.append(src_dir)

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [20]:
%aimport Analysis.index
from Analysis.index import add_datetime, add_quarter

## Read ELECT.txt file

In [21]:
cwd = os.getcwd()
path = join(data_path, 'Raw EIA bulk', '{} ELEC.txt'.format(file_date))
with open(path, 'r') as f:
    raw_txt = f.readlines()

In [70]:
type(raw_txt)

list

## Filter lines to only include total generation and fuel use
Only want monthly US data for all sectors
- US-99.M
- ELEC.GEN, ELEC.CONS_TOT_BTU, ELEC.CONS_EG_BTU
- not ALL

Fuel codes:
- WWW, wood and wood derived fuels
- WND, wind
- STH, solar thermal
- WAS, other biomass
- TSN, all solar
- SUN, utility-scale solar
- NUC, nuclear
- NG, natural gas
- PEL, petroleum liquids
- SPV, utility-scale solar photovoltaic
- PC, petroluem coke
- OTH, other
- COW, coal,
- DPV, distributed photovoltaic
- OOG, other gases
- HPS, hydro pumped storage
- HYC, conventional hydroelectric
- GEO, geothermal
- AOR, other renewables (total)

In [73]:
def line_to_df(line):
    """
    Takes in a line (dictionary), returns a dataframe
    """
    for key in ['latlon', 'source', 'copyright', 'description', 
                'geoset_id', 'iso3166', 'name', 'state']:
        line.pop(key, None)

    # Split the series_id up to extract information
    # Example: ELEC.PLANT.GEN.388-WAT-ALL.M
    series_id = line['series_id']
    series_id_list = series_id.split('.')
    # Use the second to last item in list rather than third
    plant_fuel_mover = series_id_list[-2].split('-')
    line['type'] = plant_fuel_mover[0]
    line['state'] = plant_fuel_mover[1]
    line['sector'] = plant_fuel_mover[2]
    temp_df = pd.DataFrame(line)

    try:
        temp_df['year'] = temp_df.apply(lambda x: x['data'][0][:4], axis=1).astype(int)
        temp_df['month'] = temp_df.apply(lambda x: x['data'][0][-2:], axis=1).astype(int)
        temp_df['value'] = temp_df.apply(lambda x: x['data'][1], axis=1)
        temp_df.drop('data', axis=1, inplace=True)
        return temp_df
    except:
        exception_list.append(line)
        pass

In [23]:
states = ["AL", "AK", "AZ", "AR", "CA", "CO", "CT", "DE",
          "FL", "GA", "HI", "ID", "IL", "IN", "IA", "KS",
          "KY", "LA", "ME", "MD", "MA", "MI", "MN", "MS",
          "MO", "MT", "NE", "NV", "NH", "NJ", "NM", "NY",
          "NC", "ND", "OH", "OK", "OR", "PA", "RI", "SC",
          "SD", "TN", "TX", "UT", "VT", "VA", "WA", "WV", "WI", "WY"]

In [24]:
state_geos = ['USA-{}'.format(state) for state in states]

In [25]:
type(json.loads(raw_txt[0]))

dict

In [26]:
json.loads(raw_txt[0])['geography']

'USA-MN'

In [76]:
line_to_df(json.loads(gen_rows[0])).head()

Unnamed: 0,end,f,geography,last_updated,sector,series_id,start,state,type,units,year,month,value
0,201811,M,USA-OR,2019-01-25T18:54:34-05:00,99,ELEC.GEN.SUN-OR-99.M,201101,OR,SUN,thousand megawatthours,2018,11,35.32182
1,201811,M,USA-OR,2019-01-25T18:54:34-05:00,99,ELEC.GEN.SUN-OR-99.M,201101,OR,SUN,thousand megawatthours,2018,10,48.17575
2,201811,M,USA-OR,2019-01-25T18:54:34-05:00,99,ELEC.GEN.SUN-OR-99.M,201101,OR,SUN,thousand megawatthours,2018,9,63.08021
3,201811,M,USA-OR,2019-01-25T18:54:34-05:00,99,ELEC.GEN.SUN-OR-99.M,201101,OR,SUN,thousand megawatthours,2018,8,68.24651
4,201811,M,USA-OR,2019-01-25T18:54:34-05:00,99,ELEC.GEN.SUN-OR-99.M,201101,OR,SUN,thousand megawatthours,2018,7,67.11798


In [27]:
gen_rows = [row for row in raw_txt if 'ELEC.GEN' in row 
            and 'series_id' in row 
            and '-99.M' in row 
            and 'ALL' not in row]

total_fuel_rows = [row for row in raw_txt if 'ELEC.CONS_TOT_BTU' in row 
                   and 'series_id' in row 
                   and '-99.M' in row 
                   and 'ALL' not in row
                   and 'US-99.m' not in row]

eg_fuel_rows = [row for row in raw_txt if 'ELEC.CONS_EG_BTU' in row 
                and 'series_id' in row 
                and '-99.M' in row 
                and 'ALL' not in row
                and 'US-99.m' not in row]

## All generation by fuel

In [28]:
gen_dicts = [json.loads(row) for row in gen_rows]

In [29]:
gen_df = pd.concat([line_to_df(x) for x in gen_dicts
                    if x['geography'] in state_geos])

In [30]:
#drop
gen_df.head()

Unnamed: 0,end,f,geography,last_updated,sector,series_id,start,type,units,year,month,value
0,201811,M,USA-OR,2019-01-25T18:54:34-05:00,99,ELEC.GEN.SUN-OR-99.M,201101,SUN,thousand megawatthours,2018,11,35.32182
1,201811,M,USA-OR,2019-01-25T18:54:34-05:00,99,ELEC.GEN.SUN-OR-99.M,201101,SUN,thousand megawatthours,2018,10,48.17575
2,201811,M,USA-OR,2019-01-25T18:54:34-05:00,99,ELEC.GEN.SUN-OR-99.M,201101,SUN,thousand megawatthours,2018,9,63.08021
3,201811,M,USA-OR,2019-01-25T18:54:34-05:00,99,ELEC.GEN.SUN-OR-99.M,201101,SUN,thousand megawatthours,2018,8,68.24651
4,201811,M,USA-OR,2019-01-25T18:54:34-05:00,99,ELEC.GEN.SUN-OR-99.M,201101,SUN,thousand megawatthours,2018,7,67.11798


In [31]:
gen_df['geography'].unique()

array(['USA-OR', 'USA-NC', 'USA-NV', 'USA-NY', 'USA-RI', 'USA-KY',
       'USA-MO', 'USA-MA', 'USA-NM', 'USA-ME', 'USA-NE', 'USA-KS',
       'USA-MI', 'USA-MN', 'USA-MS', 'USA-MT', 'USA-CO', 'USA-IL',
       'USA-CA', 'USA-AR', 'USA-CT', 'USA-AK', 'USA-IN', 'USA-MD',
       'USA-NH', 'USA-AL', 'USA-WV', 'USA-WY', 'USA-UT', 'USA-SD',
       'USA-TN', 'USA-PA', 'USA-VT', 'USA-WI', 'USA-SC', 'USA-TX',
       'USA-OH', 'USA-ND', 'USA-OK', 'USA-WA', 'USA-LA', 'USA-ID',
       'USA-HI', 'USA-IA', 'USA-VA', 'USA-FL', 'USA-AZ', 'USA-GA',
       'USA-NJ', 'USA-DE'], dtype=object)

Multiply generation values by 1000 and change the units to MWh

In [32]:
gen_df.loc[:,'value'] *= 1000
gen_df.loc[:,'units'] = 'megawatthours'

In [33]:
gen_df.rename_axis({'value':'generation (MWh)'}, axis=1, inplace=True)

  """Entry point for launching an IPython kernel.


In [34]:
gen_df.loc[gen_df.isnull().any(axis=1)]

Unnamed: 0,end,f,geography,last_updated,sector,series_id,start,type,units,year,month,generation (MWh)
0,201811,M,USA-RI,2019-01-25T18:54:34-05:00,99,ELEC.GEN.SUN-RI-99.M,201305,SUN,megawatthours,2018,11,
2,201811,M,USA-RI,2019-01-25T18:54:34-05:00,99,ELEC.GEN.SUN-RI-99.M,201305,SUN,megawatthours,2018,9,
6,201811,M,USA-RI,2019-01-25T18:54:34-05:00,99,ELEC.GEN.SUN-RI-99.M,201305,SUN,megawatthours,2018,5,
7,201811,M,USA-RI,2019-01-25T18:54:34-05:00,99,ELEC.GEN.SUN-RI-99.M,201305,SUN,megawatthours,2018,4,
8,201811,M,USA-RI,2019-01-25T18:54:34-05:00,99,ELEC.GEN.SUN-RI-99.M,201305,SUN,megawatthours,2018,3,
9,201811,M,USA-RI,2019-01-25T18:54:34-05:00,99,ELEC.GEN.SUN-RI-99.M,201305,SUN,megawatthours,2018,2,
10,201811,M,USA-RI,2019-01-25T18:54:34-05:00,99,ELEC.GEN.SUN-RI-99.M,201305,SUN,megawatthours,2018,1,
0,201811,M,USA-ME,2019-01-25T18:54:34-05:00,99,ELEC.GEN.SUN-ME-99.M,201701,SUN,megawatthours,2018,11,
1,201811,M,USA-ME,2019-01-25T18:54:34-05:00,99,ELEC.GEN.SUN-ME-99.M,201701,SUN,megawatthours,2018,10,
2,201811,M,USA-ME,2019-01-25T18:54:34-05:00,99,ELEC.GEN.SUN-ME-99.M,201701,SUN,megawatthours,2018,9,


In [35]:
gen_df.dropna(inplace=True)

In [36]:
gen_df.set_index(['type', 'year', 'month', 'geography'], inplace=True)

## Total fuel consumption

In [37]:
total_fuel_dict = [json.loads(row) for row in total_fuel_rows]

In [38]:
total_fuel_df = pd.concat([line_to_df(x) for x in total_fuel_dict
                           if x['geography'] in state_geos])

Multiply generation values by 1,000,000 and change the units to MMBtu

In [39]:
total_fuel_df.loc[:,'value'] *= 1E6
total_fuel_df.loc[:,'units'] = 'mmbtu'

In [40]:
total_fuel_df.rename_axis({'value':'total fuel (mmbtu)'}, axis=1, inplace=True)

  """Entry point for launching an IPython kernel.


In [41]:
total_fuel_df.set_index(['type', 'year', 'month', 'geography'], inplace=True)

Drop nans

In [42]:
total_fuel_df.loc[total_fuel_df.isnull().any(axis=1)]

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,end,f,last_updated,sector,series_id,start,units,total fuel (mmbtu)
type,year,month,geography,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
COW,2018,11,USA-NH,201811,M,2019-01-25T18:54:34-05:00,99,ELEC.CONS_TOT_BTU.COW-NH-99.M,200101,mmbtu,
NG,2018,11,USA-NE,201811,M,2019-01-25T18:54:34-05:00,99,ELEC.CONS_TOT_BTU.NG-NE-99.M,200101,mmbtu,
NG,2018,9,USA-NE,201811,M,2019-01-25T18:54:34-05:00,99,ELEC.CONS_TOT_BTU.NG-NE-99.M,200101,mmbtu,
NG,2018,4,USA-NE,201811,M,2019-01-25T18:54:34-05:00,99,ELEC.CONS_TOT_BTU.NG-NE-99.M,200101,mmbtu,
NG,2018,9,USA-MT,201811,M,2019-01-25T18:54:34-05:00,99,ELEC.CONS_TOT_BTU.NG-MT-99.M,200101,mmbtu,
NG,2018,4,USA-MT,201811,M,2019-01-25T18:54:34-05:00,99,ELEC.CONS_TOT_BTU.NG-MT-99.M,200101,mmbtu,
NG,2010,11,USA-MT,201811,M,2019-01-25T18:54:34-05:00,99,ELEC.CONS_TOT_BTU.NG-MT-99.M,200101,mmbtu,
NG,2010,7,USA-MT,201811,M,2019-01-25T18:54:34-05:00,99,ELEC.CONS_TOT_BTU.NG-MT-99.M,200101,mmbtu,
NG,2010,5,USA-MT,201811,M,2019-01-25T18:54:34-05:00,99,ELEC.CONS_TOT_BTU.NG-MT-99.M,200101,mmbtu,
NG,2010,3,USA-MT,201811,M,2019-01-25T18:54:34-05:00,99,ELEC.CONS_TOT_BTU.NG-MT-99.M,200101,mmbtu,


In [43]:
total_fuel_df = total_fuel_df.dropna()

## Electric generation fuel consumption

In [44]:
eg_fuel_dict = [json.loads(row) for row in eg_fuel_rows]

In [45]:
eg_fuel_df = pd.concat([line_to_df(x) for x in eg_fuel_dict
                        if x['geography'] in state_geos])

Multiply generation values by 1,000,000 and change the units to MMBtu

In [46]:
eg_fuel_df.loc[:,'value'] *= 1E6
eg_fuel_df.loc[:,'units'] = 'mmbtu'

In [47]:
eg_fuel_df.rename_axis({'value':'elec fuel (mmbtu)'}, axis=1, inplace=True)

  """Entry point for launching an IPython kernel.


In [48]:
eg_fuel_df.set_index(['type', 'year', 'month', 'geography'], inplace=True)

In [49]:
#drop
eg_fuel_df.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,end,f,last_updated,sector,series_id,start,units,elec fuel (mmbtu)
type,year,month,geography,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
PEL,2018,11,USA-CO,201811,M,2019-01-25T18:54:34-05:00,99,ELEC.CONS_EG_BTU.PEL-CO-99.M,200101,mmbtu,28090.0
PEL,2018,10,USA-CO,201811,M,2019-01-25T18:54:34-05:00,99,ELEC.CONS_EG_BTU.PEL-CO-99.M,200101,mmbtu,7940.0
PEL,2018,9,USA-CO,201811,M,2019-01-25T18:54:34-05:00,99,ELEC.CONS_EG_BTU.PEL-CO-99.M,200101,mmbtu,
PEL,2018,8,USA-CO,201811,M,2019-01-25T18:54:34-05:00,99,ELEC.CONS_EG_BTU.PEL-CO-99.M,200101,mmbtu,6770.0
PEL,2018,7,USA-CO,201811,M,2019-01-25T18:54:34-05:00,99,ELEC.CONS_EG_BTU.PEL-CO-99.M,200101,mmbtu,12040.0


I verified on [EIA's website](https://www.eia.gov/opendata/qb.php?category=400&sdid=ELEC.CONS_EG_BTU.PEL-MN-99.M) that the values below are correct.

In [50]:
#drop
eg_fuel_df.loc[~(eg_fuel_df['elec fuel (mmbtu)'] >= 0) &
                  ~(eg_fuel_df['elec fuel (mmbtu)'].isnull())]

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,end,f,last_updated,sector,series_id,start,units,elec fuel (mmbtu)
type,year,month,geography,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
PEL,2002,12,USA-MN,201811,M,2019-01-25T18:54:34-05:00,99,ELEC.CONS_EG_BTU.PEL-MN-99.M,200101,mmbtu,-43000.0
PEL,2002,11,USA-MN,201811,M,2019-01-25T18:54:34-05:00,99,ELEC.CONS_EG_BTU.PEL-MN-99.M,200101,mmbtu,-32000.0
PEL,2002,10,USA-MN,201811,M,2019-01-25T18:54:34-05:00,99,ELEC.CONS_EG_BTU.PEL-MN-99.M,200101,mmbtu,-15000.0
PEL,2002,8,USA-MN,201811,M,2019-01-25T18:54:34-05:00,99,ELEC.CONS_EG_BTU.PEL-MN-99.M,200101,mmbtu,-16000.0
PEL,2002,7,USA-MN,201811,M,2019-01-25T18:54:34-05:00,99,ELEC.CONS_EG_BTU.PEL-MN-99.M,200101,mmbtu,-1000.0
PEL,2002,4,USA-MN,201811,M,2019-01-25T18:54:34-05:00,99,ELEC.CONS_EG_BTU.PEL-MN-99.M,200101,mmbtu,-6000.0
PEL,2002,3,USA-MN,201811,M,2019-01-25T18:54:34-05:00,99,ELEC.CONS_EG_BTU.PEL-MN-99.M,200101,mmbtu,-10000.0
PEL,2002,2,USA-MN,201811,M,2019-01-25T18:54:34-05:00,99,ELEC.CONS_EG_BTU.PEL-MN-99.M,200101,mmbtu,-30000.0
PEL,2002,1,USA-MN,201811,M,2019-01-25T18:54:34-05:00,99,ELEC.CONS_EG_BTU.PEL-MN-99.M,200101,mmbtu,-34000.0


In [51]:
eg_fuel_df.dropna(inplace=True)

## Combine three datasets
Need to estimate fuel use for OOG, because EIA doesn't include any (this is only ~2% of OOG fuel for electricity in 2015).

In [52]:
fuel_df = pd.concat([total_fuel_df, eg_fuel_df['elec fuel (mmbtu)']], axis=1)

Not sure how this happens in EIA's data, but we do see the negative fuel consumption for electricity generation.

In [53]:
#drop
fuel_df.loc[~(fuel_df['elec fuel (mmbtu)']>=0)]

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,end,f,last_updated,sector,series_id,start,units,total fuel (mmbtu),elec fuel (mmbtu)
type,year,month,geography,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
PEL,2002,1,USA-MN,201811,M,2019-01-25T18:54:34-05:00,99,ELEC.CONS_TOT_BTU.PEL-MN-99.M,200101,mmbtu,51000.0,-34000.0
PEL,2002,2,USA-MN,201811,M,2019-01-25T18:54:34-05:00,99,ELEC.CONS_TOT_BTU.PEL-MN-99.M,200101,mmbtu,62000.0,-30000.0
PEL,2002,3,USA-MN,201811,M,2019-01-25T18:54:34-05:00,99,ELEC.CONS_TOT_BTU.PEL-MN-99.M,200101,mmbtu,99000.0,-10000.0
PEL,2002,4,USA-MN,201811,M,2019-01-25T18:54:34-05:00,99,ELEC.CONS_TOT_BTU.PEL-MN-99.M,200101,mmbtu,84000.0,-6000.0
PEL,2002,7,USA-MN,201811,M,2019-01-25T18:54:34-05:00,99,ELEC.CONS_TOT_BTU.PEL-MN-99.M,200101,mmbtu,93000.0,-1000.0
PEL,2002,8,USA-MN,201811,M,2019-01-25T18:54:34-05:00,99,ELEC.CONS_TOT_BTU.PEL-MN-99.M,200101,mmbtu,66000.0,-16000.0
PEL,2002,10,USA-MN,201811,M,2019-01-25T18:54:34-05:00,99,ELEC.CONS_TOT_BTU.PEL-MN-99.M,200101,mmbtu,64000.0,-15000.0
PEL,2002,11,USA-MN,201811,M,2019-01-25T18:54:34-05:00,99,ELEC.CONS_TOT_BTU.PEL-MN-99.M,200101,mmbtu,49000.0,-32000.0
PEL,2002,12,USA-MN,201811,M,2019-01-25T18:54:34-05:00,99,ELEC.CONS_TOT_BTU.PEL-MN-99.M,200101,mmbtu,50000.0,-43000.0


In [54]:
#drop
fuel_df.loc[~(fuel_df['total fuel (mmbtu)']>=0)]

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,end,f,last_updated,sector,series_id,start,units,total fuel (mmbtu),elec fuel (mmbtu)
type,year,month,geography,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1


### Add CO<sub>2</sub> emissions

The difficulty here is that EIA combines all types of coal fuel consumption together in the bulk download and API. Fortunately the emission factors for different coal types aren't too far off on an energy basis (BIT is 93.3 kg/mmbtu, SUB is 97.2 kg/mmbtu). I'm going to average the BIT and SUB factors rather than trying to do something more complicated. In 2015 BIT represented 45% of coal energy for electricity and SUB represented 48%.

Same issue with petroleum liquids. Using the average of DFO and RFO, which were the two largest share of petroleum liquids.

In [55]:
path = join(data_path, 'Final emission factors.csv')
ef = pd.read_csv(path, index_col=0)

### Match general types with specific fuel codes

Fuel codes:
- WWW, wood and wood derived fuels
- WND, wind
- STH, solar thermal
- WAS, other biomass
- TSN, all solar
- SUN, utility-scale solar
- NUC, nuclear
- NG, natural gas
- PEL, petroleum liquids
- SPV, utility-scale solar photovoltaic
- PC, petroluem coke
- OTH, other
- COW, coal,
- DPV, distributed photovoltaic
- OOG, other gases
- HPS, hydro pumped storage
- HYC, conventional hydroelectric
- GEO, geothermal
- AOR, other renewables (total)

In [56]:
fuel_factors = pd.Series({'NG' : ef.loc['NG', 'Fossil Factor'],
                   'PEL': ef.loc[['DFO', 'RFO'], 'Fossil Factor'].mean(),
                   'PC' : ef.loc['PC', 'Fossil Factor'], 
                   'COW' : ef.loc[['BIT', 'SUB'], 'Fossil Factor'].mean(),
                   'OOG' : ef.loc['OG', 'Fossil Factor']}, name='type')

In [57]:
#drop
fuel_factors

COW     95.250
NG      53.070
OOG     59.000
PC     102.100
PEL     75.975
Name: type, dtype: float64

In [58]:
fuel_df['all fuel CO2 (kg)'] = (fuel_df['total fuel (mmbtu)']
                                .multiply(fuel_factors, level='type',
                                          fill_value=0))
fuel_df['elec fuel CO2 (kg)'] = (fuel_df['elec fuel (mmbtu)']
                                .multiply(fuel_factors, level='type',
                                          fill_value=0))

In [59]:
fuel_df.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,end,f,last_updated,sector,series_id,start,units,total fuel (mmbtu),elec fuel (mmbtu),all fuel CO2 (kg),elec fuel CO2 (kg)
type,year,month,geography,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
COW,2001,1,USA-AK,201811,M,2019-01-25T18:54:34-05:00,99,ELEC.CONS_TOT_BTU.COW-AK-99.M,200101,mmbtu,1120000.0,872000.0,106680000.0,83058000.0
COW,2001,1,USA-AL,201811,M,2019-01-25T18:54:34-05:00,99,ELEC.CONS_TOT_BTU.COW-AL-99.M,200101,mmbtu,67999000.0,66582000.0,6476905000.0,6341935000.0
COW,2001,1,USA-AR,201811,M,2019-01-25T18:54:34-05:00,99,ELEC.CONS_TOT_BTU.COW-AR-99.M,200101,mmbtu,23099000.0,22700000.0,2200180000.0,2162175000.0
COW,2001,1,USA-AZ,201811,M,2019-01-25T18:54:34-05:00,99,ELEC.CONS_TOT_BTU.COW-AZ-99.M,200101,mmbtu,35873000.0,35483000.0,3416903000.0,3379756000.0
COW,2001,1,USA-CA,201811,M,2019-01-25T18:54:34-05:00,99,ELEC.CONS_TOT_BTU.COW-CA-99.M,200101,mmbtu,3652000.0,2008000.0,347853000.0,191262000.0


In [60]:
fuel_cols = ['total fuel (mmbtu)', 'elec fuel (mmbtu)',
              'all fuel CO2 (kg)', 'elec fuel CO2 (kg)']
gen_fuel_df = pd.concat([gen_df, fuel_df[fuel_cols]], axis=1)

Add datetime and quarter columns

In [61]:
add_quarter(gen_fuel_df)

In [62]:
gen_fuel_df.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,end,f,last_updated,sector,series_id,start,units,generation (MWh),total fuel (mmbtu),elec fuel (mmbtu),all fuel CO2 (kg),elec fuel CO2 (kg),datetime,quarter
type,year,month,geography,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1
AOR,2001,1,USA-AK,201811,M,2019-01-25T18:54:34-05:00,99,ELEC.GEN.AOR-AK-99.M,200101,megawatthours,87.0,,,,,2001-01-01,1
AOR,2001,1,USA-AL,201811,M,2019-01-25T18:54:34-05:00,99,ELEC.GEN.AOR-AL-99.M,200101,megawatthours,401167.59,,,,,2001-01-01,1
AOR,2001,1,USA-AR,201811,M,2019-01-25T18:54:34-05:00,99,ELEC.GEN.AOR-AR-99.M,200101,megawatthours,136530.37,,,,,2001-01-01,1
AOR,2001,1,USA-AZ,201811,M,2019-01-25T18:54:34-05:00,99,ELEC.GEN.AOR-AZ-99.M,200101,megawatthours,453.0,,,,,2001-01-01,1
AOR,2001,1,USA-CA,201811,M,2019-01-25T18:54:34-05:00,99,ELEC.GEN.AOR-CA-99.M,200101,megawatthours,1717398.41,,,,,2001-01-01,1


No records with positive fuel use but no generation

In [63]:
gen_fuel_df['generation (MWh)'].fillna(value=0, inplace=True)

In [64]:
gen_fuel_df.loc['COW',:].head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,end,f,last_updated,sector,series_id,start,units,generation (MWh),total fuel (mmbtu),elec fuel (mmbtu),all fuel CO2 (kg),elec fuel CO2 (kg),datetime,quarter
year,month,geography,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1
2001,1,USA-AK,201811,M,2019-01-25T18:54:34-05:00,99,ELEC.GEN.COW-AK-99.M,200101,megawatthours,46903.0,1120000.0,872000.0,106680000.0,83058000.0,2001-01-01,1
2001,1,USA-AL,201811,M,2019-01-25T18:54:34-05:00,99,ELEC.GEN.COW-AL-99.M,200101,megawatthours,6557913.0,67999000.0,66582000.0,6476905000.0,6341935000.0,2001-01-01,1
2001,1,USA-AR,201811,M,2019-01-25T18:54:34-05:00,99,ELEC.GEN.COW-AR-99.M,200101,megawatthours,2149808.0,23099000.0,22700000.0,2200180000.0,2162175000.0,2001-01-01,1
2001,1,USA-AZ,201811,M,2019-01-25T18:54:34-05:00,99,ELEC.GEN.COW-AZ-99.M,200101,megawatthours,3418454.0,35873000.0,35483000.0,3416903000.0,3379756000.0,2001-01-01,1
2001,1,USA-CA,201811,M,2019-01-25T18:54:34-05:00,99,ELEC.GEN.COW-CA-99.M,200101,megawatthours,199857.0,3652000.0,2008000.0,347853000.0,191262000.0,2001-01-01,1


In [None]:
gen_fuel_df.fillna()

In [77]:
gen_fuel_df.sample(10)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,end,f,last_updated,sector,series_id,start,units,generation (MWh),total fuel (mmbtu),elec fuel (mmbtu),all fuel CO2 (kg),elec fuel CO2 (kg),datetime,quarter
type,year,month,geography,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1
WAS,2017,4,USA-OK,201811,M,2019-01-25T18:54:34-05:00,99,ELEC.GEN.WAS-OK-99.M,200101,megawatthours,4824.1,,,,,2017-04-01,2
AOR,2001,11,USA-TN,201811,M,2019-01-25T18:54:34-05:00,99,ELEC.GEN.AOR-TN-99.M,200101,megawatthours,71638.0,,,,,2001-11-01,4
OTH,2004,12,USA-IA,201811,M,2019-01-25T18:54:34-05:00,99,ELEC.GEN.OTH-IA-99.M,200101,megawatthours,997.2,,,,,2004-12-01,4
HYC,2001,5,USA-ME,201811,M,2019-01-25T18:54:34-05:00,99,ELEC.GEN.HYC-ME-99.M,200101,megawatthours,288948.0,,,,,2001-05-01,2
WWW,2016,3,USA-NC,201811,M,2019-01-25T18:54:34-05:00,99,ELEC.GEN.WWW-NC-99.M,200101,megawatthours,145610.75,,,,,2016-03-01,1
SUN,2011,12,USA-TX,201811,M,2019-01-25T18:54:34-05:00,99,ELEC.GEN.SUN-TX-99.M,200101,megawatthours,1035.09,,,,,2011-12-01,4
WAS,2002,4,USA-AZ,201811,M,2019-01-25T18:54:34-05:00,99,ELEC.GEN.WAS-AZ-99.M,200101,megawatthours,3672.37,,,,,2002-04-01,2
OTH,2007,5,USA-OH,201811,M,2019-01-25T18:54:34-05:00,99,ELEC.GEN.OTH-OH-99.M,200201,megawatthours,80.55,,,,,2007-05-01,2
WAS,2012,1,USA-CT,201811,M,2019-01-25T18:54:34-05:00,99,ELEC.GEN.WAS-CT-99.M,200101,megawatthours,53386.67,,,,,2012-01-01,1
NUC,2006,8,USA-IA,201811,M,2019-01-25T18:54:34-05:00,99,ELEC.GEN.NUC-IA-99.M,200101,megawatthours,430874.0,,,,,2006-08-01,3


### Export data

State-level

In [65]:
path = join(data_path, 'EIA state-level gen fuel CO2 {}.csv'.format(file_date))
gen_fuel_df.to_csv(path)

National totals

In [66]:
nat_gen_fuel = gen_fuel_df.groupby(['type', 'year', 'month']).sum()
add_quarter(nat_gen_fuel)

In [67]:
nat_gen_fuel.tail()

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,generation (MWh),total fuel (mmbtu),elec fuel (mmbtu),all fuel CO2 (kg),elec fuel CO2 (kg),quarter,datetime
type,year,month,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
WWW,2018,7,3708842.07,0.0,0.0,0.0,0.0,3,2018-07-01
WWW,2018,8,3542657.09,0.0,0.0,0.0,0.0,3,2018-08-01
WWW,2018,9,3264233.97,0.0,0.0,0.0,0.0,3,2018-09-01
WWW,2018,10,3250832.52,0.0,0.0,0.0,0.0,4,2018-10-01
WWW,2018,11,3254939.17,0.0,0.0,0.0,0.0,4,2018-11-01


In [68]:
path = join(data_path, 'Derived data',
            'EIA state-level gen fuel CO2 {}.csv'.format(file_date))
gen_fuel_df.to_csv(path)

In [69]:
path = join(data_path, 'Derived data',
            'EIA country-wide gen fuel CO2 {}.csv'.format(file_date))
nat_gen_fuel.to_csv(path)