## Oil and Gas production data

As mentioned earlier, the oil and gas production data has been obtained for three different states - Texas, North Dakota and Oklahoma. These states are the top oil and gas producers in the country. Texas produces even more oil than all the offshore US fields put together.

Here, we read data from each of the states' regulatory bodies. The data are stored in the OilGasProduction folder.

In this notebook, the Texas data is processed.

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from os import listdir

### Texas - Railroad Commission of Texas (RRC) data
This data was provided as a big data dump by Enigma's help desk. This is all stored in the RRCRawData folder. The data are all stored in '.dsv' files, which can be read as a regular csv file, but with '}' separators.

A number of files from the RRC are available, with data aggregated in different ways (only the relevant files will be put on github).

In [2]:
listdir('./OilGasProduction/RRCRawData')

['.ipynb_checkpoints',
 'ERSCountyTypology2015Edition.xls',
 'GP_COUNTY_DATA_TABLE.dsv',
 'GP_DATE_RANGE_CYCLE_DATA_TABLE.dsv',
 'GP_DISTRICT_DATA_TABLE.dsv',
 'Key_Economic_Indicators.csv',
 'OG_COUNTY_CYCLE_DATA_TABLE.dsv',
 'OG_COUNTY_LEASE_CYCLE_DATA_TABLE.dsv',
 'OG_DISTRICT_CYCLE_DATA_TABLE.dsv',
 'OG_FIELD_CYCLE_DATA_TABLE.dsv',
 'OG_FIELD_DW_DATA_TABLE.dsv',
 'OG_LEASE_CYCLE_DATA_TABLE.dsv',
 'OG_LEASE_CYCLE_DISP_DATA_TABLE.dsv',
 'OG_OPERATOR_CYCLE_DATA_TABLE.dsv',
 'OG_OPERATOR_DW_DATA_TABLE.dsv',
 'OG_REGULATORY_LEASE_DW_DATA_TABLE.dsv',
 'OG_SUMMARY_MASTER_LARGE_DATA_TABLE.dsv',
 'OG_SUMMARY_ONSHORE_LEASE_DATA_TABLE.dsv',
 'OG_WELL_COMPLETION_DATA_TABLE.dsv',
 'OG_WELL_CYCLE_DATA_TABLE.dsv',
 'operators.csv',
 'pdq-dump-user-manual_final_ada_1-3-2018.pdf',
 'pdqoperatorsproductiondisposition.doc',
 'Untitled.ipynb']

In [3]:
# The county data file contains the details of each county. The FIPS code in particular
# will help in combining the production data with the unemployment data table.
county_data = pd.read_csv('./OilGasProduction/RRCRawData/GP_COUNTY_DATA_TABLE.dsv',sep='}')

# The county_cycle data table contains historical production aggregated by county
county_cycle = pd.read_csv('./OilGasProduction/RRCRawData/OG_COUNTY_CYCLE_DATA_TABLE.dsv',sep='}')


In [4]:
county_data.head()

Unnamed: 0,COUNTY_NO,COUNTY_FIPS_CODE,COUNTY_NAME,DISTRICT_NO,DISTRICT_NAME,ON_SHORE_FLAG,ONSHORE_ASSC_CNTY_FLAG
0,363,363,PALO PINTO,8,7B,Y,N
1,367,367,PARKER,8,7B,Y,N
2,411,411,SAN SABA,8,7B,Y,N
3,417,417,SHACKELFORD,8,7B,Y,N
4,425,425,SOMERVELL,8,7B,Y,N


The county data is modified to add the state FIPS code to the county code, making the code unique for each county

In [5]:
county_data.COUNTY_FIPS_CODE = list(map(lambda x: int(x), '48' + county_data.COUNTY_FIPS_CODE.astype(str))) 

In [6]:
county_data.head()

Unnamed: 0,COUNTY_NO,COUNTY_FIPS_CODE,COUNTY_NAME,DISTRICT_NO,DISTRICT_NAME,ON_SHORE_FLAG,ONSHORE_ASSC_CNTY_FLAG
0,363,48363,PALO PINTO,8,7B,Y,N
1,367,48367,PARKER,8,7B,Y,N
2,411,48411,SAN SABA,8,7B,Y,N
3,417,48417,SHACKELFORD,8,7B,Y,N
4,425,48425,SOMERVELL,8,7B,Y,N


In [7]:
area_codes = pd.read_csv('./UnemploymentData/BLS_AreaCodes.txt',sep='\t',index_col=False)
county_codes = area_codes[area_codes['area_type_code'] == 'F']
county_codes = county_codes.reset_index(drop=True)

In [8]:
county_codes['FIPS code'] = list(map(lambda x: x[2:7],county_codes.area_code))
county_codes.head()

Unnamed: 0,area_type_code,area_code,area_text,display_level,selectable,sort_sequence,FIPS code
0,F,CN0100100000000,"Autauga County, AL",0,T,31,1001
1,F,CN0100300000000,"Baldwin County, AL",0,T,32,1003
2,F,CN0100500000000,"Barbour County, AL",0,T,33,1005
3,F,CN0100700000000,"Bibb County, AL",0,T,34,1007
4,F,CN0100900000000,"Blount County, AL",0,T,35,1009


In [9]:
county_FIPS_names = dict(zip(county_codes['FIPS code'],county_codes['area_text']))

In [10]:
county_cycle.head()

Unnamed: 0,COUNTY_NO,DISTRICT_NO,CYCLE_YEAR,CYCLE_MONTH,CYCLE_YEAR_MONTH,CNTY_OIL_PROD_VOL,CNTY_OIL_ALLOW,CNTY_OIL_ENDING_BAL,CNTY_GAS_PROD_VOL,CNTY_GAS_ALLOW,...,CNTY_CSGD_PROD_VOL,CNTY_CSGD_LIMIT,CNTY_CSGD_GAS_LIFT,CNTY_OIL_TOT_DISP,CNTY_GAS_TOT_DISP,CNTY_COND_TOT_DISP,CNTY_CSGD_TOT_DISP,COUNTY_NAME,DISTRICT_NAME,OIL_GAS_CODE
0,1,5,1993,1,199301,7355,,,0,,...,6347,,,,,,,ANDERSON,5,O
1,1,5,1993,2,199302,6312,,,0,,...,4919,,,,,,,ANDERSON,5,O
2,1,5,1993,3,199303,6222,,,0,,...,4973,,,,,,,ANDERSON,5,O
3,1,5,1993,4,199304,6139,,,0,,...,4410,,,,,,,ANDERSON,5,O
4,1,5,1993,5,199305,5785,,,0,,...,5961,,,,,,,ANDERSON,5,O


In [11]:
county_cycle['COUNTY_FIPS_CODE'] = ["48%03d" % x for x in county_cycle.COUNTY_NO]
county_cycle.head()

Unnamed: 0,COUNTY_NO,DISTRICT_NO,CYCLE_YEAR,CYCLE_MONTH,CYCLE_YEAR_MONTH,CNTY_OIL_PROD_VOL,CNTY_OIL_ALLOW,CNTY_OIL_ENDING_BAL,CNTY_GAS_PROD_VOL,CNTY_GAS_ALLOW,...,CNTY_CSGD_LIMIT,CNTY_CSGD_GAS_LIFT,CNTY_OIL_TOT_DISP,CNTY_GAS_TOT_DISP,CNTY_COND_TOT_DISP,CNTY_CSGD_TOT_DISP,COUNTY_NAME,DISTRICT_NAME,OIL_GAS_CODE,COUNTY_FIPS_CODE
0,1,5,1993,1,199301,7355,,,0,,...,,,,,,,ANDERSON,5,O,48001
1,1,5,1993,2,199302,6312,,,0,,...,,,,,,,ANDERSON,5,O,48001
2,1,5,1993,3,199303,6222,,,0,,...,,,,,,,ANDERSON,5,O,48001
3,1,5,1993,4,199304,6139,,,0,,...,,,,,,,ANDERSON,5,O,48001
4,1,5,1993,5,199305,5785,,,0,,...,,,,,,,ANDERSON,5,O,48001


In [12]:
# map the county FIPS codes to the county names using the dictionary county_FIPS_names
county_cycle['Mapped_Name'] = county_cycle['COUNTY_FIPS_CODE'].map(county_FIPS_names)
county_cycle['TIME'] = pd.to_datetime(county_cycle.CYCLE_YEAR_MONTH,format='%Y%m')

county_cycle.head()

Unnamed: 0,COUNTY_NO,DISTRICT_NO,CYCLE_YEAR,CYCLE_MONTH,CYCLE_YEAR_MONTH,CNTY_OIL_PROD_VOL,CNTY_OIL_ALLOW,CNTY_OIL_ENDING_BAL,CNTY_GAS_PROD_VOL,CNTY_GAS_ALLOW,...,CNTY_OIL_TOT_DISP,CNTY_GAS_TOT_DISP,CNTY_COND_TOT_DISP,CNTY_CSGD_TOT_DISP,COUNTY_NAME,DISTRICT_NAME,OIL_GAS_CODE,COUNTY_FIPS_CODE,Mapped_Name,TIME
0,1,5,1993,1,199301,7355,,,0,,...,,,,,ANDERSON,5,O,48001,"Anderson County, TX",1993-01-01
1,1,5,1993,2,199302,6312,,,0,,...,,,,,ANDERSON,5,O,48001,"Anderson County, TX",1993-02-01
2,1,5,1993,3,199303,6222,,,0,,...,,,,,ANDERSON,5,O,48001,"Anderson County, TX",1993-03-01
3,1,5,1993,4,199304,6139,,,0,,...,,,,,ANDERSON,5,O,48001,"Anderson County, TX",1993-04-01
4,1,5,1993,5,199305,5785,,,0,,...,,,,,ANDERSON,5,O,48001,"Anderson County, TX",1993-05-01


In [13]:
county_cycle.COUNTY_NO.nunique() # checking to see the total number of counties is correct: Texas has 253 counties

253

In [14]:
# Isolating just the oil production data
county_cycle_oil = county_cycle[['TIME','Mapped_Name','CNTY_OIL_PROD_VOL']]
county_cycle_oil.tail()

Unnamed: 0,TIME,Mapped_Name,CNTY_OIL_PROD_VOL
179013,2017-09-01,"Zavala County, TX",0
179014,2017-10-01,"Zavala County, TX",0
179015,2017-10-01,"Zavala County, TX",0
179016,2017-11-01,"Zavala County, TX",0
179017,2017-11-01,"Zavala County, TX",0


As seen in the above output, some counties have multiple oil production data for the same month. We take the maximum reported oil production in that case.

In [15]:
county_cycle_oil = county_cycle_oil.rename(columns={'TIME':'Date','Mapped_Name':'County_Name','CNTY_OIL_PROD_VOL':'Oil_Production'})

In [16]:
county_cycle_oil.Date = pd.to_datetime(county_cycle_oil.Date)

In [17]:
county_cycle_oil.Date = county_cycle_oil.Date.dt.strftime('%m/%Y')

In [18]:
county_cycle_oil = county_cycle_oil.groupby(['County_Name','Date']).agg({'Oil_Production':'max'})
county_cycle_oil.tail()

Unnamed: 0_level_0,Unnamed: 1_level_0,Oil_Production
County_Name,Date,Unnamed: 2_level_1
"Zavala County, TX",12/2012,390030
"Zavala County, TX",12/2013,544949
"Zavala County, TX",12/2014,817685
"Zavala County, TX",12/2015,871432
"Zavala County, TX",12/2016,612218


In [19]:
county_cycle_oil.reset_index(inplace=True)

In [20]:
county_cycle_oil = county_cycle_oil.pivot(index='Date',columns='County_Name',values='Oil_Production')

In [21]:
county_cycle_oil.head()

County_Name,"Anderson County, TX","Andrews County, TX","Angelina County, TX","Aransas County, TX","Archer County, TX","Armstrong County, TX","Atascosa County, TX","Austin County, TX","Bandera County, TX","Bastrop County, TX",...,"Willacy County, TX","Williamson County, TX","Wilson County, TX","Winkler County, TX","Wise County, TX","Wood County, TX","Yoakum County, TX","Young County, TX","Zapata County, TX","Zavala County, TX"
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
01/1993,121640.0,2866720.0,132.0,30998.0,229457.0,,74280.0,48319.0,,19685.0,...,72844.0,825.0,133441.0,414290.0,82683.0,848809.0,2728824.0,212978.0,7072.0,256497.0
01/1994,110684.0,2846724.0,148.0,33803.0,214768.0,,66752.0,45153.0,,18881.0,...,68147.0,1423.0,90654.0,391482.0,86178.0,804362.0,2643341.0,188740.0,5587.0,146641.0
01/1995,111589.0,2688326.0,133.0,28209.0,187377.0,,60447.0,31324.0,0.0,18120.0,...,57598.0,917.0,76739.0,398144.0,78929.0,707875.0,2576306.0,186605.0,5426.0,92445.0
01/1996,104492.0,2672855.0,193.0,18207.0,177123.0,,60092.0,41908.0,295.0,15495.0,...,54366.0,1239.0,62634.0,394219.0,64957.0,631908.0,2546692.0,169412.0,6703.0,68672.0
01/1997,97139.0,2544895.0,23488.0,17315.0,163035.0,,58672.0,28653.0,191.0,13201.0,...,54684.0,1215.0,43190.0,446820.0,55618.0,561135.0,2556121.0,162501.0,5147.0,48230.0


In [22]:
county_cycle_oil = county_cycle_oil.fillna(0)

In [35]:
trial = county_cycle_oil.reset_index()

In [40]:
trial['Date'] = pd.to_datetime(trial['Date'])

In [43]:
trial.drop(trial[trial['Date'] > pd.to_datetime('12/2016')].index)

County_Name,Date,"Anderson County, TX","Andrews County, TX","Angelina County, TX","Aransas County, TX","Archer County, TX","Armstrong County, TX","Atascosa County, TX","Austin County, TX","Bandera County, TX",...,"Willacy County, TX","Williamson County, TX","Wilson County, TX","Winkler County, TX","Wise County, TX","Wood County, TX","Yoakum County, TX","Young County, TX","Zapata County, TX","Zavala County, TX"
0,1993-01-01,121640.0,2866720.0,132.0,30998.0,229457.0,0.0,74280.0,48319.0,0.0,...,72844.0,825.0,133441.0,414290.0,82683.0,848809.0,2728824.0,212978.0,7072.0,256497.0
1,1994-01-01,110684.0,2846724.0,148.0,33803.0,214768.0,0.0,66752.0,45153.0,0.0,...,68147.0,1423.0,90654.0,391482.0,86178.0,804362.0,2643341.0,188740.0,5587.0,146641.0
2,1995-01-01,111589.0,2688326.0,133.0,28209.0,187377.0,0.0,60447.0,31324.0,0.0,...,57598.0,917.0,76739.0,398144.0,78929.0,707875.0,2576306.0,186605.0,5426.0,92445.0
3,1996-01-01,104492.0,2672855.0,193.0,18207.0,177123.0,0.0,60092.0,41908.0,295.0,...,54366.0,1239.0,62634.0,394219.0,64957.0,631908.0,2546692.0,169412.0,6703.0,68672.0
4,1997-01-01,97139.0,2544895.0,23488.0,17315.0,163035.0,0.0,58672.0,28653.0,191.0,...,54684.0,1215.0,43190.0,446820.0,55618.0,561135.0,2556121.0,162501.0,5147.0,48230.0
5,1998-01-01,93245.0,2608477.0,8799.0,17131.0,147178.0,0.0,64108.0,24172.0,69.0,...,59170.0,918.0,57324.0,436871.0,52870.0,533404.0,2532826.0,168784.0,4063.0,33497.0
6,1999-01-01,84549.0,2388341.0,1764.0,11536.0,113330.0,0.0,55168.0,22535.0,0.0,...,53878.0,925.0,50444.0,399637.0,44716.0,486937.0,2413909.0,144890.0,2630.0,29569.0
7,2000-01-01,84004.0,2263432.0,1122.0,9862.0,127074.0,0.0,63252.0,22643.0,37.0,...,44777.0,685.0,38643.0,382455.0,42987.0,600177.0,2222715.0,149397.0,3777.0,46040.0
8,2001-01-01,72330.0,2247935.0,596.0,8488.0,115550.0,0.0,59051.0,19719.0,388.0,...,41346.0,494.0,32991.0,355853.0,36173.0,598115.0,2189659.0,126458.0,4497.0,35257.0
9,2002-01-01,70899.0,2216383.0,1078.0,6900.0,116560.0,0.0,54293.0,49283.0,111.0,...,42933.0,805.0,29084.0,348468.0,37263.0,526822.0,2109512.0,134019.0,3767.0,28279.0


In [24]:
county_cycle_oil.to_csv('./OilGasProduction/Texas/TexasOilProdCounty.csv')