## North Dakota Production Data

In this notebook, I wrangle the North Dakota oil production data. North Dakota has been a prolific producer of oil in the past decade, exceeded only by Texas. It was also the original subject for the Enigma Labs project, which inspired this project.

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

The data is available from Enigma as a huge file aggregating data by each oil well, but is not aggregated by county. This step is done here.

In [2]:
nd_oil_prod = pd.read_csv('EnigmaProductionData_Curated.csv')

In [3]:
nd_oil_prod.head()

Unnamed: 0,api_number,month,oil,gas,water,serialid
0,33025010130000,10-2015,1320,1858,253,2459818
1,33025010140000,10-2015,1784,1743,826,2459819
2,33025010160000,10-2015,3178,2542,1090,2459820
3,33025010170000,10-2015,4288,3431,1754,2459821
4,33025010180000,10-2015,1591,1750,213,2459822


In [4]:
nd_oil_prod.describe()

Unnamed: 0,api_number,oil,gas,water,serialid
count,2741562.0,2741562.0,2741562.0,2741562.0,2741562.0
mean,33044850000000.0,1415.911,1770.973,2111.38,1370782.0
std,33972570000.0,2740.403,6809.53,4741.633,791420.9
min,33001000000000.0,-8.0,0.0,-64.0,1.0
25%,33011010000000.0,145.0,0.0,54.0,685391.2
50%,33053000000000.0,512.0,176.0,641.0,1370782.0
75%,33061020000000.0,1553.0,1375.0,2240.0,2056172.0
max,33105040000000.0,445723.0,812411.0,246834.0,2741562.0


For North Dakota, I did not have access to county information for each well. However, the well API numbers contain the county information. The 3rd through 5th characters in the API contain the county number.

In [5]:
nd_oil_prod['County_Code'] = [x[2:5] for x in nd_oil_prod.api_number.astype(str)]

In [6]:
nd_oil_prod.head()

Unnamed: 0,api_number,month,oil,gas,water,serialid,County_Code
0,33025010130000,10-2015,1320,1858,253,2459818,25
1,33025010140000,10-2015,1784,1743,826,2459819,25
2,33025010160000,10-2015,3178,2542,1090,2459820,25
3,33025010170000,10-2015,4288,3431,1754,2459821,25
4,33025010180000,10-2015,1591,1750,213,2459822,25


In [7]:
nd_oil_prod.County_Code.nunique()

19

In [8]:
area_codes = pd.read_csv('../../Unemployment/BLS_AreaCodes.txt',sep='\t',index_col=False)
county_codes = area_codes[area_codes['area_type_code'] == 'F']
county_codes = county_codes.reset_index(drop=True)

In [9]:
county_codes['FIPS code'] = list(map(lambda x: x[2:7],county_codes.area_code))
county_FIPS_names = dict(zip(county_codes['FIPS code'],county_codes['area_text']))

In [10]:
nd_oil_prod['County_FIPS_Code'] = ["38" + x for x in nd_oil_prod.County_Code]

In [11]:
nd_oil_prod['County_Name'] = nd_oil_prod['County_FIPS_Code'].map(county_FIPS_names)

In [12]:
nd_oil_prod.County_Name.nunique()

19

In [13]:
nd_oil_prod.County_Name.unique()

array(['Dunn County, ND', 'Golden Valley County, ND',
       'McHenry County, ND', 'McKenzie County, ND', 'McLean County, ND',
       'Mountrail County, ND', 'Renville County, ND', 'Slope County, ND',
       'Stark County, ND', 'Ward County, ND', 'Williams County, ND',
       'Billings County, ND', 'Bottineau County, ND', 'Bowman County, ND',
       'Divide County, ND', 'Burke County, ND', 'Hettinger County, ND',
       'Mercer County, ND', 'Adams County, ND'], dtype=object)

In [14]:
nd_oil_prod.head()

Unnamed: 0,api_number,month,oil,gas,water,serialid,County_Code,County_FIPS_Code,County_Name
0,33025010130000,10-2015,1320,1858,253,2459818,25,38025,"Dunn County, ND"
1,33025010140000,10-2015,1784,1743,826,2459819,25,38025,"Dunn County, ND"
2,33025010160000,10-2015,3178,2542,1090,2459820,25,38025,"Dunn County, ND"
3,33025010170000,10-2015,4288,3431,1754,2459821,25,38025,"Dunn County, ND"
4,33025010180000,10-2015,1591,1750,213,2459822,25,38025,"Dunn County, ND"


In [16]:
nd_oil_month = nd_oil_prod.drop(['gas','water','serialid','api_number','County_Code','County_FIPS_Code'],axis=1).groupby(['County_Name','month']).agg('sum')

In [17]:
nd_oil_month = nd_oil_month.reset_index()

In [18]:
nd_oil_month.month = pd.to_datetime(nd_oil_month.month)

In [19]:
nd_oil_month.head()

Unnamed: 0,County_Name,month,oil
0,"Adams County, ND",1995-01-01,20
1,"Adams County, ND",1996-01-01,0
2,"Adams County, ND",1997-01-01,0
3,"Adams County, ND",1995-02-01,0
4,"Adams County, ND",1996-02-01,0


In [20]:
nd_oil_month.columns = ['County_Name','Date','Oil_Production']

In [21]:
nd_oil_month.Date = nd_oil_month.Date.dt.strftime('%m/%Y')

In [22]:
nd_oil_month.head()

Unnamed: 0,County_Name,Date,Oil_Production
0,"Adams County, ND",01/1995,20
1,"Adams County, ND",01/1996,0
2,"Adams County, ND",01/1997,0
3,"Adams County, ND",02/1995,0
4,"Adams County, ND",02/1996,0


In [23]:
nd_oil_month.County_Name.unique()

array(['Adams County, ND', 'Billings County, ND', 'Bottineau County, ND',
       'Bowman County, ND', 'Burke County, ND', 'Divide County, ND',
       'Dunn County, ND', 'Golden Valley County, ND',
       'Hettinger County, ND', 'McHenry County, ND',
       'McKenzie County, ND', 'McLean County, ND', 'Mercer County, ND',
       'Mountrail County, ND', 'Renville County, ND', 'Slope County, ND',
       'Stark County, ND', 'Ward County, ND', 'Williams County, ND'],
      dtype=object)

In [24]:
nd_oil_month = nd_oil_month.groupby(['County_Name','Date']).agg({'Oil_Production':'max'})

In [25]:
nd_oil_month = nd_oil_month.reset_index()

In [26]:
nd_oil_month = nd_oil_month.pivot(index='Date',columns='County_Name',values='Oil_Production')

In [27]:
nd_oil_month.head()

County_Name,"Adams County, ND","Billings County, ND","Bottineau County, ND","Bowman County, ND","Burke County, ND","Divide County, ND","Dunn County, ND","Golden Valley County, ND","Hettinger County, ND","McHenry County, ND","McKenzie County, ND","McLean County, ND","Mercer County, ND","Mountrail County, ND","Renville County, ND","Slope County, ND","Stark County, ND","Ward County, ND","Williams County, ND"
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1
01/1952,,,,,,,,,,,,,,,,,,,7166.0
01/1953,,,293.0,,,,,,,,2683.0,,,48367.0,,,,,405895.0
01/1954,,9001.0,1635.0,,1255.0,,,,,,20670.0,,,84835.0,,,,,335948.0
01/1955,,21178.0,12933.0,,7099.0,,,,,,64412.0,,,220109.0,,,,,669576.0
01/1956,,18302.0,32257.0,,12525.0,,,,,,145492.0,,,261804.0,,,365.0,,862845.0


In [28]:
nd_oil_month = nd_oil_month.fillna(0)

### Add in missing districts
Only 19 out of 53 districts in North Dakota have reported oil production. I'm assuming the rest of the counties do not have oil production, and filling in their values accordingly.

In [29]:
nd_oil_month.columns

Index(['Adams County, ND', 'Billings County, ND', 'Bottineau County, ND',
       'Bowman County, ND', 'Burke County, ND', 'Divide County, ND',
       'Dunn County, ND', 'Golden Valley County, ND', 'Hettinger County, ND',
       'McHenry County, ND', 'McKenzie County, ND', 'McLean County, ND',
       'Mercer County, ND', 'Mountrail County, ND', 'Renville County, ND',
       'Slope County, ND', 'Stark County, ND', 'Ward County, ND',
       'Williams County, ND'],
      dtype='object', name='County_Name')

In [30]:
kept_names = nd_oil_month.columns
nd_county_names = [county_FIPS_names[x] for x in county_FIPS_names if 'ND' in county_FIPS_names[x]] 

In [31]:
missing_counties = list(set(nd_county_names)-set(kept_names))

In [32]:
for x in missing_counties:
    nd_oil_month[x] = np.zeros(len(nd_oil_month['Adams County, ND']))

In [33]:
nd_oil_month.sort_index(axis=1,inplace=True)
nd_oil_month.head()

County_Name,"Adams County, ND","Barnes County, ND","Benson County, ND","Billings County, ND","Bottineau County, ND","Bowman County, ND","Burke County, ND","Burleigh County, ND","Cass County, ND","Cavalier County, ND",...,"Slope County, ND","Stark County, ND","Steele County, ND","Stutsman County, ND","Towner County, ND","Traill County, ND","Walsh County, ND","Ward County, ND","Wells County, ND","Williams County, ND"
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
01/1952,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,7166.0
01/1953,0.0,0.0,0.0,0.0,293.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,405895.0
01/1954,0.0,0.0,0.0,9001.0,1635.0,0.0,1255.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,335948.0
01/1955,0.0,0.0,0.0,21178.0,12933.0,0.0,7099.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,669576.0
01/1956,0.0,0.0,0.0,18302.0,32257.0,0.0,12525.0,0.0,0.0,0.0,...,0.0,365.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,862845.0


In [35]:
max(nd_oil_month.index)

'12/2016'

In [36]:
nd_oil_month.to_csv('NDOilProdCounty.csv')