# Montana Board of Oil and Gas Conservation

The purpose of this notebook is to outline the process for collecting oil and gas data from the state of Montana, how to transform the data into something useful for analysis, and to perform some initial EDA

# Data Collection
Start by extracting the data from the following link

In [1]:
# importing libraries
import zipfile
from urllib.request import urlopen
import shutil
import os
import pandas as pd

url = 'http://bogc.dnrc.mt.gov/production/historical.zip'
file_name = 'historical.zip'

# extracting zipfile from URL
with urlopen(url) as response, open(file_name, 'wb') as out_file:
    shutil.copyfileobj(response, out_file)

    # extracting required file from zipfile
    with zipfile.ZipFile(file_name) as zf:
        zf.extract('histLeaseProd.tab')
        zf.extract('histprodwell.tab')
        zf.extract('histWellData.tab')

# deleting the zipfile from the directory
os.remove('historical.zip')

# loading data from the file
lease_prod_df = pd.read_csv('histLeaseProd.tab', sep='\t')
well_prod_df = pd.read_csv('histprodwell.tab', sep='\t')
well_data_df = pd.read_csv('histWellData.tab', sep='\t')

  has_raised = await self.run_ast_nodes(code_ast.body, cell_name,


# View Data
Let's start by taking a look at these 3 files pulled from the .zip file. There are three flat files. `lease_prod_df` contains production reported on a lease basis, `well_prod_df` contains production on a per well monthly basis, and `well_data_df` contains the well information (Producting status, field, horizontal, vertical drill profile, etc.)

In [2]:
lease_prod_df.head()

Unnamed: 0,Lease_Unit,Rpt_Date,Dt_Receive,Del_Rpt,Amnd_Rpt,OpNo,CoName,StartIvn_OilCd,Oil_Prod,Gas_Prod,...,WtrInj,WtrTo_Pit,Other_Oil,Other_Gas,Other_Wtr,Dt_Amend,Lease_Update,No_ProdWells,No_SIWells,Dt_Mod
0,2,01/31/2001,03/15/2001,False,False,5385,EnCana Oil & Gas (USA) Inc.,0.0,0.0,353.0,...,0.0,0.0,0.0,0.0,0.0,,False,1.0,0.0,05/18/2001
1,3,01/31/2001,03/15/2001,False,False,5385,EnCana Oil & Gas (USA) Inc.,0.0,0.0,69.0,...,0.0,0.0,0.0,0.0,0.0,,False,1.0,0.0,05/18/2001
2,4,01/31/2001,03/05/2001,False,False,6681,Samedan Oil Corporation,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,,False,0.0,1.0,03/06/2001
3,5,01/31/2001,03/05/2001,False,False,6681,Samedan Oil Corporation,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,,False,0.0,1.0,03/06/2001
4,6,01/31/2001,03/05/2001,False,False,6681,Samedan Oil Corporation,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,,False,0.0,1.0,03/06/2001


In [5]:
lease_prod_df.describe()

Unnamed: 0,Lease_Unit,OpNo,StartIvn_OilCd,Oil_Prod,Gas_Prod,Wtr_Prod,Oil_Sold,Gas_Sold,OilSpill,WtrSpill,...,UseGas,OilInj,GasInj,WtrInj,WtrTo_Pit,Other_Oil,Other_Gas,Other_Wtr,No_ProdWells,No_SIWells
count,1298191.0,1298191.0,1283575.0,1298181.0,1297562.0,1297373.0,1297370.0,1298186.0,1297238.0,78134.0,...,1298188.0,1297233.0,1298187.0,1279484.0,1279527.0,1286605.0,1298187.0,1279363.0,1298170.0,1288983.0
mean,13705.04,1717.545,99.98504,404.7499,1476.859,2764.494,404.7213,1132.646,0.003033368,0.0209,...,47.78551,0.00347586,235.2886,1616.128,320.1118,0.1094641,3.267775,221.453,1.815966,0.7251414
std,37186.79,2344.972,207.8206,3377.169,32729.34,36040.75,3378.501,9589.917,0.6614214,5.485964,...,443.7782,1.120417,31145.22,31604.98,9859.811,14.68871,146.5689,8156.987,10.24098,4.598063
min,2.0,4.0,-815.0,-402.0,0.0,-515.0,-230.0,0.0,0.0,0.0,...,-819.0,-48.0,0.0,-93.0,-50.0,-8580.0,-554.0,-70550.0,0.0,0.0
25%,2666.0,321.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0
50%,5251.0,536.0,0.0,0.0,138.0,0.0,0.0,69.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0
75%,7387.0,1720.0,146.0,117.0,710.0,125.0,122.0,598.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0
max,990015.0,12306.0,8493.0,231897.0,5934068.0,6684198.0,231986.0,678867.0,248.0,1530.0,...,73514.0,989.0,5934068.0,2096867.0,6684198.0,3000.0,24731.0,8252006.0,1154.0,1200.0


In [3]:
well_prod_df.head()

Unnamed: 0,rpt_date,API_WELLNO,ST_FMTN_CD,Name_,Lease_Unit,OPNO,CoName,BBLS_OIL_COND,MCF_GAS,BBLS_WTR,DAYS_PROD,AMND_RPT,STATUS,dt_mod
0,12/30/1899,25035061990000,CB,Cut Bank,2593.0,1770.0,Cut Bank Gas Company,0.0,0.0,0,0.0,False,,04/13/2020
1,12/30/1899,25035062030000,CB,Cut Bank,2593.0,1770.0,Cut Bank Gas Company,0.0,0.0,0,0.0,False,,04/13/2020
2,12/30/1899,25035062220000,CB,Cut Bank,2593.0,1770.0,Cut Bank Gas Company,0.0,0.0,0,0.0,False,,04/13/2020
3,12/30/1899,25035062380000,CB,Cut Bank,2593.0,1770.0,Cut Bank Gas Company,0.0,0.0,0,0.0,False,,04/13/2020
4,12/30/1899,25035062860000,CB,Cut Bank,2203.0,1770.0,Cut Bank Gas Company,0.0,0.0,0,0.0,False,,04/10/2020


In [4]:
well_data_df.head()

Unnamed: 0,API_WellNo,OpNo,CoName,Well_Nm,Well_Typ,Type,Wl_Status,Status,Wh_Sec,Wh_Twpn,...,Wh_EW,Slant,Reg_Field_No,Reg_Field,Stat_Field_No,Stat_Field,Dt_APD,Dt_Cmp,Elev_KB,DTD
0,25101100140000,7120,"Somont Oil Company, Inc.",Remington-Warner 7,OIL,Oil,PR,Producing,32,35,...,W,V,4820,Kevin-Sunburst,4820,Kevin-Sunburst,,1927-07-05 00:00:00,,1500.0
1,25025225200000,664,"Denbury Onshore, LLC",Little Beaver East 23-22H,OIL,Oil,SI,Shut In,22,5,...,W,H,5420,"Little Beaver, East",5420,"Little Beaver, East",2005-04-12 00:00:00,2006-01-27 00:00:00,2989.0,11250.0
2,25073210450000,5130,Mont Mil Operating Company,TMCBSU 10-10,EOR,"Injection, EOR",AX,P&A - Approved,10,31,...,W,V,2400,Cut Bank,2400,Cut Bank,1968-11-25 00:00:00,1969-07-15 00:00:00,,3422.0
3,25065055920000,645,Kelly Oil and Gas LLC,Smith M #3,OIL,Oil,SI,Shut In,12,10,...,E,V,4700,Keg Coulee,4700,Keg Coulee,1966-10-20 00:00:00,1966-12-17 00:00:00,,4855.0
4,25101226430000,4070,Kipling Energy Incorporated,Allen 4,OIL,Oil,AX,P&A - Approved,11,34,...,E,V,4820,Kevin-Sunburst,4820,Kevin-Sunburst,1984-07-20 00:00:00,1984-08-16 00:00:00,,1588.0


# Clean and Merge 

There are two ways that we can look at our data, we can choose to look through the lenses of leases which will contain a collection of 1 or more wells. The other is to perform analysis on a per well Basis