# Oil spill data

Downloaded from the [Railroad commission](http://www.rrc.state.tx.us/oil-gas/compliance-enforcement/h-8).

The H-8 Loss Reports for crude oil, gas well liquids or associated products are available for download as an Excel compressed zip file by year. If you have questions or need assistance, contact Field Operations at 512-463-6830 or OGFOPS@rrc.texas.gov.


Looking at these files to compare the Magellan Midstream Partners spill of July 13 in Bastrop County. It's been reported to be 87,528 gallons or is 2,084 barrel. RRC tells Andy Sevilla the data uses barrels.

#### Our coverage
- http://www.statesman.com/news/local/new-data-shows-bastrop-county-oil-spill-leaked-000-gallons/Y8YK4xM4uPE37actcJi1HP/
- http://www.statesman.com/news/local/rolled-over-wheeler-spills-oil-onto-bastrop-roadway/PMhcfaqCXQ9U4V6FCX36WL/
- http://www.mystatesman.com/news/opinion/commentary-what-the-bastrop-oil-spill-says-about-effective-regulation/8wkZRUj2ts2mXR8BZhDkQM/


## Data prep

Doing the import, setting data types, column names and stacking.

In [1]:
import agate
import agateexcel
import re
import warnings
warnings.filterwarnings('ignore')

In [2]:
# sets column types where needed
specified_type = {
    'MasterYear': agate.Text(),
    'Water': agate.Text(),
    'WaterName': agate.Text()
}
# importing each file as a table
# should be refactored as a loop
h82009 = agate.Table.from_xlsx('raw-data/h8s-2009.xlsx', column_types=specified_type)
h82010 = agate.Table.from_xlsx('raw-data/h8s-2010.xlsx', column_types=specified_type)
h82011 = agate.Table.from_xlsx('raw-data/h8s-2011.xlsx', column_types=specified_type)
h82012 = agate.Table.from_xlsx('raw-data/h8s-2012.xlsx', column_types=specified_type)
h82013 = agate.Table.from_xlsx('raw-data/h8s-2013.xlsx', column_types=specified_type)
h82014 = agate.Table.from_xlsx('raw-data/h8s-2014.xlsx', column_types=specified_type)
h82015 = agate.Table.from_xlsx('raw-data/h8s-2015.xlsx', column_types=specified_type)
h82016 = agate.Table.from_xlsx('raw-data/h8s-2016.xlsx', column_types=specified_type)
h82017 = agate.Table.from_xlsx('raw-data/h8s-2017.xlsx', column_types=specified_type)

In [3]:
# renaming columns in some files for consistency.
# this could/should be refacted as a loop

column_names = {
    'Form Number': 'FormNumber',
    'Lease No': 'LeaseNumber',
    'Gas ID': 'Gasident'
}
h82009r = h82009.rename(column_names)
h82010r = h82010.rename(column_names)
h82011r = h82011.rename(column_names)
h82012r = h82012.rename(column_names)
h82013r = h82013.rename(column_names)
h82014r = h82014.rename(column_names)
h82015r = h82015.rename(column_names)
h82016r = h82016.rename(column_names)
h82017r = h82017.rename(column_names)

# combining into a single table
spills = agate.Table.merge([
        h82009r,
        h82010r,
        h82011r,
        h82012r,
        h82013r,
        h82014r,
        h82015r,
        h82016r,
        h82017r
    ])



In [4]:
# send to csv to use elsewhere
spills.to_csv('exports/spills.csv')

In [5]:
# look at column names
print(spills)

| column           | data_type |
| ---------------- | --------- |
| MasterYear       | Text      |
| FormNumber       | Number    |
| District         | Text      |
| OperatorNumber   | Text      |
| DateofLoss       | Date      |
| Type_Company_Cd  | Text      |
| County           | Text      |
| LeaseNumber      | Number    |
| Gasident         | Number    |
| CommNbr          | Number    |
| TypeLiquid       | Text      |
| GrossLoss        | Number    |
| Recovered        | Number    |
| NetLoss          | Number    |
| Facility         | Text      |
| CauseofLoss      | Text      |
| Remedy           | Text      |
| Corrected_Report | Boolean   |
| Water            | Text      |
| WaterName        | Text      |
| Location         | Text      |
| Remarks          | Text      |
| LeaseName        | Text      |
| FieldNumber      | Text      |
| Into_Water       | Boolean   |
| Unique_ID        | Number    |



## Overall spills by GrossLoss
These are the top spills by the most lost.

In [6]:
# checking number of records for spills
print('Total number of records:\n{}'.format(
    spills.aggregate(agate.Count())
))


Total number of records:
8903


In [7]:
# orders the spills table
spills_ordered = spills.order_by(lambda row: row['GrossLoss'], reverse=True)
# prints just to show the top in the list
spills_ordered.select(['DateofLoss', 'County', 'GrossLoss']).limit(20).print_table()

| DateofLoss | County       | GrossLoss |
| ---------- | ------------ | --------- |
| 2012-10-01 | HARRISON     |    31,000 |
| 2010-02-09 | SAN PATRICIO |    25,000 |
| 2010-08-08 | BRAZOS       |    15,000 |
| 2010-10-11 | HOCKLEY      |    10,000 |
| 2016-09-10 | NOLAN        |     8,600 |
| 2011-01-27 | GRIMES       |     6,911 |
| 2014-09-15 | HOUSTON      |     5,700 |
| 2013-12-17 | ECTOR        |     5,000 |
| 2017-01-30 | COLLIN       |     5,000 |
| 2015-05-20 | STEPHENS     |     3,900 |
| 2013-05-20 | STARR        |     3,500 |
| 2009-06-17 | SCURRY       |     3,416 |
| 2015-06-20 | FREESTONE    |     3,300 |
| 2015-04-07 | TERRY        |     3,300 |
| 2016-04-06 | LOVING       |     3,200 |
| 2012-06-28 | DIMMIT       |     3,000 |
| 2011-04-01 | HOWARD       |     2,583 |
| 2012-01-30 | GAINES       |     2,572 |
| 2015-05-11 | STEPHENS     |     2,500 |
| 2009-01-29 | NUECES       |     2,400 |


In [8]:
## would like to count number of spills greater than 2,084 gallons.
spills_bigger = spills_ordered.where(lambda row: row['GrossLoss'] > 2084)

print('Looking at all spills by GrossLoss, the Bastrop spill ranks {} in list of worst spills.'.format(
        spills_bigger.aggregate(agate.Count()) + 1
    ))

Looking at all spills by GrossLoss, the Bastrop spill ranks 24 in list of worst spills.


## Overall spills by NetLoss
Reordering all spills by NetLoss to account for barrels recovered. I'm not sure if there was oil recovered from the Bastrop spill, though.


In [9]:
spills_netloss = spills.order_by(lambda row: row['NetLoss'], reverse=True)
spills_netloss.select(['DateofLoss', 'County', 'NetLoss']).print_table()

| DateofLoss | County    | NetLoss |
| ---------- | --------- | ------- |
| 2012-10-01 | HARRISON  |  31,000 |
| 2016-09-10 | NOLAN     |   6,725 |
| 2014-09-15 | HOUSTON   |   5,698 |
| 2013-12-17 | ECTOR     |   5,000 |
| 2013-05-20 | STARR     |   3,500 |
| 2011-01-27 | GRIMES    |   3,304 |
| 2012-06-28 | DIMMIT    |   3,000 |
| 2011-04-01 | HOWARD    |   2,583 |
| 2015-06-15 | KENT      |   2,063 |
| 2013-10-28 | REAGAN    |   2,000 |
| 2009-06-17 | SCURRY    |   1,835 |
| 2014-09-27 | CULBERSON |   1,700 |
| 2010-08-09 | REAGAN    |   1,600 |
| 2012-01-30 | GAINES    |   1,472 |
| 2015-02-01 | FORT BEND |   1,452 |
| 2013-12-14 | ECTOR     |   1,430 |
| 2013-06-30 | UPTON     |   1,352 |
| 2015-11-23 | ATASCOSA  |   1,213 |
| 2015-01-07 | WEBB      |   1,130 |
| 2014-11-04 | KLEBERG   |   1,100 |
|        ... | ...       |     ... |


## Spills by 'Facility'

In [10]:
spills_group_facility = spills.group_by('Facility')
spills_facility_table = spills_group_facility.aggregate([
        ('spill_cnt', agate.Count()),
        ('spill_sum', agate.Sum('GrossLoss')),
        ('spill_avg', agate.Mean('GrossLoss')),
        ('spill_med', agate.Median('GrossLoss'))
    ]
)
spills_facility_table.print_table()

| Facility             | spill_cnt | spill_sum | spill_avg | spill_med |
| -------------------- | --------- | --------- | --------- | --------- |
| Tank Battery         |     6,275 |   571,558 |   91.085… |      33.0 |
| Flow Line            |       919 |    62,397 |   67.897… |      15.0 |
| Other                |       362 |    29,764 |   82.221… |      30.0 |
| Valve                |       354 |    20,163 |   56.958… |      15.0 |
| Pipe Line            |       673 |    96,116 |  142.817… |      20.0 |
| Pump Station or T... |        87 |    10,729 |  123.322… |      25.0 |
| Stuffing Box         |       104 |     2,901 |   27.894… |      13.5 |
| Separator-Gun Bar... |       108 |     6,595 |   61.065… |      21.0 |
| Vehicle (Truck-Ba... |        21 |     1,105 |   52.619… |      20.0 |


## Pipeline spills

In [11]:
# filter to pipeline spills
spills_pipelines = spills_ordered.where(lambda row: row['Facility'] == 'Pipe Line')

# count result
print('Number in spills_pipline:\n{}'.format(
   spills_pipelines.aggregate(agate.Count())
    ))


Number in spills_pipline:
673


In [12]:
print('Top pipeline spills in gross loss barrels:\n')

# selecting and printing top records
spills_pipelines.limit(10).select(['DateofLoss', 'County', 'GrossLoss']).print_table()

print('\nAt 2,084 barrels (87,528 gallons) the Bastrop spill ranks 5th worst pipeline spill since 2009.')

Top pipeline spills in gross loss barrels:

| DateofLoss | County    | GrossLoss |
| ---------- | --------- | --------- |
| 2010-08-08 | BRAZOS    |    15,000 |
| 2016-09-10 | NOLAN     |     8,600 |
| 2011-01-27 | GRIMES    |     6,911 |
| 2017-01-30 | COLLIN    |     5,000 |
| 2012-01-30 | GAINES    |     2,572 |
| 2011-11-15 | GALVESTON |     2,290 |
| 2015-11-23 | ATASCOSA  |     2,000 |
| 2009-07-23 | MIDLAND   |     1,912 |
| 2010-08-09 | REAGAN    |     1,600 |
| 2014-09-30 | JACK      |     1,034 |

At 2,084 barrels (87,528 gallons) the Bastrop spill ranks 5th worst pipeline spill since 2009.


## Bastrop spills listed

In [13]:
bastrop = spills_ordered.where(lambda row: row['County'] == 'BASTROP')

columns = [
    'Unique_ID',
    'DateofLoss',
    'OperatorNumber',
    'Facility',
    'GrossLoss'
]
bastrop.select(columns).print_table(max_column_width=None)

| Unique_ID | DateofLoss | OperatorNumber         | Facility     | GrossLoss |
| --------- | ---------- | ---------------------- | ------------ | --------- |
|    60,943 | 2012-10-17 | CHALKER OPERATING INC. | Tank Battery |        40 |
|    60,944 | 2012-09-02 | PLAINS PIPELINE L.P.   | Pipe Line    |         8 |


## Spills per year

In [14]:
spills_group_year = spills.group_by('MasterYear')
spills_year_table = spills_group_year.aggregate([
        ('spill_cnt', agate.Count()),
        ('spill_sum', agate.Sum('GrossLoss')),
        ('spill_avg', agate.Mean('GrossLoss')),
        ('spill_med', agate.Median('GrossLoss'))
    ]
)
spills_year_table.print_table()

| MasterYear | spill_cnt | spill_sum | spill_avg | spill_med |
| ---------- | --------- | --------- | --------- | --------- |
| 2009       |       675 |    55,996 |   82.957… |        30 |
| 2010       |       796 |   109,203 |  137.190… |        32 |
| 2011       |       869 |    74,933 |   86.229… |        32 |
| 2012       |     1,236 |   129,664 |  104.906… |        30 |
| 2013       |     1,354 |   118,136 |   87.250… |        30 |
| 2014       |     1,405 |   105,388 |   75.009… |        21 |
| 2015       |     1,485 |   120,266 |   80.987… |        25 |
| 2016       |     1,014 |    78,412 |   77.329… |        25 |
| 2017       |        69 |     9,330 |  135.217… |        23 |


### Number of spills

In [15]:
spills_year_table.print_bars('MasterYear', 'spill_cnt', width=60)

MasterYear spill_cnt
2009             675 ▓░░░░░░░░░░░░░                         
2010             796 ▓░░░░░░░░░░░░░░░                       
2011             869 ▓░░░░░░░░░░░░░░░░░                     
2012           1,236 ▓░░░░░░░░░░░░░░░░░░░░░░░               
2013           1,354 ▓░░░░░░░░░░░░░░░░░░░░░░░░░░            
2014           1,405 ▓░░░░░░░░░░░░░░░░░░░░░░░░░░░           
2015           1,485 ▓░░░░░░░░░░░░░░░░░░░░░░░░░░░░          
2016           1,014 ▓░░░░░░░░░░░░░░░░░░░                   
2017              69 ▓░                                     
                     +---------+--------+--------+---------+
                     0        500     1,000    1,500   2,000


### Barrels per year spilled

In [16]:
spills_year_table.print_bars('MasterYear', 'spill_sum', width=60)

MasterYear spill_sum
2009          55,996 ▓░░░░░░░░░░░                           
2010         109,203 ▓░░░░░░░░░░░░░░░░░░░░░                 
2011          74,933 ▓░░░░░░░░░░░░░░                        
2012         129,664 ▓░░░░░░░░░░░░░░░░░░░░░░░░░             
2013         118,136 ▓░░░░░░░░░░░░░░░░░░░░░░                
2014         105,388 ▓░░░░░░░░░░░░░░░░░░░░                  
2015         120,266 ▓░░░░░░░░░░░░░░░░░░░░░░░               
2016          78,412 ▓░░░░░░░░░░░░░░░                       
2017           9,330 ▓░░                                    
                     +---------+--------+------------------+
                     0      50,000   100,000         200,000


## Searching for the company
Looking for Magellan Midstream Partners or anything like them any any columns

In [31]:
# this function allows me to imput a column name
# and get distinct matchinhg results
def search_column(column_name):
    result = spills.where(lambda row: re.search(
        'MAGELLAN|MIDSTREAM', str(row[column_name])
    ))
    
    print('Searching the \'{}\' column for MAGELLAN or MIDSTREAM \
    and found these distinct names. (If blank, there are none):'.format(
        column_name
        ))
    
    return result.columns[column_name].values_distinct()

In [32]:
search_column('OperatorNumber')

Searching the 'OperatorNumber' column for MAGELLAN or MIDSTREAM     and found these distinct names. (If blank, there are none):


('ENLINK MIDSTREAM SERVICES, LLC',
 'DCP MIDSTREAM MARKETING, LLC',
 'NAVARRO MIDSTREAM SERVICES, LLC',
 'GAVILON MIDSTREAM ENERGY, LLC',
 'TRINITY RIVER MIDSTREAM, LLC',
 'EFS MIDSTREAM LLC',
 'DCP MIDSTREAM, LP',
 'TARGA MIDSTREAM SERVICES LP',
 'VITOL MIDSTREAM PIPELINE LLC',
 'TARGA SOUTHTEX MIDSTREAM CO LP',
 'CCS MIDSTREAM SERVICES, LLC',
 'CHESAPEAKE MIDSTREAM PARTNERSLLC',
 'DCP MIDSTREAM MARKETING, LP',
 'TALCO MIDSTREAM ASSETS, LTD.',
 'VITOL MIDSTREAM LLC',
 'PARADIGM MIDSTREAM SVCS-ST, LLC',
 'TARGA MIDSTREAM SERVICES LLC',
 'ENABLE MIDSTREAM PARTNERS, LP')

In [19]:
search_column('LeaseName')

Searching the 'LeaseName' column for MAGELLAN or MIDSTREAM     and found these distinct names. (If blank, there are none):


()

In [20]:
search_column('FieldNumber')

Searching the 'FieldNumber' column for MAGELLAN or MIDSTREAM     and found these distinct names. (If blank, there are none):


()

In [21]:
search_column('Location')

Searching the 'Location' column for MAGELLAN or MIDSTREAM     and found these distinct names. (If blank, there are none):


('ROUND TOP BOOSTER IS A NATURAL GAS, MIDSTREAM COMPRESSOR STATION',)

In [22]:
search_column('Remarks')

Searching the 'Remarks' column for MAGELLAN or MIDSTREAM     and found these distinct names. (If blank, there are none):


()

### Bulletproofing

Various looks to make sure all is good

In [23]:
# check to make sure UniqueID is unique
pivot_UniqueID = spills.pivot('Unique_ID')
pivot_UniqueID.limit(5).print_table()

| Unique_ID | Count |
| --------- | ----- |
|   117,569 |     1 |
|   117,565 |     1 |
|   117,622 |     1 |
|   118,015 |     1 |
|   117,634 |     1 |


In [24]:
# checks this 'Corrected_Report' column, which is only in some files, to see if any are true.
# None are
spills_corrected_pivot = spills.pivot('Corrected_Report', 'MasterYear')
spills_corrected_pivot.print_table(max_columns=None)

| Corrected_Report | 2009 | 2010 | 2011 |  2014 |  2012 |  2013 |  2015 |  2016 | 2017 |
| ---------------- | ---- | ---- | ---- | ----- | ----- | ----- | ----- | ----- | ---- |
|            False |  675 |  796 |  869 | 1,405 |     0 |     0 |     0 |     0 |    0 |
|                  |    0 |    0 |    0 |     0 | 1,236 | 1,354 | 1,485 | 1,014 |   69 |
