# Oil spill data

Downloaded from the [Railroad commission](http://www.rrc.state.tx.us/oil-gas/compliance-enforcement/h-8).

The H-8 Loss Reports for crude oil, gas well liquids or associated products are available for download as an Excel compressed zip file by year. If you have questions or need assistance, contact Field Operations at 512-463-6830 or OGFOPS@rrc.texas.gov.


Looking at these files to compare the Magellan Midstream Partners spill of July 13 in Bastrop County. It's been reported to be 87,000 gallons. RRC tells Andy Sevilla the data uses barrels, not gallons, and a barrel is 42 gallons. So, 87,000 gallons is 2071 barrels.


## Data prep

Doing the import, setting data types, column names and stacking.

In [19]:
import agate
import agateexcel

In [20]:
# sets column types where needed
specified_type = {
    'MasterYear': agate.Text(),
    'Water': agate.Text(),
    'WaterName': agate.Text()
}
# importing each file as a table
# should be refactored as a loop
h82009 = agate.Table.from_xlsx('raw-data/h8s-2009.xlsx', column_types=specified_type)
h82010 = agate.Table.from_xlsx('raw-data/h8s-2010.xlsx', column_types=specified_type)
h82011 = agate.Table.from_xlsx('raw-data/h8s-2011.xlsx', column_types=specified_type)
h82012 = agate.Table.from_xlsx('raw-data/h8s-2012.xlsx', column_types=specified_type)
h82013 = agate.Table.from_xlsx('raw-data/h8s-2013.xlsx', column_types=specified_type)
h82014 = agate.Table.from_xlsx('raw-data/h8s-2014.xlsx', column_types=specified_type)
h82015 = agate.Table.from_xlsx('raw-data/h8s-2015.xlsx', column_types=specified_type)
h82016 = agate.Table.from_xlsx('raw-data/h8s-2016.xlsx', column_types=specified_type)
h82017 = agate.Table.from_xlsx('raw-data/h8s-2017.xlsx', column_types=specified_type)

In [21]:
# renaming columns in some files for consistency.
# this could/should be refacted as a loop

column_names = {
    'Form Number': 'FormNumber',
    'Lease No': 'LeaseNumber',
    'Gas ID': 'Gasident'
}
h82009r = h82009.rename(column_names)
h82010r = h82010.rename(column_names)
h82011r = h82011.rename(column_names)
h82012r = h82012.rename(column_names)
h82013r = h82013.rename(column_names)
h82014r = h82014.rename(column_names)
h82015r = h82015.rename(column_names)
h82016r = h82016.rename(column_names)
h82017r = h82017.rename(column_names)

In [22]:
# combining into a single table
spills = agate.Table.merge([
        h82009r,
        h82010r,
        h82011r,
        h82012r,
        h82013r,
        h82014r,
        h82015r,
        h82016r,
        h82017r
    ])

# send to csv to use elsewhere
spills.to_csv('exports/spills.csv')

In [23]:
# look at column names
print(spills)

| column           | data_type |
| ---------------- | --------- |
| MasterYear       | Text      |
| FormNumber       | Number    |
| District         | Text      |
| OperatorNumber   | Text      |
| DateofLoss       | Date      |
| Type_Company_Cd  | Text      |
| County           | Text      |
| LeaseNumber      | Number    |
| Gasident         | Number    |
| CommNbr          | Number    |
| TypeLiquid       | Text      |
| GrossLoss        | Number    |
| Recovered        | Number    |
| NetLoss          | Number    |
| Facility         | Text      |
| CauseofLoss      | Text      |
| Remedy           | Text      |
| Corrected_Report | Boolean   |
| Water            | Text      |
| WaterName        | Text      |
| Location         | Text      |
| Remarks          | Text      |
| LeaseName        | Text      |
| FieldNumber      | Text      |
| Into_Water       | Boolean   |
| Unique_ID        | Number    |



## Overall spills by grossLoss

In [24]:
# orders the spills table
spills_ordered = spills.order_by(lambda row: row['GrossLoss'], reverse=True)
# prints just to show the top in the list
spills_ordered.select(['DateofLoss', 'County', 'GrossLoss']).limit(10).print_table()

| DateofLoss | County       | GrossLoss |
| ---------- | ------------ | --------- |
| 2012-10-01 | HARRISON     |    31,000 |
| 2010-02-09 | SAN PATRICIO |    25,000 |
| 2010-08-08 | BRAZOS       |    15,000 |
| 2010-10-11 | HOCKLEY      |    10,000 |
| 2016-09-10 | NOLAN        |     8,600 |
| 2011-01-27 | GRIMES       |     6,911 |
| 2014-09-15 | HOUSTON      |     5,700 |
| 2013-12-17 | ECTOR        |     5,000 |
| 2017-01-30 | COLLIN       |     5,000 |
| 2015-05-20 | STEPHENS     |     3,900 |


Looking at this table above, it seems the Bastrop spill is the 10th worse since 2009, according to the RRC data. It would also make it the worst in Bastrop County of any type.

## Spills by 'Facility'

In [25]:
# counting records by facility
pivot_facility = spills.pivot('Facility')
pivot_facility.print_table(max_column_width=None)

| Facility                     | Count |
| ---------------------------- | ----- |
| Tank Battery                 | 6,275 |
| Flow Line                    |   919 |
| Other                        |   362 |
| Valve                        |   354 |
| Pipe Line                    |   673 |
| Pump Station or Terminal     |    87 |
| Stuffing Box                 |   104 |
| Separator-Gun Barrel-Treater |   108 |
| Vehicle (Truck-Barge, Etc)   |    21 |


## Pipeline spills

In [26]:
# filter to pipeline spills
spills_pipelines = spills_ordered.where(lambda row: row['Facility'] == 'Pipe Line')

# count result
print('Number in spills_pipline:\n{}'.format(
   spills_pipelines.aggregate(agate.Count())
    ))


Number in spills_pipline:
673


In [27]:
print('Top pipeline spills in gross loss barrels:\n')

# selecting and printing top records
spills_pipelines.limit(10).select(['DateofLoss', 'County', 'GrossLoss']).print_table()

print('\nAt 2,071 barrels (87,000 gallons) the Bastrop spill ranks 5th worst since 2009.')

Top pipeline spills in gross loss barrels:

| DateofLoss | County    | GrossLoss |
| ---------- | --------- | --------- |
| 2010-08-08 | BRAZOS    |    15,000 |
| 2016-09-10 | NOLAN     |     8,600 |
| 2011-01-27 | GRIMES    |     6,911 |
| 2017-01-30 | COLLIN    |     5,000 |
| 2012-01-30 | GAINES    |     2,572 |
| 2011-11-15 | GALVESTON |     2,290 |
| 2015-11-23 | ATASCOSA  |     2,000 |
| 2009-07-23 | MIDLAND   |     1,912 |
| 2010-08-09 | REAGAN    |     1,600 |
| 2014-09-30 | JACK      |     1,034 |

At 2,071 barrels (87,000 gallons) the Bastrop spill ranks 5th worst since 2009.


## Bastrop spills listed

In [35]:
bastrop = spills_ordered.where(lambda row: row['County'] == 'BASTROP')

columns = [
    'Unique_ID',
    'DateofLoss',
    'OperatorNumber',
    'Facility',
    'GrossLoss'
]
bastrop.select(columns).print_table(max_column_width=None)

| Unique_ID | DateofLoss | OperatorNumber         | Facility     | GrossLoss |
| --------- | ---------- | ---------------------- | ------------ | --------- |
|    60,943 | 2012-10-17 | CHALKER OPERATING INC. | Tank Battery |        40 |
|    60,944 | 2012-09-02 | PLAINS PIPELINE L.P.   | Pipe Line    |         8 |


### Bulletproofing

Various looks to make sure all is good

In [10]:
# checking number of records for spills
print('Number of records:\n{}'.format(
    spills.aggregate(agate.Count())
))


Number of records:
8903


In [11]:
# check to make sure UniqueID is unique
pivot_UniqueID = spills.pivot('Unique_ID')
pivot_UniqueID.limit(5).print_table()

| Unique_ID | Count |
| --------- | ----- |
|   117,569 |     1 |
|   117,565 |     1 |
|   117,622 |     1 |
|   118,015 |     1 |
|   117,634 |     1 |
