# Oil spill data

Downloaded from the [Railroad commission](http://www.rrc.state.tx.us/oil-gas/compliance-enforcement/h-8)

Looking at to compare Magellan Midstream Partners spill of July 13 in Bastrop County.

In [1]:
import agate
import agateexcel

# data comes from http://www.rrc.state.tx.us/oil-gas/compliance-enforcement/h-8
# `Magellan Midstream Partners`
# Spill July 13


In [2]:
# importing the data

# sets column types where needed
specified_type = {
    'MasterYear': agate.Text(),
    'Water': agate.Text(),
    'WaterName': agate.Text()
}
# importing each file as a table
h82009 = agate.Table.from_xlsx('raw-data/h8s-2009.xlsx', column_types=specified_type)
h82010 = agate.Table.from_xlsx('raw-data/h8s-2010.xlsx', column_types=specified_type)
h82011 = agate.Table.from_xlsx('raw-data/h8s-2011.xlsx', column_types=specified_type)
h82012 = agate.Table.from_xlsx('raw-data/h8s-2012.xlsx', column_types=specified_type)
h82013 = agate.Table.from_xlsx('raw-data/h8s-2013.xlsx', column_types=specified_type)
h82014 = agate.Table.from_xlsx('raw-data/h8s-2014.xlsx', column_types=specified_type)
h82015 = agate.Table.from_xlsx('raw-data/h8s-2015.xlsx', column_types=specified_type)
h82016 = agate.Table.from_xlsx('raw-data/h8s-2016.xlsx', column_types=specified_type)
h82017 = agate.Table.from_xlsx('raw-data/h8s-2017.xlsx', column_types=specified_type)

In [3]:
# renaming columns for consistency. There are two variations

column_names_one = {
    'Form Number': 'FormNumber',
}
column_names_two = {
    'Form Number': 'FormNumber',
    'Lease No': 'LeaseNumber',
}
h82009r = h82009.rename(column_names_one)
h82010r = h82010.rename(column_names_one)
h82011r = h82011.rename(column_names_one)
h82012r = h82012.rename(column_names_one)
h82013r = h82013.rename(column_names_one)
h82014r = h82014.rename(column_names_one)
h82015r = h82015.rename(column_names_one)
h82016r = h82016.rename(column_names_two)
h82017r = h82017.rename(column_names_two)

In [4]:
# combining into a single table
spills = agate.Table.merge([
        h82009r,
        h82010r,
        h82011r,
        h82012r,
        h82013r,
        h82014r,
        h82015r,
        h82016r,
        h82017r
    ])

# send to csv to use elsewhere
spills.to_csv('exports/spills.csv')

In [6]:
# look at column names
print(spills)

| column           | data_type |
| ---------------- | --------- |
| MasterYear       | Text      |
| FormNumber       | Number    |
| District         | Text      |
| OperatorNumber   | Text      |
| DateofLoss       | Date      |
| Type_Company_Cd  | Text      |
| County           | Text      |
| LeaseNumber      | Number    |
| Gasident         | Number    |
| CommNbr          | Number    |
| TypeLiquid       | Text      |
| GrossLoss        | Number    |
| Recovered        | Number    |
| NetLoss          | Number    |
| Facility         | Text      |
| CauseofLoss      | Text      |
| Remedy           | Text      |
| Corrected_Report | Boolean   |
| Water            | Text      |
| WaterName        | Text      |
| Location         | Text      |
| Remarks          | Text      |
| LeaseName        | Text      |
| FieldNumber      | Text      |
| Into_Water       | Boolean   |
| Unique_ID        | Number    |
| Gas ID           | Number    |



In [7]:
# counting records by facility
pivot_facility = spills.pivot('Facility')
pivot_facility.print_table()

| Facility             | Count |
| -------------------- | ----- |
| Tank Battery         | 6,275 |
| Flow Line            |   919 |
| Other                |   362 |
| Valve                |   354 |
| Pipe Line            |   673 |
| Pump Station or T... |    87 |
| Stuffing Box         |   104 |
| Separator-Gun Bar... |   108 |
| Vehicle (Truck-Ba... |    21 |


### Bulletproofing

Various looks to make sure all is good

In [5]:
# checking number of records for spills
print('Number of records:\n{}'.format(
    spills.aggregate(agate.Count())
))


Number of records:
8903


In [8]:
# check to make sure UniqueID is unique
pivot_UniqueID = spills.pivot('Unique_ID')
pivot_UniqueID.limit(5).print_table()

| Unique_ID | Count |
| --------- | ----- |
|   117,569 |     1 |
|   117,565 |     1 |
|   117,622 |     1 |
|   118,015 |     1 |
|   117,634 |     1 |
