# Oil spill data

Downloaded from the [Railroad commission](http://www.rrc.state.tx.us/oil-gas/compliance-enforcement/h-8).

The H-8 Loss Reports for crude oil, gas well liquids or associated products are available for download as an Excel compressed zip file by year. If you have questions or need assistance, contact Field Operations at 512-463-6830 or OGFOPS@rrc.texas.gov.


Looking at these files to compare the Magellan Midstream Partners spill of July 13 in Bastrop County. It's been reported to be 87,528 gallons or is 2,084 barrel. RRC tells Andy Sevilla the data uses barrels.

#### Our coverage
- http://www.statesman.com/news/local/new-data-shows-bastrop-county-oil-spill-leaked-000-gallons/Y8YK4xM4uPE37actcJi1HP/
- http://www.statesman.com/news/local/rolled-over-wheeler-spills-oil-onto-bastrop-roadway/PMhcfaqCXQ9U4V6FCX36WL/
- http://www.mystatesman.com/news/opinion/commentary-what-the-bastrop-oil-spill-says-about-effective-regulation/8wkZRUj2ts2mXR8BZhDkQM/


## Data prep

Doing the import, setting data types, column names and stacking.

In [1]:
import agate
import agateexcel
import re
import warnings
warnings.filterwarnings('ignore')

In [2]:
#bastrop data
bastrop_barrels = 2084
bastrop_gallons = '87528'

In [3]:
# sets column types where needed
specified_type = {
    'MasterYear': agate.Text(),
    'Water': agate.Text(),
    'WaterName': agate.Text()
}
# importing each file as a table
# should be refactored as a loop
h82009 = agate.Table.from_xlsx('raw-data/h8s-2009.xlsx', column_types=specified_type)
h82010 = agate.Table.from_xlsx('raw-data/h8s-2010.xlsx', column_types=specified_type)
h82011 = agate.Table.from_xlsx('raw-data/h8s-2011.xlsx', column_types=specified_type)
h82012 = agate.Table.from_xlsx('raw-data/h8s-2012.xlsx', column_types=specified_type)
h82013 = agate.Table.from_xlsx('raw-data/h8s-2013.xlsx', column_types=specified_type)
h82014 = agate.Table.from_xlsx('raw-data/h8s-2014.xlsx', column_types=specified_type)
h82015 = agate.Table.from_xlsx('raw-data/h8s-2015.xlsx', column_types=specified_type)
h82016 = agate.Table.from_xlsx('raw-data/h8s-2016.xlsx', column_types=specified_type)
h82017 = agate.Table.from_xlsx('raw-data/h8s-2017.xlsx', column_types=specified_type)

In [4]:
# renaming columns in some files for consistency.
# this could/should be refacted as a loop

column_names = {
    'Form Number': 'FormNumber',
    'Lease No': 'LeaseNumber',
    'Gas ID': 'Gasident'
}
h82009r = h82009.rename(column_names)
h82010r = h82010.rename(column_names)
h82011r = h82011.rename(column_names)
h82012r = h82012.rename(column_names)
h82013r = h82013.rename(column_names)
h82014r = h82014.rename(column_names)
h82015r = h82015.rename(column_names)
h82016r = h82016.rename(column_names)
h82017r = h82017.rename(column_names)

# combining into a single table
spills = agate.Table.merge([
        h82009r,
        h82010r,
        h82011r,
        h82012r,
        h82013r,
        h82014r,
        h82015r,
        h82016r,
        h82017r
    ])
# add GrossLossGal column converts barrels to gallons
spills = spills.compute([
    ('GrossLossGal', agate.Formula(agate.Number(), lambda r: r['GrossLoss'] * 42))
])

In [5]:
# send to csv to use elsewhere
spills.to_csv('exports/spills.csv')

In [6]:
# look at column names
print(spills)

| column           | data_type |
| ---------------- | --------- |
| MasterYear       | Text      |
| FormNumber       | Number    |
| District         | Text      |
| OperatorNumber   | Text      |
| DateofLoss       | Date      |
| Type_Company_Cd  | Text      |
| County           | Text      |
| LeaseNumber      | Number    |
| Gasident         | Number    |
| CommNbr          | Number    |
| TypeLiquid       | Text      |
| GrossLoss        | Number    |
| Recovered        | Number    |
| NetLoss          | Number    |
| Facility         | Text      |
| CauseofLoss      | Text      |
| Remedy           | Text      |
| Corrected_Report | Boolean   |
| Water            | Text      |
| WaterName        | Text      |
| Location         | Text      |
| Remarks          | Text      |
| LeaseName        | Text      |
| FieldNumber      | Text      |
| Into_Water       | Boolean   |
| Unique_ID        | Number    |
| GrossLossGal     | Number    |



## Overall spills by GrossLoss
These are the top spills by the most lost.

In [7]:
# checking number of records for spills
print('Total number of records:\n{}'.format(
    spills.aggregate(agate.Count())
))


Total number of records:
8903


In [8]:
# orders the spills table
spills_ordered = spills.order_by(lambda row: row['GrossLoss'], reverse=True)
# prints just to show the top in the list
spills_ordered.select(['DateofLoss', 'County', 'GrossLoss']).limit(20).print_table()

| DateofLoss | County       | GrossLoss |
| ---------- | ------------ | --------- |
| 2012-10-01 | HARRISON     |    31,000 |
| 2010-02-09 | SAN PATRICIO |    25,000 |
| 2010-08-08 | BRAZOS       |    15,000 |
| 2010-10-11 | HOCKLEY      |    10,000 |
| 2016-09-10 | NOLAN        |     8,600 |
| 2011-01-27 | GRIMES       |     6,911 |
| 2014-09-15 | HOUSTON      |     5,700 |
| 2013-12-17 | ECTOR        |     5,000 |
| 2017-01-30 | COLLIN       |     5,000 |
| 2015-05-20 | STEPHENS     |     3,900 |
| 2013-05-20 | STARR        |     3,500 |
| 2009-06-17 | SCURRY       |     3,416 |
| 2015-06-20 | FREESTONE    |     3,300 |
| 2015-04-07 | TERRY        |     3,300 |
| 2016-04-06 | LOVING       |     3,200 |
| 2012-06-28 | DIMMIT       |     3,000 |
| 2011-04-01 | HOWARD       |     2,583 |
| 2012-01-30 | GAINES       |     2,572 |
| 2015-05-11 | STEPHENS     |     2,500 |
| 2009-01-29 | NUECES       |     2,400 |


In [9]:
## would like to count number of spills greater than 2,084 gallons.
spills_bigger = spills_ordered.where(lambda row: row['GrossLoss'] > bastrop_barrels)

print('Looking at all spills by GrossLoss, the Bastrop spill ranks {} in list of worst spills.'.format(
        spills_bigger.aggregate(agate.Count()) + 1
    ))

Looking at all spills by GrossLoss, the Bastrop spill ranks 24 in list of worst spills.


## Overall spills by NetLoss
Reordering all spills by NetLoss to account for barrels recovered. I'm not sure if there was oil recovered from the Bastrop spill, though.


In [10]:
spills_netloss = spills.order_by(lambda row: row['NetLoss'], reverse=True)
spills_netloss.select(['DateofLoss', 'County', 'NetLoss']).print_table()

| DateofLoss | County    | NetLoss |
| ---------- | --------- | ------- |
| 2012-10-01 | HARRISON  |  31,000 |
| 2016-09-10 | NOLAN     |   6,725 |
| 2014-09-15 | HOUSTON   |   5,698 |
| 2013-12-17 | ECTOR     |   5,000 |
| 2013-05-20 | STARR     |   3,500 |
| 2011-01-27 | GRIMES    |   3,304 |
| 2012-06-28 | DIMMIT    |   3,000 |
| 2011-04-01 | HOWARD    |   2,583 |
| 2015-06-15 | KENT      |   2,063 |
| 2013-10-28 | REAGAN    |   2,000 |
| 2009-06-17 | SCURRY    |   1,835 |
| 2014-09-27 | CULBERSON |   1,700 |
| 2010-08-09 | REAGAN    |   1,600 |
| 2012-01-30 | GAINES    |   1,472 |
| 2015-02-01 | FORT BEND |   1,452 |
| 2013-12-14 | ECTOR     |   1,430 |
| 2013-06-30 | UPTON     |   1,352 |
| 2015-11-23 | ATASCOSA  |   1,213 |
| 2015-01-07 | WEBB      |   1,130 |
| 2014-11-04 | KLEBERG   |   1,100 |
|        ... | ...       |     ... |


## Spills by 'Facility'

In [11]:
spills_group_facility = spills.group_by('Facility')
spills_facility_table = spills_group_facility.aggregate([
        ('spill_cnt', agate.Count()),
        ('spill_sum', agate.Sum('GrossLoss')),
        ('spill_sum_gal', agate.Sum('GrossLossGal')),
        ('spill_avg', agate.Mean('GrossLoss')),
        ('spill_med', agate.Median('GrossLoss'))
    ]
)
spills_facility_table.print_table()

| Facility             | spill_cnt | spill_sum | spill_sum_gal | spill_avg | spill_med |
| -------------------- | --------- | --------- | ------------- | --------- | --------- |
| Tank Battery         |     6,275 |   571,558 |    24,005,436 |   91.085… |      33.0 |
| Flow Line            |       919 |    62,397 |     2,620,674 |   67.897… |      15.0 |
| Other                |       362 |    29,764 |     1,250,088 |   82.221… |      30.0 |
| Valve                |       354 |    20,163 |       846,846 |   56.958… |      15.0 |
| Pipe Line            |       673 |    96,116 |     4,036,872 |  142.817… |      20.0 |
| Pump Station or T... |        87 |    10,729 |       450,618 |  123.322… |      25.0 |
| Stuffing Box         |       104 |     2,901 |       121,842 |   27.894… |      13.5 |
| Separator-Gun Bar... |       108 |     6,595 |       276,990 |   61.065… |      21.0 |
| Vehicle (Truck-Ba... |        21 |     1,105 |        46,410 |   52.619… |      20.0 |


## Pipeline spills

In [12]:
# filter to pipeline spills
spills_pipelines = spills_ordered.where(lambda row: row['Facility'] == 'Pipe Line')

pipeline_spill_count = spills_pipelines.aggregate(agate.Count())

# count result
print('Number in spills_pipline:\n{}'.format(
  pipeline_spill_count
    ))



Number in spills_pipline:
673


### Pipeline spills over time

In [96]:
spills_pipeline_group_year = spills_pipelines.group_by('MasterYear')
spills_pipeline_year_table = spills_pipeline_group_year.aggregate([
        ('spill_cnt', agate.Count()),
        ('spill_sum', agate.Sum('GrossLoss')),
        ('spill_sum_gal', agate.Sum('GrossLossGal')),
        ('spill_avg', agate.Mean('GrossLoss')),
        ('spill_med', agate.Median('GrossLoss'))
    ]
)
# send to file for later
spills_pipeline_year_table.order_by('MasterYear').to_csv('exports/spills_pipeline_year_table.csv')
# print table
spills_pipeline_year_table.order_by('MasterYear').print_table()

| MasterYear | spill_cnt | spill_sum | spill_sum_gal |  spill_avg | spill_med |
| ---------- | --------- | --------- | ------------- | ---------- | --------- |
| 2009       |        66 |     6,036 |       253,512 |    91.455… |        15 |
| 2010       |        77 |    21,362 |       897,204 |   277.429… |        23 |
| 2011       |        70 |    13,823 |       580,566 |   197.471… |        20 |
| 2012       |       109 |    11,137 |       467,754 |   102.174… |        24 |
| 2013       |       107 |    11,674 |       490,308 |   109.103… |        28 |
| 2014       |       114 |     7,982 |       335,244 |    70.018… |        20 |
| 2015       |        80 |     7,106 |       298,452 |    88.825… |        17 |
| 2016       |        46 |    11,964 |       502,488 |   260.087… |        25 |
| 2017       |         4 |     5,032 |       211,344 | 1,258.000… |        14 |


### Pipeline spills by year

In [14]:
spills_pipeline_year_table.order_by('MasterYear').print_bars('MasterYear', 'spill_cnt', width=60)

MasterYear spill_cnt
2009              66 ▓░░░░░░░░░░░░░                         
2010              77 ▓░░░░░░░░░░░░░░░                       
2011              70 ▓░░░░░░░░░░░░░                         
2012             109 ▓░░░░░░░░░░░░░░░░░░░░░                 
2013             107 ▓░░░░░░░░░░░░░░░░░░░░                  
2014             114 ▓░░░░░░░░░░░░░░░░░░░░░░                
2015              80 ▓░░░░░░░░░░░░░░░                       
2016              46 ▓░░░░░░░░░                             
2017               4 ▓░                                     
                     +---------+--------+--------+---------+
                     0        50       100      150      200


### Pipeline barrels spills by year

In [37]:
print('Barrels spilled from pipelines per year:\n')
spills_pipeline_year_table.order_by('MasterYear').print_bars('MasterYear', 'spill_sum', width=60)

Barrels spilled from pipelines per year:

MasterYear spill_sum
2009           6,036 ▓░░░░░░░░                              
2010          21,362 ▓░░░░░░░░░░░░░░░░░░░░░░░░░░░           
2011          13,823 ▓░░░░░░░░░░░░░░░░░░                    
2012          11,137 ▓░░░░░░░░░░░░░░                        
2013          11,674 ▓░░░░░░░░░░░░░░░                       
2014           7,982 ▓░░░░░░░░░░                            
2015           7,106 ▓░░░░░░░░░                             
2016          11,964 ▓░░░░░░░░░░░░░░░                       
2017           5,032 ▓░░░░░░                                
                     +---------+--------+--------+---------+
                     0       7,500   15,000   22,500  30,000


In [36]:
print('Gallons spilled from pipelines per year:\n')
spills_pipeline_year_table.order_by('MasterYear').print_bars('MasterYear', 'spill_sum_gal', width=60)


Gallons spilled from pipelines per year:

MasterYear spill_sum_gal
2009             253,512 ▓░░░░░░░░░░                        
2010             897,204 ▓░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░
2011             580,566 ▓░░░░░░░░░░░░░░░░░░░░░░            
2012             467,754 ▓░░░░░░░░░░░░░░░░░░                
2013             490,308 ▓░░░░░░░░░░░░░░░░░░░               
2014             335,244 ▓░░░░░░░░░░░░░                     
2015             298,452 ▓░░░░░░░░░░░                       
2016             502,488 ▓░░░░░░░░░░░░░░░░░░░               
2017             211,344 ▓░░░░░░░░                          
                         +-------+--------+----------------+
                         0    225,000  450,000       900,000


### Top pipeline spills

In [16]:
print('Top pipeline spills in gross loss barrels:\n')

# selecting and printing top records
spills_pipelines.limit(10).select(['DateofLoss', 'County', 'GrossLoss']).print_table()

## would like to count number of spills greater than 2,084 gallons.
pipeline_spills_bigger = spills_pipelines.where(lambda row: row['GrossLoss'] > bastrop_barrels)

print('\nLooking at all PIPELINE spills by GrossLoss, the Bastrop\
spill ranks {} out of {} in list of worst spills.'.format(
        pipeline_spills_bigger.aggregate(agate.Count()) + 1,
        pipeline_spill_count
    ))

Top pipeline spills in gross loss barrels:

| DateofLoss | County    | GrossLoss |
| ---------- | --------- | --------- |
| 2010-08-08 | BRAZOS    |    15,000 |
| 2016-09-10 | NOLAN     |     8,600 |
| 2011-01-27 | GRIMES    |     6,911 |
| 2017-01-30 | COLLIN    |     5,000 |
| 2012-01-30 | GAINES    |     2,572 |
| 2011-11-15 | GALVESTON |     2,290 |
| 2015-11-23 | ATASCOSA  |     2,000 |
| 2009-07-23 | MIDLAND   |     1,912 |
| 2010-08-09 | REAGAN    |     1,600 |
| 2014-09-30 | JACK      |     1,034 |

Looking at all PIPELINE spills by GrossLoss, the Bastropspill ranks 7 out of 673 in list of worst spills.


### Pipelines severity of spills

In [93]:
# histogram bin by GrossLoss
spills_pipeline_bins = spills_pipelines.bins('GrossLossGal', 10, 0, 10000)
spills_pipeline_bins.print_bars('GrossLossGal', 'Count', width=60)
# export bins for graphic
spills_pipeline_bins.to_csv('exports/spills_pipeline_bins.csv')

GrossLossGal      Count
[0 - 1,000)         357 ▓░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░    
[1,000 - 2,000)     106 ▓░░░░░░░░░                          
[2,000 - 3,000)      56 ▓░░░░░                              
[3,000 - 4,000)      28 ▓░░                                 
[4,000 - 5,000)      17 ▓░                                  
[5,000 - 6,000)      14 ▓░                                  
[6,000 - 7,000)      17 ▓░                                  
[7,000 - 8,000)       3 ▓                                   
[8,000 - 9,000)       8 ▓░                                  
[9,000 - 700,000]    67 ▓░░░░░░                             
                        +--------+--------+-------+--------+
                        0       100      200     300     400


## Looking by county
### Bastrop spills

In [18]:
# list of columns to print when looking at county results.
county_columns = [
    'County',
    'DateofLoss',
    'OperatorNumber',
    'Facility',
    'GrossLoss',
    'Unique_ID'
]

bastrop = spills_ordered.where(lambda row: row['County'] == 'BASTROP')

bastrop.select(county_columns).print_table(max_column_width=None)

| County  | DateofLoss | OperatorNumber         | Facility     | GrossLoss | Unique_ID |
| ------- | ---------- | ---------------------- | ------------ | --------- | --------- |
| BASTROP | 2012-10-17 | CHALKER OPERATING INC. | Tank Battery |        40 |    60,943 |
| BASTROP | 2012-09-02 | PLAINS PIPELINE L.P.   | Pipe Line    |         8 |    60,944 |


### Travis spills

In [19]:
travis = spills_ordered.where(lambda row: row['County'] == 'TRAVIS')

travis.select(county_columns).print_table(max_column_width=None)

| County | DateofLoss | OperatorNumber | Facility | GrossLoss | Unique_ID |
| ------ | ---------- | -------------- | -------- | --------- | --------- |


### Galveston spills

In [20]:
galveston = spills_ordered.where(lambda row: row['County'] == 'GALVESTON')

galveston.select(county_columns).order_by('DateofLoss').print_table(max_column_width=None)

| County    | DateofLoss | OperatorNumber                   | Facility     | GrossLoss | Unique_ID |
| --------- | ---------- | -------------------------------- | ------------ | --------- | --------- |
| GALVESTON | 2009-05-21 | HITCHCOCK SWD CORPORATION        | Other        |        10 |   117,708 |
| GALVESTON | 2009-07-23 | ERG RESOURCES, L.L.C.            | Tank Battery |       600 |   117,813 |
| GALVESTON | 2010-12-17 | SANDRIDGE ONSHORE, LLC           | Pipe Line    |         1 |   118,760 |
| GALVESTON | 2010-12-26 | HAWK FIELD SERVICES, LLC         | Pipe Line    |         6 |   118,761 |
| GALVESTON | 2011-10-27 | ENTERPRISE CRUDE PIPELINE LLC    | Tank Battery |         6 |   119,631 |
| GALVESTON | 2011-10-30 | MARATHON PETROLEUM COMPANY       | Pipe Line    |        50 |   119,819 |
| GALVESTON | 2011-11-15 | ENTERPRISE CRUDE PIPELINE LLC    | Pipe Line    |     2,290 |   119,818 |
| GALVESTON | 2012-03-21 | DUNE OPERATING COMPANY           | Tank Battery |     2,300 |   

### Harris spills

In [71]:
harris = spills_ordered.where(lambda row: row['County'] == 'HARRIS')
harris.select(county_columns).order_by('DateofLoss').print_table(max_column_width=None)

| County | DateofLoss | OperatorNumber                   | Facility     | GrossLoss | Unique_ID |
| ------ | ---------- | -------------------------------- | ------------ | --------- | --------- |
| HARRIS | 2009-02-27 | CLAY, JERRY H.                   | Pipe Line    |        14 |   117,621 |
| HARRIS | 2009-04-10 | BOMCO USA, INC.                  | Tank Battery |       100 |   117,803 |
| HARRIS | 2010-01-02 | GAITHER PETROLEUM CORPORATION    | Tank Battery |        25 |   118,245 |
| HARRIS | 2010-05-17 | ETOCO, L.P.                      | Tank Battery |        30 |   118,517 |
| HARRIS | 2010-06-14 | PLAINS MARKETING, L.P.           | Tank Battery |       100 |   118,512 |
| HARRIS | 2012-01-05 | HOLLIMON OIL CORPORATION         | Tank Battery |        36 |    59,782 |
| HARRIS | 2012-04-03 | NOXXE OIL AND GAS, LLC           | Tank Battery |        12 |    60,013 |
| HARRIS | 2012-04-13 | TANDEM ENERGY CORPORATION        | Pipe Line    |         1 |    60,016 |
| HARRIS | 2012-09-2

### Feb 2011 spills

In [78]:
# spills_2011 = spills.where(lambda row: row['DateofLoss'].year == 2011)
# spills_2011_02 = spills_2011.where(lambda row: row['DateofLoss'].month == 2)
# spills_2011_02.order_by('DateofLoss').select(county_columns).print_table(max_rows=None)

## Spills per year

In [94]:
spills_group_year = spills.group_by('MasterYear')
spills_year_table = spills_group_year.aggregate([
        ('spill_cnt', agate.Count()),
        ('spill_sum', agate.Sum('GrossLoss')),
        ('spill_avg', agate.Mean('GrossLoss')),
        ('spill_med', agate.Median('GrossLoss')),
        ('spill_sum_gal', agate.Sum('GrossLossGal'))
    ]
)
spills_year_table.print_table()

| MasterYear | spill_cnt | spill_sum | spill_avg | spill_med | spill_sum_gal |
| ---------- | --------- | --------- | --------- | --------- | ------------- |
| 2009       |       675 |    55,996 |   82.957… |        30 |     2,351,832 |
| 2010       |       796 |   109,203 |  137.190… |        32 |     4,586,526 |
| 2011       |       869 |    74,933 |   86.229… |        32 |     3,147,186 |
| 2012       |     1,236 |   129,664 |  104.906… |        30 |     5,445,888 |
| 2013       |     1,354 |   118,136 |   87.250… |        30 |     4,961,712 |
| 2014       |     1,405 |   105,388 |   75.009… |        21 |     4,426,296 |
| 2015       |     1,485 |   120,266 |   80.987… |        25 |     5,051,172 |
| 2016       |     1,014 |    78,412 |   77.329… |        25 |     3,293,304 |
| 2017       |        69 |     9,330 |  135.217… |        23 |       391,860 |


### Number of spills

In [24]:
spills_year_table.print_bars('MasterYear', 'spill_cnt', width=60)

MasterYear spill_cnt
2009             675 ▓░░░░░░░░░░░░░                         
2010             796 ▓░░░░░░░░░░░░░░░                       
2011             869 ▓░░░░░░░░░░░░░░░░░                     
2012           1,236 ▓░░░░░░░░░░░░░░░░░░░░░░░               
2013           1,354 ▓░░░░░░░░░░░░░░░░░░░░░░░░░░            
2014           1,405 ▓░░░░░░░░░░░░░░░░░░░░░░░░░░░           
2015           1,485 ▓░░░░░░░░░░░░░░░░░░░░░░░░░░░░          
2016           1,014 ▓░░░░░░░░░░░░░░░░░░░                   
2017              69 ▓░                                     
                     +---------+--------+--------+---------+
                     0        500     1,000    1,500   2,000


### Barrels per year spilled

In [25]:
spills_year_table.print_bars('MasterYear', 'spill_sum', width=60)

MasterYear spill_sum
2009          55,996 ▓░░░░░░░░░░░                           
2010         109,203 ▓░░░░░░░░░░░░░░░░░░░░░                 
2011          74,933 ▓░░░░░░░░░░░░░░                        
2012         129,664 ▓░░░░░░░░░░░░░░░░░░░░░░░░░             
2013         118,136 ▓░░░░░░░░░░░░░░░░░░░░░░                
2014         105,388 ▓░░░░░░░░░░░░░░░░░░░░                  
2015         120,266 ▓░░░░░░░░░░░░░░░░░░░░░░░               
2016          78,412 ▓░░░░░░░░░░░░░░░                       
2017           9,330 ▓░░                                    
                     +---------+--------+------------------+
                     0      50,000   100,000         200,000


## Searching for the company
Looking for Magellan Midstream Partners or anything like them any any columns

In [26]:
# this function allows me to imput a column name
# and get distinct matchinhg results
def search_column(column_name):
    result = spills.where(lambda row: re.search(
        'MAGELLAN|MIDSTREAM', str(row[column_name])
    ))
    
    print('Searching the \'{}\' column for MAGELLAN or MIDSTREAM \
    and found these distinct names. (If blank, there are none):'.format(
        column_name
        ))
    
    return result.columns[column_name].values_distinct()

In [27]:
search_column('OperatorNumber')

Searching the 'OperatorNumber' column for MAGELLAN or MIDSTREAM     and found these distinct names. (If blank, there are none):


('DCP MIDSTREAM, LP',
 'CHESAPEAKE MIDSTREAM PARTNERSLLC',
 'VITOL MIDSTREAM PIPELINE LLC',
 'GAVILON MIDSTREAM ENERGY, LLC',
 'VITOL MIDSTREAM LLC',
 'ENLINK MIDSTREAM SERVICES, LLC',
 'TARGA MIDSTREAM SERVICES LP',
 'TRINITY RIVER MIDSTREAM, LLC',
 'EFS MIDSTREAM LLC',
 'DCP MIDSTREAM MARKETING, LP',
 'DCP MIDSTREAM MARKETING, LLC',
 'TARGA MIDSTREAM SERVICES LLC',
 'TALCO MIDSTREAM ASSETS, LTD.',
 'PARADIGM MIDSTREAM SVCS-ST, LLC',
 'NAVARRO MIDSTREAM SERVICES, LLC',
 'ENABLE MIDSTREAM PARTNERS, LP',
 'TARGA SOUTHTEX MIDSTREAM CO LP',
 'CCS MIDSTREAM SERVICES, LLC')

In [28]:
search_column('LeaseName')

Searching the 'LeaseName' column for MAGELLAN or MIDSTREAM     and found these distinct names. (If blank, there are none):


()

In [29]:
search_column('FieldNumber')

Searching the 'FieldNumber' column for MAGELLAN or MIDSTREAM     and found these distinct names. (If blank, there are none):


()

In [30]:
search_column('Location')

Searching the 'Location' column for MAGELLAN or MIDSTREAM     and found these distinct names. (If blank, there are none):


('ROUND TOP BOOSTER IS A NATURAL GAS, MIDSTREAM COMPRESSOR STATION',)

In [31]:
search_column('Remarks')

Searching the 'Remarks' column for MAGELLAN or MIDSTREAM     and found these distinct names. (If blank, there are none):


()

### Bulletproofing

Various looks to make sure all is good

In [32]:
# check to make sure UniqueID is unique
pivot_UniqueID = spills.pivot('Unique_ID')
pivot_UniqueID.limit(5).print_table()

| Unique_ID | Count |
| --------- | ----- |
|   117,569 |     1 |
|   117,565 |     1 |
|   117,622 |     1 |
|   118,015 |     1 |
|   117,634 |     1 |


In [33]:
# checks this 'Corrected_Report' column, which is only in some files, to see if any are true.
# None are
spills_corrected_pivot = spills.pivot('Corrected_Report', 'MasterYear')
spills_corrected_pivot.print_table(max_columns=None)

| Corrected_Report | 2009 | 2010 | 2011 |  2014 |  2012 |  2013 |  2015 |  2016 | 2017 |
| ---------------- | ---- | ---- | ---- | ----- | ----- | ----- | ----- | ----- | ---- |
|            False |  675 |  796 |  869 | 1,405 |     0 |     0 |     0 |     0 |    0 |
|                  |    0 |    0 |    0 |     0 | 1,236 | 1,354 | 1,485 | 1,014 |   69 |
