# Data normalization assignment
In this assignment, you must take a file from the Nebraska Department of Environmental Quality and make it useful. I want to know how many leaking underground storage tanks there are in each city in Nebraska. 

Yes, the acronym for leaking underground storage tanks is LUST.

To do this, you will need to:
1. [Get the file from the DEQ](http://www.deq.state.ne.us/lustsurf.nsf/pages/sssi). The file you want is called spillfac.csv, but keep this page handy because it has some filter conditions you're going to need.
2. The file that comes from the state is not UTF-8. Follow the walkthrough. Use Excel and csvkit to zap the non-UTF-8 characters.
3. Normalize the data using Open Refine. Specifically, the fields you need to normalize are the owner company -- OWNCO -- and the city the tank is in -- SPCITY.
4. Export your newly cleaned data into a new csv file.
5. Import your newly cleaned up data into Agate.
6. Filter out any leaking underground storage tanks that aren't leaking. (see the documentation from where you downloaded the file)
5. Group it by the OWNCO and count them. 
6. Sort it.
7. Print the top 20 to the screen.


What follows is a small suggestion on something you should do AFTER you have used CSVKit to eliminate the non-UTF-8 characters and BEFORE you've put it into Open Refine.

In [1]:
import agate

In [2]:
lust = agate.Table.from_csv('../../Data/lust-csv.csv')

In [3]:
print(lust)

|--------------------------------------+---------------|
|  column_names                        | column_types  |
|--------------------------------------+---------------|
|  SPILLNO-------                      | Text          |
|  S                                   | Text          |
|  OWNCO--------------------           | Text          |
|  OWNSTREET-----------                | Text          |
|  OWNCITY-------------                | Text          |
|  OS                                  | Text          |
|  OZIP                                | Text          |
|  TY                                  | Number        |
|  DIDATE----                          | Date          |
|  SPLOC     ------------------------- | Text          |
|  SPCITY-------------------           | Text          |
|  SPCOUN-------------------           | Text          |
|  MATERIAL----------------------      | Text          |
|  SFM_ID--                            | Text          |
|  FAC_NAME-----------------   

In [12]:
print(len(lust.rows))

18545


In [11]:
lust1 = lust.where(lambda row: row['S'] != '5')
print(len(lust1.rows))

2379


In [14]:
lust2 = lust1.where(lambda row: row['S'] != '6')
print(len(lust2.rows))

1783


In [15]:
lust3 = lust2.where(lambda row: row['S'] != '7')
print(len(lust3.rows))

1719


In [16]:
lust4 = lust3.where(lambda row: row['S'] != '8')
print(len(lust4.rows))

1447


In [17]:
lust5 = lust4.where(lambda row: row['S'] != 'R')
print(len(lust5.rows))

1264


In [18]:
lust6 = lust5.where(lambda row: row['S'] != 'V')
print(len(lust6.rows))

1226


In [19]:
lust6.to_csv('filteredlust.csv')