# Cleanup and Exploration of 2014 AQ Data

As this is the smallest data file, it will be used for developing cleaning and integration functions for use for the rest of the dataset.  Some details of the json processing may need to be performed later, as the malformed json entries didn't start showing up until later in the process.

In [1]:
import pandas as pd
import json as json

In [2]:
df = pd.read_csv('2014 AQ.csv')

In [3]:
df.head()

Unnamed: 0,date,parameter,location,value,unit,city,attribution,averagingperiod,coordinates,country,sourcename,sourcetype,mobile
0,"{utc=2014-07-17T01:00:00.000Z, local=2014-07-1...",pm25,SPARTAN - Mammoth Cave,5.6,µg/m³,Mammoth Cave NP,"[{name=SPARTAN Network, url=http://www.spartan...","{unit=hours, value=1.0}","{latitude=37.132, longitude=-86.148}",US,Spartan,research,False
1,"{utc=2014-07-17T02:00:00.000Z, local=2014-07-1...",pm25,SPARTAN - Mammoth Cave,5.4,µg/m³,Mammoth Cave NP,"[{name=SPARTAN Network, url=http://www.spartan...","{unit=hours, value=1.0}","{latitude=37.132, longitude=-86.148}",US,Spartan,research,False
2,"{utc=2014-07-17T00:00:00.000Z, local=2014-07-1...",pm25,SPARTAN - Mammoth Cave,5.1,µg/m³,Mammoth Cave NP,"[{name=SPARTAN Network, url=http://www.spartan...","{unit=hours, value=1.0}","{latitude=37.132, longitude=-86.148}",US,Spartan,research,False
3,"{utc=2014-07-16T22:00:00.000Z, local=2014-07-1...",pm25,SPARTAN - Mammoth Cave,4.1,µg/m³,Mammoth Cave NP,"[{name=SPARTAN Network, url=http://www.spartan...","{unit=hours, value=1.0}","{latitude=37.132, longitude=-86.148}",US,Spartan,research,False
4,"{utc=2014-07-16T16:00:00.000Z, local=2014-07-1...",pm25,SPARTAN - Mammoth Cave,5.6,µg/m³,Mammoth Cave NP,"[{name=SPARTAN Network, url=http://www.spartan...","{unit=hours, value=1.0}","{latitude=37.132, longitude=-86.148}",US,Spartan,research,False


In [4]:
test_dist = json.loads(df['date'][0]) #Seeing if I can load the date objects as json into dict or if they will need to be processed.

JSONDecodeError: Expecting property name enclosed in double quotes: line 1 column 2 (char 1)

The entry is not valid json.  The field will need processing code.

In [5]:
def validate_json_fields(field: str) -> bool:
    """This function validates a JSON entry is valid.  Returns True if so, False if otherwise."""
    try:
        json.loads(field)
        return True
    except ValueError:
        return False

In [6]:
def validate_json_column(series: pd.Series):
    temp_df = pd.DataFrame()
    temp_df['values'] = series
    temp_df['validity'] = series.apply(validate_json_fields)
    return temp_df

## Validation of composite fields as json

In [7]:
output = validate_json_column(df['date'])

In [8]:
output.head()

Unnamed: 0,values,validity
0,"{utc=2014-07-17T01:00:00.000Z, local=2014-07-1...",False
1,"{utc=2014-07-17T02:00:00.000Z, local=2014-07-1...",False
2,"{utc=2014-07-17T00:00:00.000Z, local=2014-07-1...",False
3,"{utc=2014-07-16T22:00:00.000Z, local=2014-07-1...",False
4,"{utc=2014-07-16T16:00:00.000Z, local=2014-07-1...",False


In [9]:
output[output['validity'] == True].describe()

Unnamed: 0,values,validity
count,0.0,0.0
unique,0.0,0.0
top,,
freq,,


In [10]:
output = validate_json_column(df['attribution'])
output.head()

Unnamed: 0,values,validity
0,"[{name=SPARTAN Network, url=http://www.spartan...",False
1,"[{name=SPARTAN Network, url=http://www.spartan...",False
2,"[{name=SPARTAN Network, url=http://www.spartan...",False
3,"[{name=SPARTAN Network, url=http://www.spartan...",False
4,"[{name=SPARTAN Network, url=http://www.spartan...",False


In [11]:
output[output['validity'] == True].describe()

Unnamed: 0,values,validity
count,0.0,0.0
unique,0.0,0.0
top,,
freq,,


In [12]:
output = validate_json_column(df['averagingperiod'])
output.head()

Unnamed: 0,values,validity
0,"{unit=hours, value=1.0}",False
1,"{unit=hours, value=1.0}",False
2,"{unit=hours, value=1.0}",False
3,"{unit=hours, value=1.0}",False
4,"{unit=hours, value=1.0}",False


In [13]:
output[output['validity'] == True].describe()

Unnamed: 0,values,validity
count,0.0,0.0
unique,0.0,0.0
top,,
freq,,


In [14]:
output = validate_json_column(df['coordinates'])
output.head()

Unnamed: 0,values,validity
0,"{latitude=37.132, longitude=-86.148}",False
1,"{latitude=37.132, longitude=-86.148}",False
2,"{latitude=37.132, longitude=-86.148}",False
3,"{latitude=37.132, longitude=-86.148}",False
4,"{latitude=37.132, longitude=-86.148}",False


In [15]:
output[output['validity'] == True].describe()

Unnamed: 0,values,validity
count,0.0,0.0
unique,0.0,0.0
top,,
freq,,


All of the composite fields are in a "JSON-like format"  need to convert to JSON.

## Reprocessing the Multivalue Fields into Valid JSON

In [16]:
def correct_field_to_json(arg: str) -> str:
    string = arg.lstrip("{{")
    string = string.rstrip("]}")
    strings = string.split(",")
    new_strings = []
    for entry in strings:
        index = entry.find('=')
        entry = '"' + entry[0:index] + '"' + ':' + '"'+ entry[index+1:len(entry)]+'"'
        new_strings.append(entry)
    output_string = (',').join(entry for entry in new_strings)
    output_string = '{' + output_string + '}'
    return output_string

In [17]:
def correct_json_column(series: pd.Series, column_name: str) -> pd.DataFrame:
    temp_df = pd.DataFrame()
    temp_df[column_name] = series.apply(correct_field_to_json)
    return temp_df

In [18]:
test = correct_json_column(df['date'], 'test date conversion')
results = validate_json_column(test['test date conversion'])
results[results['validity'] == False].describe()

Unnamed: 0,values,validity
count,0.0,0.0
unique,0.0,0.0
top,,
freq,,


In [19]:
df['cleaneddate'] = test

In [20]:
test = correct_json_column(df['attribution'], 'test attribute conversion')
results = validate_json_column(test['test attribute conversion'])
results[results['validity'] == False].describe()

Unnamed: 0,values,validity
count,0.0,0.0
unique,0.0,0.0
top,,
freq,,


In [21]:
results.head()

Unnamed: 0,values,validity
0,"{""[{name"":""SPARTAN Network"","" url"":""http://www...",True
1,"{""[{name"":""SPARTAN Network"","" url"":""http://www...",True
2,"{""[{name"":""SPARTAN Network"","" url"":""http://www...",True
3,"{""[{name"":""SPARTAN Network"","" url"":""http://www...",True
4,"{""[{name"":""SPARTAN Network"","" url"":""http://www...",True


In [22]:
df['cleanedattribution'] = test

In [23]:
test = correct_json_column(df['averagingperiod'], 'test averaging period conversion')
results = validate_json_column(test['test averaging period conversion'])
results[results['validity'] == False].describe()

Unnamed: 0,values,validity
count,0.0,0.0
unique,0.0,0.0
top,,
freq,,


In [24]:
df['cleanedaveragingperiod'] = test

In [25]:
test = correct_json_column(df['coordinates'], 'test coordinate conversion')
results = validate_json_column(test['test coordinate conversion'])
results[results['validity'] == False].describe()

Unnamed: 0,values,validity
count,0.0,0.0
unique,0.0,0.0
top,,
freq,,


In [26]:
df['cleanedcoordinates'] = test

In [27]:
df.head()

Unnamed: 0,date,parameter,location,value,unit,city,attribution,averagingperiod,coordinates,country,sourcename,sourcetype,mobile,cleaneddate,cleanedattribution,cleanedaveragingperiod,cleanedcoordinates
0,"{utc=2014-07-17T01:00:00.000Z, local=2014-07-1...",pm25,SPARTAN - Mammoth Cave,5.6,µg/m³,Mammoth Cave NP,"[{name=SPARTAN Network, url=http://www.spartan...","{unit=hours, value=1.0}","{latitude=37.132, longitude=-86.148}",US,Spartan,research,False,"{""utc"":""2014-07-17T01:00:00.000Z"","" local"":""20...","{""[{name"":""SPARTAN Network"","" url"":""http://www...","{""unit"":""hours"","" value"":""1.0""}","{""latitude"":""37.132"","" longitude"":""-86.148""}"
1,"{utc=2014-07-17T02:00:00.000Z, local=2014-07-1...",pm25,SPARTAN - Mammoth Cave,5.4,µg/m³,Mammoth Cave NP,"[{name=SPARTAN Network, url=http://www.spartan...","{unit=hours, value=1.0}","{latitude=37.132, longitude=-86.148}",US,Spartan,research,False,"{""utc"":""2014-07-17T02:00:00.000Z"","" local"":""20...","{""[{name"":""SPARTAN Network"","" url"":""http://www...","{""unit"":""hours"","" value"":""1.0""}","{""latitude"":""37.132"","" longitude"":""-86.148""}"
2,"{utc=2014-07-17T00:00:00.000Z, local=2014-07-1...",pm25,SPARTAN - Mammoth Cave,5.1,µg/m³,Mammoth Cave NP,"[{name=SPARTAN Network, url=http://www.spartan...","{unit=hours, value=1.0}","{latitude=37.132, longitude=-86.148}",US,Spartan,research,False,"{""utc"":""2014-07-17T00:00:00.000Z"","" local"":""20...","{""[{name"":""SPARTAN Network"","" url"":""http://www...","{""unit"":""hours"","" value"":""1.0""}","{""latitude"":""37.132"","" longitude"":""-86.148""}"
3,"{utc=2014-07-16T22:00:00.000Z, local=2014-07-1...",pm25,SPARTAN - Mammoth Cave,4.1,µg/m³,Mammoth Cave NP,"[{name=SPARTAN Network, url=http://www.spartan...","{unit=hours, value=1.0}","{latitude=37.132, longitude=-86.148}",US,Spartan,research,False,"{""utc"":""2014-07-16T22:00:00.000Z"","" local"":""20...","{""[{name"":""SPARTAN Network"","" url"":""http://www...","{""unit"":""hours"","" value"":""1.0""}","{""latitude"":""37.132"","" longitude"":""-86.148""}"
4,"{utc=2014-07-16T16:00:00.000Z, local=2014-07-1...",pm25,SPARTAN - Mammoth Cave,5.6,µg/m³,Mammoth Cave NP,"[{name=SPARTAN Network, url=http://www.spartan...","{unit=hours, value=1.0}","{latitude=37.132, longitude=-86.148}",US,Spartan,research,False,"{""utc"":""2014-07-16T16:00:00.000Z"","" local"":""20...","{""[{name"":""SPARTAN Network"","" url"":""http://www...","{""unit"":""hours"","" value"":""1.0""}","{""latitude"":""37.132"","" longitude"":""-86.148""}"


In [28]:
df_trans = df

In [29]:
df_trans = pd.concat([df_trans, df_trans.cleanedcoordinates.apply(json.loads).apply(pd.Series)], axis = 1)

In [30]:
df_trans.head()

Unnamed: 0,date,parameter,location,value,unit,city,attribution,averagingperiod,coordinates,country,sourcename,sourcetype,mobile,cleaneddate,cleanedattribution,cleanedaveragingperiod,cleanedcoordinates,latitude,longitude
0,"{utc=2014-07-17T01:00:00.000Z, local=2014-07-1...",pm25,SPARTAN - Mammoth Cave,5.6,µg/m³,Mammoth Cave NP,"[{name=SPARTAN Network, url=http://www.spartan...","{unit=hours, value=1.0}","{latitude=37.132, longitude=-86.148}",US,Spartan,research,False,"{""utc"":""2014-07-17T01:00:00.000Z"","" local"":""20...","{""[{name"":""SPARTAN Network"","" url"":""http://www...","{""unit"":""hours"","" value"":""1.0""}","{""latitude"":""37.132"","" longitude"":""-86.148""}",37.132,-86.148
1,"{utc=2014-07-17T02:00:00.000Z, local=2014-07-1...",pm25,SPARTAN - Mammoth Cave,5.4,µg/m³,Mammoth Cave NP,"[{name=SPARTAN Network, url=http://www.spartan...","{unit=hours, value=1.0}","{latitude=37.132, longitude=-86.148}",US,Spartan,research,False,"{""utc"":""2014-07-17T02:00:00.000Z"","" local"":""20...","{""[{name"":""SPARTAN Network"","" url"":""http://www...","{""unit"":""hours"","" value"":""1.0""}","{""latitude"":""37.132"","" longitude"":""-86.148""}",37.132,-86.148
2,"{utc=2014-07-17T00:00:00.000Z, local=2014-07-1...",pm25,SPARTAN - Mammoth Cave,5.1,µg/m³,Mammoth Cave NP,"[{name=SPARTAN Network, url=http://www.spartan...","{unit=hours, value=1.0}","{latitude=37.132, longitude=-86.148}",US,Spartan,research,False,"{""utc"":""2014-07-17T00:00:00.000Z"","" local"":""20...","{""[{name"":""SPARTAN Network"","" url"":""http://www...","{""unit"":""hours"","" value"":""1.0""}","{""latitude"":""37.132"","" longitude"":""-86.148""}",37.132,-86.148
3,"{utc=2014-07-16T22:00:00.000Z, local=2014-07-1...",pm25,SPARTAN - Mammoth Cave,4.1,µg/m³,Mammoth Cave NP,"[{name=SPARTAN Network, url=http://www.spartan...","{unit=hours, value=1.0}","{latitude=37.132, longitude=-86.148}",US,Spartan,research,False,"{""utc"":""2014-07-16T22:00:00.000Z"","" local"":""20...","{""[{name"":""SPARTAN Network"","" url"":""http://www...","{""unit"":""hours"","" value"":""1.0""}","{""latitude"":""37.132"","" longitude"":""-86.148""}",37.132,-86.148
4,"{utc=2014-07-16T16:00:00.000Z, local=2014-07-1...",pm25,SPARTAN - Mammoth Cave,5.6,µg/m³,Mammoth Cave NP,"[{name=SPARTAN Network, url=http://www.spartan...","{unit=hours, value=1.0}","{latitude=37.132, longitude=-86.148}",US,Spartan,research,False,"{""utc"":""2014-07-16T16:00:00.000Z"","" local"":""20...","{""[{name"":""SPARTAN Network"","" url"":""http://www...","{""unit"":""hours"","" value"":""1.0""}","{""latitude"":""37.132"","" longitude"":""-86.148""}",37.132,-86.148


In [31]:
df_trans = pd.concat([df_trans, df_trans.cleaneddate.apply(json.loads).apply(pd.Series)], axis = 1)

In [32]:
df_trans.head()

Unnamed: 0,date,parameter,location,value,unit,city,attribution,averagingperiod,coordinates,country,...,sourcetype,mobile,cleaneddate,cleanedattribution,cleanedaveragingperiod,cleanedcoordinates,latitude,longitude,utc,local
0,"{utc=2014-07-17T01:00:00.000Z, local=2014-07-1...",pm25,SPARTAN - Mammoth Cave,5.6,µg/m³,Mammoth Cave NP,"[{name=SPARTAN Network, url=http://www.spartan...","{unit=hours, value=1.0}","{latitude=37.132, longitude=-86.148}",US,...,research,False,"{""utc"":""2014-07-17T01:00:00.000Z"","" local"":""20...","{""[{name"":""SPARTAN Network"","" url"":""http://www...","{""unit"":""hours"","" value"":""1.0""}","{""latitude"":""37.132"","" longitude"":""-86.148""}",37.132,-86.148,2014-07-17T01:00:00.000Z,2014-07-16T21:00:00-04:00
1,"{utc=2014-07-17T02:00:00.000Z, local=2014-07-1...",pm25,SPARTAN - Mammoth Cave,5.4,µg/m³,Mammoth Cave NP,"[{name=SPARTAN Network, url=http://www.spartan...","{unit=hours, value=1.0}","{latitude=37.132, longitude=-86.148}",US,...,research,False,"{""utc"":""2014-07-17T02:00:00.000Z"","" local"":""20...","{""[{name"":""SPARTAN Network"","" url"":""http://www...","{""unit"":""hours"","" value"":""1.0""}","{""latitude"":""37.132"","" longitude"":""-86.148""}",37.132,-86.148,2014-07-17T02:00:00.000Z,2014-07-16T22:00:00-04:00
2,"{utc=2014-07-17T00:00:00.000Z, local=2014-07-1...",pm25,SPARTAN - Mammoth Cave,5.1,µg/m³,Mammoth Cave NP,"[{name=SPARTAN Network, url=http://www.spartan...","{unit=hours, value=1.0}","{latitude=37.132, longitude=-86.148}",US,...,research,False,"{""utc"":""2014-07-17T00:00:00.000Z"","" local"":""20...","{""[{name"":""SPARTAN Network"","" url"":""http://www...","{""unit"":""hours"","" value"":""1.0""}","{""latitude"":""37.132"","" longitude"":""-86.148""}",37.132,-86.148,2014-07-17T00:00:00.000Z,2014-07-16T20:00:00-04:00
3,"{utc=2014-07-16T22:00:00.000Z, local=2014-07-1...",pm25,SPARTAN - Mammoth Cave,4.1,µg/m³,Mammoth Cave NP,"[{name=SPARTAN Network, url=http://www.spartan...","{unit=hours, value=1.0}","{latitude=37.132, longitude=-86.148}",US,...,research,False,"{""utc"":""2014-07-16T22:00:00.000Z"","" local"":""20...","{""[{name"":""SPARTAN Network"","" url"":""http://www...","{""unit"":""hours"","" value"":""1.0""}","{""latitude"":""37.132"","" longitude"":""-86.148""}",37.132,-86.148,2014-07-16T22:00:00.000Z,2014-07-16T18:00:00-04:00
4,"{utc=2014-07-16T16:00:00.000Z, local=2014-07-1...",pm25,SPARTAN - Mammoth Cave,5.6,µg/m³,Mammoth Cave NP,"[{name=SPARTAN Network, url=http://www.spartan...","{unit=hours, value=1.0}","{latitude=37.132, longitude=-86.148}",US,...,research,False,"{""utc"":""2014-07-16T16:00:00.000Z"","" local"":""20...","{""[{name"":""SPARTAN Network"","" url"":""http://www...","{""unit"":""hours"","" value"":""1.0""}","{""latitude"":""37.132"","" longitude"":""-86.148""}",37.132,-86.148,2014-07-16T16:00:00.000Z,2014-07-16T12:00:00-04:00


In [33]:
df_trans = pd.concat([df_trans, df_trans.cleanedattribution.apply(json.loads).apply(pd.Series)], axis = 1)

In [34]:
df_trans.head()

Unnamed: 0,date,parameter,location,value,unit,city,attribution,averagingperiod,coordinates,country,...,cleaneddate,cleanedattribution,cleanedaveragingperiod,cleanedcoordinates,latitude,longitude,utc,local,[{name,url
0,"{utc=2014-07-17T01:00:00.000Z, local=2014-07-1...",pm25,SPARTAN - Mammoth Cave,5.6,µg/m³,Mammoth Cave NP,"[{name=SPARTAN Network, url=http://www.spartan...","{unit=hours, value=1.0}","{latitude=37.132, longitude=-86.148}",US,...,"{""utc"":""2014-07-17T01:00:00.000Z"","" local"":""20...","{""[{name"":""SPARTAN Network"","" url"":""http://www...","{""unit"":""hours"","" value"":""1.0""}","{""latitude"":""37.132"","" longitude"":""-86.148""}",37.132,-86.148,2014-07-17T01:00:00.000Z,2014-07-16T21:00:00-04:00,SPARTAN Network,http://www.spartan-network.org/
1,"{utc=2014-07-17T02:00:00.000Z, local=2014-07-1...",pm25,SPARTAN - Mammoth Cave,5.4,µg/m³,Mammoth Cave NP,"[{name=SPARTAN Network, url=http://www.spartan...","{unit=hours, value=1.0}","{latitude=37.132, longitude=-86.148}",US,...,"{""utc"":""2014-07-17T02:00:00.000Z"","" local"":""20...","{""[{name"":""SPARTAN Network"","" url"":""http://www...","{""unit"":""hours"","" value"":""1.0""}","{""latitude"":""37.132"","" longitude"":""-86.148""}",37.132,-86.148,2014-07-17T02:00:00.000Z,2014-07-16T22:00:00-04:00,SPARTAN Network,http://www.spartan-network.org/
2,"{utc=2014-07-17T00:00:00.000Z, local=2014-07-1...",pm25,SPARTAN - Mammoth Cave,5.1,µg/m³,Mammoth Cave NP,"[{name=SPARTAN Network, url=http://www.spartan...","{unit=hours, value=1.0}","{latitude=37.132, longitude=-86.148}",US,...,"{""utc"":""2014-07-17T00:00:00.000Z"","" local"":""20...","{""[{name"":""SPARTAN Network"","" url"":""http://www...","{""unit"":""hours"","" value"":""1.0""}","{""latitude"":""37.132"","" longitude"":""-86.148""}",37.132,-86.148,2014-07-17T00:00:00.000Z,2014-07-16T20:00:00-04:00,SPARTAN Network,http://www.spartan-network.org/
3,"{utc=2014-07-16T22:00:00.000Z, local=2014-07-1...",pm25,SPARTAN - Mammoth Cave,4.1,µg/m³,Mammoth Cave NP,"[{name=SPARTAN Network, url=http://www.spartan...","{unit=hours, value=1.0}","{latitude=37.132, longitude=-86.148}",US,...,"{""utc"":""2014-07-16T22:00:00.000Z"","" local"":""20...","{""[{name"":""SPARTAN Network"","" url"":""http://www...","{""unit"":""hours"","" value"":""1.0""}","{""latitude"":""37.132"","" longitude"":""-86.148""}",37.132,-86.148,2014-07-16T22:00:00.000Z,2014-07-16T18:00:00-04:00,SPARTAN Network,http://www.spartan-network.org/
4,"{utc=2014-07-16T16:00:00.000Z, local=2014-07-1...",pm25,SPARTAN - Mammoth Cave,5.6,µg/m³,Mammoth Cave NP,"[{name=SPARTAN Network, url=http://www.spartan...","{unit=hours, value=1.0}","{latitude=37.132, longitude=-86.148}",US,...,"{""utc"":""2014-07-16T16:00:00.000Z"","" local"":""20...","{""[{name"":""SPARTAN Network"","" url"":""http://www...","{""unit"":""hours"","" value"":""1.0""}","{""latitude"":""37.132"","" longitude"":""-86.148""}",37.132,-86.148,2014-07-16T16:00:00.000Z,2014-07-16T12:00:00-04:00,SPARTAN Network,http://www.spartan-network.org/


In [35]:
df_trans = pd.concat([df_trans, df_trans.cleanedaveragingperiod.apply(json.loads).apply(pd.Series)], axis = 1)

In [36]:
df_trans.head()

Unnamed: 0,date,parameter,location,value,unit,city,attribution,averagingperiod,coordinates,country,...,cleanedaveragingperiod,cleanedcoordinates,latitude,longitude,utc,local,[{name,url,unit.1,value.1
0,"{utc=2014-07-17T01:00:00.000Z, local=2014-07-1...",pm25,SPARTAN - Mammoth Cave,5.6,µg/m³,Mammoth Cave NP,"[{name=SPARTAN Network, url=http://www.spartan...","{unit=hours, value=1.0}","{latitude=37.132, longitude=-86.148}",US,...,"{""unit"":""hours"","" value"":""1.0""}","{""latitude"":""37.132"","" longitude"":""-86.148""}",37.132,-86.148,2014-07-17T01:00:00.000Z,2014-07-16T21:00:00-04:00,SPARTAN Network,http://www.spartan-network.org/,hours,1.0
1,"{utc=2014-07-17T02:00:00.000Z, local=2014-07-1...",pm25,SPARTAN - Mammoth Cave,5.4,µg/m³,Mammoth Cave NP,"[{name=SPARTAN Network, url=http://www.spartan...","{unit=hours, value=1.0}","{latitude=37.132, longitude=-86.148}",US,...,"{""unit"":""hours"","" value"":""1.0""}","{""latitude"":""37.132"","" longitude"":""-86.148""}",37.132,-86.148,2014-07-17T02:00:00.000Z,2014-07-16T22:00:00-04:00,SPARTAN Network,http://www.spartan-network.org/,hours,1.0
2,"{utc=2014-07-17T00:00:00.000Z, local=2014-07-1...",pm25,SPARTAN - Mammoth Cave,5.1,µg/m³,Mammoth Cave NP,"[{name=SPARTAN Network, url=http://www.spartan...","{unit=hours, value=1.0}","{latitude=37.132, longitude=-86.148}",US,...,"{""unit"":""hours"","" value"":""1.0""}","{""latitude"":""37.132"","" longitude"":""-86.148""}",37.132,-86.148,2014-07-17T00:00:00.000Z,2014-07-16T20:00:00-04:00,SPARTAN Network,http://www.spartan-network.org/,hours,1.0
3,"{utc=2014-07-16T22:00:00.000Z, local=2014-07-1...",pm25,SPARTAN - Mammoth Cave,4.1,µg/m³,Mammoth Cave NP,"[{name=SPARTAN Network, url=http://www.spartan...","{unit=hours, value=1.0}","{latitude=37.132, longitude=-86.148}",US,...,"{""unit"":""hours"","" value"":""1.0""}","{""latitude"":""37.132"","" longitude"":""-86.148""}",37.132,-86.148,2014-07-16T22:00:00.000Z,2014-07-16T18:00:00-04:00,SPARTAN Network,http://www.spartan-network.org/,hours,1.0
4,"{utc=2014-07-16T16:00:00.000Z, local=2014-07-1...",pm25,SPARTAN - Mammoth Cave,5.6,µg/m³,Mammoth Cave NP,"[{name=SPARTAN Network, url=http://www.spartan...","{unit=hours, value=1.0}","{latitude=37.132, longitude=-86.148}",US,...,"{""unit"":""hours"","" value"":""1.0""}","{""latitude"":""37.132"","" longitude"":""-86.148""}",37.132,-86.148,2014-07-16T16:00:00.000Z,2014-07-16T12:00:00-04:00,SPARTAN Network,http://www.spartan-network.org/,hours,1.0


In [40]:
df_trans.drop('date', axis =1, inplace=True)

In [41]:
df_trans.head()

Unnamed: 0,parameter,location,value,unit,city,attribution,averagingperiod,coordinates,country,sourcename,...,cleanedaveragingperiod,cleanedcoordinates,latitude,longitude,utc,local,[{name,url,unit.1,value.1
0,pm25,SPARTAN - Mammoth Cave,5.6,µg/m³,Mammoth Cave NP,"[{name=SPARTAN Network, url=http://www.spartan...","{unit=hours, value=1.0}","{latitude=37.132, longitude=-86.148}",US,Spartan,...,"{""unit"":""hours"","" value"":""1.0""}","{""latitude"":""37.132"","" longitude"":""-86.148""}",37.132,-86.148,2014-07-17T01:00:00.000Z,2014-07-16T21:00:00-04:00,SPARTAN Network,http://www.spartan-network.org/,hours,1.0
1,pm25,SPARTAN - Mammoth Cave,5.4,µg/m³,Mammoth Cave NP,"[{name=SPARTAN Network, url=http://www.spartan...","{unit=hours, value=1.0}","{latitude=37.132, longitude=-86.148}",US,Spartan,...,"{""unit"":""hours"","" value"":""1.0""}","{""latitude"":""37.132"","" longitude"":""-86.148""}",37.132,-86.148,2014-07-17T02:00:00.000Z,2014-07-16T22:00:00-04:00,SPARTAN Network,http://www.spartan-network.org/,hours,1.0
2,pm25,SPARTAN - Mammoth Cave,5.1,µg/m³,Mammoth Cave NP,"[{name=SPARTAN Network, url=http://www.spartan...","{unit=hours, value=1.0}","{latitude=37.132, longitude=-86.148}",US,Spartan,...,"{""unit"":""hours"","" value"":""1.0""}","{""latitude"":""37.132"","" longitude"":""-86.148""}",37.132,-86.148,2014-07-17T00:00:00.000Z,2014-07-16T20:00:00-04:00,SPARTAN Network,http://www.spartan-network.org/,hours,1.0
3,pm25,SPARTAN - Mammoth Cave,4.1,µg/m³,Mammoth Cave NP,"[{name=SPARTAN Network, url=http://www.spartan...","{unit=hours, value=1.0}","{latitude=37.132, longitude=-86.148}",US,Spartan,...,"{""unit"":""hours"","" value"":""1.0""}","{""latitude"":""37.132"","" longitude"":""-86.148""}",37.132,-86.148,2014-07-16T22:00:00.000Z,2014-07-16T18:00:00-04:00,SPARTAN Network,http://www.spartan-network.org/,hours,1.0
4,pm25,SPARTAN - Mammoth Cave,5.6,µg/m³,Mammoth Cave NP,"[{name=SPARTAN Network, url=http://www.spartan...","{unit=hours, value=1.0}","{latitude=37.132, longitude=-86.148}",US,Spartan,...,"{""unit"":""hours"","" value"":""1.0""}","{""latitude"":""37.132"","" longitude"":""-86.148""}",37.132,-86.148,2014-07-16T16:00:00.000Z,2014-07-16T12:00:00-04:00,SPARTAN Network,http://www.spartan-network.org/,hours,1.0


In [42]:
df_trans.drop(['attribution','averagingperiod','coordinates','cleanedaveragingperiod','cleanedcoordinates'], axis = 1, inplace = True)

In [43]:
df_trans.head()

Unnamed: 0,parameter,location,value,unit,city,country,sourcename,sourcetype,mobile,cleaneddate,cleanedattribution,latitude,longitude,utc,local,[{name,url,unit.1,value.1
0,pm25,SPARTAN - Mammoth Cave,5.6,µg/m³,Mammoth Cave NP,US,Spartan,research,False,"{""utc"":""2014-07-17T01:00:00.000Z"","" local"":""20...","{""[{name"":""SPARTAN Network"","" url"":""http://www...",37.132,-86.148,2014-07-17T01:00:00.000Z,2014-07-16T21:00:00-04:00,SPARTAN Network,http://www.spartan-network.org/,hours,1.0
1,pm25,SPARTAN - Mammoth Cave,5.4,µg/m³,Mammoth Cave NP,US,Spartan,research,False,"{""utc"":""2014-07-17T02:00:00.000Z"","" local"":""20...","{""[{name"":""SPARTAN Network"","" url"":""http://www...",37.132,-86.148,2014-07-17T02:00:00.000Z,2014-07-16T22:00:00-04:00,SPARTAN Network,http://www.spartan-network.org/,hours,1.0
2,pm25,SPARTAN - Mammoth Cave,5.1,µg/m³,Mammoth Cave NP,US,Spartan,research,False,"{""utc"":""2014-07-17T00:00:00.000Z"","" local"":""20...","{""[{name"":""SPARTAN Network"","" url"":""http://www...",37.132,-86.148,2014-07-17T00:00:00.000Z,2014-07-16T20:00:00-04:00,SPARTAN Network,http://www.spartan-network.org/,hours,1.0
3,pm25,SPARTAN - Mammoth Cave,4.1,µg/m³,Mammoth Cave NP,US,Spartan,research,False,"{""utc"":""2014-07-16T22:00:00.000Z"","" local"":""20...","{""[{name"":""SPARTAN Network"","" url"":""http://www...",37.132,-86.148,2014-07-16T22:00:00.000Z,2014-07-16T18:00:00-04:00,SPARTAN Network,http://www.spartan-network.org/,hours,1.0
4,pm25,SPARTAN - Mammoth Cave,5.6,µg/m³,Mammoth Cave NP,US,Spartan,research,False,"{""utc"":""2014-07-16T16:00:00.000Z"","" local"":""20...","{""[{name"":""SPARTAN Network"","" url"":""http://www...",37.132,-86.148,2014-07-16T16:00:00.000Z,2014-07-16T12:00:00-04:00,SPARTAN Network,http://www.spartan-network.org/,hours,1.0


In [44]:
df_trans.drop(['cleaneddate','cleanedattribution'], axis = 1, inplace = True)

In [45]:
df_trans.head()

Unnamed: 0,parameter,location,value,unit,city,country,sourcename,sourcetype,mobile,latitude,longitude,utc,local,[{name,url,unit.1,value.1
0,pm25,SPARTAN - Mammoth Cave,5.6,µg/m³,Mammoth Cave NP,US,Spartan,research,False,37.132,-86.148,2014-07-17T01:00:00.000Z,2014-07-16T21:00:00-04:00,SPARTAN Network,http://www.spartan-network.org/,hours,1.0
1,pm25,SPARTAN - Mammoth Cave,5.4,µg/m³,Mammoth Cave NP,US,Spartan,research,False,37.132,-86.148,2014-07-17T02:00:00.000Z,2014-07-16T22:00:00-04:00,SPARTAN Network,http://www.spartan-network.org/,hours,1.0
2,pm25,SPARTAN - Mammoth Cave,5.1,µg/m³,Mammoth Cave NP,US,Spartan,research,False,37.132,-86.148,2014-07-17T00:00:00.000Z,2014-07-16T20:00:00-04:00,SPARTAN Network,http://www.spartan-network.org/,hours,1.0
3,pm25,SPARTAN - Mammoth Cave,4.1,µg/m³,Mammoth Cave NP,US,Spartan,research,False,37.132,-86.148,2014-07-16T22:00:00.000Z,2014-07-16T18:00:00-04:00,SPARTAN Network,http://www.spartan-network.org/,hours,1.0
4,pm25,SPARTAN - Mammoth Cave,5.6,µg/m³,Mammoth Cave NP,US,Spartan,research,False,37.132,-86.148,2014-07-16T16:00:00.000Z,2014-07-16T12:00:00-04:00,SPARTAN Network,http://www.spartan-network.org/,hours,1.0


In [46]:
df_trans.rename(columns={'unit': 'concentration unit'}, inplace=True)

In [48]:
df_trans.head()

Unnamed: 0,parameter,location,value,concentration unit,city,country,sourcename,sourcetype,mobile,latitude,longitude,utc,local,[{name,url,concentration unit.1,value.1
0,pm25,SPARTAN - Mammoth Cave,5.6,µg/m³,Mammoth Cave NP,US,Spartan,research,False,37.132,-86.148,2014-07-17T01:00:00.000Z,2014-07-16T21:00:00-04:00,SPARTAN Network,http://www.spartan-network.org/,hours,1.0
1,pm25,SPARTAN - Mammoth Cave,5.4,µg/m³,Mammoth Cave NP,US,Spartan,research,False,37.132,-86.148,2014-07-17T02:00:00.000Z,2014-07-16T22:00:00-04:00,SPARTAN Network,http://www.spartan-network.org/,hours,1.0
2,pm25,SPARTAN - Mammoth Cave,5.1,µg/m³,Mammoth Cave NP,US,Spartan,research,False,37.132,-86.148,2014-07-17T00:00:00.000Z,2014-07-16T20:00:00-04:00,SPARTAN Network,http://www.spartan-network.org/,hours,1.0
3,pm25,SPARTAN - Mammoth Cave,4.1,µg/m³,Mammoth Cave NP,US,Spartan,research,False,37.132,-86.148,2014-07-16T22:00:00.000Z,2014-07-16T18:00:00-04:00,SPARTAN Network,http://www.spartan-network.org/,hours,1.0
4,pm25,SPARTAN - Mammoth Cave,5.6,µg/m³,Mammoth Cave NP,US,Spartan,research,False,37.132,-86.148,2014-07-16T16:00:00.000Z,2014-07-16T12:00:00-04:00,SPARTAN Network,http://www.spartan-network.org/,hours,1.0


In [70]:
df_trans.rename(columns={df_trans.columns[15]: 'averaging time unit'}, inplace=True)

In [61]:
df_trans.rename(columns={df_trans.columns[13]: 'source'}, inplace=True)

In [62]:
df_trans.head()

Unnamed: 0,parameter,location,value,concentration unit,city,country,sourcename,sourcetype,mobile,latitude,longitude,utc,local,source,url,concentration unit.1,value.1
0,pm25,SPARTAN - Mammoth Cave,5.6,µg/m³,Mammoth Cave NP,US,Spartan,research,False,37.132,-86.148,2014-07-17T01:00:00.000Z,2014-07-16T21:00:00-04:00,SPARTAN Network,http://www.spartan-network.org/,hours,1.0
1,pm25,SPARTAN - Mammoth Cave,5.4,µg/m³,Mammoth Cave NP,US,Spartan,research,False,37.132,-86.148,2014-07-17T02:00:00.000Z,2014-07-16T22:00:00-04:00,SPARTAN Network,http://www.spartan-network.org/,hours,1.0
2,pm25,SPARTAN - Mammoth Cave,5.1,µg/m³,Mammoth Cave NP,US,Spartan,research,False,37.132,-86.148,2014-07-17T00:00:00.000Z,2014-07-16T20:00:00-04:00,SPARTAN Network,http://www.spartan-network.org/,hours,1.0
3,pm25,SPARTAN - Mammoth Cave,4.1,µg/m³,Mammoth Cave NP,US,Spartan,research,False,37.132,-86.148,2014-07-16T22:00:00.000Z,2014-07-16T18:00:00-04:00,SPARTAN Network,http://www.spartan-network.org/,hours,1.0
4,pm25,SPARTAN - Mammoth Cave,5.6,µg/m³,Mammoth Cave NP,US,Spartan,research,False,37.132,-86.148,2014-07-16T16:00:00.000Z,2014-07-16T12:00:00-04:00,SPARTAN Network,http://www.spartan-network.org/,hours,1.0


In [68]:
df_trans.rename(columns={df_trans.columns[16]: 'averaging time'},inplace=True)

In [71]:
df_trans.head()

Unnamed: 0,parameter,location,value,averaging time unit,city,country,sourcename,sourcetype,mobile,latitude,longitude,utc,local,source,url,averaging time unit.1,averaging time
0,pm25,SPARTAN - Mammoth Cave,5.6,µg/m³,Mammoth Cave NP,US,Spartan,research,False,37.132,-86.148,2014-07-17T01:00:00.000Z,2014-07-16T21:00:00-04:00,SPARTAN Network,http://www.spartan-network.org/,hours,1.0
1,pm25,SPARTAN - Mammoth Cave,5.4,µg/m³,Mammoth Cave NP,US,Spartan,research,False,37.132,-86.148,2014-07-17T02:00:00.000Z,2014-07-16T22:00:00-04:00,SPARTAN Network,http://www.spartan-network.org/,hours,1.0
2,pm25,SPARTAN - Mammoth Cave,5.1,µg/m³,Mammoth Cave NP,US,Spartan,research,False,37.132,-86.148,2014-07-17T00:00:00.000Z,2014-07-16T20:00:00-04:00,SPARTAN Network,http://www.spartan-network.org/,hours,1.0
3,pm25,SPARTAN - Mammoth Cave,4.1,µg/m³,Mammoth Cave NP,US,Spartan,research,False,37.132,-86.148,2014-07-16T22:00:00.000Z,2014-07-16T18:00:00-04:00,SPARTAN Network,http://www.spartan-network.org/,hours,1.0
4,pm25,SPARTAN - Mammoth Cave,5.6,µg/m³,Mammoth Cave NP,US,Spartan,research,False,37.132,-86.148,2014-07-16T16:00:00.000Z,2014-07-16T12:00:00-04:00,SPARTAN Network,http://www.spartan-network.org/,hours,1.0


In [72]:
df_trans.to_csv('2014 AQ Clean.csv', index=False)