## Data Preprocessing II - Coordinates

Due to the fact, that the data provided by the city of new york only provides information about the streetname and the house number, we have to send a request for each tuple to the google API. The API will return a json-file which includes the coordinate of the ticket. 
We will need those coordinates for later visualization in the "gmaps_Squad"-notebook.

Because the tickets are recorded only with a housenumber and streetname, we have a small derivation in our coordinates.

In [1]:
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

apikey = "AIzaSyApj3xGPGx1naRs2DZiUlJ6moRftzWzTJU"

precinct = ''
squad = 'D'
date = '09272016' #MMDDYYYY

datadirIn = '../../data/nyc_parking_tickets/squad_route/'
datadirOut = '../../data/nyc_parking_tickets/squad_route/'
fileNameIn = 'squad_route_time_' + squad + precinct + '_' + date + '_Parking_Violations_17'
fileNameOut = 'geo_' + fileNameIn
fileFormatIn = '.fth'
fileFormatOut = '.fth'
pathIn = datadirIn + fileNameIn + fileFormatIn
pathOut = datadirOut + fileNameOut + fileFormatOut
print('In ' + pathIn)
print('Out' + pathOut)

In ../../data/nyc_parking_tickets/squad_route/squad_route_time_D_09272016_Parking_Violations_17.fth
Out../../data/nyc_parking_tickets/squad_route/geo_squad_route_time_D_09272016_Parking_Violations_17.fth


Before we can start processing the data to eventually get the coordinates, we have to read a dataset already processed by *DataPreprocessing_SquadRoute*.

Make sure you set the correct variables for *squad, precinct* and *date* in the cell above.

In [2]:
#preprocessing for squad_data already selected by date
import feather as fth

data = fth.read_dataframe(pathIn)
data = data[['Street Name', 'House Number']]
data = np.array(data)
print('Done!')

Done!


### Google Maps API
We use the googlemaps module to pass our geocode request to the Google Maps API. As input we use the *Street Name* and the *House Number* from our dataset and in addition we add *"New York" and "USA"* to make our request more accurate and eliminate false answers.

After the requests are finished, the coordinates are added to our dataset as new columns and then saved as *feather* files for further processing and visualization

In [27]:
#convert location to coordinates and add it to input feather -> write to output feather
import googlemaps
import json
from ipywidgets import FloatProgress
from IPython.display import display

#geodata = data[['Street Name','House Number']]

datasetCount = len(data)
f = FloatProgress(min=0, max=datasetCount) #Successful

gmapsAPI = googlemaps.Client(key=apikey)

def geocode( x ):
    geocode = gmapsAPI.geocode(str(x) + ', New York City' + ', USA')
    geocode = np.array(geocode)

    global e
    try:
        x[0] = geocode[0].get('geometry').get('location').get('lat')
        x[1] = geocode[0].get('geometry').get('location').get('lng')
    except IndexError:
        x[0] = 0
        x[1] = 0
        e+=1
    f.value += 1
    print('Successful: ' + str(f.value) + '/' + str(datasetCount) + ' Errors: ' + str(e), end='\r')
    return x

f.value = 0
display(f)
e = 0
print('Successful: ' + str(f.value) + '/' + str(datasetCount) + ' Errors: ' + str(e), end='\r')

data = [geocode(x) for x in data]
data = pd.DataFrame(data)

datafth = fth.read_dataframe(pathIn)

datafth['lat'] = data[0]
datafth['lng'] = data[1]

fth.write_dataframe(datafth, pathOut)
print('Saved file as ' + pathOut)
#data[lat] = geodata[]
#data[lng] = geodata[]
print(' Finished!')

Saved file as ../../data/nyc_parking_tickets/squad_route/geo_squad_route_time_D_09272016_Parking_Violations_17.fth
 Finished!


### Old Code

In [7]:
#OLD
# import squad_data selection by date
import feather as fth

precinct = '13'
squad = 'M'

#data = fth.read_dataframe(path + 'squad_route_' + precinct + squad + '_' + file + '.fth')
data = fth.read_dataframe(pathIn)
data = np.array(data)

dateformat = '09/09/2016'#date you want to extract

count = 0
delTuple = []#array of not needed tuples
for x in data:
    #print(x[0])
    if dateformat not in x[0]: 
        delTuple.append(count)
        
    count+=1
    continue
    
print('Tuples deleted: ' + str(len(delTuple)))

#print(delTuple)#print Tuples to be deleted

dataTime = np.delete(data, (delTuple), axis=0)#delete not needed tuples

print(dataTime)#print new array

#data = np.delete(dataTime, ([2]), axis=1)#delete date row

data = [[x[3], x[4]] for x in dataTime] #prepare data for maps
#print(len(data))

Tuples deleted: 67
[]
