# Getting Apartment Coordinates

To gather the apartment addresses I need for this project, I used Zillow to gather apartment listings based on the following criteria:  

- Price Range: $1.2-2k/month  
- Beds/Bath: 1+/1+

I gathered about 150 listings that fit this criteria. Each listing has an address, but I will need the listing's coordinates (latitude and longitude) for future calculations. 

To get each address coordinates, I used geopy's Nominatim. I used [this tutorial](https://github.com/tiffsea/python-code/blob/master/demos/geopy_query_addresses.ipynb) to gather the coordinates.

In [None]:
#Import Libraries
from geopy.geocoders import Nominatim
import pandas as pd
import numpy as np

First I need to import the data. I have the apartment name and address.

In [32]:
aptDf = pd.read_csv('Houston Apartment Locations.csv')

In [33]:
aptDf

Unnamed: 0,Apartment,Address
0,Carrington Park at Gulf Pointe,"11666 Gulf Pointe Dr, Houston, TX"
1,The Augusta,"2660 Augusta Dr, Houston, TX"
2,Lanesborough Apartments,"1819 Braeswood Blvd, Houston, TX"
3,Tuscany Court Apartments,"1901 Augusta Dr, Houston, TX"
4,The Trails At Dominion Park,"200 Dominion Park Dr, Houston, TX"
...,...,...
136,The Carlton,"3805 W Alabama St, Houston, TX"
137,Circuit,"2424 Capitol St, Houston, TX"
138,Avana Eldridge,"1415 Eldridge Pkwy, Houston, TX"
139,Jackson Hill,"320 Jackson Hill St, Houston, TX"


I will check to remove the duplicates.

In [34]:
#Remove Duplicates
aptDf.drop_duplicates(subset ='Address', keep ='first', inplace = True)
aptDf

Unnamed: 0,Apartment,Address
0,Carrington Park at Gulf Pointe,"11666 Gulf Pointe Dr, Houston, TX"
1,The Augusta,"2660 Augusta Dr, Houston, TX"
2,Lanesborough Apartments,"1819 Braeswood Blvd, Houston, TX"
3,Tuscany Court Apartments,"1901 Augusta Dr, Houston, TX"
4,The Trails At Dominion Park,"200 Dominion Park Dr, Houston, TX"
...,...,...
136,The Carlton,"3805 W Alabama St, Houston, TX"
137,Circuit,"2424 Capitol St, Houston, TX"
138,Avana Eldridge,"1415 Eldridge Pkwy, Houston, TX"
139,Jackson Hill,"320 Jackson Hill St, Houston, TX"


I also need to reindex the array for any dropped duplicates.

In [35]:
#Reindex array
aptDf.reset_index(inplace=True)
aptDf

Unnamed: 0,index,Apartment,Address
0,0,Carrington Park at Gulf Pointe,"11666 Gulf Pointe Dr, Houston, TX"
1,1,The Augusta,"2660 Augusta Dr, Houston, TX"
2,2,Lanesborough Apartments,"1819 Braeswood Blvd, Houston, TX"
3,3,Tuscany Court Apartments,"1901 Augusta Dr, Houston, TX"
4,4,The Trails At Dominion Park,"200 Dominion Park Dr, Houston, TX"
...,...,...,...
120,136,The Carlton,"3805 W Alabama St, Houston, TX"
121,137,Circuit,"2424 Capitol St, Houston, TX"
122,138,Avana Eldridge,"1415 Eldridge Pkwy, Houston, TX"
123,139,Jackson Hill,"320 Jackson Hill St, Houston, TX"


In [36]:
#Create three columns to gather latitude, longitude, and address
aptDf['location_lat'] = ""
aptDf['location_long'] = ""
aptDf['location_address'] = ""

In [39]:
#Using Nominatim to get the coordinates from the address I have.
geolocator = Nominatim(user_agent="myApp")

for i in aptDf.index:
    try:
        #tries fetch address from geopy
        location = geolocator.geocode(aptDf['Address'][i])
        
        #append lat/long to column using dataframe location
        aptDf.loc[i,'location_lat'] = location.latitude
        aptDf.loc[i,'location_long'] = location.longitude
        aptDf.loc[i,'location_address'] = location.address
    except:
        #catches exception for the case where no value is returned
        #appends null value to column
        aptDf.loc[i,'location_lat'] = ""
        aptDf.loc[i,'location_long'] = ""
        aptDf.loc[i,'location_address'] = ""

I will check to see if there are any blanks in my latitude, longitude, and location addresses. I will find the indicies that have these blanks. In hindsight, I should have just set them all to null then dropped them. 

Since I have blanks, I converted the blanks to null then dropped the rows that contained the nulls.

In [40]:
#print first rows as sample
aptDf

Unnamed: 0,index,Apartment,Address,location_lat,location_long,location_address
0,0,Carrington Park at Gulf Pointe,"11666 Gulf Pointe Dr, Houston, TX",,,
1,1,The Augusta,"2660 Augusta Dr, Houston, TX",29.739297,-95.482587,"2660, Augusta Drive, Houston, Harris County, T..."
2,2,Lanesborough Apartments,"1819 Braeswood Blvd, Houston, TX",29.706742,-95.394089,"Braeswood Boulevard, Texas Medical Center, Hou..."
3,3,Tuscany Court Apartments,"1901 Augusta Dr, Houston, TX",29.747442,-95.482511,"1901, Augusta Drive, Houston, Harris County, T..."
4,4,The Trails At Dominion Park,"200 Dominion Park Dr, Houston, TX",29.975664,-95.424768,"200, Dominion Park Drive, North Houston Distri..."
...,...,...,...,...,...,...
120,136,The Carlton,"3805 W Alabama St, Houston, TX",29.738168,-95.438803,"3805, West Alabama Street, Highland Village, H..."
121,137,Circuit,"2424 Capitol St, Houston, TX",29.751732,-95.350933,"2424, Capitol Street, Houston, Harris County, ..."
122,138,Avana Eldridge,"1415 Eldridge Pkwy, Houston, TX",29.758541,-95.625225,"1415, Eldridge Parkway, Houston, Harris County..."
123,139,Jackson Hill,"320 Jackson Hill St, Houston, TX",29.764461,-95.401687,"320, Jackson Hill Street, Houston, Harris Coun..."


In [49]:
#Cleaning the data
#Finding the location of the blanks
aptDf[aptDf['location_lat'] == ''].index.tolist()

[0, 6, 22, 27, 30, 33, 53, 62, 70, 76, 77, 81, 82, 84, 88, 93, 105]

In [55]:
#Replace empty strings with nans and drop
aptDf['location_lat'].replace('', np.nan, inplace=True)
aptDf_test = aptDf.dropna(subset=['location_lat'], inplace=True)

I am resetting the index again, then dropping all of the columns I don't need at the end. 

In [59]:
#Reset index, will drop extra columns later
aptDf.reset_index(inplace=True)
aptDf

Unnamed: 0,level_0,index,Apartment,Address,location_lat,location_long,location_address
0,1,1,The Augusta,"2660 Augusta Dr, Houston, TX",29.739297,-95.482587,"2660, Augusta Drive, Houston, Harris County, T..."
1,2,2,Lanesborough Apartments,"1819 Braeswood Blvd, Houston, TX",29.706742,-95.394089,"Braeswood Boulevard, Texas Medical Center, Hou..."
2,3,3,Tuscany Court Apartments,"1901 Augusta Dr, Houston, TX",29.747442,-95.482511,"1901, Augusta Drive, Houston, Harris County, T..."
3,4,4,The Trails At Dominion Park,"200 Dominion Park Dr, Houston, TX",29.975664,-95.424768,"200, Dominion Park Drive, North Houston Distri..."
4,5,5,5401 Chimney Rock,"5401 Chimney Rock Rd, Houston, TX",29.723425,-95.475772,"5401, Chimney Rock Road, Houston, Harris Count..."
...,...,...,...,...,...,...,...
103,120,136,The Carlton,"3805 W Alabama St, Houston, TX",29.738168,-95.438803,"3805, West Alabama Street, Highland Village, H..."
104,121,137,Circuit,"2424 Capitol St, Houston, TX",29.751732,-95.350933,"2424, Capitol Street, Houston, Harris County, ..."
105,122,138,Avana Eldridge,"1415 Eldridge Pkwy, Houston, TX",29.758541,-95.625225,"1415, Eldridge Parkway, Houston, Harris County..."
106,123,139,Jackson Hill,"320 Jackson Hill St, Houston, TX",29.764461,-95.401687,"320, Jackson Hill Street, Houston, Harris Coun..."


In [81]:
#Drop indices
aptDf.drop(['level_0', 'index'], axis=1, inplace = True)

In [71]:
#Split location address by Street Number, Street, City, County, State, ZIP, Country
#aptDf[['Street_Number', 'Street', 'City', 'County', 'State', 'Zip_Code', 'Country', 'Temp1', 'Temp2', 'Temp3']] = aptDf['location_address'].str.split(',', expand=True) 


Location_long is defined as an object, so I will set the type to float.

In [83]:
aptDf.dtypes

Apartment            object
Address              object
location_lat        float64
location_long        object
location_address     object
dtype: object

In [92]:
#Change locations to float 16
aptDf['location_long'] = aptDf['location_long'].astype(float)

In [93]:
aptDf.dtypes

Apartment            object
Address              object
location_lat        float64
location_long       float64
location_address     object
dtype: object

Finally I will export the file to a csv for use for the next step: Finding the k closest Torchy's based on apartment location.

In [94]:
#Export to CSV
aptDf.to_csv('Houston_Apt_Coordinates.csv')