<a href="https://colab.research.google.com/github/ror32/ApartmentsRent/blob/master/GeocodingAptLA.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Geocoding

### How to Convert physical addresses to Geographic locations → Latitude and Longitude

The process of converting addresses to geographic information — Latitude and Longitude — to map their locations is called **Geocoding**.

---

closely following [Geocode with Python](https://towardsdatascience.com/geocode-with-python-161ec1e62b89)

In [1]:
!pip install geopandas

Collecting geopandas
[?25l  Downloading https://files.pythonhosted.org/packages/5b/0c/e6c99e561b03482220f00443f610ccf4dce9b50f4b1093d735f93c6fc8c6/geopandas-0.6.2-py2.py3-none-any.whl (919kB)
[K     |████████████████████████████████| 921kB 4.6MB/s 
[?25hCollecting fiona
[?25l  Downloading https://files.pythonhosted.org/packages/50/f7/9899f8a9a2e38601472fe1079ce5088f58833221c8b8507d8b5eafd5404a/Fiona-1.8.13-cp36-cp36m-manylinux1_x86_64.whl (11.8MB)
[K     |████████████████████████████████| 11.8MB 29.8MB/s 
Collecting pyproj
[?25l  Downloading https://files.pythonhosted.org/packages/d6/70/eedc98cd52b86de24a1589c762612a98bea26cde649ffdd60c1db396cce8/pyproj-2.4.2.post1-cp36-cp36m-manylinux2010_x86_64.whl (10.1MB)
[K     |████████████████████████████████| 10.1MB 42.3MB/s 
[?25hCollecting cligj>=0.5
  Downloading https://files.pythonhosted.org/packages/e4/be/30a58b4b0733850280d01f8bd132591b4668ed5c7046761098d665ac2174/cligj-0.5.0-py3-none-any.whl
Collecting munch
  Downloading https:

In [0]:
import geopandas as gpd
import geopy
from geopy.geocoders import Nominatim
from geopy.extra.rate_limiter import RateLimiter
import matplotlib.pyplot as plt
import folium
from folium.plugins import FastMarkerCluster
import pandas as pd

## Geocoding Single Address
We create locator that holds the Geocoding service, Nominatim. Then we pass the locator we created to geocode any address

In [0]:
locator = Nominatim(user_agent="myGeocoder")
location = locator.geocode("8111 Reseda Blvd, Los Angeles, CA")

In [4]:
print(location.address)
print(f'Latitude = {location.latitude}, Longitude = {location.longitude}')

Reseda Boulevard, Reseda, Los Angeles, Los Angeles County, California, 91335, United States of America
Latitude = 34.1897551, Longitude = -118.5360339


## Connecting gdrive into the google colab

In [6]:
from google.colab import drive
drive.mount('/content/gdrive')

Go to this URL in a browser: https://accounts.google.com/o/oauth2/auth?client_id=947318989803-6bn6qk8qdgf4n4g3pfee6491hc0brc4i.apps.googleusercontent.com&redirect_uri=urn%3aietf%3awg%3aoauth%3a2.0%3aoob&response_type=code&scope=email%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdocs.test%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive.photos.readonly%20https%3a%2f%2fwww.googleapis.com%2fauth%2fpeopleapi.readonly

Enter your authorization code:
··········
Mounted at /content/gdrive


## Geocoding with Pandas DataFrame
Download the CSV file and read it in Pandas.

In [7]:
df = pd.read_csv('/content/gdrive/My Drive/Data/apartments_rent_LosAngeles_2_1_2020.csv')

df.head()

Unnamed: 0.1,Unnamed: 0,name,description,address,city,bedrooms,sqft,price
0,0,Northview-Southview Apartments,"Studio, one- and two-bedroom apartments in pet...",8111 Reseda Blvd,Los Angeles,Studio,450 sqft,"$1,480"
1,1,Northview-Southview Apartments,"Studio, one- and two-bedroom apartments in pet...",8111 Reseda Blvd,Los Angeles,1 Bedroom,687 sqft,"$1,595"
2,2,Northview-Southview Apartments,"Studio, one- and two-bedroom apartments in pet...",8111 Reseda Blvd,Los Angeles,2 Bedrooms,953 sqft,"$1,945"
3,3,Triana,This funky apartment complex sits above shops ...,6250 Canoga Ave,Los Angeles,1 Bedroom,836 sqft,"$2,102"
4,4,Triana,This funky apartment complex sits above shops ...,6250 Canoga Ave,Los Angeles,2 Bedrooms,1201 sqft,"$2,378"


In [8]:
df = df.drop('Unnamed: 0', axis=1)
df.columns

Index(['name', 'description', 'address', 'city', 'bedrooms', 'sqft', 'price'], dtype='object')

In [0]:
df = df.drop_duplicates()

### Concatenate address columns into one that is appropriate for geocoding.

In [10]:
df['full address'] = df['address'].astype(str) + ',' + df['city'] + ',' + 'CA' + ',' + 'USA'
df.head()

Unnamed: 0,name,description,address,city,bedrooms,sqft,price,full address
0,Northview-Southview Apartments,"Studio, one- and two-bedroom apartments in pet...",8111 Reseda Blvd,Los Angeles,Studio,450 sqft,"$1,480","8111 Reseda Blvd,Los Angeles,CA,USA"
1,Northview-Southview Apartments,"Studio, one- and two-bedroom apartments in pet...",8111 Reseda Blvd,Los Angeles,1 Bedroom,687 sqft,"$1,595","8111 Reseda Blvd,Los Angeles,CA,USA"
2,Northview-Southview Apartments,"Studio, one- and two-bedroom apartments in pet...",8111 Reseda Blvd,Los Angeles,2 Bedrooms,953 sqft,"$1,945","8111 Reseda Blvd,Los Angeles,CA,USA"
3,Triana,This funky apartment complex sits above shops ...,6250 Canoga Ave,Los Angeles,1 Bedroom,836 sqft,"$2,102","6250 Canoga Ave,Los Angeles,CA,USA"
4,Triana,This funky apartment complex sits above shops ...,6250 Canoga Ave,Los Angeles,2 Bedrooms,1201 sqft,"$2,378","6250 Canoga Ave,Los Angeles,CA,USA"


### Once we create the address column, we can start geocoding
1. — We first delay our Geocoding 1 second between each address. This is convenient when you are Geocoding a large number of physical addresses as the Geocoding service provider can deny access to the service.
2. — Create a df['location'] column by applying geocode we created.
3. — Third, we can create latitude, longitude, and altitude as a single tuple column.
4. — Finally, We split latitude, longitude, and altitude columns into three separate columns.

In [0]:
from geopy.extra.rate_limiter import RateLimiter

# 1 - conveneint function to delay between geocoding calls
geocode = RateLimiter(locator.geocode, min_delay_seconds=1)

# 2- - create location column
df['location'] = df['full address'].apply(geocode)

# 3 - create longitude, laatitude and altitude from location column (returns tuple)
df['point'] = df['location'].apply(lambda loc: tuple(loc.point) if loc else None)

In [12]:
df.head()

Unnamed: 0,name,description,address,city,bedrooms,sqft,price,full address,location,point
0,Northview-Southview Apartments,"Studio, one- and two-bedroom apartments in pet...",8111 Reseda Blvd,Los Angeles,Studio,450 sqft,"$1,480","8111 Reseda Blvd,Los Angeles,CA,USA","(Reseda Boulevard, Reseda, Los Angeles, Los An...","(34.1897551, -118.5360339, 0.0)"
1,Northview-Southview Apartments,"Studio, one- and two-bedroom apartments in pet...",8111 Reseda Blvd,Los Angeles,1 Bedroom,687 sqft,"$1,595","8111 Reseda Blvd,Los Angeles,CA,USA","(Reseda Boulevard, Reseda, Los Angeles, Los An...","(34.1897551, -118.5360339, 0.0)"
2,Northview-Southview Apartments,"Studio, one- and two-bedroom apartments in pet...",8111 Reseda Blvd,Los Angeles,2 Bedrooms,953 sqft,"$1,945","8111 Reseda Blvd,Los Angeles,CA,USA","(Reseda Boulevard, Reseda, Los Angeles, Los An...","(34.1897551, -118.5360339, 0.0)"
3,Triana,This funky apartment complex sits above shops ...,6250 Canoga Ave,Los Angeles,1 Bedroom,836 sqft,"$2,102","6250 Canoga Ave,Los Angeles,CA,USA","(Canoga Avenue, Canoga Park, Los Angeles, Los ...","(34.1976208, -118.5977464, 0.0)"
4,Triana,This funky apartment complex sits above shops ...,6250 Canoga Ave,Los Angeles,2 Bedrooms,1201 sqft,"$2,378","6250 Canoga Ave,Los Angeles,CA,USA","(Canoga Avenue, Canoga Park, Los Angeles, Los ...","(34.1976208, -118.5977464, 0.0)"


In [13]:
df['point'][0][0]

34.1897551

In [0]:
# 4 - split point column into latitude, longitude and altitude columns
df[['latitude', 'longitude', 'altitude']] = pd.DataFrame(df['point'].tolist(), index=df.index)

The above code produces a Dataframe with latitude and longitude columns.


In [15]:

df.head()

Unnamed: 0,name,description,address,city,bedrooms,sqft,price,full address,location,point,latitude,longitude,altitude
0,Northview-Southview Apartments,"Studio, one- and two-bedroom apartments in pet...",8111 Reseda Blvd,Los Angeles,Studio,450 sqft,"$1,480","8111 Reseda Blvd,Los Angeles,CA,USA","(Reseda Boulevard, Reseda, Los Angeles, Los An...","(34.1897551, -118.5360339, 0.0)",34.189755,-118.536034,0.0
1,Northview-Southview Apartments,"Studio, one- and two-bedroom apartments in pet...",8111 Reseda Blvd,Los Angeles,1 Bedroom,687 sqft,"$1,595","8111 Reseda Blvd,Los Angeles,CA,USA","(Reseda Boulevard, Reseda, Los Angeles, Los An...","(34.1897551, -118.5360339, 0.0)",34.189755,-118.536034,0.0
2,Northview-Southview Apartments,"Studio, one- and two-bedroom apartments in pet...",8111 Reseda Blvd,Los Angeles,2 Bedrooms,953 sqft,"$1,945","8111 Reseda Blvd,Los Angeles,CA,USA","(Reseda Boulevard, Reseda, Los Angeles, Los An...","(34.1897551, -118.5360339, 0.0)",34.189755,-118.536034,0.0
3,Triana,This funky apartment complex sits above shops ...,6250 Canoga Ave,Los Angeles,1 Bedroom,836 sqft,"$2,102","6250 Canoga Ave,Los Angeles,CA,USA","(Canoga Avenue, Canoga Park, Los Angeles, Los ...","(34.1976208, -118.5977464, 0.0)",34.197621,-118.597746,0.0
4,Triana,This funky apartment complex sits above shops ...,6250 Canoga Ave,Los Angeles,2 Bedrooms,1201 sqft,"$2,378","6250 Canoga Ave,Los Angeles,CA,USA","(Canoga Avenue, Canoga Park, Los Angeles, Los ...","(34.1976208, -118.5977464, 0.0)",34.197621,-118.597746,0.0


In [16]:
df['latitude'][0]

34.1897551

### Clean out the unwanted columns.

In [17]:
df.columns

Index(['name', 'description', 'address', 'city', 'bedrooms', 'sqft', 'price',
       'full address', 'location', 'point', 'latitude', 'longitude',
       'altitude'],
      dtype='object')

In [0]:
df_new = df.drop(['name', 'description','address', 'city', 'full address', 'location', 'point'], axis=1)

In [19]:
df_new.head()

Unnamed: 0,bedrooms,sqft,price,latitude,longitude,altitude
0,Studio,450 sqft,"$1,480",34.189755,-118.536034,0.0
1,1 Bedroom,687 sqft,"$1,595",34.189755,-118.536034,0.0
2,2 Bedrooms,953 sqft,"$1,945",34.189755,-118.536034,0.0
3,1 Bedroom,836 sqft,"$2,102",34.197621,-118.597746,0.0
4,2 Bedrooms,1201 sqft,"$2,378",34.197621,-118.597746,0.0


In [20]:
df_new.latitude.isnull().sum()

50

In [21]:
df_new.longitude.isnull().sum()

50

In [22]:
list(zip(df_new[df_new['latitude'].isnull()].index.tolist(), df_new[df_new['longitude'].isnull()].index.tolist()))


[(50, 50),
 (51, 51),
 (52, 52),
 (85, 85),
 (86, 86),
 (87, 87),
 (102, 102),
 (103, 103),
 (104, 104),
 (264, 264),
 (265, 265),
 (266, 266),
 (274, 274),
 (398, 398),
 (399, 399),
 (400, 400),
 (451, 451),
 (452, 452),
 (453, 453),
 (465, 465),
 (466, 466),
 (467, 467),
 (500, 500),
 (529, 529),
 (530, 530),
 (534, 534),
 (535, 535),
 (536, 536),
 (578, 578),
 (579, 579),
 (592, 592),
 (593, 593),
 (601, 601),
 (602, 602),
 (789, 789),
 (1009, 1009),
 (1113, 1113),
 (1150, 1150),
 (1216, 1216),
 (1253, 1253),
 (1381, 1381),
 (1505, 1505),
 (1510, 1510),
 (1734, 1734),
 (1740, 1740),
 (1755, 1755),
 (1969, 1969),
 (2100, 2100),
 (2130, 2130),
 (2192, 2192)]

In [128]:
df.iloc[601:603]['latitude']

601   NaN
602   NaN
Name: latitude, dtype: float64

In [131]:
df.iloc[601:603]['full address']

601    14067 W Astoria St,Los Angeles,CA,USA
602    14067 W Astoria St,Los Angeles,CA,USA
Name: full address, dtype: object

In [132]:
df_new.iloc[601:603]['latitude'] = df.iloc[601:603]['latitude'] = 34.310954
df_new.iloc[601:603]['longitude'] = df.iloc[601:603]['longitude'] = -118.440208



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """Entry point for launching an IPython kernel.
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  


In [133]:
df_new.iloc[601:603]

Unnamed: 0,bedrooms,sqft,price,latitude,longitude,altitude
601,1 Bedroom,808 sqft,"$1,696",34.310954,-118.440208,
602,3 Bedrooms,1183 sqft,"$2,619",34.310954,-118.440208,


In [134]:
df.iloc[601:603]['latitude']

601    34.310954
602    34.310954
Name: latitude, dtype: float64

In [135]:
df_new.latitude.isnull().sum(), df_new.latitude.isnull().sum()

(21, 21)

In [0]:
df_new = df_new[pd.notnull(df["latitude"])]

In [137]:
df_new.latitude.isnull().sum(), df_new.latitude.isnull().sum()

(0, 0)

### Save and Copy the csv file to Google Drive

In [0]:
df.to_csv('apt_geo_LosAngeles_2_1_2020.csv')
!cp apt_geo_LosAngeles_2_1_2020.csv "drive/My Drive/"

In [0]:
df_new.to_csv('apt_geo_LosAngeles_2_1_2020_cl.csv')
!cp apt_geo_LosAngeles_2_1_2020_cl.csv "drive/My Drive/"

In [144]:
df_new.head()

Unnamed: 0,bedrooms,sqft,price,latitude,longitude,altitude
0,Studio,450 sqft,"$1,480",34.189755,-118.536034,0.0
1,1 Bedroom,687 sqft,"$1,595",34.189755,-118.536034,0.0
2,2 Bedrooms,953 sqft,"$1,945",34.189755,-118.536034,0.0
3,1 Bedroom,836 sqft,"$2,102",34.197621,-118.597746,0.0
4,2 Bedrooms,1201 sqft,"$2,378",34.197621,-118.597746,0.0


## Map with the Geographic visualisation tool.
Use **Folium** to map out the points we created.

First, we display the locations as a circle map with **Folium**.

In [0]:
map = folium.Map(
    location=[34.049749, -118.268948],
    tiles='OpenStreetMap',
    zoom_start=12,
    control_scale=True,
    prefer_canvas=True,
)

In [215]:
df_new.apply(lambda row:folium.CircleMarker(location=[row["latitude"], row["longitude"]], radius=5).add_to(map), axis=1)
#df_new.apply(lambda row: folium.CircleMarker(location=[row["latitude"], row["longitude"]], radius=5, tooltip=row['price']).add_to(map), axis=1)
map

In [218]:
df_new

Unnamed: 0,bedrooms,sqft,price,latitude,longitude,altitude
0,Studio,450 sqft,"$1,480",34.189755,-118.536034,0.0
1,1 Bedroom,687 sqft,"$1,595",34.189755,-118.536034,0.0
2,2 Bedrooms,953 sqft,"$1,945",34.189755,-118.536034,0.0
3,1 Bedroom,836 sqft,"$2,102",34.197621,-118.597746,0.0
4,2 Bedrooms,1201 sqft,"$2,378",34.197621,-118.597746,0.0
...,...,...,...,...,...,...
2382,4 Bedrooms,2200 sqft,"$3,995",34.277692,-118.475827,0.0
2383,2 Bedrooms,1250 sqft,"$5,300",34.112698,-118.321657,0.0
2384,Studio,1100 sqft,"$3,000",34.126639,-118.218803,0.0
2385,2 Bedrooms,1129 sqft,"$2,350",34.226241,-118.395172,0.0


In [0]:
map.save("map.html")
