#### Merging Rent & Soccer Activity with City Boundary Data

*Adding rows to rent_and_activity_data (from prev. notebook) for missing So. Cal cities- contained in the city boundaries shapefile, below

---

In [1]:
import pandas as pd
import numpy as np
import geopandas as gpd

---

In [2]:
rent_and_activity_df = pd.read_csv('rent_and_activity_data.csv')

In [3]:
rent_and_activity_df.head(2)

Unnamed: 0,City,Avg Rent - Office,Avg Rent - Industrial,Avg Rent - Retail,Soccer_Activity
0,Aliso Viejo,0.0,0.0,0.0,0
1,Anaheim,20.55,11.91,21.82,2


In [4]:
rent_and_activity_df.shape[0]

113

---

In [5]:
# Source: http://gisdata-scag.opendata.arcgis.com/datasets/27b134459761486991f0b72f8a9a67c5_0
cities_shp = 'City_Boundaries_SCAG_Region.shp'

In [6]:
city_boundaries_geodf = gpd.read_file(cities_shp)

In [7]:
city_boundaries_geodf.shape[0]

197

In [8]:
city_boundaries_geodf.head(1)

Unnamed: 0,OBJECTID,CITY,CITY_ID,PERIMETER,ACRES,COUNTY,COUNTY_ID,YEAR,ANNEX_DATE,ANNEX_NOTE,Shapearea,Shapelen,geometry
0,192,Big Bear Lake,6434,46945.672906,4116.109802,San Bernardino,71,2016,2015-10-08T00:00:00.000Z,Reorganization,16657310.0,46945.673047,(POLYGON ((-116.8661844092499 34.2647571598981...


---

##### Checking & Dealing w/ Missing Cities:
(Cities not found in city_boundaries)

In [9]:
for city in rent_and_activity_df.City:
    if city not in city_boundaries_geodf.CITY.values:
        print(city)

Palos Verdes
Woodland Hills


---

- Check if other variations of PV or WH exist- if so, include them
- Remove 'Palos Verdes' & 'Woodland Hills' after adding variations

---

In [10]:
city_boundaries_geodf[city_boundaries_geodf.CITY.str.contains('Palos')]

Unnamed: 0,OBJECTID,CITY,CITY_ID,PERIMETER,ACRES,COUNTY,COUNTY_ID,YEAR,ANNEX_DATE,ANNEX_NOTE,Shapearea,Shapelen,geometry
51,243,Palos Verdes Estates,55380,21674.807023,3069.210751,Los Angeles,37,2016,1964-11-09T00:00:00.000Z,Annexation,12420660.0,21674.806933,POLYGON ((-118.3617095860261 33.80417735318261...
63,255,Rancho Palos Verdes,59514,62100.312364,8656.425459,Los Angeles,37,2016,2016-04-14T00:00:00.000Z,,35031310.0,62100.313702,POLYGON ((-118.3775786458657 33.79477814760457...


In [11]:
city_boundaries_geodf[city_boundaries_geodf.CITY.str.contains('Woodland')].values

array([], shape=(0, 13), dtype=object)

---

Replacing name of 'Palos Verdes' & Removing 'Woodland Hills'

In [12]:
rent_and_activity_df[rent_and_activity_df.City == 'Palos Verdes']

Unnamed: 0,City,Avg Rent - Office,Avg Rent - Industrial,Avg Rent - Retail,Soccer_Activity
86,Palos Verdes,0.0,0.0,0.0,0


In [12]:
rent_and_activity_df.loc[rent_and_activity_df.City == 'Palos Verdes', 'City'] = 'Palos Verdes Estates'

In [13]:
rent_and_activity_df[rent_and_activity_df.City.str.contains('Palos')]

Unnamed: 0,City,Avg Rent - Office,Avg Rent - Industrial,Avg Rent - Retail,Soccer_Activity
86,Palos Verdes Estates,0.0,0.0,0.0,0


In [14]:
rent_and_activity_df = rent_and_activity_df[rent_and_activity_df.City != 'Woodland Hills']

---

##### Adding Rows for Missing Cities:
*these cities aren't the focus of the analysis but must be included in order to join the shapefile w/ additional statistical data-  each shapefile record must have a corresponding record in the joined statistical data file to successfully join them in QGIS

In [15]:
missing = []
for city in city_boundaries_geodf.CITY:
    if city not in rent_and_activity_df.City.values:
        missing.append(city)

In [16]:
missing_cities_df = pd.DataFrame({'City':missing})

In [17]:
rent_and_activity_df.columns

Index(['City', 'Avg Rent - Office', 'Avg Rent - Industrial',
       'Avg Rent - Retail', 'Soccer_Activity'],
      dtype='object')

*Assigning value of 0 for all values in rows of missing cities

In [18]:
for col in rent_and_activity_df.columns[1:]:
    missing_cities_df[col] = np.full(len(missing_cities_df), 0)

In [19]:
missing_cities_df.tail()

Unnamed: 0,City,Avg Rent - Office,Avg Rent - Industrial,Avg Rent - Retail,Soccer_Activity
80,Unincorporated,0,0,0,0
81,Palm Springs,0,0,0,0
82,Unincorporated,0,0,0,0
83,Banning,0,0,0,0
84,Jurupa Valley,0,0,0,0


In [20]:
all_cities_rent_activity = pd.concat([rent_and_activity_df, missing_cities_df]).reset_index(drop=True)

In [21]:
all_cities_rent_activity.tail()

Unnamed: 0,City,Avg Rent - Office,Avg Rent - Industrial,Avg Rent - Retail,Soccer_Activity
192,Unincorporated,0.0,0.0,0.0,0
193,Palm Springs,0.0,0.0,0.0,0
194,Unincorporated,0.0,0.0,0.0,0
195,Banning,0.0,0.0,0.0,0
196,Jurupa Valley,0.0,0.0,0.0,0


In [22]:
all_cities_rent_activity.to_csv('all_cities_rent_activity.csv', index=False)

*Will add columns containing census demographic data to this df in the 5th notebook

---