# Wrangling - Looking Into How Parks in Paradise Compare to Cities of Similar Population Density

### By Kavish Harjai

**I explore how the number of mobile home lots in Paradise compare to other cities with a similar population population density in 2018 (before the Camp Fire).** 

I chose to use population density in lieu of population because I am comparing smaller geographic areas (Census places). According to the [Census]('https://www.census.gov/newsroom/blogs/random-samplings/2015/03/understanding-population-density.html'):

"When comparing population density values for different geographic areas, then, it is helpful to keep in mind that the values are most useful for small areas, such as neighborhoods." 

To answer the above question, I will use two notebooks. This one is focused on **wrangling and merging**. The notebook called 'MH_paradise_analysis' is focused on **data analysis**. 

There are three datasets I use in this notebook:
* List of mobile home parks permitted by the California Housing and Community Development Department. [Source.]('https://casas.hcd.ca.gov/casas/cmirMp/onlineQuery')
* ACS 2018 5-year population estimates by place in California.
* California place shapefile that I've exported as a .csv through ArcGIS. [Source.]('https://www.census.gov/geographies/mapping-files/time-series/geo/tiger-line-file.2018.html')

The list of mobile home parks includes the number of lots in each park. I can calculate the population density by place using the ACS population estimates and the land area of each place in California. Then I merge the three datasets together before moving to the analysis notebook.

I will start by importing necessary libraries and creating an folder paths for the datasets I've cleaned and would like to import/export.

In [1]:
import pandas as pd 
import numpy as np
import os as os
import requests
from pprint import pprint

In [4]:
data_dir = os.environ["DATA_DIR"]
raw_data = data_dir + "/raw/"
processed_data = data_dir + '/processed/'

#### Cleaning MH dataseta

In [5]:
list_of_parks = pd.read_csv(raw_data + 'MH_parks.csv')
list_of_parks.head()

Unnamed: 0,Park Name,County,Park Identifier,Park Address,Fire Authority,Jurisdiction,MH Spaces,RV Lots W/Drains,RV Lots W/O Drains,Operated by,ADDRESS
0,SPANISH RANCH II,ALAMEDA,01-0001-MP,"121 RANCHERO WAY, HAYWARD, CA 94544, (510) 886...","HCD - NORTHERN AREA OFFICE, 9342 TECH CENTER D...","HCD - NORTHERN AREA OFFICE, 9342 TECH CENTER D...",187.0,0.0,0.0,"HOMETOWN SPANISH RANCH, LLC, A DELAWARE LIMITE...","121 RANCHERO WAY, HAYWARD, CA 94544, (510) 886..."
1,GEORGIAN MANOR MHP,ALAMEDA,01-0003-MP,"1419 BUCKINGHAM WAY, HAYWARD, CA 94544, (510) ...","HCD - NORTHERN AREA OFFICE, 9342 TECH CENTER D...","HCD - NORTHERN AREA OFFICE, 9342 TECH CENTER D...",265.0,0.0,0.0,"GEORGIAN MANOR MOBILEHOME PARK L P, 321 HARTZ ...","1419 BUCKINGHAM WAY, HAYWARD, CA 94544, (510) ..."
2,BAL TRAILER CT,ALAMEDA,01-0008-MP,"14831 BANCROFT AVE, SAN LEANDRO, CA 94578, (51...","HCD - NORTHERN AREA OFFICE, 9342 TECH CENTER D...","HCD - NORTHERN AREA OFFICE, 9342 TECH CENTER D...",31.0,0.0,0.0,"PUEBLO SPRINGS MHP LTD, 10351 SANTA MONICA BLV...","14831 BANCROFT AVE, SAN LEANDRO, CA 94578, (51..."
3,VASCO MHP,ALAMEDA,01-0010-MP,"6539 S FRONT RD, LIVERMORE, CA 94550, (925) 44...","HCD - NORTHERN AREA OFFICE, 9342 TECH CENTER D...","HCD - NORTHERN AREA OFFICE, 9342 TECH CENTER D...",50.0,21.0,0.0,"SOLOMON/MANDEL, EDWARD/GEORGE, PO BOX 406, ALA...","6539 S FRONT RD, LIVERMORE, CA 94550, (925) 44..."
4,AVALON MOBILE HOME PARK,ALAMEDA,01-0013-MP,"3970 CASTRO VALLEY BLVD, CASTRO VALLEY, CA 945...","HCD - NORTHERN AREA OFFICE, 9342 TECH CENTER D...","HCD - NORTHERN AREA OFFICE, 9342 TECH CENTER D...",49.0,0.0,0.0,"AVALON MHP INC, 4061 EAST CASTRO VALLEY BLVD S...","3970 CASTRO VALLEY BLVD, CASTRO VALLEY, CA 945..."


**I will start by standardizing column names so there aren't spaces between words.**

In [7]:
list_of_parks = list_of_parks.rename(columns = {'Park Name': 'park_name',
                                                'County':'county',
                                              'Park Identifier': 'park_identifier', 
                                              'Park Address': 'park_address', 
                                              'MH Spaces': 'mh_spaces', 
                                              'Fire Authority': 'fire_authority', 
                                              'Jurisdiction': 'jurisdiction',
                                             'RV Lots W/Drains': 'rv_lots_drains',
                                             'RV Lots W/O Drains': 'rv_lots_no_drains',
                                             'Operated by': 'operated_by',
                                             'ADDRESS': 'operater_address'})

**In addition to mobile homes, this dataset contains information on RV Parks.**

**Since my analysis is focused on mobile homes, I will remove all observations where the value for MH_Spaces is 0.**

In [8]:
list_of_parks = list_of_parks[list_of_parks.mh_spaces != 0]

In [9]:
len(list_of_parks)

4668

**After filtering my dataframe, I wanted to check out what kind of data types there are and whether there are columns that are missing a bunch of values.** 

In [10]:
list_of_parks.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 4668 entries, 0 to 5247
Data columns (total 11 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   park_name          4668 non-null   object 
 1   county             4668 non-null   object 
 2   park_identifier    4668 non-null   object 
 3   park_address       4668 non-null   object 
 4   fire_authority     4668 non-null   object 
 5   jurisdiction       4668 non-null   object 
 6   mh_spaces          4667 non-null   float64
 7   rv_lots_drains     4667 non-null   float64
 8   rv_lots_no_drains  4667 non-null   float64
 9   operated_by        4664 non-null   object 
 10  operater_address   4668 non-null   object 
dtypes: float64(3), object(8)
memory usage: 437.6+ KB


**It looks pretty good, but since I'm invested in checking out the amount of spaces in each park, I want to probably get rid of the observation where the value of MH_Spaces is null.**

In [11]:
list_of_parks[list_of_parks['mh_spaces'].isnull()]

Unnamed: 0,park_name,county,park_identifier,park_address,fire_authority,jurisdiction,mh_spaces,rv_lots_drains,rv_lots_no_drains,operated_by,operater_address
2982,8456 BRADSHAW ROAD (S.O.P. W/OUT P.T.O.),SACRAMENTO,34-13586-MP,"8456 BRADSHAW ROAD, ELK GROVE, CA 95624","HCD - NORTHERN AREA OFFICE, 9342 TECH CENTER D...","HCD - NORTHERN AREA OFFICE, 9342 TECH CENTER D...",,,,,"8456 BRADSHAW ROAD, ELK GROVE, CA 95624"


In [12]:
list_of_parks = list_of_parks.drop(labels=2982, axis=0)

**The 'park_address' column annoyingly contains the park phone number. Let me split that column into separate entities, and then rejoin on all the address components. I ultimately will only be interested in the city column.** 

In [13]:
list_of_parks['park_address'] = list_of_parks.park_address.str.title()

In [14]:
list_of_parks.head(10)

Unnamed: 0,park_name,county,park_identifier,park_address,fire_authority,jurisdiction,mh_spaces,rv_lots_drains,rv_lots_no_drains,operated_by,operater_address
0,SPANISH RANCH II,ALAMEDA,01-0001-MP,"121 Ranchero Way, Hayward, Ca 94544, (510) 886...","HCD - NORTHERN AREA OFFICE, 9342 TECH CENTER D...","HCD - NORTHERN AREA OFFICE, 9342 TECH CENTER D...",187.0,0.0,0.0,"HOMETOWN SPANISH RANCH, LLC, A DELAWARE LIMITE...","121 RANCHERO WAY, HAYWARD, CA 94544, (510) 886..."
1,GEORGIAN MANOR MHP,ALAMEDA,01-0003-MP,"1419 Buckingham Way, Hayward, Ca 94544, (510) ...","HCD - NORTHERN AREA OFFICE, 9342 TECH CENTER D...","HCD - NORTHERN AREA OFFICE, 9342 TECH CENTER D...",265.0,0.0,0.0,"GEORGIAN MANOR MOBILEHOME PARK L P, 321 HARTZ ...","1419 BUCKINGHAM WAY, HAYWARD, CA 94544, (510) ..."
2,BAL TRAILER CT,ALAMEDA,01-0008-MP,"14831 Bancroft Ave, San Leandro, Ca 94578, (51...","HCD - NORTHERN AREA OFFICE, 9342 TECH CENTER D...","HCD - NORTHERN AREA OFFICE, 9342 TECH CENTER D...",31.0,0.0,0.0,"PUEBLO SPRINGS MHP LTD, 10351 SANTA MONICA BLV...","14831 BANCROFT AVE, SAN LEANDRO, CA 94578, (51..."
3,VASCO MHP,ALAMEDA,01-0010-MP,"6539 S Front Rd, Livermore, Ca 94550, (925) 44...","HCD - NORTHERN AREA OFFICE, 9342 TECH CENTER D...","HCD - NORTHERN AREA OFFICE, 9342 TECH CENTER D...",50.0,21.0,0.0,"SOLOMON/MANDEL, EDWARD/GEORGE, PO BOX 406, ALA...","6539 S FRONT RD, LIVERMORE, CA 94550, (925) 44..."
4,AVALON MOBILE HOME PARK,ALAMEDA,01-0013-MP,"3970 Castro Valley Blvd, Castro Valley, Ca 945...","HCD - NORTHERN AREA OFFICE, 9342 TECH CENTER D...","HCD - NORTHERN AREA OFFICE, 9342 TECH CENTER D...",49.0,0.0,0.0,"AVALON MHP INC, 4061 EAST CASTRO VALLEY BLVD S...","3970 CASTRO VALLEY BLVD, CASTRO VALLEY, CA 945..."
5,NEW ENGLAND VILLAGE,ALAMEDA,01-0015-MP,"940 New England Village Dr, Hayward, Ca 94544,...","HCD - NORTHERN AREA OFFICE, 9342 TECH CENTER D...","HCD - NORTHERN AREA OFFICE, 9342 TECH CENTER D...",415.0,0.0,0.0,"BRANDENBURG STAEDLER & MOORE, 1122 WILLOW ST S...","940 NEW ENGLAND VILLAGE DR, HAYWARD, CA 94544,..."
6,DEL RIO MOBILE HOME,ALAMEDA,01-0016-MP,"1550 162Nd Ave, San Leandro, Ca 94578, (510) 2...","HCD - NORTHERN AREA OFFICE, 9342 TECH CENTER D...","HCD - NORTHERN AREA OFFICE, 9342 TECH CENTER D...",52.0,0.0,0.0,"CALIFANO FAMILY TRUST, 1550 162ND AVENUE SPACE...","1550 162ND AVE, SAN LEANDRO, CA 94578, (510) 2..."
7,DEL VALLE MHP,ALAMEDA,01-0017-MP,"1148 Arroyo Rd, Livermore, Ca 94550, (510) 331...","HCD - NORTHERN AREA OFFICE, 9342 TECH CENTER D...","HCD - NORTHERN AREA OFFICE, 9342 TECH CENTER D...",29.0,0.0,0.0,"ABBOUD FAMILY TRUST, PO BOX 3975, HAYWARD, CA ...","1148 ARROYO RD, LIVERMORE, CA 94550, (510) 331..."
8,BAYSHORE COMMONS,ALAMEDA,01-0023-MP,"1468 Grand Ave, San Leandro, Ca 94577, (510) 3...","HCD - NORTHERN AREA OFFICE, 9342 TECH CENTER D...","HCD - NORTHERN AREA OFFICE, 9342 TECH CENTER D...",40.0,0.0,0.0,"SAN LEANDRO TRAILER PARK LLC, 6653 EMBARCADERO...","1468 GRAND AVE, SAN LEANDRO, CA 94577, (510) 3..."
9,FAIRVIEW TP,ALAMEDA,01-0024-MP,"785 Rose Ave, Pleasanton, Ca 94566, (925) 846-...","HCD - NORTHERN AREA OFFICE, 9342 TECH CENTER D...","HCD - NORTHERN AREA OFFICE, 9342 TECH CENTER D...",14.0,8.0,0.0,"GIL, EDMUND, 2606 DERBY DR, SAN RAMON, CA 94583","785 ROSE AVE, PLEASANTON, CA 94566, (925) 846-..."


In [15]:
list_of_parks[['num_st', 'city', 'state', 'phone_num']] = list_of_parks['park_address'].str.split(',', n = 3, expand=True)

In [16]:
list_of_parks.head(10)

Unnamed: 0,park_name,county,park_identifier,park_address,fire_authority,jurisdiction,mh_spaces,rv_lots_drains,rv_lots_no_drains,operated_by,operater_address,num_st,city,state,phone_num
0,SPANISH RANCH II,ALAMEDA,01-0001-MP,"121 Ranchero Way, Hayward, Ca 94544, (510) 886...","HCD - NORTHERN AREA OFFICE, 9342 TECH CENTER D...","HCD - NORTHERN AREA OFFICE, 9342 TECH CENTER D...",187.0,0.0,0.0,"HOMETOWN SPANISH RANCH, LLC, A DELAWARE LIMITE...","121 RANCHERO WAY, HAYWARD, CA 94544, (510) 886...",121 Ranchero Way,Hayward,Ca 94544,(510) 886-4646
1,GEORGIAN MANOR MHP,ALAMEDA,01-0003-MP,"1419 Buckingham Way, Hayward, Ca 94544, (510) ...","HCD - NORTHERN AREA OFFICE, 9342 TECH CENTER D...","HCD - NORTHERN AREA OFFICE, 9342 TECH CENTER D...",265.0,0.0,0.0,"GEORGIAN MANOR MOBILEHOME PARK L P, 321 HARTZ ...","1419 BUCKINGHAM WAY, HAYWARD, CA 94544, (510) ...",1419 Buckingham Way,Hayward,Ca 94544,(510) 785-2212
2,BAL TRAILER CT,ALAMEDA,01-0008-MP,"14831 Bancroft Ave, San Leandro, Ca 94578, (51...","HCD - NORTHERN AREA OFFICE, 9342 TECH CENTER D...","HCD - NORTHERN AREA OFFICE, 9342 TECH CENTER D...",31.0,0.0,0.0,"PUEBLO SPRINGS MHP LTD, 10351 SANTA MONICA BLV...","14831 BANCROFT AVE, SAN LEANDRO, CA 94578, (51...",14831 Bancroft Ave,San Leandro,Ca 94578,(510) 352-8152
3,VASCO MHP,ALAMEDA,01-0010-MP,"6539 S Front Rd, Livermore, Ca 94550, (925) 44...","HCD - NORTHERN AREA OFFICE, 9342 TECH CENTER D...","HCD - NORTHERN AREA OFFICE, 9342 TECH CENTER D...",50.0,21.0,0.0,"SOLOMON/MANDEL, EDWARD/GEORGE, PO BOX 406, ALA...","6539 S FRONT RD, LIVERMORE, CA 94550, (925) 44...",6539 S Front Rd,Livermore,Ca 94550,(925) 447-0758
4,AVALON MOBILE HOME PARK,ALAMEDA,01-0013-MP,"3970 Castro Valley Blvd, Castro Valley, Ca 945...","HCD - NORTHERN AREA OFFICE, 9342 TECH CENTER D...","HCD - NORTHERN AREA OFFICE, 9342 TECH CENTER D...",49.0,0.0,0.0,"AVALON MHP INC, 4061 EAST CASTRO VALLEY BLVD S...","3970 CASTRO VALLEY BLVD, CASTRO VALLEY, CA 945...",3970 Castro Valley Blvd,Castro Valley,Ca 94546,(510) 537-4815
5,NEW ENGLAND VILLAGE,ALAMEDA,01-0015-MP,"940 New England Village Dr, Hayward, Ca 94544,...","HCD - NORTHERN AREA OFFICE, 9342 TECH CENTER D...","HCD - NORTHERN AREA OFFICE, 9342 TECH CENTER D...",415.0,0.0,0.0,"BRANDENBURG STAEDLER & MOORE, 1122 WILLOW ST S...","940 NEW ENGLAND VILLAGE DR, HAYWARD, CA 94544,...",940 New England Village Dr,Hayward,Ca 94544,(510) 785-4511
6,DEL RIO MOBILE HOME,ALAMEDA,01-0016-MP,"1550 162Nd Ave, San Leandro, Ca 94578, (510) 2...","HCD - NORTHERN AREA OFFICE, 9342 TECH CENTER D...","HCD - NORTHERN AREA OFFICE, 9342 TECH CENTER D...",52.0,0.0,0.0,"CALIFANO FAMILY TRUST, 1550 162ND AVENUE SPACE...","1550 162ND AVE, SAN LEANDRO, CA 94578, (510) 2...",1550 162Nd Ave,San Leandro,Ca 94578,(510) 278-8810
7,DEL VALLE MHP,ALAMEDA,01-0017-MP,"1148 Arroyo Rd, Livermore, Ca 94550, (510) 331...","HCD - NORTHERN AREA OFFICE, 9342 TECH CENTER D...","HCD - NORTHERN AREA OFFICE, 9342 TECH CENTER D...",29.0,0.0,0.0,"ABBOUD FAMILY TRUST, PO BOX 3975, HAYWARD, CA ...","1148 ARROYO RD, LIVERMORE, CA 94550, (510) 331...",1148 Arroyo Rd,Livermore,Ca 94550,(510) 331-2000
8,BAYSHORE COMMONS,ALAMEDA,01-0023-MP,"1468 Grand Ave, San Leandro, Ca 94577, (510) 3...","HCD - NORTHERN AREA OFFICE, 9342 TECH CENTER D...","HCD - NORTHERN AREA OFFICE, 9342 TECH CENTER D...",40.0,0.0,0.0,"SAN LEANDRO TRAILER PARK LLC, 6653 EMBARCADERO...","1468 GRAND AVE, SAN LEANDRO, CA 94577, (510) 3...",1468 Grand Ave,San Leandro,Ca 94577,(510) 351-5950
9,FAIRVIEW TP,ALAMEDA,01-0024-MP,"785 Rose Ave, Pleasanton, Ca 94566, (925) 846-...","HCD - NORTHERN AREA OFFICE, 9342 TECH CENTER D...","HCD - NORTHERN AREA OFFICE, 9342 TECH CENTER D...",14.0,8.0,0.0,"GIL, EDMUND, 2606 DERBY DR, SAN RAMON, CA 94583","785 ROSE AVE, PLEASANTON, CA 94566, (925) 846-...",785 Rose Ave,Pleasanton,Ca 94566,(925) 846-5074


In [17]:
list_of_parks['city'] = list_of_parks['city'].str.lstrip(' ')
list_of_parks[['state_only', 'zip']] = list_of_parks['state'].str.split(expand=True)
list_of_parks['state_only'] = list_of_parks.state_only.str.upper()
list_of_parks["updated_address"] = list_of_parks["num_st"] +","+ list_of_parks["city"] + "," + list_of_parks["state"]

**Next, I will explore some of the data fields. First off, it looks like there are 57 counties which make sense. I will go through them and ensure there are no duplicates.**

In [18]:
list_of_parks["county"] = list_of_parks["county"].str.lower()

In [19]:
list_of_parks.county.nunique()

57

In [20]:
list_of_parks.county.unique()

array(['alameda', 'alpine', 'amador', 'butte', 'calaveras', 'colusa',
       'contra costa', 'del norte', 'el dorado', 'fresno', 'glenn',
       'humboldt', 'imperial', 'los angeles', 'riverside', 'inyo', 'kern',
       'kings', 'lake', 'lassen', 'madera', 'marin', 'mariposa',
       'tuolumne', 'merced', 'mendocino', 'modoc', 'mono', 'monterey',
       'santa cruz', 'napa', 'nevada', 'placer', 'orange', 'plumas',
       'sacramento', 'solano', 'san benito', 'san bernardino',
       'san diego', 'san joaquin', 'san luis obispo', 'san mateo',
       'santa barbara', 'santa clara', 'shasta', 'siskiyou', 'sierra',
       'sonoma', 'stanislaus', 'sutter', 'yolo', 'tehama', 'trinity',
       'tulare', 'ventura', 'yuba'], dtype=object)

**Next, I'll filter this dataframe down so it only contains columns necessary to my analysis.**

In [22]:
list_of_parks_filtered = list_of_parks[['park_name',
                                        'mh_spaces',
                                        'city']]

**The last thing I'll do is apply a groupby to the dataframe so there are columns for each city and the amount of MH spaces in each city.**

In [23]:
list_of_parks_analysis = list_of_parks_filtered.groupby('city').mh_spaces.sum().reset_index().sort_values('mh_spaces')

In [24]:
list_of_parks_analysis

Unnamed: 0,city,mh_spaces
128,Canyon Lake,1.0
33,Aromas,1.0
448,Magalia,1.0
594,Point Arena,1.0
420,Loch Lomond,1.0
...,...,...
653,San Diego,4747.0
232,El Cajon,5359.0
644,Sacramento,6729.0
322,Hemet,6987.0


**Export dataframe as CSV.**

In [27]:
mh_analysis_path = os.path.join(processed_data, 'mh_parks_analysis.csv')
list_of_parks_analysis.to_csv(mh_analysis_path, index=False)

#### Acquiring 2018 ACS pop estimates

**I'll start by using a function I wrote for a previous project to get the specific data I want (ACS 5 year estimate for chart DPO5, which estimates population). I'll run the function and assign the resulting json to a variable.**

In [28]:
def get_ACS_name(year):
    dsource = 'acs'
    dname = 'acs5'
    base_url = f'https://api.census.gov/data/{year}/{dsource}/{dname}/profile'
    chart = 'DP05_0001E'
    state = '06'
    api_key = os.environ["CENSUS_API_KEY"]
    data_url = f'{base_url}?get=NAME,{chart}&for=place:*&in=state:{state}&key={api_key}'
    response=requests.get(data_url)
    json=response.json()
    return json

In [29]:
population_by_place_name = get_ACS_name(2018)

**I'll convert the json to a pandas dataframe and rename columns for readability.**

In [32]:
population_by_place_name =pd.DataFrame(population_by_place_name[1:], columns=population_by_place_name[0]
                              ).rename(columns={'NAME':'city',
                                                'DP05_0001E':'pop_2018_est'}
                                      ).drop(columns=['state']
                                            )#first argument specified rows, second specifies columns


In [33]:
population_by_place_name.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1521 entries, 0 to 1520
Data columns (total 3 columns):
 #   Column        Non-Null Count  Dtype 
---  ------        --------------  ----- 
 0   city          1521 non-null   object
 1   pop_2018_est  1521 non-null   object
 2   place         1521 non-null   object
dtypes: object(3)
memory usage: 35.8+ KB


**I'll remove any of the the words that follow the name of each place with an empty string.**

In [34]:
population_by_place_name['city'] = population_by_place_name['city'].str.replace(' city','')
population_by_place_name['city'] = population_by_place_name['city'].str.replace(' CDP','')
population_by_place_name['city'] = population_by_place_name['city'].str.replace(' town','')
population_by_place_name['city'] = population_by_place_name['city'].str.replace(', California','')

**Export as CSV.**

In [35]:
population_by_place_path = os.path.join(processed_data, 'ca_cities_2018_ACS.csv')
population_by_place_name.to_csv(population_by_place_path, index=False)

#### Cleaning CA place shapefile

**Before merging, I will retain only the columns that are necessary. Note: I keep the PLACEFP column because that's what I'll use to merge with the population dataset. I'll also maintain the NAMELSAD column because if there are any duplicated place names, this column will tell me if it's a CDP, town, or city.**

In [36]:
places_geo = pd.read_csv(raw_data + 'ca_place_geo_2018.csv', dtype=str)

In [37]:
places_geo.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1521 entries, 0 to 1520
Data columns (total 16 columns):
 #   Column    Non-Null Count  Dtype 
---  ------    --------------  ----- 
 0   STATEFP   1521 non-null   object
 1   PLACEFP   1521 non-null   object
 2   PLACENS   1521 non-null   object
 3   GEOID     1521 non-null   object
 4   NAME      1521 non-null   object
 5   NAMELSAD  1521 non-null   object
 6   LSAD      1521 non-null   object
 7   CLASSFP   1521 non-null   object
 8   PCICBSA   1521 non-null   object
 9   PCINECTA  1521 non-null   object
 10  MTFCC     1521 non-null   object
 11  FUNCSTAT  1521 non-null   object
 12  ALAND     1521 non-null   object
 13  AWATER    1521 non-null   object
 14  INTPTLAT  1521 non-null   object
 15  INTPTLON  1521 non-null   object
dtypes: object(16)
memory usage: 190.2+ KB


In [38]:
places_geo = places_geo[['NAME', 'ALAND', 'PLACEFP', 'NAMELSAD']]

In [39]:
places_geo.rename(columns={'NAME':'city',
                          'ALAND':'area_land',
                          'PLACEFP': 'place',
                          'NAMELSAD': 'place_type'}, inplace=True)

In [48]:
places_geo

Unnamed: 0,city,area_land,place,place_type
0,King City,10090834,38520,King City city
1,Woodlake,6305660,86300,Woodlake city
2,Twentynine Palms,152179753,80994,Twentynine Palms city
3,Upland,40432020,81344,Upland city
4,Yucaipa,73279143,87042,Yucaipa city
...,...,...,...,...
1516,Yosemite Lakes,54136049,86878,Yosemite Lakes CDP
1517,Yosemite Valley,5333712,86912,Yosemite Valley CDP
1518,Zayante,7058064,87090,Zayante CDP
1519,Sugarloaf Mountain Park,273752,75588,Sugarloaf Mountain Park CDP


### Merging

**In the following steps, I'll do a left outer join on the ACS estimates and the dataframe with land area on the common column of 'place.'**

In [41]:
merged = pd.merge(population_by_place_name, places_geo, on='place', how='outer')

In [42]:
merged.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1521 entries, 0 to 1520
Data columns (total 6 columns):
 #   Column        Non-Null Count  Dtype 
---  ------        --------------  ----- 
 0   city_x        1521 non-null   object
 1   pop_2018_est  1521 non-null   object
 2   place         1521 non-null   object
 3   city_y        1521 non-null   object
 4   area_land     1521 non-null   object
 5   place_type    1521 non-null   object
dtypes: object(6)
memory usage: 83.2+ KB


In [43]:
merged = merged.rename(columns={'city_x':'city'})

**Merge worked as planned, as there are no null values.**

**According to [the Census](https://www.census.gov/quickfacts/fact/note/US/LND110210#:~:text=Land%20area%20%2D%20an%20area%20measurement,MAF%2FTIGER%20Database%22), area is calculated in square meters. To convert to square miles, I will first convert 'population' and 'area' to integers, and then divide every value in 'area_land' by 2,589,988.** 

In [45]:
merged['pop_2018_est'] = merged['pop_2018_est'].astype(int)

In [46]:
merged['area_land'] = merged['area_land'].astype(int)

In [47]:
merged['area_land'] = (merged['area_land'] / 2589988).round(2)

**To find the population density, I'll create a new column. The values in that row will be population / area (in sq miles).**

In [49]:
merged['pop_density'] = (merged['pop_2018_est'] / merged['area_land']).round(2)

**Next, I'll create a new dataframe by merging the above with my mobile home data on 'city'.**

In [51]:
mh_merge = pd.merge(merged, list_of_parks_analysis, on='city', how='left')

In [52]:
mh_merge.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1521 entries, 0 to 1520
Data columns (total 8 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   city          1521 non-null   object 
 1   pop_2018_est  1521 non-null   int64  
 2   place         1521 non-null   object 
 3   city_y        1521 non-null   object 
 4   area_land     1521 non-null   float64
 5   place_type    1521 non-null   object 
 6   pop_density   1521 non-null   float64
 7   mh_spaces     678 non-null    float64
dtypes: float64(3), int64(1), object(4)
memory usage: 106.9+ KB


In [53]:
merged_path = os.path.join(processed_data, 'merged.csv')
merged.to_csv(merged_path, index=False)

mh_merge_path = os.path.join(processed_data, 'mh_merge.csv')
mh_merge.to_csv(mh_merge_path, index=False)