# Analyzing - Looking Into How Parks in Paradise Compare to Cities of Similar Size


### By Kavish Harjai

In this notebook, I analyze the number of mobile home lots in cities that had a similar population density to Paradise in 2018.

To find the range of comparison cities, I first found out at which percentile Paradise's population density falls. Then I found the values that corresponded with five percentile points above and below. I use those values to filter my dataset to 60+ cities that had comparable population densities. 



In [1]:
import pandas as pd 
import numpy as np
import os as os
import requests
from pprint import pprint

In [2]:
data_dir = os.environ["DATA_DIR"]
raw_data = data_dir + "/raw/"
processed_data = data_dir + '/processed/'

In [4]:
mh_merge = pd.read_csv(processed_data + 'mh_merge.csv')
mh_merge.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1521 entries, 0 to 1520
Data columns (total 8 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   city          1521 non-null   object 
 1   pop_2018_est  1521 non-null   int64  
 2   place         1521 non-null   int64  
 3   city_y        1521 non-null   object 
 4   area_land     1521 non-null   float64
 5   place_type    1521 non-null   object 
 6   pop_density   1521 non-null   float64
 7   mh_spaces     678 non-null    float64
dtypes: float64(3), int64(2), object(3)
memory usage: 95.2+ KB


In [5]:
mh_merge_filtered = mh_merge.dropna()

In [6]:
mh_merge_filtered['pop_2018_est'].describe().apply(lambda x: format(x, 'f'))

count        678.000000
mean       43852.662242
std       175206.005129
min            0.000000
25%         1387.000000
50%         8810.500000
75%        46604.000000
max      3959657.000000
Name: pop_2018_est, dtype: object

In [7]:
mh_merge_filtered[mh_merge_filtered.city == 'Paradise']

Unnamed: 0,city,pop_2018_est,place,city_y,area_land,place_type,pop_density,mh_spaces
183,Paradise,26543,55520,Paradise,18.32,Paradise town,1448.85,1586.0
874,Paradise,186,55528,Paradise,4.35,Paradise CDP,42.76,1586.0


**There are two Paradise matches because the smaller one is a CDP. The larger one is the town of Paradise.**

**To find the range of cities to compare to Paradise, I'll find at which percentile the town of Paradise's population density falls.**

In [8]:
mh_merge_filtered['pop_density'].quantile(q=[0.1,
                                              0.2,
                                              0.3,
                                              0.4,
                                              0.5,
                                              0.6,
                                              0.7,
                                              0.8,
                                              0.9,
                                              1.0])

0.1       65.400
0.2      181.572
0.3      489.355
0.4      940.922
0.5     1737.390
0.6     2710.674
0.7     3450.765
0.8     4322.878
0.9     6754.560
1.0    20352.540
Name: pop_density, dtype: float64

**Paradise falls somewhere between the 40th and 50th percentile. I'll drill down further.**

In [9]:
mh_merge_filtered['pop_density'].quantile(q=[0.4,
                                             0.41,
                                             0.42,
                                             0.43,
                                             0.44,
                                             0.45,
                                             0.46,
                                             0.47,
                                             0.48,
                                             0.49,
                                             0.5,
                                            0.51,
                                            0.52])

0.40     940.9220
0.41     967.0455
0.42    1151.2548
0.43    1221.7383
0.44    1286.7516
0.45    1318.8820
0.46    1388.6062
0.47    1484.7854
0.48    1530.8668
0.49    1660.7012
0.50    1737.3900
0.51    1862.2144
0.52    1942.4144
Name: pop_density, dtype: float64

**Paradise falls at the 47th quartile. To create my range based on percentiles, I will use the 42nd percentile as my lower limit and 52nd as my upper limit. That's 1151.2548 and 1942.4144.**

In [10]:
comparison_range_popdens_percentile = mh_merge_filtered[
    (mh_merge_filtered['pop_density']>= 1151.2548) & 
    (mh_merge_filtered['pop_density']<= 1942.4144)]


In [12]:
comparison_range_popdens_percentile.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 68 entries, 14 to 1454
Data columns (total 8 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   city          68 non-null     object 
 1   pop_2018_est  68 non-null     int64  
 2   place         68 non-null     int64  
 3   city_y        68 non-null     object 
 4   area_land     68 non-null     float64
 5   place_type    68 non-null     object 
 6   pop_density   68 non-null     float64
 7   mh_spaces     68 non-null     float64
dtypes: float64(3), int64(2), object(3)
memory usage: 4.8+ KB


In [13]:
sort_compared_popdens_percentile = comparison_range_popdens_percentile.sort_values('mh_spaces', ascending= False).reset_index()
sort_compared_popdens_percentile

Unnamed: 0,index,city,pop_2018_est,place,city_y,area_land,place_type,pop_density,mh_spaces
0,556,Yucaipa,53264,87042,Yucaipa,28.29,Yucaipa city,1882.79,4557.0
1,574,Lancaster,159662,40130,Lancaster,94.28,Lancaster city,1693.49,4177.0
2,1169,Redding,91327,59920,Redding,59.65,Redding city,1531.05,2569.0
3,14,Palmdale,156904,55156,Palmdale,106.08,Palmdale city,1479.11,2098.0
4,980,San Jacinto,47474,67112,San Jacinto,25.71,San Jacinto city,1846.52,1846.0
...,...,...,...,...,...,...,...,...,...
63,916,San Ardo,554,64476,San Ardo,0.45,San Ardo CDP,1231.11,14.0
64,1454,Ferndale,1365,23910,Ferndale,1.03,Ferndale city,1325.24,13.0
65,132,Chowchilla,18533,13294,Chowchilla,11.09,Chowchilla city,1671.15,9.0
66,746,Fort Jones,832,25128,Fort Jones,0.60,Fort Jones city,1386.67,5.0


### Conclusions