# Natural Environment Conditions
### By Shin Young Kang 

Natural Environment Conditions is the name of a land and population census from 2010 containing data related to land use in individual neighborhoods. Data includes proportions of land that contain park/greenspace. And how prone the land is to different natural disasters. By using this data, we can come to a conclusion which neighborhoods are more desirable and livable in our metrics. 

The data recorded in the census pertains to how the land in each neighborhood has been developed but also what percentage of the land is suseptable to flooding or landslides, which should make the neighborhood less desirable. While the monetary value of land comes from the economic potential/value the land represents, ultimately it is the natural environment the land is in that influence the subjective value of the land.


In [11]:
import pandas as pd
file = pd.read_csv("tough_times_.csv", index_col="_id")
file.head(5)

Unnamed: 0_level_0,Neighborhood,Sector #,Population (2010),Land Area (acres),Landslide Prone (% land area),Undermined (% land area),Flood Plain (% land area),# Street Trees,Park Space (acres),Park Space (% of land area),Park Space (acres/1000 pers.),Greenway (% of land area),Woodland (% of land area),Cemetery (% of land area)
_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
1,Allegheny Center,3,933,134.4,0.0%,0.0%,0.2%,22,55.3,41.1%,59.2,0.0%,0.0%,0.0%
2,Allegheny West,3,462,90.2,9.3%,0.0%,2.3%,229,7.2,8.0%,15.5,0.0%,4.2%,0.0%
3,Allentown,6,2500,188.8,27.1%,90.4%,0.0%,87,39.4,20.9%,15.8,0.0%,12.3%,0.0%
4,Arlington,7,1869,300.8,41.4%,57.3%,1.5%,79,6.7,2.2%,3.6,0.0%,29.4%,7.3%
5,Arlington Heights,7,244,84.5,39.9%,61.2%,0.0%,3,0.0,0.0%,0.0,0.0%,41.8%,0.0%


Not all the information in the dataset will be used for the project. Therefore we will be filtering out certain parts of it. 

In [3]:
#Filtered data
filter1 = file[["Neighborhood", "Population (2010)", "Land Area (acres)", "Landslide Prone (% land area)", "Undermined (% land area)", "Flood Plain (% land area)","Park Space (acres)", "Park Space (% of land area)", "Greenway (% of land area)"]]
filter1.head(5)

Unnamed: 0_level_0,Neighborhood,Population (2010),Land Area (acres),Landslide Prone (% land area),Undermined (% land area),Flood Plain (% land area),Park Space (acres),Park Space (% of land area),Greenway (% of land area)
_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
1,Allegheny Center,933,134.4,0.0%,0.0%,0.2%,55.3,41.1%,0.0%
2,Allegheny West,462,90.2,9.3%,0.0%,2.3%,7.2,8.0%,0.0%
3,Allentown,2500,188.8,27.1%,90.4%,0.0%,39.4,20.9%,0.0%
4,Arlington,1869,300.8,41.4%,57.3%,1.5%,6.7,2.2%,0.0%
5,Arlington Heights,244,84.5,39.9%,61.2%,0.0%,0.0,0.0%,0.0%


### Initial Analysis
If we are simply looking at which neighborhood has the most amount of parks as the "best neighborhood" and the most landslide prone neighborhood as the "worst"...

In [12]:
filter2 = file[['Neighborhood', 'Population (2010)', 'Park Space (acres)']]
filter2[filter2['Park Space (acres)'] == filter2['Park Space (acres)'].max()]

Unnamed: 0_level_0,Neighborhood,Population (2010),Park Space (acres)
_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
77,Squirrel Hill South,15110,684.6


Squirrel Hill South has the most amount of park space. This is likely because Frick Park is located in this neighborhood. 

In [13]:
file['Landslide Prone (% land area)'] = file['Landslide Prone (% land area)'].str.rstrip('%').astype(float)
file['Park Space (% of land area)'] = file['Park Space (% of land area)'].str.rstrip('%').astype(float)
file['Undermined (% land area)'] = file['Undermined (% land area)'].str.rstrip('%').astype(float)
file['Flood Plain (% land area)'] = file['Flood Plain (% land area)'].str.rstrip('%').astype(float)
file['Greenway (% of land area)'] = file['Greenway (% of land area)'].str.rstrip('%').astype(float)

In [14]:

filter2 = file[['Neighborhood', 'Population (2010)', 'Landslide Prone (% land area)']]
filter2[filter2['Landslide Prone (% land area)'] == filter2['Landslide Prone (% land area)'].max()]

Unnamed: 0_level_0,Neighborhood,Population (2010),Landslide Prone (% land area)
_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
33,Fineview,1285,92.0


And Fineview is the most Landslide Prone. This could causes a serious threat to the properties and people living in the area. 

However, there are other metrics that we need to look at. And also the amount of park space a neighborhood has does not mean much as different neighborhoods have different sizes. In order to have a more holistic view, we need to consider multiple metrics and consider the proportions of land that is developed into public greenspace or prone to danger.

### Further Analysis

In [15]:

filter2 = file[['Neighborhood', 'Population (2010)', 'Park Space (% of land area)']]
filter2[filter2['Park Space (% of land area)'] == filter2['Park Space (% of land area)'].max()]

Unnamed: 0_level_0,Neighborhood,Population (2010),Park Space (% of land area)
_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
66,Regent Square,928,51.6


In [16]:

filter3 = file[['Neighborhood', 'Population (2010)', 'Undermined (% land area)']]
filter3[filter3['Undermined (% land area)'] == filter3['Undermined (% land area)'].max()]

Unnamed: 0_level_0,Neighborhood,Population (2010),Undermined (% land area)
_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
44,Knoxville,3747,99.9


In [18]:

filter4 = file[['Neighborhood', 'Population (2010)', 'Flood Plain (% land area)']]
filter4[filter4['Flood Plain (% land area)'] == filter4['Flood Plain (% land area)'].max()]

Unnamed: 0_level_0,Neighborhood,Population (2010),Flood Plain (% land area)
_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
57,North Shore,303,82.0


In [19]:

filter5 = file[['Neighborhood', 'Population (2010)', 'Greenway (% of land area)']]
filter5[filter5['Greenway (% of land area)'] == filter5['Greenway (% of land area)'].max()]

Unnamed: 0_level_0,Neighborhood,Population (2010),Greenway (% of land area)
_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
25,Duquesne Heights,2425,16.6


The basic conclusion is that **Regent Square** has the most amount of parkspace percentage, **Fineview** is the most landslide prone, **Knoxville** has the most amount of undermined land, **North Shore** has the most Flood Plain, and **Duquesne Heights** has the most amount of Greenway. 

Undermined land means land that the ground underneath it is weak and eroded. This makes it dangerous for structures built on top of it. This means that undermined land can be more dangerous than even landslide prone land. 

Greenway essencially means walking trails on unpaved lands (so basically like trails). While this serves a similar function to park space, greeenway is also a form of transportation infrastructure which is a very good thing for residents. 

In order to combine the data, I have decided to combine the different datasets and add all greenspace into one set and all geographic dangers to another

In [27]:
file['greenspace'] = file['Park Space (% of land area)'] + file['Greenway (% of land area)'] 

filter3 = file[["Neighborhood", 'Park Space (% of land area)', 'Greenway (% of land area)', 'greenspace']]

neighborhood_max_green_space = filter3[filter3['greenspace'] == filter3['greenspace'].max()]

filter3[filter3['greenspace'] == filter3['greenspace'].max()]


Unnamed: 0_level_0,Neighborhood,Park Space (% of land area),Greenway (% of land area),greenspace
_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
66,Regent Square,51.6,0.0,51.6


**Regent Square** has the most amount of greenspace total with 51.6% of its land being Park Space. While there zero greenway, it makes up for it by having more than half the neighborhood's land dedicated to parks.

In [28]:
file['dangers'] = file['Landslide Prone (% land area)'] + file['Undermined (% land area)'] + file['Flood Plain (% land area)']

filter3 = file[["Neighborhood", 'Landslide Prone (% land area)', 'Undermined (% land area)', 'Flood Plain (% land area)', 'dangers']]

neighborhood_max_green_space = filter3[filter3['dangers'] == filter3['dangers'].max()]

filter3[filter3['dangers'] == filter3['dangers'].max()]

Unnamed: 0_level_0,Neighborhood,Landslide Prone (% land area),Undermined (% land area),Flood Plain (% land area),dangers
_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
85,Upper Hill,38.5,83.4,0.0,121.9


**Upper Hill** ended up being the neighborhood with the most amount of potential land dangers. 38.5% was landslide prone and 83.4% was undermined. This creates a hazardous environment for the residents and their property.

# Conclusion

Geographic factors is only one of the many aspects to how 'nice' a neighborhood is. The dataset does not take into consideration the type of housing in each area or other non-geographic threats. However it does provide some insight into the type of land that makes up the neighborhood. Also the data fails to account for how well the park is maintained or what recreational facilties the park has, which are important for seeing how well the greenspace serves the residents. 