# Final Report | The Best Neighborhood in New York City
Florence Pernia Del Rosario

## 1. Introduction


#### 1.1 Background Information

Moving to a new city or country is always intimidating, and people may be unsure of their choice of neighborhood to live in. There is a plethora of decision variables one has to consider when choosing the perfect neighborhood. It could be the place interests, food locations, crime rates, housing prices etc. 

The purpose of this report is to allow people to make smarter and more informed decisions when selecting the neighborhood that is best fit with their interests, budgets and concerns.

With access to databases available on the internet, we are able to extract information relevant to our places of interest. In this project, we will be making use of database website such as Foursquare and other relevant OpenData websites for the information we need. In the end of this report, we should be able to confidently recommend the best neighborhoods in New York City.



#### 1.2 Location of Choice
In this report, we our focal point will be The Big Apple, New York City. New York City is one the most ethnically diverse, commercially driven and most attractive urban centre in the country. It is not a shock that people would wish to shift to this bustling city. The main purpose of this project is to suggest the best neighborhoods in the city for someone shifting to this state.


#### 1.3 Target Audience:
The target audience of this project are people migrating to the state of New York City, and are unsure of the neighborhood to buy a property at. These would mainly be people who wish to save as much as they can, but still live wish to enjoy a good lifestyle and ensure their security in their new homes.

## 2. Data

This project makes use of data from four different sources to support our analysis. 

#### 2.1 Neighborhood Coordinates:
In order to segment and explore the neighborhoods, we can extract the data of New York City neighborhoods from the [NYU Spatial Data Repository](https://geo.nyu.edu/catalog/nyu_2451_34572). Here, we can obtain:

1. Borough
2. Neighborhood
3. Latitude
4. Longitude

#### 2.2 Venues and Places of Interest:
One important decision variable of people would be the places of interest around the neighborhood. This can be obtained from the FourSquare API. For each neighborhood, we have chosen a radius of 500 meter. We will be obtaining:

1. Neighborhood
2. Neighborhood Latitude
3. Neighborhood Longitude
4. Venue
5. Venue Latitude
6. Venue Longitude
7. Venue Category

#### 2.3 Subsidized Housing Prices:
The prices of properties is a major factor in one's decision in moving into a neighborhood. Thus, I extracted this data from [CoreData.nyc](http://app.coredata.nyc/?mlb=true&ntii=&ntr=&mz=14&vtl=https%3A%2F%2Fthefurmancenter.carto.com%2Fu%2Fnyufc%2Fapi%2Fv2%2Fviz%2F98d1f16e-95fd-4e52-a2b1-b7abaf634828%2Fviz.json&mln=false&mlp=true&mlat=40.718&ptsb=&nty=&mb=roadmap&pf=%7B%7D&md=table&mlv=false&mlng=-73.996&btl=Borough&atp=properties) to facilitate in our analysis. We do not require all columns from this dataset. Thus, this will be cleaned and we will only be obtaining:

1. boro_name
2. accessed_value
3. res_unit

#### 2.4 Crime Data:
Another major deciding factor for many residents are crimes rates within the area. Thus, I have retrieved the NYPD complaints dataset from [NYC OpenData](https://data.cityofnewyork.us/Public-Safety/Crime-Map-/5jvd-shfj). This is a very large dataset, consisting of 108058 rows, thus we reduced this to be able to work with the data easier. 

This will give us a basic overview of crimes in boroughs in overall. We do not require all columns from this dataset as well, and thus, will also be cleaned. We will only need:

1. BORO_NM
2. Latitude
3. Longitude

## 3. Methodology

#### 3.1 Geocoders:
This project required the latitude and longitude coordinates of New York City and it's multiple Boroughs in our data analysis and visualizations. The  Geocoders Nomatim function allowed for ease and accurate retrieval of these necessary coordinates.

![Geocoder](https://github.com/florencedelros/IBM-Capstone-Project/blob/master/Final%20Project/Images/Screenshot%202020-06-08%20at%203.46.26%20PM.png?raw=true)

 #### 3.2 Folium:
 All clustering and bubble map visualization was done with the help of Folium to generate a map using OpenStreetMap technology. With the add circle markers function, I was able to represent clusters in differing colours, as well as different circle radius to represent the magnitude of the data.
 
 **Different Radius**
 ![Folium](https://github.com/florencedelros/IBM-Capstone-Project/blob/master/Final%20Project/Images/Screenshot%202020-06-08%20at%203.46.54%20PM.png?raw=true)
 
 **Different Colours**
  ![Folium](https://github.com/florencedelros/IBM-Capstone-Project/blob/master/Final%20Project/Images/Screenshot%202020-06-08%20at%203.47.07%20PM.png?raw=true)

#### 3.3 Bar Chart Sub-Plot:
We made use of a 2 way Bar Chart Sub-Plots to compare the analysis for our Crime Data and Housing Data. This allowed us to see our data clraly in one plane, which aids in the comparison we needed to conduct.

![Folium](https://github.com/florencedelros/IBM-Capstone-Project/blob/master/Final%20Project/Images/subplot.png?raw=true)

#### 3.4 FourSquare API:
One major decision variable in choosing the best neighborhood was the venue categories in its vicinity. The FourSquare API enabled us to retrieve this dataset matching the neighborhood data we already have.

![Folium](https://github.com/florencedelros/IBM-Capstone-Project/blob/master/Final%20Project/Images/foursquare%20api.png?raw=true)

#### 3.5 One-Hot Coding:
One-Hot Coding is a process which converts data into dummy variables, assigned a 1 or 0. This was a necessary step in listing the top 5 most common venues in each neighborhoods, as well as our clustering process.

![Folium](https://github.com/florencedelros/IBM-Capstone-Project/blob/master/Final%20Project/Images/one-hot%20coding.png?raw=true)

#### 3.6 K-Means Clustering:
We needed to find similarities within the neighborhood datset, and not base our recommendation only based on the largest amount of venues within the vicinity. Thus, to form these clusters, we trained and made use of the K-Means Algorithm.

![Folium](https://github.com/florencedelros/IBM-Capstone-Project/blob/master/Final%20Project/Images/Kmeans%20Clustering.png?raw=true)

#### 3.7 Top Venue Categories in each Cluster:
The variety and category of venues is also an important aspect in our analysis in choosing the best venue. Thus, we analysed the characteristics of these clusters according to the kinds of categories it has.

![Folium](https://github.com/florencedelros/IBM-Capstone-Project/blob/master/Final%20Project/Images/Top%20VenueCat.png?raw=true)

## 4. Results

#### Crime Numbers in all 5 Boroughs :
![Foium](https://github.com/florencedelros/IBM-Capstone-Project/blob/master/Final%20Project/Images/Screenshot%202020-06-08%20at%203.26.24%20PM.png?raw=true)
<center> Figure 4.1</center>


#### Subsidized Housing Prices in all 5 Boroughs :
![Foium](https://github.com/florencedelros/IBM-Capstone-Project/blob/master/Final%20Project/Images/Housing%20Prices.png?raw=true)
<center> Figure 4.2</center>


#### Comparison of the Crime Numbers and Housing Prices :
![Foium](https://github.com/florencedelros/IBM-Capstone-Project/blob/master/Final%20Project/Images/barplotsubplot.png?raw=true)
<center> Figure 4.3</center>



#### Staten Island Clusters :
![Foium](https://github.com/florencedelros/IBM-Capstone-Project/blob/master/Final%20Project/Images/Statencluster.png?raw=true)
<center> Figure 4.4</center>

![Foium](https://github.com/florencedelros/IBM-Capstone-Project/blob/master/Final%20Project/Images/staten%200.png?raw=true)
<center> Cluster 0</center>

![Foium](https://github.com/florencedelros/IBM-Capstone-Project/blob/master/Final%20Project/Images/Staten%203.png?raw=true) 
<center> Cluster 3</center>


#### Bronx Clusters :
![Foium](https://github.com/florencedelros/IBM-Capstone-Project/blob/master/Final%20Project/Images/Statencluster.png?raw=true)
<center> Figure 4.5</center>

![Foium](https://github.com/florencedelros/IBM-Capstone-Project/blob/master/Final%20Project/Images/staten%200.png?raw=true)
<center> Cluster 1</center>

![Foium](https://github.com/florencedelros/IBM-Capstone-Project/blob/master/Final%20Project/Images/Staten%203.png?raw=true) 
<center> Cluster 2</center>

## 5. Discussion

#### Cutting down of Boroughs by Looking at the Crime and Housing Prices results:

The results above shows the workflow of how we were able to get the best neighborhoods in New York City. We cut down the Boroughs by looking at the Boroughs that retirned the least values for the its crime numbers (Figure 4.1) and Housing Prices (Figure 4.2). Our target audience moving into the state would likely prefer to live in a neighborhood that had the lowest crime rates and housing prices. 

As seen in the Bar Graph Sub-Plot (Figure 4.3) can see a common pattern and the 3 lowest ranking Boroughs are similar: **Staten Island, Queens and Bronx**. Although this does not cut down our choices, this clearly affirms that the top 3 best boroughs in terms of safety and housing prices are the ones stated above.

However, to futher deep dive into our analysis, we should consider the different priorities of different groups of migrants. Some may place more importance on lower prices, and others may be greater safety. Thus, we will divide our recommendations for 2 groups of people:

* **Staten Island**: Safety Concious
* **Bronx**: Price Concious

#### The last criteria we need to look at is Venues within the cluster of neighborhoods:

We would not know the preference of individual people or families migrating, whether they wish to have more parks in the vicinity or food places. With this, we will  judge the neighborhoods in terms of the most venues within the clusters, and next the variety of venue categories in those chosen clusters. This will ensure that the migrants will have a relatively good lifestyle around the neighborhood and will cater to a variety of their needs.

**For Staten Island:**

In Figure 4.4 , we can observe that Cluster 0 is mainly a Food District, while Cluster 3, although has a lot of food places, also has a greater variety of venues such as Grocery stores, Pharmacy and Banks. This will be greatly convinient when moving into a new area. Thus, the best neighborhoods in Staten Island are from Cluster 3, which are:
1. West Brighton
2. Etingville
3. New Springville

**For Bronx:**

In Figure 4.5 , it is clear that the variety of Venues in both clusters is extremely similar and will both provide a relatively good lifestyle for the migrants. Thus, we judged the best neighborhood using the most venue actegories within the vicinity, which is Cluster 2. The top 3 neighborhoods are: 
1. Belmont
2. Fordham
3. Kingsbridge

## 6. Conclusion

The neighbourhoods West Brighton, Etingville and New Springville of **Staten Island**, as well as Belmont, Fordman and Kingsbridge of the **Bronx** are the bext neighborhoods in New York City.

These boroughs cater to a variety of people and their concerns, whether they are money concious or wary of their safety. The neighborhoods further caters to people's interest and preferences, in terms venue categories. This will give them greater flexibility in the new area they wish to live in.

Overall, I feel highly proud that I was able to apply and reinforce the knowledge and skills I have learnt throughout these 9 courses in IBM Data Science Course. This project has not only shown me the practical application to be able to analyse and resolve problems with Data Science tools, but also how personal, social and financial data impacts our analysis and recommendations.