# The Battle of the Neighborhoods - Report

## Introduction

<u>***Obesity***</u> is a medical condition where excess body fat has accumulated to an extent that may have a negative effect on health. Obesity is quite widespread. In the US, about 35% of the population (which is approximately 82 million) are obese. Obesity often adds to various diseases and conditions - often for life. Obesity is one of the leading preventable causes of death worldwide. It is linked to various cardiovascular diseases, Type 2 Diabetes, few types of cancer, high blood pressure, apnea etc. The World Health Organization (WHO) predicts that overweight and obesity may soon replace more traditional public health concerns such as undernutrition and infectious diseases as the most significant cause of poor health. Obesity also has taken a toll on healthy care costs across the country—estimated between 187 billion dollars and 265 billion dollars in direct and indirect health care costs, as of 2019.

![obesity-rates-2030.png](attachment:obesity-rates-2030.png)

The main treatment for obesity consists of weight loss via calorie restricted dieting. However, dieting over a long period of time is demanding. Thus physical exercise is equally important. As someone who used to be overweight, I have decided to focus on reducing this issue by facilitating construction of gyms and/or yoga classes.

## The Problem

I have chosen the city of Chicago, Illinois. Chicago is home to more than 2.6 million residents. Obesity is a major problem in Chicago, where 36.2% of the city's high school students and 61.2% of adults in the metropolitan area are overweight or obese. That is among the highest in the entire country! The aim of this project is to successfully locate suitable locations to construct Gym, fitness centers and/or yoga classes in Chicago. They should be built in a place having no access to gym/yoga classes.

## Target Audience

This project will be very useful for Chicago city commission, health officials, entrepreneurs and NGO's who are interested in having an insight into the various neighborhoods in order to build a gym/yoga classes.

## Data Description

For this project, the following datasets were collected:

1) ***Chicago Neighborhood Data*** - From https://en.wikipedia.org/wiki/List_of_neighborhoods_in_Chicago. It contains all the Chicago Neighborhoods by their community areas. I have scraped the neighborhoods table using beautifulsoup library.

![Chicago%20Neighborhoods.JPG](attachment:Chicago%20Neighborhoods.JPG)

2) ***Neighborhood Location data*** - Using <u>Geocoders</u> package to retrieve latitude and longitude data of each neighborhoods.

![Location%20Data.JPG](attachment:Location%20Data.JPG)

3) ***Neighborhood Venues data*** - <u>Using Foursquare API</u>, we can gather a list of venues for each neighbourhood within radius of 500 meters.

![Venues.JPG](attachment:Venues.JPG)

Other Important Libraries:-

1) **Pandas** - For working with a dataframe of neighborhoods, latitudes and longitudes, venues etc

2) **JSON** - The wikipedia link provided is programmed using JSON language. We require this library to parse it's contents.

3) **Folium** - Visualization library in python for visualizing clusters of neighborhoods.

4) **Scikit** - For using K-means clustering approach algorithm.

## Methodology

After importing all the required libraries, the first step involved collecting neighborhood and the location data and merging them into a single dataframe for simplicity. After visualizing the neighborhoods on a map, a few locations were found to be existing outside Chicago. These entries were dropped. Using Foursquare API, we then find all the nearby venues for each neighborhood, with radius of 1000 and limit of 200. The venues are grouped together with its corresponding neighborhood. We found 414 unique categories. Out of these, we're interested in "Gym", "Gym/Fitness Center" and "Yoga Studios". After tuning our drataframe towards these three categories, we prepare for K-Means Clustering algorithm.. 

In order to locate a suitable place to construct a gym and/or yoga studio compare the similarities of two cities, we have to perform exploratory data analysis on each neighborhoods, segment them, and group them into clusters. To achieve that we cluster data using the k-means clustering algorithm, which is an unsupervised machine learning algorithm.

We divide our data into 10 clusters and then run the algorithm. The cluster data consists of neighborhood having similar properties. These 10 clusters are visualized on a map. We then analyze each cluster to draw out conclusions which will help us to reach our goal. 

## Results

After running K-means clustering, the resulting 10 clusters are visualized on a map.

![Cluster%20map-2.JPG](attachment:Cluster%20map-2.JPG)

#### Cluster 0

![0-2.JPG](attachment:0-2.JPG)

#### Cluster 1

![1-2.JPG](attachment:1-2.JPG)

#### Cluster 2

![2-2.JPG](attachment:2-2.JPG)

#### Cluster 3

![3-2.JPG](attachment:3-2.JPG)

#### Cluster 4

![4-2.JPG](attachment:4-2.JPG)

#### Cluster 5

![5-2.JPG](attachment:5-2.JPG)

#### Cluster 6

![6-2.JPG](attachment:6-2.JPG)

#### Cluster 7

![7-2.JPG](attachment:7-2.JPG)

#### Cluster 8

![8-2.JPG](attachment:8-2.JPG)

#### Cluster 9

![9-2.JPG](attachment:9-2.JPG)

## Discussions

# After analysing the map, a few observations are made:

1) Cluster 1 represents the best opportunity to build a gym or yoga studios since this cluster (purple) does not have any gyms, yoga classes or fitness centers. Southern Chicago is the prime location I'd choose given the number of gyms here are scanty.

2) Clusters 0,2,3,8 have gyms/fitness centers but no yoga studios. Gyms are usually more lucrative than yoga studios. Some gyms have in built yoga studios. In such cases, construction of independent yoga studios may be risky.

3) Cluster 6 has yoga studios but no gyms/ fitness centers. As gyms are more popular, I'd choose to construct a gym here.

4) In general, southern and north western Chicago has lesser gyms/ yoga studios when compared to the other parts. This makes sense as suburbs have lesser population compared to the main city. It would be more prudent to build a gym/ yoga studio at the mid point of 3-4 suburban neighborhoods in order to have enough population to sustain the gym/ yoga studio profitably.

## Conclusion

This assignment involved scraping the required data, converting it into relevant form and then running K-means clustering algorithm to find a suitable place to construct a gym and/or yoga studio. I divided the neighborhoods into 10 clusters and ran k-means algorithm on them. My recommendation is to open a gym in cluster 0 and cluster 3 and Yoga studios on cluster 2,6,7 (with caution).

<u>Future</u>: In the future, this project can be more refined by collecting population data and refining our analysis a bit more. Chicago is also known for gun violence, which is amongst the highest in the country. Some neighborhoods will be more crime ridden than the others. If we accommodate crime statistics for each neighborhood, we can refine our analysis even more. Another idea is to look for neighborhoods with high number of restaurants and food stalls. More people will gather at these food stalls and thus a construct of gym can be a lucrative option.

### Thank you for reading, I hope you enjoyed this assignment as much as I did!!