# Capstone Project - The Battle of Neighborhoods (Week 1)

## 1. Introduction Section

### Business Problem:

My client has been renting an apartment at west San Jose (zip code is 95129) area more than ten years. The leasing rate of a two beds room apartment has been raising dramatically since the crashing of economic bubble in 2008. He pays around 4,500 for his small town house now rather than 2,200 per month ten years ago. My client is pushing me lately to do a search for buying a house at Silicon Valley if the environment is good and the price is within his budget. I am very excited and want to use this opportunity to practice what I've learned so far from this course. My client's key question is : How can I help him to find a convenient and enjoyable place similar to his current resident area? I am planning to use the **FourSquare API** that we learned at this course and some real estate API (such as **ZillowAPI**) and **Silicon Valley Real Estate websites** in the market. The idea is to use this chance to apply the knowledge and tools I have learned so far. Here belows are the requirements from my client:

- The amenities in the selected neighborhood shall be similar to his current residence apartment
- The price is around 1.5M 
- House must be at least 3 bedrooms, 2 bathrooms, 1 car garage, around 1800 to 2100 square footage of size
- Near the park (within 0.5 mile)
- Near the library (within 1 mile)
- Near the school (within 0.5 mile)
- The schools in the area should have high rating (Ranking greater and equal than 8)
- Not close to the railroad (at least 3 mile away)
- The location is near the supermarket (within 0.5 mile radius)
- The location is near the shopping mall (within 3 mile radius)
- The location is close (within 1 mile) to venues such as restaurants (Asian and Mexican foods ...etc), parks and coffee shops
- The neighborhood/community should be safe and have low crime rate

Base on the requirements listed above, I finalize the business problem as:

**How to buy a dream house in Silicon Valley which complies with the requirements of price, features, safety, location and venues?**



### The audience who would be interested in this project:
This case is also applicable for anyone interested in exploring the ways of searching and analysis the location and real estate data for finding a suitable house to buy in Silicon Valley

## 2. Data Section

#### The following data is required to answer the questions of the business problem:
- List of **public schools** in Santa Clara county with their location data [https://data.sccgov.org/Education/SchoolsPublic/q83h-ht3q]
- List of **parks** in Santa Clara county with their location data [https://data.sccgov.org/Environment/ParkPoints/3t3k-gian]
- List of **railroad station** in Santa Clara county with their location data [https://data.sccgov.org/Transportation/RailroadStations/9wv8-3ekq]
- List of **public library** in Santa Clara county with their location data [https://data.sccgov.org/Government/Libraries/xxrb-pj5j]
- List of **crime report** in Santa Clara county with their location data [https://data.sccgov.org/Public-Safety/Crime-Reports/n9u6-aijz/data]
- California Santa Clara county **school rating** data [https://school-ratings.com/counties/Santa_Clara.html]
- **House price trend** data of the cities in Silicon Valley [https://julianalee.com/trends.htm]
- List of **Recent Sold House** dataset based on zip code [https://julianalee.com/zip-code/95124/95124-home-sales.htm]
- Data from **Zillow API** for comparable house sales analysis [https://github.com/asclepiusaka/zillowAPI]
- Most current list of **houses for sale** in each neighborhoods from real estate website with their addresses, price and selling information [https://julianalee.com/real-estate/property-organizer-view-saved-search/10279348/]
- Data from **Foursquare API** [https://developer.foursquare.com/docs]

#### The dataset and features will be used as following:
1. Use **geopy Nominatim** to find out the latitude and longitude of current resident apartment.
2. Apply **FourSquare** to find 10 venues around current residence in San Jose.
3. Map current residence place with 10 venues.
4. **Schools Dataset**: [features to be extracted: **'ZIP', 'CITY', 'PLACENAME', 'ADDRESS', 'LATITUDE', 'LONGITUDE'**]. **School Rating Dataset** (WebScraping by **BeautifulSoup**): [features to be extracted: **'SCHOOL', 'RANK'**]. Based on my client's house buying criteria, the school ranking (Ranking greater and equal than 8) is a major factor of his consideration. Therefore, I download the public school information dataset from the Santa Clara county open dataset. The goal is to show those schools with their school ranking on the map to help house selection.
5. **Parks dataset**: [features to be extracted: **'PLACENAME', 'LATITUDE', 'LONGITUDE'**]. My client is considering the potential residence community should be close (within 0.5 mile) to parks. So, I download the parks dataset from Santa Clara county's open data repository. The parks will be shown on the map to help house buying decision.
6. **Railroad stations dataset**: [features to be extracted: **'PLACENAME', 'PLACETYPE', 'LATITUDE', 'LONGITUDE'**]. My client raises a concern about the noise of the trains. So, the potential candidates should not be too close (at least 3 mile away) to the railroad stations. The railroad station will be shown on the map to help my client to make buying decision.
7. **Public Libraries dataset**: [features to be extracted: **'ZIP', 'CITY', 'PLACENAME', 'ADDRESS', 'LATITUDE', 'LONGITUDE'**]. My client enjoy going to libraries. The candidates must be convenient to access to the libraries (within 1 mile). The libraries will be shown on the map to help decision making.
8. **Crime Reports dataset**: [features to be extracted: **'city', 'incident_type_primary', 'parent_incident_type', 'latitude', 'longitude'**]. The safety is the most important factor while considering buying a house. My client emphasize that he won't buy a house in an area with higher crime rate even the price is cheaper. The county provides a huge dataset that contains more the one hundred and fifty thousand records. We have to clean up all those data and only keep those which are essential and suitable for visualization.
9. **House price trends data**: WebScraping by **BeautifulSoup** [features to be extracted: **'City', 'Median Price', '2018 Change', '2017 Change', '2016 Change'**]. My client is aiming at an affordable price within 1.5M so a median prices chart including every city/zipcode can provide my client a better direction of home hunting.
10. Process the ZIP code and Cities: Since we can not obtain the exact boundaries of the city, I will utilize the schools dataset which contains all the location data for each school. Usually, higher school density can represent the city when using Foursquare library.
11. Explore Cities in Santa Clara County: [features to be extracted: **'Neighborhood', 'Venue Category', 'venue', 'freq'**]. I will use **Foursquare** to explore all cities/zipcodes. The goal is to find a city that has similar context with my client's current city. The context includes foods, restaurant, coffee shops, gymnastics, interesting points ...etc. I would like to provide a visualization map including similar cities/zipcodes grouped together to my client.
12. Analyze Each Neighborhood/city: Group rows by neighborhood and by taking the mean value of the frequency of occurrence of each category. I will consider each neighborhood along with the top 5 most common venues.
13. Clustering Cities/Zipcodes: Use **KMean or DBScan algorithm** for clustering analysis
14. **Recent Sold House Price Data**: WebScraping by **BeautifulSoup** [features to be extracted: **'ZIP', 'ADDRESS', 'ORGLD', 'ORIG LSPRC', 'LIST PRICE', 'SALE PRICE', 'SQFT', 'LOTSZ', 'COE', 'DOM'**]. I will use geopy Nominatim library to get the latitude and longitude values. The goal is to show those houses with features and sold price on the map to help decision making.
15. House **Comparable Sales Analysis**: [features to be extracted: **'zpid', 'zipcode', 'city', 'street', 'year_built', 'lot_size', 'finished_size', 'bedrooms', 'bathrooms', 'zestimate', 'last_sold_price', 'last_sold_date', 'latitude', 'longitude', 'home_details', 'similar_sales'**]. Use **Zillow API** to get house features from Zillow real estate database

#### The processing of these DATA will allow to answer following key questions to make the buying decision:

- Which city/zipcode is good to consider while considering the **price** (less than 1.5M)?
- Which city/zipcode is good to consider while considering the **school** (within 0.5 mile and ranking >= 8)?
- Which city/zipcode is good to consider while considering the **park** (within 0.5 mile)?
- Which city/zipcode is good to consider while considering the **library** (within 1 mile)?
- Which city/zipcode is good to consider while considering the **railroad** (at least 3 mile away)?
- Which city/zipcode is good to consider while considering the **supermarket** (0.5 mile radius)?
- Which city/zipcode is good to consider while considering the **shopping mall** (within 3 mile radius)?
- Which city/zipcode is good to consider while considering the **foods, restaurants (Asian and Mexican foods ...etc) and coffee shops**?
- Which city/zipcode is good to consider while considering the **safety** (low crime rate)?
- Which city/zipcode has the **best housing pricing**?
- Are there **tradeoffs** between size, price, location and safety?
- What are the venues of the **two best house** to buy?
- Any other **interesting statistical data** findings from the **Foursquare** and **ZillowAPI** real estate data?

## 3.	Methodology section

## 4.	Results section

## 5.	Discussion section

## 6.	Conclusion section