# 1. Introduction
## 1.1 Background
San Francisco Bay Area is a populous region in Northern California with nearly 7.8 million people within nine-county region. It is a major job hub for high tech workers, the population has grown by over 600k since 2010 according to a report by KQED news.[1] There are lots of challenges for thousands of newcomers when they first arrived, and one of the most important and frequent questions is where should I live? Within a commute of 1-2 hours, there are lots of cities one can choose from. In this project, I want to use publicly available geographic data to help answer this question with the tool I have learned during this online course.
## 1.2 Data source
First, I need to get a list of cities in Bay Area from wiki page [2]. There are around ~100 cities, I filter out 25 most populated cities as candidate cities based on area size and population per square miles. Then, I will select a list of criteria to use for ranking each city. this includes schools, housing costs, neighborhoods, crime rates, etc. Some of the data will come from Foursquare website as used in previous course projects, others will come different online resources.




## Reference:
1. "MAP: The Bay Area Leads California in Population Growth": https://www.kqed.org/news/11741275/map-the-bay-area-leads-california-in-population-growth
2. "List of cities and towns in the San Francisco Bay Area" : "https://en.wikipedia.org/wiki/List_of_cities_and_towns_in_the_San_Francisco_Bay_Area"

# Step 1: Select candidate cities in Bayarea
Requirements: 

*	To limit the scope, only upper half of cities with bigger area are considered. This means we sorted the ~100 cities by area size, and discard smaller cities.
*	Then I pick top 25 most populated cities from remaining list, and use that as candidate cities.


In [91]:
import random # library for random number generation
import numpy as np # library for vectorized computation
import pandas as pd # library to process data as dataframes

## 1.1 Scrape from wikipedia using pandas 'read_html'


In [92]:
wiki_url = "https://en.wikipedia.org/wiki/List_of_cities_and_towns_in_the_San_Francisco_Bay_Area"
wiki_data = pd.read_html(wiki_url)[1]
print(f"{wiki_data.shape}")
display(wiki_data)

wiki_data = wiki_data.droplevel(0, axis=1)
wiki_data.rename(columns={'Population (2010)[8][9]': 'population'}, inplace=True)
wiki_data.keys()
display(wiki_data['population'])
wiki_data['population_per_sq_mi'] = wiki_data['population'] / wiki_data['sq mi']
wiki_data.head()
threshold_area = wiki_data['sq mi'].describe()['50%']
print(threshold_area)
target_cities = wiki_data
target_cities = target_cities[target_cities['sq mi'] >= threshold_area]

target_cities = target_cities.sort_values(by='population_per_sq_mi', ascending=False).reset_index(drop=True)[:25]
display(target_cities)

(101, 7)


Unnamed: 0_level_0,Name,Type,County,Population (2010)[8][9],Land area[8],Land area[8],Incorporated[7]
Unnamed: 0_level_1,Name,Type,County,Population (2010)[8][9],sq mi,km2,Incorporated[7]
0,Alameda,City,Alameda,73812,10.61,27.5,"April 19, 1854"
1,Albany,City,Alameda,18539,1.79,4.6,"September 22, 1908"
2,American Canyon,City,Napa,19454,4.84,12.5,"January 1, 1992"
3,Antioch,City,Contra Costa,102372,28.35,73.4,"February 6, 1872"
4,Atherton,Town,San Mateo,6914,5.02,13.0,"September 12, 1923"
...,...,...,...,...,...,...,...
96,Vallejo,City,Solano,115942,30.67,79.4,"March 30, 1868"
97,Walnut Creek,City,Contra Costa,64173,19.76,51.2,"October 21, 1914"
98,Windsor,Town,Sonoma,26801,7.27,18.8,"July 1, 1992"
99,Woodside,Town,San Mateo,5287,11.73,30.4,"November 16, 1956"


0       73812
1       18539
2       19454
3      102372
4        6914
        ...  
96     115942
97      64173
98      26801
99       5287
100      2933
Name: population, Length: 101, dtype: int64

9.14


Unnamed: 0,Name,Type,County,population,sq mi,km2,Incorporated[7],population_per_sq_mi
0,San Francisco,City and county,San Francisco,805235,46.87,121.4,"April 16, 1850[10]",17180.179219
1,Berkeley,City,Alameda,112580,10.47,27.1,"April 4, 1878",10752.626552
2,San Mateo,City,San Mateo,97207,12.13,31.4,"September 4, 1894",8013.767519
3,Oakland,City,Alameda,390724,55.79,144.5,"May 4, 1852",7003.477326
4,South San Francisco,City,San Mateo,63632,9.14,23.7,"September 19, 1908",6961.925602
5,Alameda,City,Alameda,73812,10.61,27.5,"April 19, 1854",6956.833176
6,Sunnyvale,City,Santa Clara,140081,21.99,57.0,"December 24, 1912",6370.213734
7,San Leandro,City,Alameda,84950,13.34,34.6,"March 21, 1872",6368.065967
8,Santa Clara,City,Santa Clara,116468,18.41,47.7,"July 5, 1852",6326.344378
9,Mountain View,City,Santa Clara,74066,12.0,31.1,"November 7, 1902",6172.166667
