# A Guide to Living in Shanghai
### Coursera Applied Data Science Specialization "The Battle of Neighborhoods" Capstone Project Notebook

#### Part 2: Data

The project uses two datasets, namely shanghai_demo and shanghai_data.

shanghai_demo contains demographic information of all 16 districts (区) in Shanghai. It has 6 variables: District, Population Density, Salary, Home Price, GDP and GDP Per Capita.

In shanghai_demo dataset, District is a character column. It lists Shanghai's 16 districts. Population Density (person/square kilometer) is density of population in 2017 (data source: Shanghai Statistical Yearbook 2018). Salary (RMB/month) stands for personal monthly salary in 2019 (data source: Sohu). Home Price (RMB/square meter) is from the same source (Sohu). GDP (billion RMB) column has district level GDP in 2018 (data source: Sohu). Finally, GDP Per Capita (RMB) (year 2017 data) is sourced from 好金贵财经.

Note that all data are from different sources and are in different years. This is because it is extremely hard to find one single source that provides related district-level data. This is the most severe fallacy of this project, but this would not affect the results too greatly given that the data would not change too much within 1 to 2 years if the accuracy of data is fully guaranteed.


In [32]:
# Import library
import pandas as pd
import numpy as np

In [3]:
# Read shanghai_demo dataset
shanghai_demo=pd.read_csv("https://raw.githubusercontent.com/ctzhou86/Coursera-Applied-Data-Science-Specialization/master/Applied%20Data%20Science%20Capstone/Week%204%20and%205/Shanghai%20District.csv")

# Print all rows of shanghai_demo
shanghai_demo

Unnamed: 0,District,Population Density,Salary,Home Price,GDP,GDP PP
0,Pudong,4567,8170,48713,1046.009,175448
1,Huangpu,32004,7160,81375,227.03,320701
2,Xuhui,19874,7640,71064,167.0,144983
3,Changning,18112,8030,68491,142.8,191305
4,Jing'an,28910,8380,66228,184.7,159550
5,Putuo,23431,7720,55738,100.17,72796
6,Hongkou,34058,7970,58927,83.801,96955
7,Yangpu,21627,7220,59443,184.77,130074
8,Minhang,6836,8030,47381,201.36,88089
9,Baoshan,7494,7910,38860,139.206,56506




shanghai_data lists prominent neighborhoods in Shanghai (in both English and Chinese) as well as the districts they belong to. It has three variables: District, Neighborhood and Neighborhood Chinese Name.

It is necessary to highlight that there is no such concept as "neighborhood" (社区) in Shanghai. "Neighborhood" is essentially a "western" concept and is not used to indicate the same thing in China. In the country, a "neighborhood" is more like a "residential community" (小区) that only has residential buildings rather than a large area that has shopping malls, stores, restaurants and attractions (and of course resdential communities). An equivalent concept, in Shanghai, is in fact "subdistrict" (街道).

"Subdistrict," however, is still not entirely the same as "neighborhood" in the western world. For instance, Wujiaochang (五角场) in Yangpu District is essentially a subdistrict ("五角场街道") in Shanghai's township-level divisions hierarchy and can be treated as a neighborhood to an extent. In contrast, the famous Xintiandi (新天地), an area full of delicacy, art, decent food and fashion in Huangpu District, is not a subdistrict but can be thought as a neighborhood.

Nevertheless, the report adopts the western convention and focuses on a total of 47 neighborhoods (subdistricts/towns) across 16 districts in Shanghai. The author built shanghai_data based on his own discretion.


In [15]:
import pandas as pd
shanghai_data = pd.read_excel ("https://github.com/kanav58/Capstone-Week-4-5/blob/master/Shanghai%20Neighborhood.xlsx?raw=true")
shanghai_data

Unnamed: 0,District,Neighborhood,Neighborhood Chinese Name
0,Pudong,Lujiazui,陆家嘴
1,Pudong,Century Park,世纪公园
2,Pudong,Zhoujiadu,周家渡
3,Pudong,Zhangjiang,张江
4,Huangpu,People's Square,人民广场
5,Huangpu,Huaihai Road,淮海路
6,Huangpu,The Bund,外滩
7,Huangpu,Former French Concession,旧法租界
8,Huangpu,Xintiandi,新天地
9,Huangpu,Dapuqiao,打浦桥


In [24]:
import geopy
from geopy.geocoders import Nominatim

In [83]:
geolocator=Nominatim(user_agent="shanghai_explorer",format_string="%s, Shanghai")

In [26]:
# Attach string ', Shanghai' to Neighborhood column
shanghai_data['Neighborhood']=shanghai_data['Neighborhood']+', Shanghai'
# This is for the location precesion concern

In [82]:
def get_latlng(neighborhood):
    # initialize your variable to None
    lat_lng_coords = None
    # loop until you get the coordinates
    while(lat_lng_coords is None):
        g = geocoder.arcgis('{}, Kuala Lumpur, Malaysia'.format(neighborhood))
        lat_lng_coords = g.latlng
    return lat_lng_coords

'1.18.1'

In [87]:
# Get coordinates
shanghai_data['Coordinates'],(longitude,latitude)=geolocator.geocode(shanghai_data['Neighborhood'])

TypeError: 'NoneType' object is not iterable

In [79]:
import geocoder
g = geocoder.google(shanghai_data["Neighborhood"])
g.latlng
print (g.latlang)

None


In [74]:
shanghai_data['Coordinates'] = geolocator.geocode(shanghai_data['Neighborhood'])
shanghai_data['Coordinates'] = shanghai_data['Coordinates'].apply(lambda x: (x.latitude, x.longitude))


AttributeError: 'NoneType' object has no attribute 'latitude'

In [71]:


shanghai_data.head(10)



Unnamed: 0,District,Neighborhood,Neighborhood Chinese Name,Coordinates
0,Pudong,"Lujiazui, Shanghai, Shanghai, Shanghai, Shangh...",陆家嘴,
1,Pudong,"Century Park, Shanghai, Shanghai, Shanghai, Sh...",世纪公园,
2,Pudong,"Zhoujiadu, Shanghai, Shanghai, Shanghai, Shang...",周家渡,
3,Pudong,"Zhangjiang, Shanghai, Shanghai, Shanghai, Shan...",张江,
4,Huangpu,"People's Square, Shanghai, Shanghai, Shanghai,...",人民广场,
5,Huangpu,"Huaihai Road, Shanghai, Shanghai, Shanghai, Sh...",淮海路,
6,Huangpu,"The Bund, Shanghai, Shanghai, Shanghai, Shangh...",外滩,
7,Huangpu,"Former French Concession, Shanghai, Shanghai, ...",旧法租界,
8,Huangpu,"Xintiandi, Shanghai, Shanghai, Shanghai, Shang...",新天地,
9,Huangpu,"Dapuqiao, Shanghai, Shanghai, Shanghai, Shangh...",打浦桥,


In [23]:


# Attach string ', Shanghai' to Neighborhood column
shanghai_data['Neighborhood']=shanghai_data['Neighborhood']+', Shanghai'
# This is for the location precesion concern

# Get coordinates
shanghai_data['Coordinates']=shanghai_data['Neighborhood'].apply(geolocator.geocode).apply(lambda x: (x.latitude, x.longitude))

# Seperate Coordinates column into latitude and longitude columns
shanghai_data[['Latitude','Longitude']]=shanghai_data['Coordinates'].apply(pd.Series)

# Drop Coordinates column
shanghai_data.drop(['Coordinates'],axis=1,inplace=True)

GeocoderUnavailable: Service not available