# Shanghai Neighborhood Clustering

This notebook explores and clusters the neighborhoods in Shanghai in order to find a suitable community for a new expat in town. 

Neighborhood here is defined as within 400 meter/ 5 minute walking distance of a metro station- the main form of transport between home and office.

There are 3 parts to this notebook:
1. Importing and cleaning the data
2. Calling Foursquare API to find each neighborhood's characteristics (by most common venue)
3. Using k-means to cluster the neighborhoods & analysis

In [4]:
# before we begin, import libraries

# library to handle data in a vectorized manner
import numpy as np 

 # library for data analsysis
import pandas as pd
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

# library to handle JSON files
import json 

# library for convert an address into latitude and longitude values
!pip install geopy 
from geopy.geocoders import Nominatim 

# library to handle requests
import requests 

# tranform JSON file into a pandas dataframe
from pandas.io.json import json_normalize 

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

!pip install folium==0.5.0 
import folium # map rendering library

print('Libraries imported.')

Libraries imported.


## Part 1: Importing & cleaning data 

### 1.1 importing data

In [5]:
# this 2016 data is available from https://blog.csdn.net/a364572/article/details/50483568
# I've since updated with 2020 data from https://en.wikipedia.org/wiki/List_of_Shanghai_Metro_stations

shmetro = pd.read_csv('ShanghaiMRTLatLng2020.csv')
shmetro.head()


Unnamed: 0,English,Chinese (S),Location,Transfers[10],Line,Latitude,Longitude,Unnamed: 7
0,Xinzhuang,莘庄,Minhang,5 Jinshan Xinzhuang,1,31.111093,121.385454,
1,Waihuanlu,外环路,Minhang,,1,31.120916,121.393003,
2,Lianhua Road,莲花路,Minhang,,1,31.130957,121.402919,
3,Jinjiang Park,锦江乐园,Xuhui,,1,31.142197,121.414146,
4,Shanghai South Railway Station,上海南站,Xuhui,3 Jinshan [a] Shanghainan[b],1,31.154688,121.430136,


### 1.2. Clean up the data

In [6]:
# 1.2.1. First we'll extract the needed columns

shmetro.drop(columns=['Transfers[10]','Line','Unnamed: 7'],inplace=True)
shmetro.head()


Unnamed: 0,English,Chinese (S),Location,Latitude,Longitude
0,Xinzhuang,莘庄,Minhang,31.111093,121.385454
1,Waihuanlu,外环路,Minhang,31.120916,121.393003
2,Lianhua Road,莲花路,Minhang,31.130957,121.402919
3,Jinjiang Park,锦江乐园,Xuhui,31.142197,121.414146
4,Shanghai South Railway Station,上海南站,Xuhui,31.154688,121.430136


In [7]:
# 1.2.2. Rename the columns

# let's see the column name format
print('before',shmetro.columns)

# we can rename some to make them more clear and get rid of spaces
shmetro.rename(columns={
    'English':'StationName_English',
    'Chinese (S)':'StationName_Chinese',
    ' Latitude ':'Latitude',
    ' Longitude ':'Longitude'
    },inplace=True)

# let's take a look
print('after',shmetro.columns)
shmetro.head()

before Index(['English', 'Chinese (S)', 'Location', ' Latitude ', ' Longitude '], dtype='object')
after Index(['StationName_English', 'StationName_Chinese', 'Location', 'Latitude',
       'Longitude'],
      dtype='object')


Unnamed: 0,StationName_English,StationName_Chinese,Location,Latitude,Longitude
0,Xinzhuang,莘庄,Minhang,31.111093,121.385454
1,Waihuanlu,外环路,Minhang,31.120916,121.393003
2,Lianhua Road,莲花路,Minhang,31.130957,121.402919
3,Jinjiang Park,锦江乐园,Xuhui,31.142197,121.414146
4,Shanghai South Railway Station,上海南站,Xuhui,31.154688,121.430136


In [8]:
# 1.2.3. check for null values in df

print('before',shmetro.shape)

null = pd.isnull(shmetro['Latitude'])
print(shmetro[null])
# empty df means no null values

before (423, 5)
Empty DataFrame
Columns: [StationName_English, StationName_Chinese, Location, Latitude, Longitude]
Index: []


In [9]:
# 1.2.4. we can move on to duplicates, now there are many duplicated station names
# Reason being one station might be an interchange of several metro lines 
# we will drop all the duplicated stations and keep only the first occurence 
# because they are unlikely to be too far away from each other

shmetro.drop_duplicates(subset=['StationName_Chinese'],keep='first',inplace=True)


In [10]:
# we'll double check shmetro data's final shape

print('There are',shmetro.shape[0],'stations:')
shmetro.head()

There are 345 stations:


Unnamed: 0,StationName_English,StationName_Chinese,Location,Latitude,Longitude
0,Xinzhuang,莘庄,Minhang,31.111093,121.385454
1,Waihuanlu,外环路,Minhang,31.120916,121.393003
2,Lianhua Road,莲花路,Minhang,31.130957,121.402919
3,Jinjiang Park,锦江乐园,Xuhui,31.142197,121.414146
4,Shanghai South Railway Station,上海南站,Xuhui,31.154688,121.430136


This project was made by Jane Goh 31 Oct 2020 as part of the Coursera IBM Data Science Professional Capstone Project

## Part 2: Use Foursquare API to obtain Shanghai neighborhoods' characteristics

### 2.1 Visualize Shanghai neighborhoods on map

In [11]:
# let's take a look at Shanghai's map
# Prep1. let's get Shanghai's latitude and longitude with geocode

address = 'Shanghai'

geolocator = Nominatim(user_agent="foursquare_agent")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print(latitude, longitude)


31.2322758 121.4692071


In [12]:
# Prep2. create map of Shanghai using latitude and longitude values
map_shanghai = folium.Map(location= [latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, label in zip(shmetro['Latitude'], shmetro['Longitude'], shmetro['StationName_English']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_shanghai)  
    
map_shanghai

### 2.2 Foursquare data

In [17]:
# We are working with Foursquare API to get the neighborhoods details
# below are the parameters to be passed
CLIENT_ID ='QKRAFQVOC3KO42L3WP0JKMBXRHYKDKOR0Y42AK0IG4DGZ3VY'
CLIENT_SECRET ='LOUTQW24A5AGBLKHNANPWHTKZ2Y5TQ0K2JWLQXMH5J1EUFGY'
VERSION = '20180605' # Foursquare API version
LIMIT = 100 # A default Foursquare API limit value

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: QKRAFQVOC3KO42L3WP0JKMBXRHYKDKOR0Y42AK0IG4DGZ3VY
CLIENT_SECRET:LOUTQW24A5AGBLKHNANPWHTKZ2Y5TQ0K2JWLQXMH5J1EUFGY


In [14]:
# check to see the data we are working with is correct
shmetro.loc[0, 'StationName_English']

'Xinzhuang'

In [15]:
# 2.2.1. define new function to get nearby venues from Foursquare API based on 
# stationName, latitude, longitude and within radius of 400m (5-min walking distance)

def getNearbyVenues(names, latitudes, longitudes, radius=400):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['StationName_English', 
                  'Station Latitude', 
                  'Station Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [18]:
# 2.2.2. get the Shanghai neighborhood venues from Fourquare 

shanghai_venues = getNearbyVenues(names=shmetro['StationName_English'],
                                   latitudes=shmetro['Latitude'],
                                   longitudes=shmetro['Longitude']
                                  )

Xinzhuang
Waihuanlu
Lianhua Road
Jinjiang Park
Shanghai South Railway Station
Caobao Road
Shanghai Indoor Stadium
Xujiahui
Hengshan Road
Changshu Road
South Shaanxi Road
South Huangpi Road
People's Square
Xinzha Road
Hanzhong Road
Shanghai Railway Station
North Zhongshan Road
Yanchang Road
Shanghai Circus World
Wenshui Road
Pengpu Xincun
Gongkang Road
Tonghe Xincun
Hulan Road
Gongfu Xincun
Bao'an Highway
West Youyi Road
Fujin Road
East Xujing
Hongqiao Railway Station
Hongqiao Airport Terminal 2
Songhong Road
Beixinjing
Weining Road
Loushanguan Road
Zhongshan Park
Jiangsu Road
Jing'an Temple
West Nanjing Road
East Nanjing Road
Lujiazui
Dongchang Road
Century Avenue
Shanghai Science and Technology Museum
Century Park
Longyang Road
Zhangjiang Hi-Tech Park
Jinke Road
Guanglan Road
Tangzhen
Middle Chuangxin Road
East Huaxia Road
Chuansha
Lingkong Road
Yuandong Avenue
Haitiansan Road
Pudong International Airport
Shilong Road
Longcao Road
Caoxi Road
Yishan Road
Hongqiao Road
West Yan'an Road


In [None]:
# 2.2.3. check the results df
print('We have obtained',shanghai_venues.shape[0],'venues from Foursquare API:')
shanghai_venues.head()

In [None]:
# 2.2.4. Tabulate the venues results
print(shanghai_venues.groupby('StationName_English').count().shape)
shanghai_venues.groupby('StationName_English').count()

In [None]:
# It seems there are lots of neighborhoods with few results, that's not meaningful when we do clustering
# let's remove the neighborhoods with insufficient number of results, say less than 30
# 2.2.5. find neighborhoods with < 30 venues

L30 = pd.DataFrame(shanghai_venues.groupby('StationName_English').count())
dropV = L30[L30['Venue'] <= 30].index
dropV


In [None]:
# 2.2.6. drop the neighborhoods with < 30 venues from the shanghai_venues df

shanghai_venues = shanghai_venues[~shanghai_venues.StationName_English.isin(dropV)]
print('We just went down to',shanghai_venues.shape[0],'venues from the 1942 venues we started with')

In [None]:
# 2.2.7. check the shanghai_venues df again, the shape has decreased

print('We are now left with',shanghai_venues.groupby('StationName_English').count().shape[0],'areas to analyze')
shanghai_venues.groupby('StationName_English').count()

In [None]:
# 2.2.8. Now we will do one-hot encoding of the Shanghai venues categories for each venue

# one hot encoding
shanghai_onehot = pd.get_dummies(shanghai_venues[['Venue Category']], prefix="", prefix_sep="")

# add StationName column back to dataframe
shanghai_onehot['StationName_English'] = shanghai_venues['StationName_English'] 

# move StationName column to the first column
fixed_columns = [shanghai_onehot.columns[-1]] + list(shanghai_onehot.columns[:-1])
shanghai_onehot = shanghai_onehot[fixed_columns]

shanghai_onehot.head()

In [None]:
# double check the rows is the same as shanghai_venues, no data missing
shanghai_onehot.shape

In [None]:
# 2.2.9. group the StationName together and find each StationName's average # of venues per category 
shanghai_grouped = shanghai_onehot.groupby('StationName_English').mean().reset_index()
shanghai_grouped

In [None]:
# checking the number of neighborhoods and venues categories are same
shanghai_grouped.shape

In [None]:
# 2.2.10. define new function to return the most common venue categories in each neighborhood
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [None]:
# 2.2.11. let's find each neighborhood's top 10 venues

num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['StationName_English']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
station_venues_sorted = pd.DataFrame(columns=columns)
station_venues_sorted['StationName_English'] = shanghai_grouped['StationName_English']

for ind in np.arange(shanghai_grouped.shape[0]):
    station_venues_sorted.iloc[ind, 1:] = return_most_common_venues(shanghai_grouped.iloc[ind, :], num_top_venues)

station_venues_sorted.tail()

This project was made by Jane Goh 31 Oct 2020 as part of the Coursera IBM Data Science Professional Capstone Project

## 3. Clustering & Analysis

### 3.1 Clustering

In [3]:
# Now we are ready to do the clustering and analysis of neighborhoods
# 3.1.1. run the clustering algorithm

# set number of clusters
kclusters = 5

# dropping the 'StationName_English' column cause we don't need it for the clustering algorithm
shanghai_grouped_clustering = shanghai_grouped.drop('StationName_English', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(shanghai_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 
kmeans.inertia_


NameError: name 'shanghai_grouped' is not defined

In [613]:
# 3.1.2. Making a new df shanghai_merged from the original shmetro df BUT!
# Remember we removed the stations with < 30 venue results in (3.1.5)? 
# We also gotta do that to the shmetro df - drop stations < 30 venue results 

shanghai_merged = shmetro[~shmetro.StationName_English.isin(dropV)]
shanghai_merged.shape
# this is not the same number of rows as what we ran in kmeans (10), we'll drop the extra later

(77, 5)

In [661]:
# 3.1.3. merge the Shanghai neighbourhood data with cluter label from (3.2.1) and most common venues data from (3.1.11)

# add clustering labels
station_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

# merge shanghai_grouped with shanghai_data to add latitude/longitude for each neighborhood
shanghai_merged = shanghai_merged.join(station_venues_sorted.set_index('StationName_English'), on='StationName_English')

# drop all the extra rows with NaN
shanghai_merged.dropna(axis=0,inplace=True)

In [659]:
# 3.1.4. fixing the cluster columns so it's cast as int instead of float64
shanghai_merged['Cluster Labels']=shanghai_merged['Cluster Labels'].astype(int)
print(shanghai_merged.dtypes)

StationName_English        object
StationName_Chinese        object
Location                   object
Latitude                  float64
Longitude                 float64
Cluster Labels              int32
1st Most Common Venue      object
2nd Most Common Venue      object
3rd Most Common Venue      object
4th Most Common Venue      object
5th Most Common Venue      object
6th Most Common Venue      object
7th Most Common Venue      object
8th Most Common Venue      object
9th Most Common Venue      object
10th Most Common Venue     object
dtype: object


In [662]:
# 3.1.5 visualize the clusters with map

# create map
map_clusters = folium.Map(location= [latitude,longitude], zoom_start=12)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.jet(np.linspace(0, 1, len(ys)))
jet = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(shanghai_merged['Latitude'], shanghai_merged['Longitude'], shanghai_merged['StationName_English'], shanghai_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=jet[cluster-1],
        fill=True,
        fill_color=jet[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

### 3.2 Clusters Analysis

In [652]:
# cluster 1: Tourist area/ Downtown
shanghai_merged.loc[shanghai_merged['Cluster Labels'] == 0, shanghai_merged.columns[[0] + list(range(5, shanghai_merged.shape[1]))]]

Unnamed: 0,StationName_English,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
12,People's Square,0,Chinese Restaurant,Hotel,Noodle House,Coffee Shop,Sandwich Place,Bookstore,Shanghai Restaurant,Karaoke Bar,Korean Restaurant,Fast Food Restaurant


In [668]:
# cluster 2: Higher-end residential cluster
shanghai_merged.loc[shanghai_merged['Cluster Labels'] == 1, shanghai_merged.columns[[0] + list(range(5, shanghai_merged.shape[1]))]]

Unnamed: 0,StationName_English,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
13,Xinzha Road,1,Fast Food Restaurant,Hotel,Chinese Restaurant,Coffee Shop,Dumpling Restaurant,Lounge,Bakery,Bed & Breakfast,Café,Candy Store
246,Shangcheng Road,1,Coffee Shop,Hotel,Szechuan Restaurant,Japanese Restaurant,Fast Food Restaurant,Hotpot Restaurant,Pizza Place,Athletics & Sports,Kushikatsu Restaurant,Clothing Store


In [667]:
# cluster 3: Central Business District/ Office/ High-rise residential cluster
shanghai_merged.loc[shanghai_merged['Cluster Labels'] == 2, shanghai_merged.columns[[0] + list(range(5, shanghai_merged.shape[1]))]]

Unnamed: 0,StationName_English,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
7,Xujiahui,2,Coffee Shop,Clothing Store,Chinese Restaurant,Sandwich Place,Burger Joint,Pizza Place,Shopping Mall,Fast Food Restaurant,Supermarket,Shanghai Restaurant
41,Lujiazui,2,Coffee Shop,Hotel Bar,Scenic Lookout,Hotel,Chinese Restaurant,Japanese Restaurant,Italian Restaurant,Convenience Store,Dumpling Restaurant,Electronics Store
192,Huamu Road,2,Coffee Shop,Cantonese Restaurant,Burger Joint,Pizza Place,Fast Food Restaurant,Clothing Store,Shanghai Restaurant,Sandwich Place,Restaurant,Noodle House


In [655]:
# cluster 4: The Entertainment/ TikTok Influencers' Hangout cluster
shanghai_merged.loc[shanghai_merged['Cluster Labels'] == 3, shanghai_merged.columns[[0] + list(range(5, shanghai_merged.shape[1]))]]

Unnamed: 0,StationName_English,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
11,South Huangpi Road,3,Hotel,Café,Chinese Restaurant,New American Restaurant,Cocktail Bar,Coffee Shop,Park,Ice Cream Shop,Taiwanese Restaurant,Shopping Mall
37,Jing'an Temple,3,Japanese Restaurant,Cocktail Bar,Coffee Shop,Shanghai Restaurant,Burger Joint,Gym,Cantonese Restaurant,Food Court,Café,Lounge


In [656]:
# cluster 5: Sophisticated Hangout Cluster
shanghai_merged.loc[shanghai_merged['Cluster Labels'] == 4, shanghai_merged.columns[[0] + list(range(5, shanghai_merged.shape[1]))]]

Unnamed: 0,StationName_English,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
40,East Nanjing Road,4,French Restaurant,Chinese Restaurant,Hotel,Italian Restaurant,Lounge,Shopping Mall,Seafood Restaurant,Restaurant,Deli / Bodega,Jazz Club
271,Shanghai Library,4,Bar,Restaurant,Art Gallery,Bistro,Cocktail Bar,Turkish Restaurant,Nightclub,Hotel,Huaiyang Restaurant,Pizza Place


This is the analysis of Shanghai Neighbourhood Clustering Project:

Upon examining the clusters below, an expat can infer a characteristics of different areas in town:
Cluster 1: People's square station is filled with variety of Asian cuisine restaurants, hotels, and karoeke bar- indicating that this might be a very bustling place, suitable for someone who prefers a busy environment with easy access to food and entertainment

Cluster 2: Xinzha Road and Shangcheng Road stations are concentrated in largely Asian/some Western restaurants and cafes, hotels and b&b's, and shopping (sports & clothes)- an indication that it could be an area that is convenient for tourist access

Cluster 3: Xujiahui, Lujiazhui and Huamu Road stations are first and foremost about coffee shops, followed by a mix of Asian/Western restaurants, and access to shopping malls/ convenience stores/ supermarket, these are indications of the convenient city lifestyle

Cluster 4: South Huangpi Road and Jian'an Temple stations are the first clusters where we see gym/park making it to the top 10, with Asian/Western restaurants and cocktail bars, this cluster looks like a good base for work/life balance 

Cluster 5: East Nanjing Road and Shanghai Library stations are defined by its lounge/bistro/deli, largely Western restaurants. And the art gallery and jazz club? This cluster looks like one fine lifestyle neighborhood.

### 3.3 Improvements

This project could be further improved with the following:
In part 1: Introduce a web crawler (to verify the lat/lng coordinates in csv file) and more comprehensive data cleaning codes. The handling of shmetro df is flawed as I'm still learning how to do pandas df properly. 
In part 1, 2: Can include other potential data sources such as average property price, surround building types, and residential demographic to make neighborhood characteristics identification even more meaningful. 
In part 3: Instead of K-Means, maybe can explore DBSCAN density based clustering. 

This project was made with the IBM Coursera instructors' teachings, python documentation, and lots of help from those who shared their findings/discussions on the StackExchange forums. Appreciate your constructive feedback!

# About

This project was made by Jane Goh 31 Oct 2020 as part of the Coursera IBM Data Science Professional Capstone Project

I'm a UW '12 graduate working in manufacturing general management in Asia. At work, I have led the managers on a mini analytics evolution: they have went from making Excel numbers reports in 2014 to making Powerpoint presentations of their own analysis/plans by 2018. The managers have learned how to read, analyze, interpret SAP-Crystal Reports data (and lots of Excel pivot tables), and overtime, come to their own conclusions and recommend action plans. Now, the managers are able to fully understand the company's strategic direction and communicate to their teams.  It is a major cultural change for this local large company. The big change has also brought unprecedented sales growth for the manufacturing division, elevating its cashflow generating capabilities and its importance in the group. 

However, My career/personal growth have plateaued since 2019 as my skills/knowledge could no longer keep up with the analytical work that comes with higher level company strategy work, such as monitoring & analyzing macro economic trends that affect raw material prices, more effective product pricing strategy, better customer clustering methods, etc. In the first half of 2020, I started self-learning operational efficiency and improvement. The pathway is suitable if I were more manufacturing/operational-based, however I am more office-based and unable to apply the learnings directly. In September 2020, I came across this data science course on Coursera. Within 1.5 months of daily learning, I can now understand some of the lingo and tools' usage. Now brewing up ideas/ problems to tackle as I continue on this long learning journey of programming, math, stats. 