# Project WSL: Why So Long?

## __Table of Contents for :__
* [Section 1: Problem Introduction and Datasets](#part1)
* [Section 2: Data Pre-processing](#part2)
    * [Pre-processing Steps](#part21)
    * [Map Matching](#part22)
* [Section 3: Clean Data, now the analysis](#part3)
    * [Aspect 1 of delay: Non-optimal routing](#part31)
    * [Aspect 2 of delay: Speed bottlenecks](#part32)
* [Section 4: Proposed alleviations](#part4)
    * [Can we route better?](#part41)
    * [Can we remove the bottlenecks?](#part42)

***
<a id='part1'></a>
# Section 1: Problem Introduction and Datasets

In a fast-paced society, we are especially concerned when things do not go as efficiently as expected. Now, equipped with some data analytic tools, we want to progress from complaining each time our Grab ride or food order arrives late, to actually understanding the **why** and **how** can we do better.

<p> This notebook addresses 3 key sections, (i) data pre-processing, where we have read the paper to understand the dataset better and perform relevant data pre-processing steps, (ii) do some detective work and find out what could possibly lead to delays, (iii) some innovative ideations we have come up with to address the reasons found in Part (ii). But first, some Python dependencies. </p>

### Python dependencies

In [1]:
from preprocess import *
import pandas as pd 
import geopandas as gpd
import osmnx
import folium
import matplotlib.pyplot as plt
import seaborn as sns
import json
import sys
from bs4 import BeautifulSoup
from tqdm import tqdm
from datetime import datetime, timedelta, timezone
import numpy as np

from ipyleaflet import (Map, GeoData, basemaps, WidgetControl, GeoJSON,
                        LayersControl, Icon, Marker,basemap_to_tiles, Choropleth,
                        MarkerCluster, Heatmap,SearchControl, 
                        FullScreenControl)
from ipywidgets import Text, HTML

### Relevant Datasets:
1. External Datasets we found useful

In [2]:
# We have secured 2 external datasets:

# sg_areas represent the towns/boundaries for Singapore, retrieved from data@gov
sg_areas = gpd.read_file('data/master-plan-2019-subzone-boundary-no-sea-geojson.geojson')

# jkt_areas represent some places of interest retrieved from One Jakarta, along with the help of some local expertise (Indo team-mate)
jkt_areas = gpd.read_file("data/jakarta_poi.geojson")

A glimpse of what `sg_areas` look like:

In [3]:
subzone_name = []
plan_name = []
region_name = []

for location in sg_areas['Description']:
    temp_soup = BeautifulSoup(location)
    temp_tr = temp_soup.find_all('tr')

    subzone_name.append(temp_tr[2].td.text.strip())
    plan_name.append(temp_tr[5].td.text.strip())
    region_name.append(temp_tr[7].td.text.strip())
    
sg_areas["subzone"] = subzone_name
sg_areas["plan_name"] = plan_name
sg_areas["region_name"] = region_name

In [4]:
sg_areas.head()

Unnamed: 0,Name,Description,geometry,subzone,plan_name,region_name
0,kml_1,<center><table><tr><th colspan='2' align='cent...,"POLYGON Z ((103.88025 1.28386 0.00000, 103.880...",MARINA EAST,MARINA EAST,CENTRAL REGION
1,kml_2,<center><table><tr><th colspan='2' align='cent...,"POLYGON Z ((103.83764 1.29560 0.00000, 103.837...",INSTITUTION HILL,RIVER VALLEY,CENTRAL REGION
2,kml_3,<center><table><tr><th colspan='2' align='cent...,"POLYGON Z ((103.83410 1.29248 0.00000, 103.834...",ROBERTSON QUAY,SINGAPORE RIVER,CENTRAL REGION
3,kml_4,<center><table><tr><th colspan='2' align='cent...,"MULTIPOLYGON Z (((103.71253 1.29163 0.00000, 1...",JURONG ISLAND AND BUKOM,WESTERN ISLANDS,WEST REGION
4,kml_5,<center><table><tr><th colspan='2' align='cent...,"POLYGON Z ((103.84718 1.29700 0.00000, 103.847...",FORT CANNING,MUSEUM,CENTRAL REGION


We also can examine our `jkt_areas` dataset:

In [5]:
jkt_areas.head()

Unnamed: 0,name,lat,lon,geometry
0,Mal Kelapa Gading,-6.17705,106.902802,"POLYGON ((106.90460 -6.17705, 106.90459 -6.177..."
1,Grand Indonesia,-6.195003,106.819786,"POLYGON ((106.82159 -6.19500, 106.82158 -6.195..."
2,Plaza Indonesia,-6.193793,106.822109,"POLYGON ((106.82391 -6.19379, 106.82390 -6.193..."
3,PIK Avenue,-6.108566,106.740648,"POLYGON ((106.74245 -6.10857, 106.74244 -6.108..."
4,Soetta Airport,-6.126573,106.654152,"POLYGON ((106.65595 -6.12657, 106.65594 -6.126..."


2. The provided dataset from Grab <br>
For context, we have processed all the paraquet files using `glob` and decided to work with the `Feather` format for efficiency and known portability. We will now proceed to explain our data-preprocessing pipeline in the next section.

In [7]:
sg_df_raw = pd.read_feather('data/sgp.ftr')

***
<a id='part2'></a>
# Section 2: Data Pre-Processing
<a id='part21'></a>
### Pre-processing Steps

Terry to help put some diagram here and explain keysteps. What I can recall is:
1. Convert unix time to timestamp (done on ftr), IMPORTANT to talk about timezones, careful treatment of data.
2. Create day, month, year, day of week columns (done on ftr)
3. Convert individual columns to either numeric or category (Cannot rmb the different types for memory optimization.
4. Filter out negative speed. Zero speed still investigating, done with FAQ explanation.
5. Bearing 180 IOS -> 0 speed, this is explained in the paper.

### Preprocessing for Singapore's dataframe:

In [8]:
sg_df = preprocess(sg_df_raw)
sg_df.to_feather('data/processed_sgp.ftr')

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['speed'] = df['bearing'].apply(make_zero)
100%|█████████████████████████████████████████| 5855/5855 [01:09<00:00, 84.01it/s]
100%|███████████████████████████████| 6028377/6028377 [00:11<00:00, 530411.01it/s]
100%|███████████████████████████████| 6028377/6028377 [00:19<00:00, 313136.03it/s]
100%|███████████████████████████████| 6028377/6028377 [00:24<00:00, 248376.64it/s]
100%|███████████████████████████████| 6028377/6028377 [00:21<00:00, 285205.36it/s]
100%|███████████████████████████████| 6028377/6028377 [00:20<00:00, 297246.63it/s]
100%|█████████████████████████████| 29654259/29654259 [00:55<00:00, 531090.15it/s]
100%|█████████████████████████████| 29654259/29654259 [01:38<00:00, 302309.66it/s]


In [9]:
jkt_df_raw = pd.read_feather('data/jkt.ftr')
jkt_df = preprocess(jkt_df_raw)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['speed'] = df['bearing'].apply(make_zero)
100%|███████████████████████████████████████| 11203/11203 [02:07<00:00, 88.10it/s]
100%|███████████████████████████████| 7586349/7586349 [00:14<00:00, 527133.87it/s]
100%|███████████████████████████████| 7586349/7586349 [00:25<00:00, 300638.33it/s]
100%|███████████████████████████████| 7586349/7586349 [00:30<00:00, 245533.53it/s]
100%|███████████████████████████████| 7586349/7586349 [00:26<00:00, 285101.04it/s]
100%|███████████████████████████████| 7586349/7586349 [00:25<00:00, 295840.40it/s]
100%|█████████████████████████████| 53744659/53744659 [01:44<00:00, 513118.51it/s]
100%|█████████████████████████████| 53744659/53744659 [03:06<00:00, 287773.66it/s]


In [16]:
jkt_df["time"] = pd.to_datetime(jkt_df['pingtimestamp'], unit='s') # defaults goes to gmt8
jkt_df["time"] = jkt_df["time"].dt.tz_localize('Etc/GMT+7').dt.tz_convert('utc')

In [18]:
jkt_df.reset_index(drop=True).to_feather('data/processed_jkt.ftr')

Alternatively, you can just read the processed feather files:

In [17]:
jkt_df = pd.read_feather('data/processed_jkt.ftr')
sg_df = pd.read_feather('data/processed_sgp.ftr')

Sorting the dataframes by their trip IDs and pingtimestamp:

In [26]:
%%time
jkt_df.sort_values(by=["trj_id", "pingtimestamp"], inplace=True)
sg_df.sort_values(by=["trj_id", "pingtimestamp"], inplace=True)

CPU times: user 1min 21s, sys: 7 s, total: 1min 28s
Wall time: 1min 28s


***
<a id='part22'></a>
### Map Matching

After we have sorted, we note that we need to do some map-matching. This is because some GPS pings may not be accurate, and appear on non-roads. We make use of a C++ library, `Valhalla` to do the map matching, and they implemented a HMM-based map-matching algorithm. You can read the article we have adapted from [here](https://towardsdatascience.com/map-matching-done-right-using-valhallas-meili-f635ebd17053).

**Sean** to help elaborate key points of map matching, and some implementation details:
Inputs, command to run to get, outputs

<a id='part3'></a>
### Section 3: Clean Data, What Next?

In [105]:
final_indices = []
grouped = jkt_df.groupby("trj_id")
for name, group in tqdm(grouped):
    if not group.empty:
        final_indices.append(group.iloc[0].name)
        final_indices.append(group.iloc[-1].name)
        
jkt_df_final = jkt_df[jkt_df.index.isin(set(final_indices))]

100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████| 55995/55995 [00:28<00:00, 1933.22it/s]


In [48]:
sg_df.sort_values(by=["trj_id", "pingtimestamp"], inplace=True)

In [64]:
final_indices = []
grouped = sg_df.groupby("trj_id")
for name, group in tqdm(grouped):
    if not group.empty:
        final_indices.append(group.iloc[0].name)
        final_indices.append(group.iloc[-1].name)
        
sg_df_final = sg_df[sg_df.index.isin(set(final_indices))]

100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████| 28000/28000 [00:15<00:00, 1844.93it/s]


In [66]:
sg_df_final

Unnamed: 0,trj_id,driving_mode,osname,pingtimestamp,rawlat,rawlng,speed,bearing,accuracy,time,day,month,year,day_of_week
19455403,10,car,android,1554809147,1.301775,103.799255,5.851406,133,8.000,19:25:47,9,4,2019,1
24896889,10,car,android,1554810296,1.358001,103.845161,0.000000,196,6.534,19:44:56,9,4,2019,1
14076795,100,car,ios,1555382655,1.345079,103.938477,2.549204,181,11.000,10:44:15,16,4,2019,1
12741206,100,car,ios,1555383648,1.335207,103.842209,5.622902,208,13.000,11:00:48,16,4,2019,1
1317245,1000,car,ios,1554943773,1.435317,103.788643,0.000000,192,16.000,08:49:33,11,4,2019,3
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4169215,998,car,ios,1555627861,1.296571,103.852600,0.000000,221,12.000,06:51:01,19,4,2019,4
8054042,9982,car,ios,1554826935,1.285460,103.847176,7.806410,209,16.000,00:22:15,10,4,2019,2
10316321,9982,car,ios,1554828066,1.425922,103.787338,20.385883,283,8.000,00:41:06,10,4,2019,2
1145119,9984,car,ios,1555724314,1.331534,103.723389,0.000000,95,8.000,09:38:34,20,4,2019,5


In [107]:
jkt_df_final["trj_id"].value_counts()

Unnamed: 0,trj_id,driving_mode,osname,pingtimestamp,rawlat,rawlng,speed,bearing,accuracy,time,day,month,year,day_of_week
36192264,1,car,android,1554992255,-6.197622,106.769020,5.580000,180,4.288,2019-04-11 21:17:35+00:00,11,4,2019,3
48798825,1,car,android,1554993352,-6.239635,106.801964,1.950000,14,7.551,2019-04-11 21:35:52+00:00,11,4,2019,3
16022400,10000,motorcycle,ios,1555375884,-6.248311,106.930450,11.350000,88,5.000,2019-04-16 07:51:24+00:00,16,4,2019,1
13561729,10000,motorcycle,ios,1555376451,-6.229177,106.947006,7.710000,358,5.000,2019-04-16 08:00:51+00:00,16,4,2019,1
46389183,10002,motorcycle,android,1554702941,-6.249766,106.968163,8.540586,269,3.900,2019-04-08 12:55:41+00:00,8,4,2019,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
49334485,9997,car,android,1554790331,-6.122796,106.751923,19.940001,272,4.551,2019-04-09 13:12:11+00:00,9,4,2019,1
46173170,9998,car,android,1555732280,-6.280849,106.782318,10.561894,289,9.000,2019-04-20 10:51:20+00:00,20,4,2019,5
27281830,9998,car,android,1555734514,-6.122941,106.834579,8.160220,103,3.900,2019-04-20 11:28:34+00:00,20,4,2019,5
47051869,9999,motorcycle,android,1555820405,-6.254682,106.943993,6.740000,5,3.900,2019-04-21 11:20:05+00:00,21,4,2019,6


In [108]:
jkt_gdf = gpd.GeoDataFrame(
    jkt_df_final, 
    geometry=gpd.points_from_xy(jkt_df_final.rawlng, jkt_df_final.rawlat,
    crs="EPSG:4326")
)

jkt_gdf.head()

Unnamed: 0,trj_id,driving_mode,osname,pingtimestamp,rawlat,rawlng,speed,bearing,accuracy,time,day,month,year,day_of_week,geometry
36192264,1,car,android,1554992255,-6.197622,106.76902,5.58,180,4.288,2019-04-11 21:17:35+00:00,11,4,2019,3,POINT (106.76902 -6.19762)
48798825,1,car,android,1554993352,-6.239635,106.801964,1.95,14,7.551,2019-04-11 21:35:52+00:00,11,4,2019,3,POINT (106.80196 -6.23964)
16022400,10000,motorcycle,ios,1555375884,-6.248311,106.93045,11.35,88,5.0,2019-04-16 07:51:24+00:00,16,4,2019,1,POINT (106.93045 -6.24831)
13561729,10000,motorcycle,ios,1555376451,-6.229177,106.947006,7.71,358,5.0,2019-04-16 08:00:51+00:00,16,4,2019,1,POINT (106.94701 -6.22918)
46389183,10002,motorcycle,android,1554702941,-6.249766,106.968163,8.540586,269,3.9,2019-04-08 12:55:41+00:00,8,4,2019,0,POINT (106.96816 -6.24977)


In [111]:
jkt_areas

Unnamed: 0,name,lat,lon,geometry
0,Mal Kelapa Gading,-6.177050,106.902802,"POLYGON ((106.90460 -6.17705, 106.90459 -6.177..."
1,Grand Indonesia,-6.195003,106.819786,"POLYGON ((106.82159 -6.19500, 106.82158 -6.195..."
2,Plaza Indonesia,-6.193793,106.822109,"POLYGON ((106.82391 -6.19379, 106.82390 -6.193..."
3,PIK Avenue,-6.108566,106.740648,"POLYGON ((106.74245 -6.10857, 106.74244 -6.108..."
4,Soetta Airport,-6.126573,106.654152,"POLYGON ((106.65595 -6.12657, 106.65594 -6.126..."
...,...,...,...,...
295,Stadion UMS,-6.141693,106.819572,"POLYGON ((106.82137 -6.14169, 106.82136 -6.141..."
296,SAMSAT Jakut Jakpus,-6.137308,106.832554,"POLYGON ((106.83435 -6.13731, 106.83435 -6.137..."
297,GOR Pademangan Barat,-6.133255,106.837168,"POLYGON ((106.83897 -6.13325, 106.83896 -6.133..."
298,Plaza Atrium,-6.176773,106.841064,"POLYGON ((106.84286 -6.17677, 106.84286 -6.176..."


In [112]:
jkt_gdf_sindexed = jkt_gdf
jkt_areas_sindexed = jkt_areas

In [113]:
jkt_gdf_sindexed.sindex
jkt_areas_sindexed.sindex

rtree.index.Index(bounds=[106.65235177091267, -6.394197098189403, 106.9539183121135, -6.104342392845497], size=300)

In [139]:
jkt_sjoined = gpd.sjoin(jkt_gdf_sindexed, jkt_areas_sindexed, how="inner", op="within")

  if await self.run_code(code, result, async_=asy):


In [142]:
jkt_sjoined.drop_duplicates(subset=["trj_id", "pingtimestamp"], inplace=True)

In [77]:
sg_gdf_sindexed = sg_gdf
sg_areas_sindexed = sg_areas

In [78]:
sg_gdf_sindexed.sindex
sg_areas_sindexed.sindex

rtree.index.Index(bounds=[103.605700705134, 1.15869870063517, 104.088483065163, 1.47077483208461], size=332)

In [82]:
sjoined = gpd.sjoin(sg_gdf_sindexed, sg_areas_sindexed, op="within")

  if await self.run_code(code, result, async_=asy):


In [157]:
sjoined.drop_duplicates(subset=["trj_id", "pingtimestamp"],inplace=True)

In [None]:
sjoined[sjoined["trj_id"].isin(sjoined["trj_id"].value_counts()[sjoined["trj_id"].value_counts() == 2].index)]

In [164]:
%%time
sjoined.sort_values(by=["trj_id", "pingtimestamp"], inplace=True)

CPU times: user 56.6 ms, sys: 32 µs, total: 56.7 ms
Wall time: 55.4 ms


Finding top start-end trips:

In [168]:
final_pairs = []
grouped = sjoined.groupby("trj_id")
for name, group in tqdm(grouped):
    if not group.empty:
        final_pairs.append((name, group.iloc[0].subzone, group.iloc[-1].subzone))

100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████| 28000/28000 [00:25<00:00, 1092.32it/s]


In [171]:
from collections import Counter
counter_dict = Counter([tpl[1:] for tpl in final_pairs])

In [186]:
frequent_pairs = [pair[0] for pair in counter_dict.most_common(30)]
sg_poi = [tpl[0] for tpl in final_pairs if tpl[1:] in frequent_pairs]

In [193]:
sg_df_reduced = sg_df.loc[sg_df["trj_id"].isin(sg_poi)]

In [206]:
pd.read_feather('data/sgp_reduced.ftr')["trj_id"].nunique()

748

In [199]:
sg_df_reduced.reset_index(drop=True).to_feather("data/sgp_reduced.ftr")

In [211]:
sg_df_reduced.sort_values(by=["trj_id", "pingtimestamp"], inplace=True)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  sg_df_reduced.sort_values(by=["trj_id", "pingtimestamp"], inplace=True)


In [218]:
sg_df_reduced

Unnamed: 0,trj_id,driving_mode,osname,pingtimestamp,rawlat,rawlng,speed,bearing,accuracy,time,day,month,year,day_of_week
13554465,10042,car,ios,1555767587,1.341679,103.749016,24.663235,117,16.0,21:39:47,20,4,2019,5
12606262,10042,car,ios,1555767588,1.341559,103.749207,24.663235,117,16.0,21:39:48,20,4,2019,5
11259782,10042,car,ios,1555767589,1.341441,103.749413,24.878941,117,16.0,21:39:49,20,4,2019,5
7075854,10042,car,ios,1555767590,1.341327,103.749619,25.292252,117,16.0,21:39:50,20,4,2019,5
8103374,10042,car,ios,1555767591,1.341211,103.749817,25.788006,117,12.0,21:39:51,20,4,2019,5
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
7829354,9870,car,ios,1554689559,1.340167,103.972992,23.190001,115,5.0,10:12:39,8,4,2019,0
11347021,9870,car,ios,1554689560,1.340093,103.973183,23.260000,114,5.0,10:12:40,8,4,2019,0
13350274,9870,car,ios,1554689561,1.340017,103.973389,23.270000,113,5.0,10:12:41,8,4,2019,0
9149575,9870,car,ios,1554689562,1.339946,103.973595,23.170000,113,5.0,10:12:42,8,4,2019,0


In [224]:
start_end = []
grouped = sg_df_reduced.groupby("trj_id")
for name, group in tqdm(grouped):
    if not group.empty:
        start, end = group.iloc[0], group.iloc[-1]
        start_end.append((name, (start.time.strftime("%H:%M:%S"), start.rawlat, start.rawlng), (end.time.strftime("%H:%M:%S"), end.rawlat, end.rawlng)))

100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████| 28000/28000 [00:01<00:00, 16984.97it/s]


In [230]:
len(id_list)

748

In [None]:
pd.read_feather("data/sgp_reduced.ftr")

In [202]:
import pickle
with open("sg_poi.pkl", 'wb') as f:
    pickle.dump(sg_poi, f)

In [None]:
import pickle
with open('sg_poi.pkl', 'rb') as f:
    id_list = pickle.load(f)

In [None]:
id_list

In [6]:
# # preprocessing for jakarta df
# jkt_dfclean = preprocess(jkt_df)
# jkt_dfclean.reset_index(drop=True).to_feather('data/processed_jkt.ftr')

sg_dfclean = preprocess(sg_df)
sg_dfclean.to_feather('data/processed_sgp.ftr')

NameError: name 'sg_df' is not defined

In [4]:
jkt_dfclean = preprocess(jkt_df)
jkt_dfclean.to_feather('data/processed_jkt.ftr')

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['speed'] = df['bearing'].apply(make_zero,val=180)
100%|████████████████████████████████████████████████████████████████████████████████| 11203/11203 [02:21<00:00, 79.09it/s]
100%|████████████████████████████████████████████████████████████████████████| 7586349/7586349 [00:15<00:00, 480627.20it/s]
100%|████████████████████████████████████████████████████████████████████████| 7586349/7586349 [00:27<00:00, 274359.05it/s]
100%|████████████████████████████████████████████████████████████████████████| 7586349/7586349 [00:32<00:00, 234558.66it/s]
100%|████████████████████████████████████████████████████████████████████████| 7586349/7586349 [00:28<00:00, 262666.36it/s]
100%|█████████████████████████████████████████████████████████

In [138]:
m = folium.Map([-6.138945, 106.812561], zoom_start=7, tiles='cartodbpositron')
folium.GeoJson(jkt_fucker_gdf).add_to(m)

m

In [11]:
m

In [153]:
# These are the usual ipython objects, including this one you are creating
ipython_vars = ['In', 'Out', 'exit', 'quit', 'get_ipython', 'ipython_vars']

# Get a sorted list of the objects and their sizes
sorted([(x, sys.getsizeof(globals().get(x))) for x in dir() if not x.startswith('_') and x not in sys.modules and x not in ipython_vars], key=lambda x: x[1], reverse=True)

[('jkt_df', 3015279301),
 ('sg_df', 2553055874),
 ('sjoined', 66556308),
 ('jkt_df_final', 12745629),
 ('jkt_gdf', 12745629),
 ('jkt_gdf_sindexed', 12745629),
 ('sg_df_final', 8501472),
 ('sg_gdf', 8501472),
 ('sg_gdf_sindexed', 8501472),
 ('jkt_sjoined', 7293559),
 ('group', 5697453),
 ('jkt_fucker', 5664093),
 ('jkt_fucker_gdf', 5653381),
 ('sg_df_sample', 3259472),
 ('df_group', 2789472),
 ('final_indices', 927560),
 ('sg_areas', 344790),
 ('sg_areas_sindexed', 344790),
 ('final_df', 110992),
 ('jkt_areas', 28368),
 ('jkt_areas_sindexed', 28368),
 ('plan_name', 2888),
 ('region_name', 2888),
 ('subzone_name', 2888),
 ('BeautifulSoup', 2008),
 ('basemaps', 1192),
 ('Choropleth', 1064),
 ('FullScreenControl', 1064),
 ('GeoData', 1064),
 ('GeoJSON', 1064),
 ('HTML', 1064),
 ('Heatmap', 1064),
 ('Icon', 1064),
 ('LayersControl', 1064),
 ('Map', 1064),
 ('Marker', 1064),
 ('MarkerCluster', 1064),
 ('SearchControl', 1064),
 ('Text', 1064),
 ('WidgetControl', 1064),
 ('location', 753),
 ('

In [None]:
del 