# Project on Tartu Smart Bike data analysis
## Introduction to Data Science - LTAT.02.002
## Bike locations data analysis

## Table of Contents
 1. [Reading and selecting locations data for heatmaps](#data)  
 2. [Fix for bike station locations JSON file](#json)  
 3. [Heatmap for summer locations data](#heatmap2)  
 4. [Heatmap for September locations data ](#heatmap1)  


### 1. Reading and selecting locations data for heatmaps<a name = "data">

In [1]:
# Import the necessary libraries
import pandas as pd
import gmplot
# For improved table display in the notebook
from IPython.display import display

locations06p2 =  pd.read_csv("..\\data_for_IDS2019_project_team_W17\\bicycle_data\\locations_201906_part2.csv")
locations07p1 =  pd.read_csv("..\\data_for_IDS2019_project_team_W17\\bicycle_data\\locations_201907_part1.csv")
locations07p2 =  pd.read_csv("..\\data_for_IDS2019_project_team_W17\\bicycle_data\\locations_201907_part2.csv")
locations08p1 =  pd.read_csv("..\\data_for_IDS2019_project_team_W17\\bicycle_data\\locations_201908_part1.csv")
locations08p2 =  pd.read_csv("..\\data_for_IDS2019_project_team_W17\\bicycle_data\\locations_201908_part2.csv")
locations09 =  pd.read_csv("..\\data_for_IDS2019_project_team_W17\\bicycle_data\\locations_201909.csv")

locations_datasets = [locations06p2, locations07p1, locations07p2, locations08p1, locations08p2, locations09]

# Data for heatmaps:
# September
data_september = locations_datasets[5] 
data_september.dropna(inplace = True)

# We select the same amount of data for summer from June, July and August locations randomly
#data_summer = pd.concat(locations_datasets[0:5]) 
#data_summer = data_summer.sample(n=data_september.shape[0])
#data_summer.dropna(inplace = True)

# July data, we take the same amount of rows as in september data
data_summer = locations07p2.iloc[0:data_september.shape[0]]


### 2. Fix for bike station locations JSON file <a name="json"></a>
Code was provided by Asko Seeba in the Piazza forum of LTAT.02.002 Introduction to Data Science course. Link to post: https://piazza.com/class/k0259zlgyprlw?cid=120

In [2]:
"""
Created on Sun Nov 24 18:05:30 2019

@author: Asko Seeba
"""

import json
import pprint

pp = pprint.PrettyPrinter(indent = 4)

# Change it to your file location:
tartu_bikes_dir = '../data_for_IDS2019_project_team_W17/'

with open(tartu_bikes_dir + '/2019_08_28_bicycle_stations_public_and_metallica.json',
          encoding = "utf-8") as f:
    lines = f.readlines()

# The json string in file is broken. Need to fix it before the parser agrees to parse it.
# 1. The file is list of dictionaries, but the beginning [ and the ending ] are missing --
#    need to add those.
# 2. There are commas after last elements before closing } and ] symbols -- it is ok in
#    python, but not ok in json -- need to remove those.
# 3. There are some missing commas between the list elements -- add those.
#
# The method: strip and join the lines and add the embracing '[' and ']', so we can
# search for ',}' and ',]', and replace them with '}' and ']' respectively. Then add the
# missing commas.
for i in range(len(lines)):
    lines[i] = lines[i].strip()
json_str = ''.join(lines)
json_str = '[' + json_str + ']'
json_str = json_str.replace(',}', '}')
json_str = json_str.replace(',]', ']')
json_str = json_str.replace('}{', '},{')

stations = json.loads(json_str)

#print('Enjoy!')
#pp.pprint(stations)

with open(tartu_bikes_dir + '/2019_08_28_bicycle_stations_public_and_metallica_fixed.json', 'w') as f_fixed:
    f_fixed.write(json_str)


In [3]:
station_locations = []
infoboxes = []

for station in stations:
    station_locations.append((station.get('areaCentroid').get('latitude'),station.get('areaCentroid').get('longitude')))
    infoboxes.append(station.get('name'))

### 3. Heatmap for July locations data <a name="heatmap1"></a>

In [4]:
import gmaps.datasets
from ipywidgets.embed import embed_minimal_html
import pandas as pd

gmaps.configure(api_key='AIzaSyBOQ22AXVvQuwCJWmsIe6sc3iyJ00UmgR0')

fig1 = gmaps.figure(center = (58.377180, 26.726092), zoom_level=14, map_type= "SATELLITE")

# heatmap layer
heatmap_layer1 = gmaps.heatmap_layer(data_summer[['latitude', 'longitude']], max_intensity=20, point_radius=5, opacity = 0.9)
fig1.add_layer(heatmap_layer1)

# bike stations layer
stations_layer1 = gmaps.symbol_layer(station_locations, fill_color=(59, 176, 255),fill_opacity=1,stroke_color=(66, 135, 245),
                                     stroke_opacity=0.0,scale=5,info_box_content=infoboxes)
fig1.add_layer(stations_layer1)

embed_minimal_html('heatmaps\\july.html', views=[fig1])


### 4. Heatmap for September locations data <a name="heatmap2"></a>

In [5]:
fig2 = gmaps.figure(center = (58.377180, 26.726092), zoom_level=14, map_type= "SATELLITE")

# heatmap layer
heatmap_layer2 = gmaps.heatmap_layer(data_september[['latitude', 'longitude']], max_intensity=20, point_radius=5, opacity = 0.9)
fig2.add_layer(heatmap_layer2)

# bike stations layer
stations_layer2 = gmaps.symbol_layer(station_locations, fill_color=(59, 176, 255),fill_opacity=1,stroke_color=(66, 135, 245),
                                     stroke_opacity=0.0,scale=5,info_box_content=infoboxes)
fig2.add_layer(stations_layer2)

embed_minimal_html('heatmaps\\september.html', views=[fig2])

### July + September in one html

In [6]:
embed_minimal_html('heatmaps\\july_and_september.html', views=[fig1,fig2])