## 2. Citybik.es dataset
### 2.1. Datasets Download
Citybik.es is a website that offers an Application Programming Interface (or API, for short) for the usage of
bike-sharing services throughout the world. Among the others, data for one of Turin’s bike sharing system
([TO]Bike) is available. For [TO]Bike, the information available is at a “station” granularity. This means
that all the data available regards the bike stations: some of the useful information available is the station
name, its position (in terms of latitude and longitude), the number of available bikes and the number of
free docks. The data is offered in near real-time (i.e. it is updated every 15-30 minutes).

The API endpoint to request the data about for the [TO]Bike service is the following:

http://api.citybik.es/v2/networks/to-bike

You can either download the data from your browser, or using wget (as shown for the Iris dataset).

This dataset is in the JSON (JavaScript Object Notation) format. This format allows storing complex data structure (i.e. not only tabular data). You can either store basic data types (strings, numbers, boolean) or lists of them (e.g. [0, true, "test"]) and dictionaries (e.g. { "key1": "value1",
"key2": false, "key3": [3, 2, 1]}).

Lists and dictionaries are the same ones found in Python (list and dict). As such, any JSON file can
be loaded as a Python data structure (which can possibly be nested). To do that, a Python module called
json is available. It can be used as follows:

import json

with open("file.json") as f:

    obj = json.load(f)
    print(obj)
In this example, the file file.json, which contains a JSON object, is opened (with open()) and read
(with json.load()). The contents of the object are then stored into the obj variable.
You can read more about JSON on Wikipedia.


### 2.2. Exercises
**1.** Load the previously downloaded Citybik.es dataset as a Python dictionary. You can make use of the
json module presented. You can find the full documentation for the json module here. After the
dictionary is loaded, explore its contents

In [1]:
import json

In [2]:
with open('tobike.json','r') as file:
    obj = json.load(file)
print(list(obj.keys()))
#we need to deep into one more level to understand the structure
obj2 = obj[list(obj.keys())[0]]
print(list(obj2.keys()))
#now we can understand that we found the 'station' granularity.
#our focus will be on 'station's. we get this value as our 'data' before continue
data = obj2['stations']
print("---------------------------------")
print(f"-Type of data is {type(data)}.")
print(f"-The list consists of {type(data[0])}.")
#we can see that all stations is collected as list of dictionaries.
print(f"-Data consists of {len(data)} different stations.")
print("---------------------------------")
#now explore the data inside of any station
for k,v in zip(list(data[0].keys()),list(data[0].values())):
    print(f"{k}:{v}")
#some keys has integer values such as empty_slots, free_bikes
#id and name keys have string values
#latitude and longitude information is float value that shows the location
#extra key contains this stations number, review, score, status and uid values
#timestamp key shows the last update

['network']
['company', 'href', 'id', 'location', 'name', 'source', 'stations']
---------------------------------
-Type of data is <class 'list'>.
-The list consists of <class 'dict'>.
-Data consists of 145 different stations.
---------------------------------
empty_slots:15
extra:{'number': 4, 'reviews': 340, 'score': 3.9, 'status': 'online', 'uid': '253'}
free_bikes:2
id:9f705b5e090de99e976f4ac6c6911571
latitude:45.072882
longitude:7.667951
name:Porta Susa 1
timestamp:2022-12-16T02:37:15.713000Z


**2.** Count and print the number of active stations (a station is active if its extra.status field is "online")

In [7]:
c = 0
for l in data:
    if l['extra']['status'] == 'online':
        c += 1
print(f"Number of active stations is {c}")

Number of active stations is 120


**3.** Count and print the total number of bikes available (field free_bikes) and the number of free docks
(field empty_slots) throughout all stations.

In [10]:
c2 = 0
for l in data:
    c2 += l['free_bikes']
print(f"Number of free bikes is {c2}")

Number of free bikes is 220


**4.** (*) Given the coordinates (latitude, longitude) of a point (e.g. 45.074512, 7.694419), identify the closest bike station to it that has available bikes. For computing the distance among two points (given their coordinates), you can use the function distance_coords() defined in the code snippet below (which is an implementation of the great-circle distance):

from math import cos, acos, sin

def distance_coords(lat1, lng1, lat2, lng2):

    """Compute the distance among two points."""
    
    deg2rad = lambda x: x * 3.141592 / 180
    
    lat1, lng1, lat2, lng2 = map(deg2rad, [ lat1, lng1, lat2, lng2 ])
    
    R = 6378100 # Radius of the Earth, in meters
    
    return R * acos(sin(lat1) * sin(lat2) + cos(lat1) * cos(lat2) * cos(lng1 - lng2))


In [11]:
from math import cos, acos, sin

def distance_coords(lat1, lng1, lat2, lng2):
    """Compute the distance among two points."""
    deg2rad = lambda x: x * 3.141592 / 180
    lat1, lng1, lat2, lng2 = map(deg2rad, [ lat1, lng1, lat2, lng2 ])
    R = 6378100 # Radius of the Earth, in meters
    return R * acos(sin(lat1) * sin(lat2) + cos(lat1) * cos(lat2) * cos(lng1 - lng2))

In [24]:
x,y = 45.074512, 7.694419

availableStations = []
for l in data:
    if l['free_bikes'] > 0:
        availableStations.append(l)
distances = []
for k in availableStations:
    distances.append(distance_coords(x,y,k['latitude'],k['longitude']))
loc = availableStations[distances.index(min(distances))]
print(f"The ID of closest location with available bike is: {loc['id']}")
print(f"The name is '{loc['name']}' and there are {loc['free_bikes']} free bikes")

The ID of closest location with available bike is: 1916e772eb3e6c88b37a0f584da1e333
The name is 'Regina Margherita 3' and there are 1 free bikes
