# CitiBike exploration

We want to explore the streaming dataset published by CitiBike (a NYC-based bike-share company).

They publish their data using a REST endpoint.

Let's hit their endpoint to explore it a little bit:

In [2]:
import requests

r = requests.get("http://gbfs.citibikenyc.com/gbfs/gbfs.json")

We always need to check the status code.  `200` indicates success.  If we do not succeed then we should probably figure out some "retry" logic:

In [3]:
r.status_code

200

The content is just bytes.  In python bytes "look" like a string, but they have a `b` in front.  Like this:
```
b'thesearebytesnotastring'
```

In [4]:
r.content

b'{"last_updated":1554069608,"ttl":10,"data":{"en":{"feeds":[{"name":"station_status","url":"https://gbfs.citibikenyc.com/gbfs/en/station_status.json"},{"name":"system_regions","url":"https://gbfs.citibikenyc.com/gbfs/en/system_regions.json"},{"name":"station_information","url":"https://gbfs.citibikenyc.com/gbfs/en/station_information.json"},{"name":"system_alerts","url":"https://gbfs.citibikenyc.com/gbfs/en/system_alerts.json"},{"name":"system_information","url":"https://gbfs.citibikenyc.com/gbfs/en/system_information.json"}]},"es":{"feeds":[{"name":"station_status","url":"https://gbfs.citibikenyc.com/gbfs/es/station_status.json"},{"name":"system_regions","url":"https://gbfs.citibikenyc.com/gbfs/es/system_regions.json"},{"name":"station_information","url":"https://gbfs.citibikenyc.com/gbfs/es/station_information.json"},{"name":"system_alerts","url":"https://gbfs.citibikenyc.com/gbfs/es/system_alerts.json"},{"name":"system_information","url":"https://gbfs.citibikenyc.com/gbfs/es/system

This looks like json, which is a VERY common way to transmit information over HTTP.

However, we first need to convert these bytes to a string.  Recall that in python strings are *unicode*.  To decode bytes -> unicode we need to know how it was encoded in the first place.  Most of the time you will encounter the `utf-8` encoding.  Let's try it:  

In [5]:
msg = r.content.decode('utf-8')
msg

'{"last_updated":1554069608,"ttl":10,"data":{"en":{"feeds":[{"name":"station_status","url":"https://gbfs.citibikenyc.com/gbfs/en/station_status.json"},{"name":"system_regions","url":"https://gbfs.citibikenyc.com/gbfs/en/system_regions.json"},{"name":"station_information","url":"https://gbfs.citibikenyc.com/gbfs/en/station_information.json"},{"name":"system_alerts","url":"https://gbfs.citibikenyc.com/gbfs/en/system_alerts.json"},{"name":"system_information","url":"https://gbfs.citibikenyc.com/gbfs/en/system_information.json"}]},"es":{"feeds":[{"name":"station_status","url":"https://gbfs.citibikenyc.com/gbfs/es/station_status.json"},{"name":"system_regions","url":"https://gbfs.citibikenyc.com/gbfs/es/system_regions.json"},{"name":"station_information","url":"https://gbfs.citibikenyc.com/gbfs/es/station_information.json"},{"name":"system_alerts","url":"https://gbfs.citibikenyc.com/gbfs/es/system_alerts.json"},{"name":"system_information","url":"https://gbfs.citibikenyc.com/gbfs/es/system_

That worked.  Indeed those bytes were in `utf-8`.  Now we have an honest string (that contains json) and we can deserialize it into a python dict:

In [6]:
import ujson

data = ujson.loads(msg)
data

{'last_updated': 1554069608,
 'ttl': 10,
 'data': {'en': {'feeds': [{'name': 'station_status',
     'url': 'https://gbfs.citibikenyc.com/gbfs/en/station_status.json'},
    {'name': 'system_regions',
     'url': 'https://gbfs.citibikenyc.com/gbfs/en/system_regions.json'},
    {'name': 'station_information',
     'url': 'https://gbfs.citibikenyc.com/gbfs/en/station_information.json'},
    {'name': 'system_alerts',
     'url': 'https://gbfs.citibikenyc.com/gbfs/en/system_alerts.json'},
    {'name': 'system_information',
     'url': 'https://gbfs.citibikenyc.com/gbfs/en/system_information.json'}]},
  'es': {'feeds': [{'name': 'station_status',
     'url': 'https://gbfs.citibikenyc.com/gbfs/es/station_status.json'},
    {'name': 'system_regions',
     'url': 'https://gbfs.citibikenyc.com/gbfs/es/system_regions.json'},
    {'name': 'station_information',
     'url': 'https://gbfs.citibikenyc.com/gbfs/es/station_information.json'},
    {'name': 'system_alerts',
     'url': 'https://gbfs.citib

In [7]:
data.keys()

dict_keys(['last_updated', 'ttl', 'data'])

In [8]:
data['last_updated']

1554069608

That is a unix timestamp.  Now let's look at the `ttl` (time to live):

In [9]:
data['ttl']

10

Apparently this just shows us more REST endpoints where we can get the actual data:

In [10]:
data['data']['en']['feeds']

[{'name': 'station_status',
  'url': 'https://gbfs.citibikenyc.com/gbfs/en/station_status.json'},
 {'name': 'system_regions',
  'url': 'https://gbfs.citibikenyc.com/gbfs/en/system_regions.json'},
 {'name': 'station_information',
  'url': 'https://gbfs.citibikenyc.com/gbfs/en/station_information.json'},
 {'name': 'system_alerts',
  'url': 'https://gbfs.citibikenyc.com/gbfs/en/system_alerts.json'},
 {'name': 'system_information',
  'url': 'https://gbfs.citibikenyc.com/gbfs/en/system_information.json'}]

We are mostly interested in `station_status` and `station_information`.  Let's hit `station_status`:

## Station Information and Status

In [15]:
r = requests.get('https://gbfs.citibikenyc.com/gbfs/en/station_information.json')
r.status_code

200

In [16]:
station_info = ujson.loads(r.content.decode('utf-8'))
station_info

{'last_updated': 1554070800,
 'ttl': 10,
 'data': {'stations': [{'station_id': '72',
    'external_id': '66db237e-0aca-11e7-82f6-3863bb44ef7c',
    'name': 'W 52 St & 11 Ave',
    'short_name': '6926.01',
    'lat': 40.76727216,
    'lon': -73.99392888,
    'region_id': 71,
    'rental_methods': ['KEY', 'CREDITCARD'],
    'capacity': 55,
    'rental_url': 'http://app.citibikenyc.com/S6Lr/IBV092JufD?station_id=72',
    'electric_bike_surcharge_waiver': False,
    'eightd_has_key_dispenser': False,
    'has_kiosk': True},
   {'station_id': '79',
    'external_id': '66db269c-0aca-11e7-82f6-3863bb44ef7c',
    'name': 'Franklin St & W Broadway',
    'short_name': '5430.08',
    'lat': 40.71911552,
    'lon': -74.00666661,
    'region_id': 71,
    'rental_methods': ['KEY', 'CREDITCARD'],
    'capacity': 33,
    'rental_url': 'http://app.citibikenyc.com/S6Lr/IBV092JufD?station_id=79',
    'electric_bike_surcharge_waiver': False,
    'eightd_has_key_dispenser': False,
    'has_kiosk': True},
 

In [11]:
r = requests.get('https://gbfs.citibikenyc.com/gbfs/en/station_status.json')
r.status_code

200

Using the same steps as above (but all in one line) let's get a hold of the station status:

In [14]:
station_status = ujson.loads(r.content.decode('utf-8'))
station_status

{'last_updated': 1554070289,
 'ttl': 10,
 'data': {'stations': [{'station_id': '72',
    'num_bikes_available': 49,
    'num_ebikes_available': 0,
    'num_bikes_disabled': 0,
    'num_docks_available': 6,
    'num_docks_disabled': 0,
    'is_installed': 1,
    'is_renting': 1,
    'is_returning': 1,
    'last_reported': 1554070069,
    'eightd_has_available_keys': False},
   {'station_id': '79',
    'num_bikes_available': 23,
    'num_ebikes_available': 1,
    'num_bikes_disabled': 1,
    'num_docks_available': 9,
    'num_docks_disabled': 0,
    'is_installed': 1,
    'is_renting': 1,
    'is_returning': 1,
    'last_reported': 1554069814,
    'eightd_has_available_keys': False},
   {'station_id': '82',
    'num_bikes_available': 22,
    'num_ebikes_available': 0,
    'num_bikes_disabled': 3,
    'num_docks_available': 2,
    'num_docks_disabled': 0,
    'is_installed': 1,
    'is_renting': 1,
    'is_returning': 1,
    'last_reported': 1554068368,
    'eightd_has_available_keys': Fa