# Ford GoBike System Data Flights Exploration
## by *Furawa*

## Data Wrangling

The Ford GoBike System Data Flights includes information about individual rides made in a bike-sharing system covering the greater San Francisco Bay area.  We will use the datasets of all the year 2018 which are divided in 12 separated datasets(1 dataset for each month of the year). 
We will download all the 12 datasets for the year 2018, gather them all together in one unique dataset, assess it and clean it if necessary. 
First of all let us import libraries useful for the process.  

In [26]:
# Import libraries
import numpy as np
import pandas as pd
import seaborn as sb
import matplotlib.pyplot as plt  
import requests
import os
import zipfile
import re

%matplotlib inline

Let us retrieve all the urls from the [Bay Wheels trip history data](https://s3.amazonaws.com/baywheels-data/index.html).

In [37]:
# Create a folder to store the downloaded files if it does not exist 
folder_name = 'gobike_monthly_data'            
if not os.path.exists(folder_name):
    os.makedirs(folder_name)  
# Create an empty list to store the urls
baywheels_urls = []
# Create a for loop to retrieve the urls one by one, there are 12
for i in range(1,13):
    # Remove the 0 (after 2018) after the 9th file
    if i < 10:
        url = 'https://s3.amazonaws.com/baywheels-data/20180' + str(i) +'-fordgobike-tripdata.csv.zip'
    else:
        url = 'https://s3.amazonaws.com/baywheels-data/2018' + str(i) +'-fordgobike-tripdata.csv.zip'
    baywheels_urls.append(url)
baywheels_urls

['https://s3.amazonaws.com/baywheels-data/201801-fordgobike-tripdata.csv.zip',
 'https://s3.amazonaws.com/baywheels-data/201802-fordgobike-tripdata.csv.zip',
 'https://s3.amazonaws.com/baywheels-data/201803-fordgobike-tripdata.csv.zip',
 'https://s3.amazonaws.com/baywheels-data/201804-fordgobike-tripdata.csv.zip',
 'https://s3.amazonaws.com/baywheels-data/201805-fordgobike-tripdata.csv.zip',
 'https://s3.amazonaws.com/baywheels-data/201806-fordgobike-tripdata.csv.zip',
 'https://s3.amazonaws.com/baywheels-data/201807-fordgobike-tripdata.csv.zip',
 'https://s3.amazonaws.com/baywheels-data/201808-fordgobike-tripdata.csv.zip',
 'https://s3.amazonaws.com/baywheels-data/201809-fordgobike-tripdata.csv.zip',
 'https://s3.amazonaws.com/baywheels-data/201810-fordgobike-tripdata.csv.zip',
 'https://s3.amazonaws.com/baywheels-data/201811-fordgobike-tripdata.csv.zip',
 'https://s3.amazonaws.com/baywheels-data/201812-fordgobike-tripdata.csv.zip']

Now we download all the zip files from the urls and store them in the created folder.  

In [48]:
# Download each zip file from the urls with a for loop
for url in baywheels_urls:  
    response = requests.get(url)
    # Retrieve programmatically the name of the file which is the last part after /
    with open(os.path.join(folder_name, url.split('/')[-1]), mode = 'wb') as file:
        file.write(response.content)
sorted(os.listdir(folder_name))

['201801-fordgobike-tripdata.csv.zip',
 '201802-fordgobike-tripdata.csv.zip',
 '201803-fordgobike-tripdata.csv.zip',
 '201804-fordgobike-tripdata.csv.zip',
 '201805-fordgobike-tripdata.csv.zip',
 '201806-fordgobike-tripdata.csv.zip',
 '201807-fordgobike-tripdata.csv.zip',
 '201808-fordgobike-tripdata.csv.zip',
 '201809-fordgobike-tripdata.csv.zip',
 '201810-fordgobike-tripdata.csv.zip',
 '201811-fordgobike-tripdata.csv.zip',
 '201812-fordgobike-tripdata.csv.zip']

All the zip files are in the folder, we can proceed and unzip them.  

In [68]:
import glob
files = glob.glob(folder_name+'/*.zip')
for file in files:
    with zipfile.ZipFile(file, 'r') as my_zip:
        my_zip.extractall()

In [66]:
sorted(os.listdir(folder_name))

['201801-fordgobike-tripdata.csv',
 '201801-fordgobike-tripdata.csv.zip',
 '201802-fordgobike-tripdata.csv',
 '201802-fordgobike-tripdata.csv.zip',
 '201803-fordgobike-tripdata.csv',
 '201803-fordgobike-tripdata.csv.zip',
 '201804-fordgobike-tripdata.csv',
 '201804-fordgobike-tripdata.csv.zip',
 '201805-fordgobike-tripdata.csv',
 '201805-fordgobike-tripdata.csv.zip',
 '201806-fordgobike-tripdata.csv',
 '201806-fordgobike-tripdata.csv.zip',
 '201807-fordgobike-tripdata.csv',
 '201807-fordgobike-tripdata.csv.zip',
 '201808-fordgobike-tripdata.csv',
 '201808-fordgobike-tripdata.csv.zip',
 '201809-fordgobike-tripdata.csv',
 '201809-fordgobike-tripdata.csv.zip',
 '201810-fordgobike-tripdata.csv',
 '201810-fordgobike-tripdata.csv.zip',
 '201811-fordgobike-tripdata.csv',
 '201811-fordgobike-tripdata.csv.zip',
 '201812-fordgobike-tripdata.csv',
 '201812-fordgobike-tripdata.csv.zip']