# Dublin Bike CA 1

## 1 Data Loading Notebook


Ronan Downes  | November 2022 
***


This notebook downloads the publicly available datasets of the quarterly  Dublin Bike occupancy csv files and a station GPS file  directly from the  [Smart Dublin](https://data.smartdublin.ie/dataset/dublinbikes-api) website. The hourly Phoinex Park (weather station 175) weather data provided by the Irish Meteorological Society [Met Éireann](https://www.met.ie/climate/available-data/historical-data) website is also downloaded. All datasets are downloaded to a "data" folder which the code makes directly inside the folder containing this jupyter notebook.
This avoids errors in filenames and avoids any need for pathname manipulations based on operating systems.
***

In [1]:
# from IPython.display import Image
# Image(filename =r'bike_sharing.gif', width = 600, height = 300)

In [2]:
#Setup required  libraries and data directory 
import pandas as pd
import urllib
import os
## Downloading data directly  from URL avoids cross-platform and end-user errors 
if not os.path.exists("data"):
    os.makedirs("data")


### 1.1 Load Bike Data

Retrieve Dublin Bikes data  from  [Smart Dublin](https://data.smartdublin.ie/dataset/dublinbikes-api). This study is restricted to the post COVID-19 lockdown era but a fork of the project can easily be adapted any range of available data by adding and removing comment lines in the dataframe list of tuples.

In [3]:
### Defines a list of tuples with ordered pairs of filenames and retrieval urls for urllib requests later.
### The os library checks if the quarterly csv file is already in place and if not it asks the urllib library to retrieve
### the csv at that url and name it according to item[0]
### Date range can easily be restricted by commenting out tuples.
    
dataframes = [
#      (
#         "data/2018_Q3.csv",
#         " https://data.smartdublin.ie/dataset/33ec9fe2-4957-4e9a-ab55-c5e917c7a9ab/resource/9496fac5-e4d7-4ae9-a49a-217c7c4e83d9/download/dublinbikes_20180701_20181001.csv",
#     ),
#      (
#         "data/2018_Q4.csv",
#         "https://data.smartdublin.ie/dataset/33ec9fe2-4957-4e9a-ab55-c5e917c7a9ab/resource/67ea095f-67ad-47f5-b8f7-044743043848/download/dublinbikes_20181001_20190101.csv",
#     ),
#     (
#         "data/2019_Q1.csv",
#         "https://data.smartdublin.ie/dataset/33ec9fe2-4957-4e9a-ab55-c5e917c7a9ab/resource/538165d7-535e-4e1d-909a-1c1bfae901c5/download/dublinbikes_20190101_20190401.csv",
#     ),
#     (
#         "data/2019_Q2.csv",
#         "https://data.smartdublin.ie/dataset/33ec9fe2-4957-4e9a-ab55-c5e917c7a9ab/resource/76fdda3d-d8be-441b-92dd-0ee36d9c5316/download/dublinbikes_20190401_20190701.csv",
#     ),
#     (
#         "data/2019_Q3.csv",
#         "https://data.smartdublin.ie/dataset/33ec9fe2-4957-4e9a-ab55-c5e917c7a9ab/resource/305d39ac-b6a0-4216-a535-0ae2ddf59819/download/dublinbikes_20190701_20191001.csv",
#     ),
#     (
#         "data/2019_Q4.csv",
#         "https://data.smartdublin.ie/dataset/33ec9fe2-4957-4e9a-ab55-c5e917c7a9ab/resource/5d23332e-4f49-4c41-b6a0-bffb77b33d64/download/dublinbikes_20191001_20200101.csv",
#     ),
#             (
#         "data/2020_Q1.csv",
#         "https://data.smartdublin.ie/dataset/33ec9fe2-4957-4e9a-ab55-c5e917c7a9ab/resource/aab12e7d-547f-463a-86b1-e22002884587/download/dublinbikes_20200101_20200401.csv",
#     ),
#                 (
#         "data/2020_Q2.csv",
#         "https://data.smartdublin.ie/dataset/33ec9fe2-4957-4e9a-ab55-c5e917c7a9ab/resource/8ddaeac6-4caf-4289-9835-cf588d0b69e5/download/dublinbikes_20200401_20200701.csv",
#     ),
#                     (
#         "data/2020_Q3.csv",
#         "https://data.smartdublin.ie/dataset/33ec9fe2-4957-4e9a-ab55-c5e917c7a9ab/resource/99a35442-6878-4c2d-8dff-ec43e91d21d7/download/dublinbikes_20200701_20201001.csv",
#     ),
#                         (
#         "data/2020_Q4.csv",
#         "https://data.smartdublin.ie/dataset/33ec9fe2-4957-4e9a-ab55-c5e917c7a9ab/resource/5328239f-bcc6-483d-9c17-87166efc3a1a/download/dublinbikes_20201001_20210101.csv",
#     ),
    
# Scope of this research is post COVID lockdown and restrictions
                        (
        "data/2021_Q1.csv",
        "https://data.smartdublin.ie/dataset/33ec9fe2-4957-4e9a-ab55-c5e917c7a9ab/resource/7987ddc8-674a-4368-b344-560804771b98/download/dublinbikes_20210101_20210401.csv",
    ),
                        (
        "data/2021_Q2.csv",
        "https://data.smartdublin.ie/dataset/33ec9fe2-4957-4e9a-ab55-c5e917c7a9ab/resource/1c18f219-3885-403e-aa55-4d4c78ee0204/download/dublinbikes_20210401_20210701.csv",
    ),
                            (
        "data/2021_Q3.csv",
        "https://data.smartdublin.ie/dataset/33ec9fe2-4957-4e9a-ab55-c5e917c7a9ab/resource/91ccfcb7-0c5b-41e4-be1b-e5d35c609638/download/dublinbikes_20210701_20211001.csv",
    ),
                            (
        "data/2021_Q4.csv",
        "https://data.smartdublin.ie/dataset/33ec9fe2-4957-4e9a-ab55-c5e917c7a9ab/resource/5bc73751-4280-4423-b64d-18f4cc17986d/download/dublinbikes_20211001_20220101.csv",
    ),
]

# 
for item in dataframes:
    if os.path.exists(item[0]):
        continue
    print(f"Downloading {item[0]} from {item[1]}")
    urllib.request.urlretrieve(item[1], item[0])
    
df = "data/01_Loaded_Bikes.csv"
df = pd.concat([pd.read_csv(item[0]) for item in dataframes])   

if not os.path.exists("data/01_Loaded_Bikes.csv"):
    df.to_csv("data/01_Loaded_Bikes.csv", index=False)

In [4]:
df.tail(3) # Analysis starts in next  notebook

Unnamed: 0,STATION ID,TIME,LAST UPDATED,NAME,BIKE STANDS,AVAILABLE BIKE STANDS,AVAILABLE BIKES,STATUS,ADDRESS,LATITUDE,LONGITUDE
2776082,507,2022-01-01 23:45:02,2021-11-18 07:11:16,ORIEL STREET TEST TERMINAL,1,0,1,Open,"JCDecaux Ireland, 52 Oriel Street Lower, Dublin 1",53.35463,-6.242615
2776083,507,2022-01-01 23:50:02,2021-11-18 07:11:16,ORIEL STREET TEST TERMINAL,1,0,1,Open,"JCDecaux Ireland, 52 Oriel Street Lower, Dublin 1",53.35463,-6.242615
2776084,507,2022-01-01 23:55:02,2021-11-18 07:11:16,ORIEL STREET TEST TERMINAL,1,0,1,Open,"JCDecaux Ireland, 52 Oriel Street Lower, Dublin 1",53.35463,-6.242615


### 1.2 Load Weather Data 

Retrieve historical weather data for Phoinex Park Weather Station (175) on an hourly basis. Filtering to this choice on [Met Éireann's](https://www.met.ie/climate/available-data/historical-data) Irish Meteorological Society website is easy and gives the URL link used below. Note: hly175.csv tells us the data is recorded hourly from station 175.  The leading 15 rows give the data dictionary for the data and are skipped during the import process.

In [6]:
#Conditionally calls  retrieval functions if CSVs are not in place in the"data" directory
           
dfw = "data/hly175.csv"    
        
def retrieve_weather():
    weather_url = "http://cli.fusio.net/cli/climate_data/webdata/hly175.csv"
    urllib.request.urlretrieve(weather_url, dfw)

if not os.path.exists(dfw):
    retrieve_weather()
    
dfw = pd.read_csv("data/hly175.csv", skiprows=15)   
if not os.path.exists("data/01_Loaded_Weather.csv"):
    dfw.to_csv("data/01_Loaded_Weather.csv", index=False)
    

  dfw = pd.read_csv("data/hly175.csv", skiprows=15)


In [7]:
dfw.tail()

Unnamed: 0,date,ind,rain,ind.1,temp,ind.2,wetb,dewpt,vappr,rhum,msl
168403,31-oct-2022 20:00,0,0.0,0,12.4,0,11.4,10.4,12.6,87,999.3
168404,31-oct-2022 21:00,0,2.3,0,10.6,0,10.3,10.0,12.3,96,998.5
168405,31-oct-2022 22:00,0,2.0,0,10.3,0,10.1,10.0,12.2,97,998.1
168406,31-oct-2022 23:00,0,3.1,0,10.1,0,9.9,9.7,12.1,98,997.6
168407,01-nov-2022 00:00,0,0.7,0,9.9,0,9.8,9.6,12.0,98,997.0


### 1.3 Load Station GPS

Retrieves the Dublin Bikes Station GPS data from   [Smart Dublin](https://data.smartdublin.ie/dataset/dublinbikes-api) website.

In [None]:
# Is reduntant!
# dfg ="data/01_Loaded_GPS.csv"
        
# def retrieve_gps():
#     gps_url = "https://data.smartdublin.ie/dataset/33ec9fe2-4957-4e9a-ab55-c5e917c7a9ab/resource/2dec86ed-76ed-47a3-ae28-646db5c5b965/download/dublin.csv"
#     urllib.request.urlretrieve(gps_url, dfg) 
    
    
# if not os.path.exists(dfg):
#     retrieve_gps()                  #Conditionally retrievs gps CSVs if not in "data" directory
    
# dfg = pd.read_csv("data/01_Loaded_GPS.csv")  
# if not os.path.exists("data/01_Loaded_GPS.csv"):
#     dfg.to_csv("data/01_Loaded_GPS.csv", index=True) 
# dfg.sample(11)

In [8]:
 dft = pd.read_csv("data/01LoadedTravel2006.csv")  
dft.head()

Unnamed: 0,2006_Census,On_foot,Bicycle,Bus_minibus_coach,Train_DART_LUAS,Motorcycle_scooter,Car_Driver,Car_Passenger,Other,Not_stated,Total
0,Dublin City,90982,18028,63101,18138,2806,85128,24346,16381,10774,329684
1,Dún Laoghaire-Rathdown,17516,4995,15668,13629,1276,50180,19778,6813,1274,131129
2,Fingal,24561,3220,20332,16938,1318,69244,20520,9357,3342,168832
3,South Dublin,28469,4662,26246,3148,1888,71663,21452,10801,3230,171559


In [10]:
 dfT = pd.read_csv("data/01LoadedTravel2011.csv")  
dfT.head()

Unnamed: 0,2011_Census,On_foot,Bicycle,Bus_minibus_coach,Train_DART_LUAS,Motorcycle_scooter,Car_Driver,Car_Passenger,Van,Other,Not_stated,Total
0,Dublin City,89197,23265,55601,18175,1944,82619,25987,5231,5306,13352,320677
1,Dún Laoghaire-Rathdown,18450,6869,13908,15570,935,49525,19569,2225,3764,2654,133469
2,Fingal,27206,3925,20728,14753,1007,72058,25134,4811,3506,5325,178453
3,South Dublin,27765,4985,22941,3152,1239,68785,24468,5128,2412,4914,165789


### 1.4 Close dataframes and free up memory 

In [5]:

df = None
dfw = None
dft= None
dfT=None
gps_df= None