# Dublin Bike CA 1

## 1 Data Loading Notebook


Ronan Downes  | November 2022 
***


This notebook downloads the publicly available datasets of the quarterly  Dublin Bike occupancy csv files and a station GPS file  directly from the  [Smart Dublin](https://data.smartdublin.ie/dataset/dublinbikes-api) website. The hourly Phoinex Park (weather station 175) weather data provided by the Irish Meteorological Society [Met Éireann](https://www.met.ie/climate/available-data/historical-data) website is also downloaded. All datasets are downloaded to a "data" folder which the code makes directly inside the folder containing this jupyter notebook.
This avoids errors in filenames and avoids any need for pathname manipulations based on operating systems.
***

In [1]:
# from IPython.display import Image
# Image(filename =r'bike_sharing.gif', width = 600, height = 300)

In [3]:
#Setup required  libraries and data directory 
import pandas as pd
import urllib
import os
## Downloading data directly  from URL avoids cross-platform and end-user errors 
if not os.path.exists("data"):
    os.makedirs("data")

### 1.1 Load Bike Data

Retrieve Dublin Bikes data  from  [Smart Dublin](https://data.smartdublin.ie/dataset/dublinbikes-api). This study is restricted to the post COVID-19 lockdown era but a fork of the project can easily be adapted to any range of available data by adding and removing comment lines in the dataframe list of tuples.

In [None]:

### Defines a list of tuples with ordered pairs of filenames and retrieval urls for urllib requests later.
### The os library checks if the quarterly csv file is already in place and if not the urllib library retrieves it
### Include and Omit Date range according to research question.
    
dataframes = [
#      (
#         "data/2018_Q3.csv",
#         " https://data.smartdublin.ie/dataset/33ec9fe2-4957-4e9a-ab55-c5e917c7a9ab/resource/9496fac5-e4d7-4ae9-a49a-217c7c4e83d9/download/dublinbikes_20180701_20181001.csv",
#     ),
#      (
#         "data/2018_Q4.csv",
#         "https://data.smartdublin.ie/dataset/33ec9fe2-4957-4e9a-ab55-c5e917c7a9ab/resource/67ea095f-67ad-47f5-b8f7-044743043848/download/dublinbikes_20181001_20190101.csv",
#     ),
#     (
#         "data/2019_Q1.csv",
#         "https://data.smartdublin.ie/dataset/33ec9fe2-4957-4e9a-ab55-c5e917c7a9ab/resource/538165d7-535e-4e1d-909a-1c1bfae901c5/download/dublinbikes_20190101_20190401.csv",
#     ),
#     (
#         "data/2019_Q2.csv",
#         "https://data.smartdublin.ie/dataset/33ec9fe2-4957-4e9a-ab55-c5e917c7a9ab/resource/76fdda3d-d8be-441b-92dd-0ee36d9c5316/download/dublinbikes_20190401_20190701.csv",
#     ),
#     (
#         "data/2019_Q3.csv",
#         "https://data.smartdublin.ie/dataset/33ec9fe2-4957-4e9a-ab55-c5e917c7a9ab/resource/305d39ac-b6a0-4216-a535-0ae2ddf59819/download/dublinbikes_20190701_20191001.csv",
#     ),
#     (
#         "data/2019_Q4.csv",
#         "https://data.smartdublin.ie/dataset/33ec9fe2-4957-4e9a-ab55-c5e917c7a9ab/resource/5d23332e-4f49-4c41-b6a0-bffb77b33d64/download/dublinbikes_20191001_20200101.csv",
#     ),
#             (
#         "data/2020_Q1.csv",
#         "https://data.smartdublin.ie/dataset/33ec9fe2-4957-4e9a-ab55-c5e917c7a9ab/resource/aab12e7d-547f-463a-86b1-e22002884587/download/dublinbikes_20200101_20200401.csv",
#     ),
#                 (
#         "data/2020_Q2.csv",
#         "https://data.smartdublin.ie/dataset/33ec9fe2-4957-4e9a-ab55-c5e917c7a9ab/resource/8ddaeac6-4caf-4289-9835-cf588d0b69e5/download/dublinbikes_20200401_20200701.csv",
#     ),
#                     (
#         "data/2020_Q3.csv",
#         "https://data.smartdublin.ie/dataset/33ec9fe2-4957-4e9a-ab55-c5e917c7a9ab/resource/99a35442-6878-4c2d-8dff-ec43e91d21d7/download/dublinbikes_20200701_20201001.csv",
#     ),
#                         (
#         "data/2020_Q4.csv",
#         "https://data.smartdublin.ie/dataset/33ec9fe2-4957-4e9a-ab55-c5e917c7a9ab/resource/5328239f-bcc6-483d-9c17-87166efc3a1a/download/dublinbikes_20201001_20210101.csv",
#     ),
    
# Scope of this research is post COVID lockdown and restrictions
                        (
        "data/2021_Q1.csv",
        "https://data.smartdublin.ie/dataset/33ec9fe2-4957-4e9a-ab55-c5e917c7a9ab/resource/7987ddc8-674a-4368-b344-560804771b98/download/dublinbikes_20210101_20210401.csv",
    ),
                        (
        "data/2021_Q2.csv",
        "https://data.smartdublin.ie/dataset/33ec9fe2-4957-4e9a-ab55-c5e917c7a9ab/resource/1c18f219-3885-403e-aa55-4d4c78ee0204/download/dublinbikes_20210401_20210701.csv",
    ),
                            (
        "data/2021_Q3.csv",
        "https://data.smartdublin.ie/dataset/33ec9fe2-4957-4e9a-ab55-c5e917c7a9ab/resource/91ccfcb7-0c5b-41e4-be1b-e5d35c609638/download/dublinbikes_20210701_20211001.csv",
    ),
                            (
        "data/2021_Q4.csv",
        "https://data.smartdublin.ie/dataset/33ec9fe2-4957-4e9a-ab55-c5e917c7a9ab/resource/5bc73751-4280-4423-b64d-18f4cc17986d/download/dublinbikes_20211001_20220101.csv",
    ),
]

# 
for item in dataframes:
    if os.path.exists(item[0]):
        continue
    print(f"Downloading {item[0]} from {item[1]}")
    urllib.request.urlretrieve(item[1], item[0])
    
df = "data/01_Loaded_Bikes.csv"
df = pd.concat([pd.read_csv(item[0]) for item in dataframes])   

if not os.path.exists("data/01_Loaded_Bikes.csv"):
    df.to_csv("data/01_Loaded_Bikes.csv", index=False)


In [None]:
df.tail(3) # Just to check data decoded correctly. Analysis starts in next  notebook

### 1.2 Load Weather Data 

Retrieve historical weather data for Phoinex Park Weather Station (175) on an hourly basis. Filtering to this choice on [Met Éireann's](https://www.met.ie/climate/available-data/historical-data) Irish Meteorological Society website is easy and gives the URL link used below. Note: hly175.csv tells us the data is recorded hourly from station 175.  The leading 15 rows give the data dictionary for the data and are skipped during the import process.

In [6]:
#Conditionally calls  retrieval functions if CSVs are not in place in the"data" directory
           
dfw = "data/hly175.csv"    
        
def retrieve_weather():
    weather_url = "http://cli.fusio.net/cli/climate_data/webdata/hly175.csv"
    urllib.request.urlretrieve(weather_url, dfw)

if not os.path.exists(dfw):
    retrieve_weather()
    
dfw = pd.read_csv("data/hly175.csv", skiprows=15)   
if not os.path.exists("data/01_Loaded_Weather.csv"):
     dfw.to_csv("data/01_Loaded_Weather.csv", index=False)
    

  dfw = pd.read_csv("data/hly175.csv", skiprows=15)


In [None]:
dfw.head() # Just to check data decoded correctly. Analysis starts in next  notebook

### 1.2 Load  Data Modes of Travel in DCC

In [None]:

### When Pandas reads a CSV, by default it assumes that the encoding is UTF-8 but according to Notepad++ here it is ANSI. 
mode_2006_url = "https://opendata.dublincity.ie/TrafficOpenData/Transport/ModeofTravel2006.csv"
dft = pd.read_csv(mode_2006_url, encoding="ansi")
dft.to_csv("data/01_Loaded_Travel_2006.csv", index=False)
dft # Checks for correct decoding. Only 4 rows so no need for .head() .tail() or .sample() Analysis starts in next  notebook

In [None]:
### When Pandas reads a CSV, by default it assumes that the encoding is UTF-8 b
#According to Notepad++ this file is encoded as ANSI. 
mode_2011_url = "https://opendata.dublincity.ie/TrafficOpenData/Transport/ModeofTravel2011.csv"
dfT = pd.read_csv(mode_2011_url, encoding="ansi")
dfT # Just to check data decoded correctly. Analysis starts in next  notebook
dfT.to_csv("data/01_Loaded_Travel_2011.csv", index=False)
dfT # Checks for correct decoding. Only 4 rows so no need for .head() .tail() or .sample() Analysis starts in next  notebook

In [None]:
### 1.4 Close dataframes to free up memory. 
df = None
dfw =None
dft= None
dfT= None

## Open 02_Prep_Notebook next