## 1. Load the Non-Hourly Features Data

## a) Consumer Price Index

[BFS.admin.ch "LIK, Totalindex auf allen Indexbasen"](https://www.bfs.admin.ch/bfs/de/home/statistiken/preise/landesindex-konsumentenpreise.assetdetail.29065706.html)


In [17]:
import pandas as pd

# We downloaded the Excel file from BFSC.admin.ch and converted the Excel file to a pandas dataframe

cpi = pd.read_excel("Konsumentenpreise_CH.xlsx")

# Save file as csv

cpi.to_csv("data/cpi.csv")
cpi.tail()

Unnamed: 0,Datum / Date,1914-06-01 00:00:00,1939-08-01 00:00:00,1966-09-01 00:00:00,1977-09-01 00:00:00,1982-12-01 00:00:00,1993-05-01 00:00:00,2000-05-01 00:00:00,2005-12-01 00:00:00,2010-12-01 00:00:00,2015-12-01 00:00:00,2020-12-01 00:00:00,% m-1,% m-12
1229,2023-06-01,1094.9,798.1,353.3,209.5,168.1,121.4,114.4,108.7,104.3,107.2,106.3,0.1,1.7
1230,2023-07-01,1093.4,797.0,352.8,209.3,167.8,121.2,114.2,108.6,104.2,107.1,106.2,-0.1,1.6
1231,2023-08-01,1095.7,798.7,353.5,209.7,168.2,121.5,114.5,108.8,104.4,107.3,106.4,0.2,1.6
1232,2023-09-01,1094.9,798.1,353.3,209.5,168.1,121.4,114.4,108.7,104.3,107.2,106.3,-0.1,1.7
1233,2023-10-01,1095.9,798.8,353.6,209.7,168.2,121.5,114.5,108.8,104.4,107.3,106.4,0.1,1.7


## b) Retail Trade Turnover

[BFS.admin.ch "Retail Trade Turnover Statistics"](https://www.bfs.admin.ch/bfs/en/home/statistics/industry-services/surveys/dhu.html)

In [22]:
from pyaxis import pyaxis

# We downloaded the .px file from BFSC.admin.ch and converted the .px file to a pandas dataframe

rtt = r"Retail_Trade_Turnover.px"

px = pyaxis.parse(uri=rtt, encoding='ISO-8859-2')

rtt = px["DATA"]

rtt["Monat"] = pd.to_datetime(rtt["Monat"], format="%YM%m")
rtt["DATA"] = pd.to_numeric(rtt["DATA"], errors='coerce')
rtt["DATA"] = rtt["DATA"].round(4)

# Group by Branche / Warengruppe and only keep "Total Detailhandel ohne Tankstellen"

grouped_data = rtt.groupby("Branche / Warengruppe")
group_datasets = {}

for group_name, group_data in grouped_data:
    group_datasets[group_name] = group_data.copy()

rtt = group_datasets['47 (ohne 473): Total Detailhandel ohne Tankstellen']

rtt.to_csv("data/rtt.csv")
rtt

Unnamed: 0,Bereinigung,Indizes / Veränderungen,Nominal / Real,Branche / Warengruppe,Monat,DATA
285,Unbereinigt,Indizes,Nominal,47 (ohne 473): Total Detailhandel ohne Tankste...,2000-01-01,80.0496
286,Unbereinigt,Indizes,Nominal,47 (ohne 473): Total Detailhandel ohne Tankste...,2000-02-01,80.0300
287,Unbereinigt,Indizes,Nominal,47 (ohne 473): Total Detailhandel ohne Tankste...,2000-03-01,82.1719
288,Unbereinigt,Indizes,Nominal,47 (ohne 473): Total Detailhandel ohne Tankste...,2000-04-01,84.5639
289,Unbereinigt,Indizes,Nominal,47 (ohne 473): Total Detailhandel ohne Tankste...,2000-05-01,92.0505
...,...,...,...,...,...,...
64690,Trend,Veränderungen,Real,47 (ohne 473): Total Detailhandel ohne Tankste...,2023-05-01,-0.0774
64691,Trend,Veränderungen,Real,47 (ohne 473): Total Detailhandel ohne Tankste...,2023-06-01,-0.0709
64692,Trend,Veränderungen,Real,47 (ohne 473): Total Detailhandel ohne Tankste...,2023-07-01,-0.0695
64693,Trend,Veränderungen,Real,47 (ohne 473): Total Detailhandel ohne Tankste...,2023-08-01,-0.0719


## c) Population

[Open Data Zurich "Bevölkerung nach Monat, Stadtquartier, Geschlecht, Altersgruppe und Herkunft, seit 1998"](https://data.stadt-zuerich.ch/dataset/bev_monat_bestand_quartier_geschl_ag_herkunft_od3250)

In [31]:
# We could directly download the population data as csv from Open Data Zurich

population = pd.read_csv("bev325od3250.csv")
population.to_csv("data/population.csv")

population.tail()

Unnamed: 0,StichtagDatJahr,StichtagDatMM,StichtagDatMonat,StichtagDat,SexCd,SexLang,AlterV20ueber80Sort_noDM,AlterV20ueber80Cd_noDM,AlterV20ueber80Kurz_noDM,HerkunftCd,HerkunftLang,KreisCd,KreisLang,QuarCd,QuarLang,DatenstandCd,DatenstandLang,AnzBestWir
210071,2023,9,September,2023-09-30,2,weiblich,5,5,80 u. älter,2,Ausländer*in,11,Kreis 11,115,Oerlikon,P,Provisorisch,75
210072,2023,9,September,2023-09-30,2,weiblich,5,5,80 u. älter,2,Ausländer*in,11,Kreis 11,119,Seebach,P,Provisorisch,97
210073,2023,9,September,2023-09-30,2,weiblich,5,5,80 u. älter,2,Ausländer*in,12,Kreis 12,121,Saatlen,P,Provisorisch,21
210074,2023,9,September,2023-09-30,2,weiblich,5,5,80 u. älter,2,Ausländer*in,12,Kreis 12,122,Schwamendingen-Mitte,P,Provisorisch,47
210075,2023,9,September,2023-09-30,2,weiblich,5,5,80 u. älter,2,Ausländer*in,12,Kreis 12,123,Hirzenbach,P,Provisorisch,60


## d) Hotel Guests

[Open Data Zurich "Tourismus Stadt Zürich"](https://www.stadt-zuerich.ch/prd/de/index/statistik/themen/wirtschaft/tourismus/tourismus-stadt-zuerich.html)

In [36]:
# We downloaded the Excel file from BFSC.admin.ch and converted the Excel file to a pandas dataframe

hotel_guests = pd.read_excel("hotel_guests.xlsx")

# Save file as csv
hotel_guests.to_csv("data/hotel_guests.csv")

hotel_guests.tail()

Unnamed: 0,Jahr,Monat,Total,Inlandgäste,Auslandgäste,Anteil aus dem Ausland\n(in %)
365,1992,Mai,169410,26985,142425,84.1
366,1992,April,148734,25847,122887,82.6
367,1992,März,154771,30124,124647,80.5
368,1992,Februar,127861,24654,103207,80.7
369,1992,Januar,128417,27343,101074,78.7


## 2. Load the Hourly Features Data

## a) Sunshine

[Open Data Zurich "Stündlich aktualisierte Meteodaten, seit 1992"](https://data.stadt-zuerich.ch/dataset/ugz_meteodaten_stundenmittelwerte)


In [38]:
# We could directly download the weather data as csv from Open Data Zurich

url_cs_2021 = "https://data.stadt-zuerich.ch/dataset/ugz_meteodaten_stundenmittelwerte/download/ugz_ogd_meteo_h1_2021.csv"    
url_cs_2022 = "https://data.stadt-zuerich.ch/dataset/ugz_meteodaten_stundenmittelwerte/download/ugz_ogd_meteo_h1_2022.csv"
url_cs_2023 = "https://data.stadt-zuerich.ch/dataset/ugz_meteodaten_stundenmittelwerte/download/ugz_ogd_meteo_h1_2023.csv"

weather_2021 = pd.read_csv(
    url_cs_2021,
    sep=',',
    encoding='utf-8',
)
weather_2022 = pd.read_csv(
    url_cs_2022,
    sep=',',
    encoding='utf-8',
)
weather_2023 = pd.read_csv(
    url_cs_2023,
    sep=',',
    encoding='utf-8',
)

In [39]:
# Convert the column "Datum" to datetime for each dataframe and only keep measurments at Rosengartenstrasse

weather_2021.loc[weather_2021["Standort"] == "Zch_Rosengartenstrasse"]
weather_2021["Datum"] = pd.to_datetime(weather_2021["Datum"], format="%Y-%m-%dT%H:%M+0100")

weather_2022.loc[weather_2022["Standort"] == "Zch_Rosengartenstrasse"]
weather_2022["Datum"] = pd.to_datetime(weather_2022["Datum"], format="%Y-%m-%dT%H:%M+0100")

weather_2023.loc[weather_2023["Standort"] == "Zch_Rosengartenstrasse"]
weather_2023["Datum"] = pd.to_datetime(weather_2023["Datum"], format="%Y-%m-%dT%H:%M+0100")

# Concatenate dataframes of all three years and keep sunshine metrics

concatenated_weather = pd.concat([weather_2022, weather_2023, weather_2021], ignore_index=True)
concatenated_weather= concatenated_weather.loc[concatenated_weather["Einheit"] == "W/m2"]
concatenated_weather=concatenated_weather.sort_values(by = "Datum")

concatenated_weather.to_csv("data/weather.csv")
concatenated_weather.tail()

Unnamed: 0,Datum,Standort,Parameter,Intervall,Einheit,Wert,Status
373014,2023-12-08 11:00:00,Zch_Stampfenbachstrasse,StrGlo,h1,W/m2,84.66,provisorisch
373036,2023-12-08 12:00:00,Zch_Stampfenbachstrasse,StrGlo,h1,W/m2,64.78,provisorisch
373058,2023-12-08 13:00:00,Zch_Stampfenbachstrasse,StrGlo,h1,W/m2,54.95,provisorisch
373080,2023-12-08 14:00:00,Zch_Stampfenbachstrasse,StrGlo,h1,W/m2,30.86,provisorisch
373102,2023-12-08 15:00:00,Zch_Stampfenbachstrasse,StrGlo,h1,W/m2,17.28,provisorisch


## b) Traffic

[Open Data Zurich "Daten der Verkehrszählung zum motorisierten Individualverkehr (Stundenwerte), seit 2012"](https://data.stadt-zuerich.ch/dataset/sid_dav_verkehrszaehlung_miv_od2031)

In [49]:
# We downloaded the Excel file from Open Data Zurich and converted the Excel file to a pandas dataframe


traffic = pd.read_csv("traffic2021-2023.csv")

traffic["MessungDatZeit"] = pd.to_datetime(traffic["MessungDatZeit"], format="%Y-%m-%d %H:%M:%S")

traffic.to_csv("data/traffic.csv")
traffic.head()

Unnamed: 0,MessungDatZeit,AnzFahrzeuge
0,2021-01-01 00:00:00,24624.0
1,2021-01-01 01:00:00,36322.0
2,2021-01-01 02:00:00,27735.0
3,2021-01-01 03:00:00,18375.0
4,2021-01-01 04:00:00,13232.0
