# Testing
### Passantenfrequenzen an der Bahnhofstrasse - Stundenwerte

Hier nur der Ausschnitt, wo ich eine animierte Grafik als GIF erstelle.

https://data.stadt-zuerich.ch/dataset/hystreet_fussgaengerfrequenzen

Datum: 12.07.2022


### Importiere die notwendigen Packages

In [1]:
#%pip install geopandas altair fiona requests folium mplleaflet contextily seaborn datetime plotly leafmap


In [2]:
import pandas as pd
import pivottablejs
from pivottablejs import pivot_ui
import numpy as np
import altair as alt
import matplotlib.pyplot as plt

import datetime
import geopandas as gpd
import folium 
import plotly.express as px
import seaborn as sns
import leafmap

import requests
import io

import plotly.express as px


In [3]:
SSL_VERIFY = False
# evtl. SSL_VERIFY auf False setzen wenn die Verbindung zu https://www.gemeinderat-zuerich.ch nicht klappt (z.B. wegen Proxy)
# Um die SSL Verifikation auszustellen, bitte die nächste Zeile einkommentieren ("#" entfernen)
# SSL_VERIFY = False

In [4]:
if not SSL_VERIFY:
    import urllib3
    urllib3.disable_warnings()

Definiere Settings. Hier das Zahlenformat von Float-Werten (z.B. *'{:,.2f}'.format* mit Komma als Tausenderzeichen), 

In [5]:
#pd.options.display.float_format = lambda x : '{:,.1f}'.format(x) if (np.isnan(x) | np.isinf(x)) else '{:,.0f}'.format(x) if int(x) == x else '{:,.1f}'.format(x)
pd.options.display.float_format = '{:.0f}'.format
pd.set_option('display.width', 100)
pd.set_option('display.max_columns', 15)

### Zeitvariabeln
Bestimme den aktuellst geladenen Monat. Hier ist es der Stand vor 2 Monaten. 
Bestimme noch weitere evt. sinnvolle Zeitvariabeln.

Zum Unterschied zwischen import datetime und from datedtime import datetime, siehe https://stackoverflow.com/questions/15707532/import-datetime-v-s-from-datetime-import-datetime

Zuerst die Zeitvariabeln als Strings

In [6]:
#today_date = datetime.date.today()
#date_time = datetime.datetime.strptime(date_time_string, '%Y-%m-%d %H:%M')
now = datetime.date.today()
date_today = now.strftime("%Y-%m-%d")
year_today = now.strftime("%Y")
month_today = now.strftime("%m")
day_today = now.strftime("%d")



Und hier noch die Zeitvariabeln als Integers:
- `aktuellesJahr`
- `aktuellerMonat`: Der gerade jetzt aktuelle Monat
- `selectedMonat`: Der aktuellste Monat in den Daten. In der Regel zwei Monate her.

In [7]:
#now = datetime.now() 
int_times = now.timetuple()

aktuellesJahr = int_times[0]
aktuellerMonat = int_times[1]
selectedMonat = int_times[1]-2

print(aktuellesJahr, 
      aktuellerMonat,
    'datenstand: ', selectedMonat,
     int_times)


2022 12 datenstand:  10 time.struct_time(tm_year=2022, tm_mon=12, tm_mday=12, tm_hour=0, tm_min=0, tm_sec=0, tm_wday=0, tm_yday=346, tm_isdst=-1)


### Setze einige Pfadvariabeln

- Der Packagename ist eigentlich der **Verzeichnisname** unter dem die Daten und Metadaten auf der Dropzone abgelegt werden.
- Definiert wird er bei SASA-Prozessen auf dem **Produkte-Sharepoint ([Link](https://kollaboration.intranet.stzh.ch/orga/ssz-produkte/Lists/SASA_Outputs/PersonalViews.aspx?PageView=Personal&ShowWebPart={6087A3E7-8AC8-40BA-8278-DECFACE124FF}))**.
- Der Packagename wird auf CKAN teil der URL, daher ist die exakte Schreibweise wichtig.

Beachte: im Packagename müssen alle Buchstaben **klein** geschrieben werden. Dies weil CKAN aus grossen kleine Buchstaben macht.

**BITTE HIER ANPASSEN**

In [8]:
package_name = "hystreet_fussgaengerfrequenzen"

In [9]:
dataset_name = "hystreet_fussgaengerfrequenzen_seit2021.csv"

**Statische Pfade in DWH-Dropzones**

In [10]:
dropzone_path_integ = r"\\szh\ssz\applikationen\OGD_Dropzone\INT_DWH"

In [11]:
dropzone_path_prod = r"\\szh\ssz\applikationen\OGD_Dropzone\DWH"

**Statische Pfade CKAN-URLs**

In [12]:
ckan_integ_url ="https://data.integ.stadt-zuerich.ch/dataset/"

In [13]:
ckan_prod_url ="https://data.stadt-zuerich.ch/dataset/"

### Checke die Metadaten auf der CKAN INTEG- oder PROD-Webseite

Offenbar lassen sich aktuell im Markdownteil keine Variabeln ausführen, daher gehen wir wie unten gezeigt vor. Siehe dazu: https://data-dive.com/jupyterlab-markdown-cells-include-variables
Instead of setting the cell to Markdown, create Markdown from withnin a code cell! We can just use python variable replacement syntax to make the text dynamic

In [14]:
from IPython.display import Markdown as md

In [15]:
md(" **1. Dataset auf INTEG-Datakatalog:** Link {} ".format(ckan_integ_url+package_name))

 **1. Dataset auf INTEG-Datakatalog:** Link https://data.integ.stadt-zuerich.ch/dataset/hystreet_fussgaengerfrequenzen 

In [16]:
md(" **2. Dataset auf PROD-Datakatalog:** Link {} ".format(ckan_prod_url+package_name))

 **2. Dataset auf PROD-Datakatalog:** Link https://data.stadt-zuerich.ch/dataset/hystreet_fussgaengerfrequenzen 

### Importiere einen Datensatz 

Definiere zuerst folgende Werte:
1) Kommt der Datensatz von PROD oder INTEG?
2) Beziehst Du den Datensatz direkt ab der DROPZONE oder aus dem INTERNET?

In [17]:
#Die Datasets sind nur zum Testen auf INT-DWH-Dropzone. Wenn der Test vorbei ist, sind sie auf PROD. 
# Über den Status kann man einfach switchen

status = "prod"; #prod vs something else
data_source = "web"; #dropzone vs something else
print(status+" - "+ data_source)

prod - web


In [18]:
# Filepath
if status == "prod":
    if data_source == "dropzone":
            fp = dropzone_path_prod+"\\"+ package_name +"\\"+dataset_name
            print("fp lautet:"+fp)
    else:
        #fp = r"https://data.stadt-zuerich.ch/dataset/bau_neubau_whg_bausm_rinh_geb_projstatus_quartier_seit2009_od5011/download/BAU501OD5011.csv"
        fp = ckan_prod_url+package_name+'/download/'+dataset_name
        print("fp lautet:"+fp)
else:
    if data_source == "dropzone":
        fp = dropzone_path_integ+"\\"+ package_name +"\\"+dataset_name
        print("fp lautet:"+fp)
    else:
        #fp = r"https://data.stadt-zuerich.ch/dataset/bau_neubau_whg_bausm_rinh_geb_projstatus_quartier_seit2009_od5011/download/BAU501OD5011.csv"
        fp = ckan_integ_url+package_name+'/download/'+dataset_name
        print("fp lautet:"+fp)


fp lautet:https://data.stadt-zuerich.ch/dataset/hystreet_fussgaengerfrequenzen/download/hystreet_fussgaengerfrequenzen_seit2021.csv


Beachte, wie das SAS Datum (ohne Format) in ein UNIX Datum umgerechnet und als Datumsformat dargestellt wird! Siehe dazu `https://stackoverflow.com/questions/26923564/convert-sas-numeric-to-python-datetime`

In [19]:
# Read the data
if data_source == "dropzone":
    data2betested = pd.read_csv(
        fp
        , sep=','
        ,parse_dates=['timestamp']
        ,low_memory=False
    )
    print("dropzone")
else:
    r = requests.get(fp, verify=False)  
    r.encoding = 'utf-8'
    data2betested = pd.read_csv(
        io.StringIO(r.text)
        ,parse_dates=['timestamp']
        # KONVERTIERE DAS SAS DATUM IN EIN UNIXDATUM UND FORMATIERE ES
        #, date_parser=lambda s: epoch + datetime.timedelta(days=int(s))
        ,low_memory=False)
    print("web")

data2betested.dtypes

web


timestamp                          datetime64[ns, UTC]
location_id                                      int64
location_name                                   object
ltr_label                                       object
rtl_label                                       object
weather_condition                               object
temperature                                    float64
pedestrians_count                                int64
unverified                                        bool
ltr_pedestrians_count                            int64
rtl_pedestrians_count                            int64
adult_pedestrians_count                          int64
child_pedestrians_count                          int64
adult_ltr_pedestrians_count                      int64
adult_rtl_pedestrians_count                      int64
child_ltr_pedestrians_count                      int64
child_rtl_pedestrians_count                      int64
zone_1_pedestrians_count                       float64
zone_1_ltr

In [20]:
date_today = now.strftime("%Y-%m-%d")
year_today = now.strftime("%Y")
month_today = now.strftime("%m")
day_today = now.strftime("%d")

Berechne weitere Attribute falls notwendig

In [21]:
data2betested = (
    data2betested
    .copy()
    .assign(
        #Aktualisierungs_Datum_str= lambda x: x.Aktualisierungs_Datum.astype(str),
        timestamp_str = lambda x: x.timestamp.astype(str),
        day_str = lambda x: x.timestamp.dt.strftime("%Y-%m-%d %H:%M"),       
        hour_str = lambda x: x.timestamp.dt.strftime("%H:%M"),  
        weekday = lambda x: x.timestamp.dt.dayofweek,
        weekday_name = lambda x: x.timestamp.dt.day_name(),
        year = lambda x: x.timestamp.dt.year,
        month = lambda x: x.timestamp.dt.month,
        day = lambda x: x.timestamp.dt.day
    )
    .sort_values('timestamp', ascending=False)
    )
data2betested.columns
#data2betested.dtypes
data2betested

Unnamed: 0,timestamp,location_id,location_name,ltr_label,rtl_label,weather_condition,temperature,...,day_str,hour_str,weekday,weekday_name,year,month,day
31643,2022-12-12 09:00:00+00:00,330,Bahnhofstrasse (Süd),Bürkliplatz,Hauptbahnhof,cloudy,-6,...,2022-12-12 09:00,09:00,0,Monday,2022,12,12
31642,2022-12-12 09:00:00+00:00,331,Bahnhofstrasse (Nord),Bürkliplatz,Hauptbahnhof,cloudy,-6,...,2022-12-12 09:00,09:00,0,Monday,2022,12,12
31641,2022-12-12 09:00:00+00:00,329,Bahnhofstrasse (Mitte),Hauptbahnhof,Bürkliplatz,cloudy,-6,...,2022-12-12 09:00,09:00,0,Monday,2022,12,12
31640,2022-12-12 08:00:00+00:00,330,Bahnhofstrasse (Süd),Bürkliplatz,Hauptbahnhof,cloudy,-7,...,2022-12-12 08:00,08:00,0,Monday,2022,12,12
31639,2022-12-12 08:00:00+00:00,331,Bahnhofstrasse (Nord),Bürkliplatz,Hauptbahnhof,cloudy,-7,...,2022-12-12 08:00,08:00,0,Monday,2022,12,12
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3,2021-09-28 23:00:00+00:00,329,Bahnhofstrasse (Mitte),Hauptbahnhof,Bürkliplatz,cloudy,13,...,2021-09-28 23:00,23:00,1,Tuesday,2021,9,28
5,2021-09-28 23:00:00+00:00,330,Bahnhofstrasse (Süd),Bürkliplatz,Hauptbahnhof,cloudy,13,...,2021-09-28 23:00,23:00,1,Tuesday,2021,9,28
2,2021-09-28 22:00:00+00:00,330,Bahnhofstrasse (Süd),Bürkliplatz,Hauptbahnhof,cloudy,14,...,2021-09-28 22:00,22:00,1,Tuesday,2021,9,28
1,2021-09-28 22:00:00+00:00,331,Bahnhofstrasse (Nord),Bürkliplatz,Hauptbahnhof,cloudy,14,...,2021-09-28 22:00,22:00,1,Tuesday,2021,9,28


### Verwende das Datum als Index

While we did already parse the `datetime` column into the respective datetime type, it currently is just a regular column. 
**To enable quick and convenient queries and aggregations, we need to turn it into the index of the DataFrame**

In [22]:
data2betested = data2betested.set_index("timestamp")

In [23]:
data2betested.info()
data2betested.index.year.unique()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 31644 entries, 2022-12-12 09:00:00+00:00 to 2021-09-28 22:00:00+00:00
Data columns (total 44 columns):
 #   Column                           Non-Null Count  Dtype  
---  ------                           --------------  -----  
 0   location_id                      31644 non-null  int64  
 1   location_name                    31644 non-null  object 
 2   ltr_label                        31644 non-null  object 
 3   rtl_label                        31644 non-null  object 
 4   weather_condition                31644 non-null  object 
 5   temperature                      31644 non-null  float64
 6   pedestrians_count                31644 non-null  int64  
 7   unverified                       31644 non-null  bool   
 8   ltr_pedestrians_count            31644 non-null  int64  
 9   rtl_pedestrians_count            31644 non-null  int64  
 10  adult_pedestrians_count          31644 non-null  int64  
 11  child_pedestrians_count          

Int64Index([2022, 2021], dtype='int64', name='timestamp')

### Test mit animierten Grafiken

Inspiration von:
 - https://jackmckew.dev/creating-animated-plots-with-pandas_alive.html resp. Github-Page: https://github.com/JackMcKew/pandas_alive
 - Details zu Paramtern:  https://jackmckew.github.io/pandas_alive/_modules/pandas_alive/plotting.html
 
Default figsize is (6.5, 3.5). It is in inches. Calculator for in-->px under https://www.ninjaunits.com/converters/pixels/inches-pixels/#:~:text=To%20convert%20inches%20to%20pixels%2C%20you%20have%20to%20multiply%20inches,pixels%20on%20a%20computer%20screen.

In [24]:
data2betested.columns

Index(['location_id', 'location_name', 'ltr_label', 'rtl_label', 'weather_condition',
       'temperature', 'pedestrians_count', 'unverified', 'ltr_pedestrians_count',
       'rtl_pedestrians_count', 'adult_pedestrians_count', 'child_pedestrians_count',
       'adult_ltr_pedestrians_count', 'adult_rtl_pedestrians_count', 'child_ltr_pedestrians_count',
       'child_rtl_pedestrians_count', 'zone_1_pedestrians_count', 'zone_1_ltr_pedestrians_count',
       'zone_1_rtl_pedestrians_count', 'zone_1_adult_pedestrians_count',
       'zone_1_child_pedestrians_count', 'zone_2_pedestrians_count',
       'zone_2_ltr_pedestrians_count', 'zone_2_rtl_pedestrians_count',
       'zone_2_adult_pedestrians_count', 'zone_2_child_pedestrians_count',
       'zone_3_pedestrians_count', 'zone_3_ltr_pedestrians_count', 'zone_3_rtl_pedestrians_count',
       'zone_3_adult_pedestrians_count', 'zone_3_child_pedestrians_count',
       'zone_99_pedestrians_count', 'zone_99_ltr_pedestrians_count',
       'zone_99_r

In [25]:
myAnimation = data2betested.loc["2022-11-26 12":date_today]
#myAnimation= myAnimation[['timestamp_str', 'location_name', 'pedestrians_count', 'ltr_pedestrians_count', 'rtl_pedestrians_count']].query('location_name== "Bahnhofstrasse (Mitte)"').rename(columns=rename_cols)
myAnimation= myAnimation[['timestamp_str', 'location_name', 'pedestrians_count', 'ltr_pedestrians_count', 'rtl_pedestrians_count']].query('location_name!= "Bahnhofstrasse (Nord)"')
myAnimation.head(3)
myAnimation.columns

Index(['timestamp_str', 'location_name', 'pedestrians_count', 'ltr_pedestrians_count',
       'rtl_pedestrians_count'],
      dtype='object')

In [26]:
#myAnimation_wide=pd.pivot_table(myAnimation, values=['pedestrians_count','ltr_pedestrians_count', 'rtl_pedestrians_count'], index='timestamp', columns='location_name')
myAnimation_wide=pd.pivot_table(myAnimation, values='pedestrians_count', index='timestamp', columns='location_name')
myAnimation_wide.head(2)

location_name,Bahnhofstrasse (Mitte),Bahnhofstrasse (Süd)
timestamp,Unnamed: 1_level_1,Unnamed: 2_level_1
2022-11-26 12:00:00+00:00,6331,3016
2022-11-26 13:00:00+00:00,7220,3459


In [27]:
myLabelEvents = {'1. Sonntagsverkauf':datetime.datetime.strptime("27/11/2022", "%d/%m/%Y"),
                 '2. Sonntagsverkauf':datetime.datetime.strptime("04/12/2022", "%d/%m/%Y")
                }

print(myLabelEvents)

{'1. Sonntagsverkauf': datetime.datetime(2022, 11, 27, 0, 0), '2. Sonntagsverkauf': datetime.datetime(2022, 12, 4, 0, 0)}


In [28]:
import pandas_alive

myTitle="Passantenfrequenzen seit dem 26. November 2022"
myZeitformat = '%a: %H:00 Uhr'


myAnimation_wide.plot_animated(filename='./output_hystreet/hystreet_xmas_shopping_bhfstr_mitte_sued_notFixedMax.gif'
                               , kind='line'
                               , steps_per_period = 5
                               , period_length = 100
                               , period_fmt = myZeitformat
                               , title=myTitle
                               , cmap='dark24'
                               , tick_label_size=8
                               , period_label={'x':0.75,'y':0.6}
                               , period_summary_func=None
                               , fixed_max=False
                               , figsize=(8.53, 4.8)                                
                               , dpi=150
                               , writer=None
                               , enable_progress_bar=True
                               , line_width=2
                               , fill_under_line_color=None
                               , add_legend=True                                   
                               #, label_events={'1. Sonntagsverkauf':datetime.datetime.strptime("27/11/2022", "%d/%m/%Y"),'2. Sonntagsverkauf':datetime.datetime.strptime("04/12/2022", "%d/%m/%Y")}
                               #, kwargs={}
                               #, fig=<Figure size 936x504 with 1 Axes>                         
                              )
print('terminée')

Generating LineChart, plotting ['Bahnhofstrasse (Mitte)', 'Bahnhofstrasse (Süd)']


  0%|          | 0/1906 [00:00<?, ?it/s]

terminée


In [29]:
#mit fixed Maximums
myZeitformat = "%H Uhr: %a %d.%m.%y"

myAnimation_wide.plot_animated(filename='./output_hystreet/hystreet_xmas_shopping_bhfstr_mitte_sued_FixedMax_gross.gif'
                               , kind='line'
                               , steps_per_period = 5
                               , period_length = 100
                               , period_fmt = myZeitformat
                               , title=myTitle
                               , cmap='dark24'
                               , tick_label_size=9
                               , period_label={'x':0.35,'y':0.7}
                               , period_summary_func=None
                               , fixed_max=True
                               , dpi=150
                               , figsize=(8.53, 4.8)
                               , writer=None
                               , enable_progress_bar=True
                               , line_width=2
                               , fill_under_line_color=None
                               , add_legend=True
                               #, label_events={'1. Sonntagsverkauf':datetime.datetime.strptime("27/11/2022", "%d/%m/%Y"),'2. Sonntagsverkauf':datetime.datetime.strptime("04/12/2022", "%d/%m/%Y")}
                               #, kwargs={}
                               #, fig=<Figure size 936x504 with 1 Axes>                              
                              )
print('terminée')
#default figsize is (6.5, 3.5). It is in inches. Calculator for in-->px under https://www.ninjaunits.com/converters/pixels/inches-pixels/#:~:text=To%20convert%20inches%20to%20pixels%2C%20you%20have%20to%20multiply%20inches,pixels%20on%20a%20computer%20screen.

Generating LineChart, plotting ['Bahnhofstrasse (Mitte)', 'Bahnhofstrasse (Süd)']


  0%|          | 0/1906 [00:00<?, ?it/s]

terminée


In [40]:
#mit fixed Maximums
myZeitformat = "%H Uhr: %a %d.%m.%y"

myAnimation_wide.plot_animated(filename='./output_hystreet/hystreet_xmas_shopping_bhfstr_mitte_sued_FixedMax_klein.gif'
                               , kind='line'
                               , steps_per_period = 5
                               , period_length = 100
                               , period_fmt = myZeitformat
                               , title=myTitle
                               , cmap='dark24'
                               , tick_label_size=7
                               , period_label={'x':0.35,'y':0.7}
                               , period_summary_func=None
                               , fixed_max=True
                               , dpi=144
                               , writer=None
                               , enable_progress_bar=True
                               , line_width=2
                               , fill_under_line_color=None
                               , add_legend=True
                               #, label_events={'1. Sonntagsverkauf':datetime.datetime.strptime("27/11/2022", "%d/%m/%Y"),'2. Sonntagsverkauf':datetime.datetime.strptime("04/12/2022", "%d/%m/%Y")}
                               #, kwargs={}
                               #, fig=<Figure size 936x504 with 1 Axes>
                               #, figsize=array([6.5, 3.5])                               
                              )
print('terminée')

Generating LineChart, plotting ['Bahnhofstrasse (Mitte)', 'Bahnhofstrasse (Süd)']


  0%|          | 0/1201 [00:00<?, ?it/s]

terminée


-------------------------------

### Test via mapplotlib animation
 - https://towardsdatascience.com/how-to-create-animated-graphs-in-python-bb619cc2dec1 und


In [31]:
import matplotlib
import matplotlib.pyplot as plt
import matplotlib.animation as animation

In [32]:
plt.rcParams['animation.ffmpeg_path'] = 'C:Users/sszsim/.conda/pkgs/ffmpeg'
#plt.rcParams

In [33]:
#welche Writer stehen mir zur Verfügung?
myWriters=animation.writers.list()
myWriters

['pillow', 'html']

In [34]:
#nitialize a writer which uses ffmpeg and records at 20 fps with a bitrate of 1800.

Writer = animation.writers['pillow']
writer = Writer(fps=20, metadata=dict(artist='Me'), bitrate=1800)

---------
test anhand von https://matplotlib.org/stable/api/_as_gen/matplotlib.animation.FuncAnimation.html

In [None]:
plt.rcParams["figure.figsize"] = [7.50, 3.50]
plt.rcParams["figure.autolayout"] = True
plt.rcParams['animation.ffmpeg_path'] = 'ffmpeg'

fig = plt.figure()
ax = fig.add_subplot(111)
#div = make_axes_locatable(ax)
#cax = div.append_axes('right', '5%', '5%')
data = np.random.rand(5, 5)
im = ax.imshow(data)
#cb = fig.colorbar(im, cax=cax)
tx = ax.set_title('Frame 0')

In [None]:
cmap = ["copper", 'RdBu_r', 'Oranges', 'cividis', 'hot', 'plasma']

def animate(i):
   #cax.cla()
   data = np.random.rand(5, 5)
   im = ax.imshow(data, cmap=cmap[i%len(cmap)])
   #fig.colorbar(im, cax=ax)
   tx.set_text('Frame {0}'.format(i))

ani = animation.FuncAnimation(fig, animate, frames=10)
FFwriter = animation.FFMpegWriter()
ani.save('hystreet_plot.mp4', writer=FFwriter)


---------