# VisDa8: Jupyter Notebooks. Interactive notebook tutorial.

## How to participate in the tutorial.
Click on this link to open the tutorial-notebook: <br>
https://mybinder.org/v2/gh/startbit96/visda_tutorial/f5e43f6e083bce50ef26bbb6907c409774c82dd4
<br><br>

## Introduction.

### What is a Jupyter Notebook?
The [Jupyter Notebook](https://jupyter.org/) is an open-source web application that allows you to create and share documents that contain live code, equations, visualisations and narrative text. <br>
Uses include: data cleaning and transformation, numerical simulation, statistical modeling, data visualisation, machine learning, and much more. 
<br><br>

### What software do I need to work with Jupyter Notebooks?
To work with Jupyter Notebooks using the python programming language, you first need to install Anaconda. <br>
Anaconda is an open source distribution for the programming languages Python and R, which aims to simplify package management and software deployment. <br>
You can install Anaconda by following the instructions of the [website](https://docs.anaconda.com/anaconda/install/). <br>
<br>
After the basic installation of Anaconda, Jupyter Notebook should also be pre-installed. <br>
Otherwise it can be installed according to the instructions on the [website](https://jupyter.org/install). <br>
<br>
If you are a more advanced user with Python already installed and prefer to manage your packages manually, you can just use [pip](https://pypi.org/project/jupyter/). <br>
<br>
For this tutorial we will provide you a browser version, so you don't have to install anything.
<br><br>

### Basics of working with a Jupyter Notebook.
Code parts or text contents are organised in sequential cells. This enables separate execution of each one. <br>
This allows you to display intermediate results directly or to optimise individual sections of code without having to execute the entire source code each time. <br>
You can also insert images or tables into the document. Double click on this cell to see how it works and then press `Shift + Enter` to run this cell again! <br>

<img src="https://www.dlr.de/content/de/bilder/institute/datenwissenschaften/dw-institutsgebaeude.jpg?__blob=normal&v=10__ifc1920w" 
alt="DLR Institut für Datenwissenschaften Jena"
style="width:800px;">

Before we start programming, here are a few **useful shortcuts** for using Jupyter notebooks: <br>

`Shift + Enter` run the current cell, select below. <br>
`Ctrl + Enter` run selected cells. <br>

While in **command mode** (press `Esc` to activate): <br>

`Enter` take you into **edit mode**. <br>
`a` insert cell above. <br>
`b` insert cell below. <br>
`dd` delete selected cells. <br>
`z` undo cell deletion. <br>
`y` change the cell type to Code. <br>
`m` change the cell type to Markdown (text). <br>

## Get the data from Deutscher Wetterdienst.

In [1]:
# Laines / Joshua.

## Interactive data visualisation.

In [2]:
# Pandas for managing data in dataframes.
import pandas as pd

# Plotly for visualisation.
import plotly.express as px

# Further imports.
import numpy as np

### Import the data to be analysed.

In [3]:
# First file: stations, their station_id and their location.
filepath = './stations_information.csv'
# Import the csv-file to a pandas dataframe and show some rows.
df_stations = pd.read_csv(filepath, header=0, index_col=0)
df_stations

Unnamed: 0,station_id,station_name,station_state,lon,lat,alt
0,3,Aachen,Nordrhein-Westfalen,50.7827,6.0941,202
1,44,Großenkneten,Niedersachsen,52.9336,8.2370,44
2,52,Ahrensburg-Wulfsdorf,Schleswig-Holstein,53.6623,10.1990,46
3,71,Albstadt-Badkap,Baden-Württemberg,48.2156,8.9784,759
4,73,Aldersbach-Kriestorf,Bayern,48.6159,13.0506,340
...,...,...,...,...,...,...
657,15207,Schauenburg-Elgershausen,Hessen,51.2835,9.3590,317
658,15444,Ulm-Mähringen,Baden-Württemberg,48.4418,9.9216,593
659,15555,Kaufbeuren-Oberbeuren,Bayern,47.8761,10.5849,815
660,19171,Hasenkrug-Hardebek,Schleswig-Holstein,54.0038,9.8553,13


In [10]:
# Second file: measured data for different stations.
filepath = './data_hourly_2020_10.csv'
# Import the csv-file to a pandas dataframe and show some rows.
df_measurements = pd.read_csv(filepath, header=0, index_col=0)
df_measurements

Unnamed: 0,STATIONS_ID,MESS_DATUM,QN_9,TT_TU,RF_TU
0,44,2020-10-01 00:00:00,1,13.5,88.0
1,44,2020-10-01 00:00:01,1,12.7,90.0
2,44,2020-10-01 00:00:02,1,11.6,94.0
3,44,2020-10-01 00:00:03,1,11.5,94.0
4,44,2020-10-01 00:00:04,1,11.4,94.0
...,...,...,...,...,...
366919,19172,2020-10-31 00:00:19,1,12.3,88.0
366920,19172,2020-10-31 00:00:20,1,12.0,86.0
366921,19172,2020-10-31 00:00:21,1,11.5,88.0
366922,19172,2020-10-31 00:00:22,1,11.8,86.0


### Visualise the available weather stations on a map.

In [11]:
# Plot every available station and their location to a map.
fig = px.scatter_mapbox(
    df_stations,                                    # The dataframe that contains the informations.
    lat="lon", lon="lat",                           # The names of the columns that contain the location information.
    hover_name="station_name",                      # Specify tooltip.
    hover_data=["station_id", "station_state"],  
    color_discrete_sequence=["fuchsia"], zoom = 4)  # Specify color and zoom.
# Change the layout of the plot.
fig.update_layout(
    mapbox_style="open-street-map",                 # Display a world map in the background.
    width = 800,                                    # Size of our diagram.
    height = 500,
    margin={"r":0,"t":0,"l":0,"b":0})               # Remove the border around the diagram.
# Display the resulting visualisation.
fig.show()

### Extract the weather data for a specific weather station.

In [12]:
# Station id for 'Jena Sternwarte': 2444.
station_id = 2444
# Extract the rows of the dataframe that meet the condition.
df = df_measurements[df_measurements['STATIONS_ID'] == station_id]
# Reset the index (line number).
df.reset_index(drop=True, inplace=True)
df

Unnamed: 0,STATIONS_ID,MESS_DATUM,QN_9,TT_TU,RF_TU
0,2444,2020-10-01 00:00:00,1,9.0,95.0
1,2444,2020-10-01 00:00:01,1,8.9,96.0
2,2444,2020-10-01 00:00:02,1,9.0,97.0
3,2444,2020-10-01 00:00:03,1,8.8,97.0
4,2444,2020-10-01 00:00:04,1,8.6,97.0
...,...,...,...,...,...
739,2444,2020-10-31 00:00:19,1,11.9,81.0
740,2444,2020-10-31 00:00:20,1,11.3,83.0
741,2444,2020-10-31 00:00:21,1,11.1,84.0
742,2444,2020-10-31 00:00:22,1,10.4,86.0


### Plot the weather data.

In [13]:
# Plot every available data for one individual station.
fig = px.line(
    df,                 # The dataframe that contains the informations.
    x = "MESS_DATUM",   # The names of the columns that contain the information.
    y = "TT_TU")
# Change the layout of the plot.
fig.update_layout(
    width = 800,        # Size of our diagram.
    height = 600,
    title={             # Set the title and format it.
        'text': 'Weather data of ' + df_stations['station_name'][df_stations['station_id'] == station_id].to_list()[0],
        'y':0.95,
        'x':0.5,
        'xanchor': 'center',
        'yanchor': 'top'})
# Display the resulting visualisation.
fig.show()

### Clean up the data.

In [14]:
# Set illogical values to NaN (Not a Number).
df.loc[df['TT_TU'] < -100, 'TT_TU'] = np.NaN
# Fill NaN with previous value.
df['TT_TU'].fillna(method='ffill', inplace=True)

### Plot the weather data again.

In [15]:
# Plot every available data for one individual station.
fig = px.line(
    df,                 # The dataframe that contains the informations.
    x = "MESS_DATUM",   # The names of the columns that contain the information.
    y = "TT_TU")
# Change the layout of the plot.
fig.update_layout(
    width = 800,        # Size of our diagram.
    height = 600,
    title={             # Set the title and format it.
        'text': 'Weather data of ' + df_stations['station_name'][df_stations['station_id'] == station_id].to_list()[0],
        'y':0.95,
        'x':0.5,
        'xanchor': 'center',
        'yanchor': 'top'})
# Display the resulting visualisation.
fig.show()