**Course**: Data Visualization (Prof. Dr. Heike Leitte, Jan-Tobias Sohns, TU Kaiserslautern),   **Name**: Goutham Kallempudi,   **Date**: 01.11.2021

# Comments regarding assignments

Each assignment consists of **two pieces**:
1. A jupyter notebook with practical exercises.
2. An OLAT questionaire that contains questions regarding the material of the lecture and the notebook. 

Modalities for credit points:
- To qualify for the exam (Prüfungsvoraussetzung), you have to obtain 80% of points in each assignment.
- Points are only given through the questionaire in OLAT. Many questions will be related to material you learned or practiced in the notebook.
- While questionaires are open, you can retake them until you have enough credit points to pass.

**Submission instructions**:
- Finish the practical exercises in the notebook.
- Fill in the OLAT questionaire (which includes the submission of an HTML export of the notebook)
- The submission of the notebook is mandatory
- No group work allowed. You may discuss strategies and solutions, but every student has to do their own implementation.

<div class="alert alert-info">
    
# Assignment 1 - Visualizing Data
</div>

The **goals** of the first assignment are:
- Get familiar with python programming in the jupyter notebook;
- Be able to create a data visualization using bokeh;
- Recreate an existing visualization and develop an eye for key features;
- Start critical thinking about design options;



To achieve these goals, your task is to create a visualization of the weather in Kaiserslautern in 2018. The visualization should be similar to the following chart from the New York Times (Jan. 11, 1981, p. 32; Tufte (1983), p. 30) and needs to be implemented in bokeh+pandas:

![New York city's weather for 1980 from the New York Times](http://euclid.psych.yorku.ca/SCS/Gallery/images/NYweather.jpg)


<div class="alert alert-danger">

**Important**: While no points will be awarded for typing the correct answers in the notebooks, it is highly advised to solve the tasks thoroughly. They are designed to be encouraging and provide you with valuable learnings for the exam, understanding of the methods and practical coding.
</div>

<div class="alert alert-success">
    
All tasks in this notebook are marked in green.
</div>

<div class="alert alert-info">
    
## 1. Starter Code - Minimal working example
</div>

The following pieces of code load the data for this assignment and generate a minimal chart for the temperature data. More details can be found in the [bokeh documentation](https://docs.bokeh.org/en/latest/docs/user_guide/quickstart.html).

First load all necessary python modules:

In [217]:
import pandas as pd

from bokeh.plotting import figure, output_notebook, show
from bokeh.models import Band, ColumnDataSource, PrintfTickFormatter, DatetimeTickFormatter, Label
from bokeh.layouts import column
from bokeh.models.tickers import MonthsTicker
from bokeh.models import FactorRange, Legend
from datetime import datetime
from datetime import timedelta
from bokeh.transform import dodge
import datetime

output_notebook()

Load the data given in csv-file format using the pandas library and display the first lines of the data table.

In [218]:
df_kl = pd.read_csv('KLweather2018.csv', parse_dates=['Timestamp'], index_col='Timestamp')
df_kl_prec = pd.read_csv('KLweather2018_monthlyPrecipitation.csv', parse_dates=['Timestamp'], index_col='Timestamp')

df_kl.head()

Unnamed: 0_level_0,temp_min,temp_max,temp_normal_min,temp_normal_max,rel_humidity
Timestamp,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2018-01-01,4.8,9.3,-0.031034,4.875862,79.75
2018-01-01,4.8,9.3,-0.031034,4.875862,79.75
2018-01-02,4.3,6.4,-0.3,4.996552,83.58
2018-01-03,5.4,10.7,0.310345,5.182759,83.46
2018-01-04,4.8,12.4,-0.351724,5.027586,90.5


Plot the temperature minimum as a line chart with bokeh using default settings. 

In [219]:
# create a figure
p = figure(plot_height=400, x_axis_type="datetime")

# define the type of glyph that is rendered and its data. here: a polyline
p.line(source=df_kl, x='Timestamp', y='temp_min')

# render the chart
show(p)

<div class="alert alert-info">
    
## 2. Customizing the temperature chart
</div>

As detailed above, your visualization should look like a modern version of the one from the New York Times. This can be achieved by changing the graphical elements and styling visual properties. In the function below some elements are already changed. Update the code to make the temperature chart even more similar:

<div class="alert alert-success">
    
- Depict the normal high and low temperatures as polylines.
- Label the two polylines. You may use the legend functionality.
- Depict the daily temperature range as an area.
- Label the y-axis.
- Style visual attributes (color, line style) to your liking.
    
</div>

Helpful ressources:
- [Plotting with basic glyphs](https://docs.bokeh.org/en/latest/docs/user_guide/plotting.html) - Overview of glyph types that are implemented in bokeh; see the examples for all the graphical primitives that can be plotted directly.
- [Styling visual attributes](https://docs.bokeh.org/en/latest/docs/user_guide/styling.html) - See styling options for chart elements

In [220]:
def create_temperature_chart(df, width=900):
    '''Create a bokeh figure for temperature range and normal values.'''
    
    tooltip = [("HIGH", "@Timestamp : @temp_max °")]
        
    # create figure and data source
    p = figure(plot_width=width, plot_height=400, title='Kaiserslautern\'s Weather for 2018',
               tools=['xwheel_zoom', 'pan','hover'], tooltips=tooltip,
               x_axis_type="datetime", x_axis_location="above", y_range=(-15,40))

    df['Timestamp_Right'] =  pd.to_datetime(df.index) + datetime.timedelta(days=0.5)
    source = ColumnDataSource(df)
    
    
    
    # add graphical items
    p.line(source=source, x='Timestamp', y='temp_max', color = "#000000")
    p.line(source=source, x='Timestamp', y='temp_min', color = "#000000")
    
    # mark min/max temperature 
    p.quad(source=source, top='temp_max', bottom='temp_min', left='Timestamp', right='Timestamp_Right', color = "#000000")
    p.line(source=source, x='Timestamp', y='temp_normal_max', color = "gray")
    p.line(source=source, x='Timestamp', y='temp_normal_min', color = "gray")
   
   
    # Filtering data and making annotations
    tmax_id = df_kl['temp_max'].idxmax()
    tmin_id = df_kl['temp_min'].idxmin()
    max_temp = df_kl.at[tmax_id,'temp_max']
    min_temp = df_kl.at[tmin_id,'temp_min']
    
    temp_max_data = pd.to_datetime(tmax_id)
    max_month = temp_max_data.strftime("%b")
    max_day   = temp_max_data.day
    
    temp_min_data = pd.to_datetime(tmin_id)
    min_month = temp_min_data.strftime("%b")
    min_day   = temp_min_data.day
    
    label_max = Label(x=tmax_id, y= max_temp, text=f'High {max_month} {max_day} : {max_temp} °',
      border_line_color='black', border_line_alpha=1.0,
      background_fill_color='white', background_fill_alpha=1.0)
    
    label_min = Label(x=tmin_id, y= min_temp, text=f'Low {min_month} {min_day} : {min_temp} °',
      border_line_color='black', border_line_alpha=1.0,
      background_fill_color='white', background_fill_alpha=1.0)

    
    
    # style visual attributes
    p.xaxis.ticker = MonthsTicker(months=list(range(12)))                          # Generate ticks spaced apart by specific, even multiples of months.
    p.xgrid.ticker = MonthsTicker(months=list(range(12))) 
    p.xaxis.formatter=DatetimeTickFormatter(months=["               %b"])
    p.xaxis.major_label_text_align = 'right'
    p.yaxis[0].formatter = PrintfTickFormatter(format="%2i°")
    p.yaxis.axis_label = "Temperature [°C]"
    p.title.text_font_size = "15pt"
    p.title.align = "center"
    p.background_fill_color = "#B2BEB5"
    p.background_fill_alpha = 0.1
    
    p.add_layout(label_max)
    p.add_layout(label_min)
    return p

p = create_temperature_chart(df_kl)
show(p)

<div class="alert alert-info">
    
## 3. Filtering data and making annotations
</div>

The following piece of code demonstrates how to find maxima in a data column. Use this code to automatically find the highest and lowest temperature values in 2018 and place a mark in the chart above at these positions (e.g. circle the respective data points).

<div class="alert alert-success">
    
- Automatically filter the highest and lowest temperatures in Kaiserslautern in 2018.
- Integrate the code in the chart computation method above and mark the two detected positions.
- Add text labels to the positions. [Label documentation](https://docs.bokeh.org/en/latest/docs/user_guide/annotations.html#labels) for bokeh.
    
</div>

In [221]:
tmax_id = df_kl['temp_max'].idxmax()
print("KL temperature maximum:", tmax_id, df_kl.at[tmax_id,'temp_max'])

tmin_id = df_kl['temp_min'].idxmin()
print("KL temperature minium:", tmin_id, df_kl.at[tmin_id,'temp_min'])

KL temperature maximum: 2018-08-04 00:00:00 35.5
KL temperature minium: 2018-02-28 00:00:00 -14.0


<div class="alert alert-info">

## 4. Designing additional charts
</div>

Now design the charts for precipitation and relative humidity.

<div class="alert alert-success">
    
- Create the chart for precipitation. Try to design a bar chart using the hints below.
- Create the chart for humidity.
    
</div>

Hints for temporal x-axis:
- **Width of bars**: The width is given milliseconds. In order to get the required scaling, you will need to specify the width like: `widthInDays = ndays*24*60*60*1000` (24 hours * 60 minutes * 60 seconds * 1000 milliseconds)
- **Position of bars**: You can shift the bars using the dodge function `x=dodge('prec', value, range=p.x_range)`. Keep in mind that you need to define an appropriate `value` by which to shift the bar.

In [222]:

def create_precipitation_chart(df, width=900):
    '''Create a bokeh figure for monthly precipitation (2018 vs normal values).'''
    
    widthInDays = 0.5
    p = figure(plot_width=width, plot_height=200, tools=['xwheel_zoom'], x_axis_type="datetime")
    df['timestamp'] = df.index
    
    
    for year in df.index:
        width = 10*24*60*60*1000
        p.vbar(x = dodge('timestamp', -width/2 , range = p.x_range ), top = 'prec_normal', width = width , source = df, fill_color = 'gray',line_color = 'black', legend_label = "normal")
        p.vbar(x = dodge('timestamp', (width/2 + 1) , range = p.x_range ), top = 'prec', width = width, source = df, fill_color = 'white',line_color = "black" ,legend_label = 'prec', hatch_pattern = "right_diagonal_line" )
    
    
    p.xaxis.visible = False
    p.legend.location = "top_center"
    p.xaxis.ticker = MonthsTicker(months=list(range(12)))                          # Generate ticks spaced apart by specific, even multiples of months.
    p.xgrid.ticker = MonthsTicker(months=list(range(12))) 
    p.xaxis.formatter=DatetimeTickFormatter(months=["               %b"]) 
    return p

show(create_precipitation_chart(df_kl_prec))

In [None]:
df_

In [223]:
def create_humidity_chart(df, width=900):
    '''Create a bokeh figure for relative humidity.'''
    
    p = figure(plot_width=width, plot_height=200, tools=['xwheel_zoom'], title='Relative Humidity as of Noon',
               title_location="below", x_axis_type="datetime", y_range=(0,100) )
    

    listofzeros = [0] * df_kl.shape[0]
    
    p.varea(x=df_kl.index, y1=df_kl['rel_humidity'], y2= listofzeros , alpha=0.2, fill_color='gray') 
    p.line(source=df_kl, x='Timestamp', y='rel_humidity', color = "#000000")
    
    
    p.xaxis.visible = False
    p.title.align = "center"
    p.xaxis.ticker = MonthsTicker(months=list(range(12)))                          # Generate ticks spaced apart by specific, even multiples of months.
    p.xgrid.ticker = MonthsTicker(months=list(range(12))) 
    p.xaxis.formatter=DatetimeTickFormatter(months=["               %b"]) 
    return p

show(create_humidity_chart(df_kl))

In [225]:
df_kl

Unnamed: 0_level_0,temp_min,temp_max,temp_normal_min,temp_normal_max,rel_humidity,Timestamp_Right
Timestamp,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2018-01-01,4.8,9.3,-0.031034,4.875862,79.75,2018-01-01 12:00:00
2018-01-01,4.8,9.3,-0.031034,4.875862,79.75,2018-01-01 12:00:00
2018-01-02,4.3,6.4,-0.300000,4.996552,83.58,2018-01-02 12:00:00
2018-01-03,5.4,10.7,0.310345,5.182759,83.46,2018-01-03 12:00:00
2018-01-04,4.8,12.4,-0.351724,5.027586,90.50,2018-01-04 12:00:00
...,...,...,...,...,...,...
2018-12-27,-5.1,0.8,0.055172,4.365517,94.00,2018-12-27 12:00:00
2018-12-28,-2.7,-0.5,-0.434483,3.972414,97.21,2018-12-28 12:00:00
2018-12-29,-2.2,3.0,-1.289655,4.972414,96.50,2018-12-29 12:00:00
2018-12-30,3.1,6.1,-0.486207,4.968966,91.33,2018-12-30 12:00:00


In [226]:
df_kl_prec

Unnamed: 0_level_0,prec,prec_normal,timestamp
Timestamp,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2018-01-16 00:00:00,128.4,60.4,2018-01-16 00:00:00
2018-02-14 12:00:00,13.7,48.414286,2018-02-14 12:00:00
2018-03-16 00:00:00,41.7,53.444828,2018-03-16 00:00:00
2018-04-15 12:00:00,30.5,47.268966,2018-04-15 12:00:00
2018-05-16 00:00:00,108.9,65.924138,2018-05-16 00:00:00
2018-06-15 12:00:00,81.5,67.137931,2018-06-15 12:00:00
2018-07-16 00:00:00,41.6,60.521429,2018-07-16 00:00:00
2018-08-16 00:00:00,40.7,57.653571,2018-08-16 00:00:00
2018-09-15 12:00:00,27.0,52.596552,2018-09-15 12:00:00
2018-10-16 00:00:00,11.1,65.413793,2018-10-16 00:00:00


<div class="alert alert-info">
    
## 5. Combining multiple charts
</div>

In this last part, we combine the three charts you designed above.

<div class="alert alert-success">
    
- Create the combined weather chart for Kaiserslautern.
- Save a jpg/png-version or screenshot of this chart that can be uploaded in OLAT.
    
</div>

In [224]:
show(column(create_temperature_chart(df_kl), create_precipitation_chart(df_kl_prec),create_humidity_chart(df_kl)))