**Course**: Data Visualization,   **Name**: Kartik Ramesha Kadur,   **Date**: 04.12.2019

**Due date**: 4. December 2019, 23:59

# Comments regarding assignments

Each assignment consists of **two pieces**:
1. A jupyter notebook with practical exercises.
2. An OLAT questionaire that contains questions regarding the material of the lecture and the notebook. 

Modalities for credit points:
- To qualify for the exam (Prüfungsvoraussetzung), you have to obtain 80% of points in each assignment.
- Points are only given through the questionaire in OLAT. Many questions will be related to material you learned or practiced in the notebook.
- While questionaires are open, you can retake them until you have enough credit points to pass.

**Submission instructions**:
- Finish the practical exercises in the notebook.
- Fill in the OLAT questionaire (which includes the submission of an HTML export of the notebook)
- No group work allowed! You may discuss strategies and solutions, but every student has to do their own implementation.

# Assignment 2 - Good Chart Design

**Due date**: Wednesday, 4. December 2019, 23:59

The **goals** of the second assignment are:
- Practice visualization design critiques using a given visualization.
- Decompose a given chart into its components and analyze their design.
- Practice visual encoding theory by detecting marks and channels.
- Design an information rich chart.



To achieve these goals, your task is to analyze, critique, and revise a given visualization that depicts the received and handled tickets (technical issues):

## 1. Problem Setting and Starter Code

**Scenario:** Imagine that you manage an information technology (IT) team. Your team receives tickets, or technical issues, from employees. In the past year, you've had two team members leave and decided at the time not to replace them. You have heard a rumbling of complaints from the remaining employees about having to "pick up the slack". You've just been asked about your hiring needs for the coming year and are wondering if you should hire a couple more people. First, you want to understand what impact the departure of individuals over the past year has had on your team's overall productivity. You plot the monthly trend of incoming tickets and those processed over the past calender year. You see that there is some evidence your team's productivity is suffering from being short-staffed and now want to turn the quick-and-dirty visual you created into the basis for your hiring request.

Below is the code and chart of the initial ticket visualization. Read through the code and make sure that you understand it. 

In [None]:
import pandas as pd
from bokeh.plotting import figure, output_notebook, show, output_file
from bokeh.models import ColumnDataSource, LabelSet
from bokeh.transform import dodge
from bokeh.palettes import brewer

output_notebook()

In [None]:
month = [i for i in range(1,13)]
received = [160,184,241,149,180,161,132,202,160,139,149,177]
processed = [160,184,237,148,181,150,123,156,126,121,124,140]

df = pd.DataFrame({'month': month, 
                   'received': received, 
                   'processed': processed})

In [None]:
p = figure(title="TICKET TREND", plot_height=400)
p.vbar(source=df, x=dodge('month', -0.175, range=p.x_range), top='received', width=0.3, 
       legend_label="Ticket Volume Received")
p.vbar(source=df, x=dodge('month',  0.175, range=p.x_range), top='processed', width=0.3, 
       color="deeppink", legend_label="Ticket Volume Processed")

# remove the toolbar
p.toolbar.logo = None
p.toolbar_location = None

show(p)

## 2. Analysis of the original chart

### Write down the message you want to convey with your chart:

I want to show that my team's productivity is suffering because the team is short staffed.

### Design critique

Answer the following questions:
- How good does the current design support the message your trying to make?
- Can you spot problems with the current design?

### Analyze the chart design

Now go through each of the elements of the chart and check if the design is suitable. Make a list of all visible elements of the chart and problems with the design.

- X-axis : Months
- Y-axis : Ticket volume
- Ticket volume received.
- Ticket volume processed.

## 3. Improved design of helper elements

A major problem of the current chart is that it is too cluttered and full of competing information. Improve all chart elements (list above) except for the data representation (bar elements) to achieve a better "background" design.

In [None]:
p = figure(title="TICKET TREND", plot_height=400, plot_width=1000)
p.vbar(source=df, x=dodge('month', -0.175, range=p.x_range), top='received', width=0.3, 
       color="orange", legend_label="Ticket Volume Received")
p.vbar(source=df, x=dodge('month',  0.175, range=p.x_range), top='processed', width=0.3, 
       color="#B3DE69", legend_label="Ticket Volume Processed")

p.ygrid.grid_line_color = 'black'
p.xgrid.grid_line_color = 'grey'
p.y_range.start = 100
p.y_range.end = 250
p.x_range.start = 0.5
p.x_range.end = 12.5
p.legend.location = "top_right"

show(p)

## 4. Data Encoding

### Marks and channels

What are the marks and channels used to encode data? Write down the information as discussed in the lecture.

Marks describe the shape/geometry of the encoding.
- Points
- Lines
- Area
- Volume

Channels describe the style of the mark.
- Position
- Shape
- Size
- Color
- Orientation

### Creating new columns

The current dataframe only contains the raw data. Often you need to process the data further to obtain suitable input for your data. Read the [ten minutes to pandas](https://pandas.pydata.org/pandas-docs/stable/getting_started/10min.html) introduction to pandas programming.

Now you are able to do the following dataframe operations:

In [None]:
import numpy as np

a = [int(i*10) for i in np.random.random(6)]
b = [int(i*10) for i in np.random.random(6)]

df_test = pd.DataFrame({"A": a, "B": b})

# select a column
df_test['A']

# create a new column by adding two column
df_test['A+B'] = df_test['A'] + df_test['B']

# create a new column by applying a function to an existing column
df_test['A+1'] = df_test['A'].apply(lambda x: x+1)
df_test['Size(A)'] = df_test['A'].apply(lambda x: "S" if x < 5 else "M")

df_test

Use these dataframe operations to create the following columns in the ticket dataframe:
- the difference between received and processed tickets
- the total number of open tickets, i.e., the sum of all tickets that have not been handled yet. See method `cumsum()` for a pandas Series.

In [None]:
unprocessed = []
for i in range(0,12):
    unprocessed.append(received[i] - processed [i])
print("Unprocessed tickets in each month: \n{}".format(unprocessed))

df1 = pd.DataFrame({'month': month, 
                   'received': received, 
                   'processed': processed,
                   'unprocessed': unprocessed,
                   })

open_tickets = df1['unprocessed'].cumsum()
print("Open Tickets :\n{}".format(open_tickets))

### Alternative data encodings

Now change the presentation of the data. Select two new, suitable chart types for the ticket data. Stick to chart types that use the same x- and y-axis as defined above. Implement (at least) two different data encodings. What is your favorite?

In [None]:
fig1 = figure(title = 'Productivity', plot_width=1000, plot_height=400)
names = ['processed','unprocessed']
fig1.varea_stack(['processed', 'unprocessed'], x='month', color=("#B3DE69", "red"), legend_label = names, source=df1)
fig1.legend.items.reverse()
fig1.ygrid.grid_line_color = 'black'
fig1.xgrid.grid_line_color = 'grey'
fig1.xaxis[0].axis_label = 'Months'
fig1.yaxis[0].axis_label = 'Number of Tickets'
show(fig1)

df2 = pd.DataFrame({'month': month, 
                   'open': open_tickets
                   })
fig2 = figure(title="Open Tickets", plot_width=1000, plot_height=300)
fig2.line(x='month', y='open', source = df2, color = "red", legend_label = "Number of unprocesed tickets")
fig2.ygrid.grid_line_color = 'black'
fig2.xgrid.grid_line_color = 'grey'
fig2.legend.location = "top_center"
fig2.xaxis[0].axis_label = 'Months'
fig2.yaxis[0].axis_label = 'Unprocessed Tickets'
show(fig2)

## 4. Revise the chart

Now combine the results from the previous steps and create a final chart that you would send to your boss to ask for additional staff. Also think about adding additional information to your chart with labels etc.

In [None]:
fig1 = figure(title = 'Productivity', plot_width=1000, plot_height=400)
names = ['processed','unprocessed']
fig1.varea_stack(['processed', 'unprocessed'], x='month', color=("#B3DE69", "red"), legend_label = names, source=df1)
fig1.legend.items.reverse()
fig1.ygrid.grid_line_color = 'black'
fig1.xgrid.grid_line_color = 'grey'
fig1.xaxis[0].axis_label = 'Months'
fig1.yaxis[0].axis_label = 'Number of Tickets'
show(fig1)

df2 = pd.DataFrame({'month': month, 
                   'open': open_tickets
                   })
x = df2["month"]
y1 = [0 for i in range(0, len(x))]
y2 = df2["open"]
fig2 = figure(title="Open Tickets", plot_width=1000, plot_height=300)

fig2.line(x='month', y='open', source = df2, color = "red", line_width = 3.0, legend_label = "Number of Unprocesed tickets")
fig2.varea(x, y1, y2, color = "#FB8072")
fig2.circle(x = 'month', y = 'open', source = df2, fill_color="red", line_color="red")
fig2.ygrid.grid_line_color = 'black'
fig2.xgrid.grid_line_color = 'grey'
fig2.legend.location = "top_center"
fig2.xaxis[0].axis_label = 'Months'
fig2.yaxis[0].axis_label = 'Unprocessed Tickets'
show(fig2)

**Caption**: Loss in Team Productivity

### Write a figure caption

As we learned: Each figure needs a caption. Write a caption for your figure. Position it underneath your chart so you can take a screenshot of chart + caption.