# 1. Introduction to visualisations

In [None]:
import pandas as pd
import plotly.express as px

This is a layout helper function that is making our plots look a bit nicer, i.e. no grid in the background.

In [None]:
def layout_helper(fig):
    fig.update_layout(xaxis=dict(showgrid=False), yaxis=dict(showgrid=False))
    fig.update_layout({'plot_bgcolor':'rgba(0,0,0,0)', 'paper_bgcolor':'rgba(0,0,0,0)'})
    fig.update_layout(yaxis_title = None, xaxis_title = None)
    return fig

### 1.1 Exploratory to explanatory data analysis
In this part, we will show you some basics about visualisations based on the *canton* dataset.
We will focus on the most up-to-date data, which comes from the year 2018.

In [None]:
df = pd.read_csv('https://thomann-public.s3.eu-west-1.amazonaws.com/jst-mapviz/cantons.csv')

In [None]:
df = df[df['year'] == 2018]

In [None]:
df.head()

In [None]:
var_x=df.canton
var_y=df.taxable_income_chf

In [None]:
fig = px.bar(x=var_x,
             y=var_y)
layout_helper(fig).show()

### Sorting
With this, we can look at the total taxable income in each canton, but it is a bit unordered and we don't see a quick message.
That's why in the next chart, we will try sorting:

In [None]:
fig = px.bar(x=df.sort_values(by='taxable_income_chf',ascending=False).canton,
             y=df.sort_values(by='taxable_income_chf',ascending=False).taxable_income_chf)
layout_helper(fig).show()

### Vertical orientation for better readability
Although vertical bar chart is the most typical one, it is usually better to go for *vertical* bar chart instead, because the horizontal labels are much easier to read.

In [None]:
fig = px.bar(y=df.sort_values(by='taxable_income_chf',ascending=True).canton[len(df)-5:],
             x=df.sort_values(by='taxable_income_chf',ascending=True).taxable_income_chf[len(df)-5:],
            orientation='h')
layout_helper(fig).show()

### Add labels and remove the unnecessary
Last, we will add the actual value of the bars directly as label to the bars. Like this, we don't need the x-axis anymore and can set it to invisible:

In [None]:
fig = px.bar(y=df.sort_values(by='taxable_income_chf',ascending=True).canton,
             x=df.sort_values(by='taxable_income_chf',ascending=True).taxable_income_chf,
            orientation='h',text_auto='.5s',
            width=700,height=600)

fig.show()

We don't need gridlines, axis titles, and axis labels actually. This is just distraction from the message.

In [None]:
fig = px.bar(y=df.sort_values(by='taxable_income_chf',ascending=True).canton,
             x=df.sort_values(by='taxable_income_chf',ascending=True).taxable_income_chf,
            orientation='h',
            text_auto='.5s',width=700,height=600)

# we don't need the xaxis anymore, because we have the labels
fig.update_xaxes(visible=False)

layout_helper(fig).show()

Now, compare this chart to a pie chart. Which one do you find easier?

In [None]:
fig = px.pie(df, values='taxable_income_chf', names='canton',width=800,height=800)
fig.show()

## Exercises
Now, we will dig into the *communities* dataset, and we would like you to answer the following questions by choosing an appropriate visualisation type.

1) Which are the top 10 communities by taxable income per capita in the year of 2018?
You can get community data from https://thomann-public.s3.eu-west-1.amazonaws.com/jst-mapviz/communities.csv
2) How could you highlight a certain community, say the city of Wollerau, in this chart?
Consider using preattentive attributes.

Hint: remove Nan-values from the dataframe first and filter it to the year of interest!