# Python 101
## Part IX.
---
## Dataframes and visualization
### Act I: Use the pandas, Luke!

<img src="pics/pandas.png" align="left">
<br style="clear:left;"/>

In [None]:
import pandas as pd

#### Part I. Basic pandas operations
- read csv data into a pandas dataframe

In [None]:
data = pd.read_csv('./data/vote2018.csv', quotechar='"', delimiter=';', encoding='utf-8')

- show the first 5 rows

In [None]:
data.head()

- get a dataframe's column names

In [None]:
data.columns

- select a subset of columns

In [None]:
data[['name', 'votes']].head()

- filter columns by selecting subset of columns

In [None]:
cols_i_want = [col for col in data.columns if not col == 'subid']
cols_i_want

In [None]:
data[cols_i_want].head()

__Caution!__ `data[cols]` only creates a view!  
Use `data = data[cols]` if you want on a subset.

In [None]:
data.head()

In [None]:
data = data[cols_i_want]
data.head()

- use aggregation functions  
_How many people voted?_

In [None]:
data['votes'].sum()

- group values to get more insight  
_Let's get the sum of the votes for each party!_

In [None]:
data[['party', 'votes']].groupby('party').sum().head(10)

- ordering dataframes  
_Order results by the number of votes!_

In [None]:
party_votes = (
    data
    [['party', 'votes']]
    .groupby('party')
    .sum()
    .sort_values('votes', ascending=False)
)
party_votes.head()

In [None]:
len(data['party'].unique())

In [None]:
data.party.nunique()

#### Part II. Plotting results

Use a jupyter "magic" function to draw the plots into the notebook.  
Also load plotting libraries `matplotlib` and `seaborn`.

In [None]:
%matplotlib inline
import matplotlib.pyplot as plt
import seaborn as sns

- simple barplot

In [None]:
party_votes.plot(kind='bar');

- filtering and plotting  
_Only plot parties with at least 10000 votes!_

We can filter dataframes with the `dataframe.loc[condition]` statement where condition is a logical expression on one (or more) column(s).

In [None]:
condition = party_votes['votes'] > 10000
condition.head(15)

In [None]:
vote10k = party_votes.loc[condition]
vote10k.plot(kind='bar');

_Plot the top 6 party!_

In [None]:
top6 = party_votes.head(6)
top6.plot(kind='bar');

---
### Act III: The devil lies in the details!

<img src="pics/evil-panda.png" width=300 height=300 align="left">
<br style="clear:left;"/>

- Nested grouping operations  
_Consider the regional data too._

In [None]:
regional = (
    data
    [['party', 'region', 'votes']]
    .groupby(['region', 'party'])
    .sum()
)
regional.head(10)

_Let's only have the ones with more than 5000 votes!_

In [None]:
regional5k = regional.loc[regional.votes > 5000]
regional5k.head(10)

- Pivot  
To pivot this dataframe first we need to remove the nested index:

In [None]:
regional5k.reset_index().head()

Now we can pivot this flattened dataframe:

In [None]:
(regional5k
 .reset_index()
 .pivot(index='region', columns='party', values='votes'))

Plot the results:

In [None]:
(
    regional5k
    .reset_index()
    .pivot(index='region', columns='party', values='votes')
    .plot(kind='barh', figsize=(8, 6))
);

---

## Let's do some...

<img align="left" width=150 src="http://www.reactiongifs.com/r/mgc.gif">

<br style="clear:left;"/>

### Act III: Cool library of the week: <a href="https://mzucker.github.io/2016/09/20/noteshrink.html">noteshrink</a>
#### Export your notes into readable pdfs!

To install:
- install pillow (in your shell execute: `conda install pillow`)
- download and unzip the library (from <a href="https://github.com/mzucker/noteshrink/archive/master.zip">here</a>)
- optional: change line #578 in noteshrink/noteshrink.py: comment out the line
- optional: install with pip: `pip install -e noteshrink`
Then use with:  
`python filename(s) -b output_file_prefix`  
example:

In [None]:
!python ./noteshrink/noteshrink.py noteshrink/examples/notesA1.jpg noteshrink/examples/notesA2.jpg -b example

---
## Final Act: The pandas is strong with this one!

<img src="http://2.bp.blogspot.com/-pgK8KdMmSn8/TsFTOwrGk9I/AAAAAAAABAk/5ondVGyw6w8/s320/Darth+Panda.jpg" align="left">

<br style="clear:left;"/>

## It's your turn - write the missing code snippets!

#### 1.  Plot the number of voters in each region!

#### 2. Who would win, if Fidesz doesn't participate in the election?

Hint: You can create filters based on equality. (`~data['party'] == 'FIDESZ-KDNP'`)

#### 3. Who would win by regions, if Fidesz doesn't participate in the election?

#### 4. How many wins could the opposition get in case of perfect cooperation?

#### 5. Who were the most successful candidates? (top10)

#### 6. List the number of subregions each party participated! 
Hint: groupby aggregation function `.count()` returns the number of items in a group.

#### 7. List the most successful regions for each party!