# Data Science and Visualization (RUC F2023)

## Lecture 3: Data Visualization

# Advanced Visualizations

* ### Advanced bar chart
* ### Pairplot
* ### Correlation heatmap


We demonstrate with a real dataset about gas prices in selected countries from 1990 to 2000. We mainly use the library called **seaborn** which contains functions that make more advanced and/or beautiful plots.

## 0. Setup and construct the data

In [None]:
import pandas as pd
import seaborn as sns

# 'C:/Data/gas_prices.csv' is where I put my data file. You need to change it to your path/folder that contains your data file.
gas = pd.read_csv('C:/Data/gas_prices.csv')

gas.head()

In [None]:
gas.shape

## 1. Advanced bar chart

Here we show how to plot a bar chart in a 'transformed' manner, i.e., with horizontal bars.

### 1.1 We plot such a bar chart for all countries gas prices in the year of 1990

We get the data as the first row without the year column:

In [None]:
data = gas.iloc[0][1:]
data
#data = gas.drop(columns=['Year'])

We call seaborn's **barplot()** function to plot it. In order to get the horizontal bars, we need to use 'h' for the parameter *orient*. This parameter is not defined for **bar()** in matplotlib.pyplot.

NB: If *orient* is not specified, the barplot() function will decide the orientation automatically and try to make the plot look wide. This means sometimes the bars are still horizontal. 

In [None]:
sns.barplot(y=data.index, x=data.values) #, orient='h')

We may want to make all bars sorted.

In [None]:
sorted = data.sort_values(ascending=False)

sns.barplot(y=sorted.index, x=sorted.values)

### 1.2 (Exercise) Plot a 'transformed' bar chart for USA's gas prices over all these years

## 2. Pairplot

### 2.1 We compare European countries with UK

In [None]:
import matplotlib.pyplot as plt

# Compare European countries with UK
sns.pairplot(gas, x_vars=['France', 'Germany', 'Italy'], y_vars='UK', height=4.5, aspect=1)
# If the first plot does not show anything, include this "diag_kind = None"

# To save it in a file
# dpi - sets the resolution of the saved image in dots/inches
# bbox_inches - when set to 'tight' - does not allow the labels to be cropped
plt.savefig('gas_prices_EU.png', dpi=300, bbox_inches='tight')

What kind of patterns do you see from above?

###  And Asian countries with Mexico:

In [None]:
# Compare European countries with UK
sns.pairplot(gas, x_vars=['Japan', 'South Korea'], y_vars='Mexico', height=4.5, aspect=1)

### 2.2 (Exercise) Make a pairplot to compare Asia-Pacific countries with USA, and another one to compare European countries (incl. UK) with USA.

What can you tell from the plots?

## 3. Heatmap: Another way to visualize pairwise correlation.

We may plot the correlation heatmap for each pair of countries to show the correlation.

In [None]:
# Drop the column of Year, as we only want to see the correlation among countries
data = gas.drop(columns=['Year'])
#data = gas

# We may set the size of a figure
plt.figure(figsize=(16, 6))

heatmap = sns.heatmap(data.corr(), vmin=-1, vmax=1, annot=True, cmap='BrBG')
heatmap.set_title('Correlation Heatmap', fontdict={'fontsize':18}, pad=12);

# save heatmap as .png file
plt.savefig('heatmap_gas_prices.png', dpi=300, bbox_inches='tight')