# Overview of Visualisation
In this notebook, we will practice data visualisation in Python using English Premier League data used for Fantasy football teams. Information on the dataset can be found [here](https://www.kaggle.com/mauryashubham/english-premier-league-players-dataset).

This notebook will be focusing on the following visualisation packages: 


1. [**Seaborn**](https://seaborn.pydata.org/index.html) for making easy, visually appealing graphics 
    * Better default graphics, and a larger variety of graphs to enhance data communication 
    * More customiseable and visually appealing (e.g., [colour palettes](https://seaborn.pydata.org/tutorial/color_palettes.html) & [aesthetics](https://seaborn.pydata.org/tutorial/aesthetics.html))



2. [**Plotly Express**](https://plot.ly/python/plotly-express/) for making interactive, publication-quality graphics 
    * Has more graphing options not typically found in other packages 
    * Can make your visuals [interactive and animated](https://plot.ly/python/animations/)
    * *Optional*: Plotly Dash to make dashboards for your plotly graphics



These other packages that could be useful. Take a look at them or use in on the online practice!
3. [**Bokeh**](https://bokeh.pydata.org/en/latest/index.html), another package for making interactive plots
4. [**ggplot**](https://github.com/hadley/ggplot) (a graphics package from R, made useable in Python)

# Load the libraries
Start by loading the libraries that are needed for all the visualisation tools we will be using.

In [None]:
import pandas as pd                        # basic data manipulation
import numpy as  np                        # basic data manipulation

import seaborn as sns                      # for seaborn visualisations
%matplotlib inline 
      # inline is to render all figures inside the notebook (required for seaborn too)

import plotly as py                        # needed to export interactive graphics as html 
import plotly.express as px                # for plotly express visualisations

# Import & Clean the Data
Next we need to import our data, which happens to be data about Fantasy English Premier League dataset. Let's start by downloading it from [Google Sheets](https://decd.co/epl-data). Make sure to save it to the same folder as this notebook. 

To import our data in Google Sheets or Excel, we might click the import button. In Python, we can use `pd.read_csv`.

After you've loaded in the data, have a look at it, e.g., `head()`, `tail()`, `sample()`

In [None]:
# Import the data
df = pd.read_csv("", index_col=0)

In [None]:
# Check your data
df.head()

In [None]:
# Drop any null values 
df = df.dropna()

In [None]:
# Check to see if there are any remaining null values
df.isnull().sum()

In [None]:
# Remove '+' symbols from 'club' unique values and replace with a space 
df["club"] = df["club"].str.replace('+', ' ')

In [None]:
# Check the unique values of 'club' to make sure the replacement worked
df["club"].unique()

# Types of Graphs for Exploratory Data Analysis

Now that we're all set up, let's dive into some visualisations. Keep in mind which graphs work best with certain types of data. The `plot` method is built into the Pandas library and is great for quick plotting to get a feel for your data. Let's spend a few minutes using the built-in plots just to see what kind of data we have.

**Categorical data** (objects) is best displayed using:
* Bar Charts
* Stacked Bar Charts
* Grouped Bar Charts

**Numeric data** (integers/floats) is best displayed using:
* Histograms
* Line graphs
* Area plots
* Scatter plots
* Box plots

**Consider the following**: 
* What if you have ***both*** numeric and categorical data? With what kinds of graphs could you graph both types of data?
* Which types of graphs are best for visualising 1 feature? For 2 or more features?

A good visualisation resource is [data-to-viz](data-to-viz.com), a website that helps you choose the appropriate graphs based on the data you have.

Try creating some quick plots (e.g., scatter plots, bar plots, histograms). What do you notice about them? Are they useful?

In [None]:
# Basic histogram 
df.plot(y="___", kind="hist")

In [None]:
# Basic scatter plot
df.plot.scatter(x="___", y="___")

In [None]:
# Other basic plots (e.g., bar plot, line graph)

While the in-built plots are useful, they are not that visually appealing. They are useful for gleaning insights from your data, but they are likely not something you would want to present in a formal report.  

This is where other visualisation packages like Seaborn and Plotly come into play.

# Seaborn: Communicating Information through Visuals
Seaborn is a powerful and easy-to-use graphing package to make some great visuals. Use [this website](https://seaborn.pydata.org/examples/index.html) for inspiration on the many types of graphs you can do, with sample code. The following example codes below are graphs you will most likely use in your analysis. 

**Basic examples of seaborn graphs include:**
* Scatter plots: `sns.relplot(data=, x=, y=, ...)`   
* Line plots: `sns.lineplot(data=, x=, y=, ...)` 
* Histograms: `sns.displot(data.x, ...)`   
* Bar plots (defined y-axis): `sns.catplot(data=, x=, y=, ...)` 
* Bar plots (default count y-axis) `sns.countplot(data=, x=, ...)`   
* Box plots: `sns.boxplot(data=, x=, y=, ...)` 


#### Optional: Using Seaborn's colour palettes

The default colours aren't the most pleasing, but you can use some built-in colour palettes to change them. Use `palette=" "` and enter the name the palette (either seaborn pre-made palette, or a custom one) inside the `" "` to change the colours to your liking. These can be used on many types of seaborn graphs, which can be seen below.


***Built-in multicolor palettes:***
Example: `palette="muted"`. Other options: `deep`, `pastel`, `bright`, `dark`, `colorblind`, `hls`, `paired`, and `Set2`)

***Built-in sequential, monochrome, and other colour palettes:***
Example: `palette="GnBu"`. But there are many color options for this. To make the palettes darker, add a `_d` after the name (example `GnBu_d`). To make the colors reverse order, add an `_r` after the name (example `GnBu_r`). 
* *Note: If you enter an incorrect palette=" ", you will get an error, and will list all possible colour palette options in the warning.*

***Custom palettes:***
Create a list of colours and save it as a feature. Make sure the number of colours you select match the number of feature levels (e.g., if a feature has 5 groups, select 5 colours). Then use that as the name for the palette in your graph. Remember, when you use this method, do not add `""` around your palette name. 
* Step 1. `yourcolours = ["red", "orange", "yellow", "green", "blue"]` 
* Step 2. `sns.catplot(....palette = yourcolours,...)`

### Scatterplots

In [None]:
# basic scatterplot 
sns.relplot(data=, x="___", y="___", 
            hue="___",          # Add a categorical feature by using 'hue=__' 
            size="___")         # Add a scaling feature by using 'size=__' 

In [None]:
# add a pre-made colour palette 
sns.relplot(data=, x="___", y="___", 
            hue="___", 
            size="___", 
            palette="___")   

### Histograms

In [None]:
# The seaborn default histogram includes a normal distribution curve, and adds the x-axis feature name as the label 
sns.distplot(df.___, color="___",
             bins=___)          # Change the number of bins (bars) to see what happens to the histogram

In [None]:
# In the above histogram, add the parameter 'kde=False' to display the frequency only

In [None]:
# In the above histogram, add the parameter 'hist=False' to display the density curve only

### Regressions & Line Graphs

In [None]:
# Basic regression plot 
sns.regplot(data=___, x="___", y="___", color="___",
            marker="___",    # change the data point shapes with marker="__",  See what letters work for you! 
            ci=___)          # change the confidence interval of the regression with 'ci=' a number

In [None]:
# Basic line plot
sns.lineplot(data=___, x="___", y="___", color="___")

In [None]:
# Add a categorical feature using 'hue=___' to the above line plot to show groups

# Add a 'ci' value (see the regression example above) and if you want, a new colour palette

### Bar Plots

In [None]:
# Regular bar plot: use a continuous feature on the y-axis  
sns.catplot(data=___, x="___", y="___", 
            height=___, 
            kind="bar", 
            palette="___")

In [None]:
# Grouped bar plot: like the bar plot above, but adds an extra parameter "hue=___" (a categorical feature)

In [None]:
# What other types of carts can you do with catplots? 
# Try setting kind= to one or more of the following: "point", "strip", "swarm", "box", "violin", or "boxen"

### Count Plots

In [None]:
# Basic count plot: the y-axis is automatically a count value, so you do not include a feature on the y-axis
sns.countplot(data=___, x="___")

In [None]:
# Add a categorical feature using the "hue=___" parameter. What other parameters can you add (e.g., palette)? 

# Plotly Express: Enhancing your Visual Communication
Now is the time you've probably been looking forward to the most - making graphs animated! [Plotly express](https://plot.ly/python/plotly-express/) can take your visualisations to a whole other level. Make them [animated](https://plot.ly/python/animations/), interactive, and you can save them as html files, or even host them on a web-app. 


***Basic examples of plotly express graphs include:***
* Scatter plots: `px.scatter(df, x=, y=, ...)`
* Line graphs: `px.line(df, x=, y=, ...)`
* Area plots: `px.area(df, x=, y=, ...)`
* Bar plots: `px.bar(df, x=, y=, barmode=, ...)`
* Histograms: `px.histogram(df, x=, y=, barmode=, ...)`

To include categories/group levels in your graphs, use `color = ___`, (as opposed to `hue = ___` with seaborn) to group data by a categorical feature.


***When using scatter plots:***
You can also include smaller graphs of a different variety on the outer margins of your scatter plot. You do this within the main `px.scatter(   )` by including `marginal_y="___"` (for a graph on the y-axis), or `marginal_x="___"` (for a graph on the x-axis), and including the type of graph (e.g., histogram, box, violin), inside the `"___"`. 
* *Note: while it might look good to have graphs outside of other graphs, use this with caution, as it can be too much information and detract from your main findings*


## Below is an example of an animated bubble chart inspired by Hans Rosling's Gapminder:

More information on bubble charts (scatter plots with an added third dimension of "size") can be found [here](https://plot.ly/python/bubble-charts/). 

In [None]:
# This code is taken from the Plotly website. It is a great demonstration of using Plotly Express to tell a story.
gapminder = px.data.gapminder()

gap = px.scatter(gapminder, x="gdpPercap", y="lifeExp", 
           animation_frame="year", 
           animation_group="country", 
           size="pop",            # this scales/sizes the data points based on the value of a numeric feature
           color="continent",     # this is like "hue" in seaborn, to group by a categorical feature
           hover_name="country",  # adds labels from a categorical feature when hover over a data point
           log_x=True,            # this re-scales the x-axis with log-transformation
           size_max=55,           # this sets the maximum scaling size for size="pop"
           range_x=[100,100000],  # this sets the min and max values to show in the x-axis
           range_y=[25,90],       # this sets the min and max values to show in the y-axis
           title="GDP Per Capita vs. Life Expectancy from 1952-2007, by Country",  # add a title 
           labels={"gdpPercap":"GDP Per Capita", "lifeExp": "Life Expectancy (Years)"})  # rename x-axis and y-axis 
gap.show()

## Now that we've seen what this package can do, let's try to tell a story with our own data by using these tools!

Let's do this in steps, practicing with the different parameters along the way.

### First, make a plotly express graph of your choosing. 
Here are some examples below, but feel free try out other ones for yourself.

In [None]:
# Basic bubble chart using scatterplot - what other parameters can you add, remove, or change?
px.scatter(df, x="___", y="___", 
          color="___", 
          size="___",     
          marginal_x="rug")  # adds a 'rug plot' on top, showing the tick marks for all individuals per region

In [None]:
# Basic bar plot - what other parameters can you add, remove, or change?
px.bar(df, x="___", y="___", 
       color="___",
       barmode="___",                   # barmode= tells the plot to display the bars in the chart a certain way
       color_discrete_sequence=px.colors.sequential.Plasma)  # plotly color palette parameter (for discrete scales)

In [None]:
# Here's an example of a heatmap, with darker colors having higher counts (or occurrences)
px.density_heatmap(df, x="___", y="___", 
                   color_continuous_scale="Purples")   # plotly color palette parameter (for continuous scales)

In [None]:
# Basic line graph - what other parameters can you add, remove, or change?
px.line(df, x="___", y="___")

In [None]:
# Basic histogram graph - what other parameters can you add, remove, or change?
px.histogram(df, x="___")

Try out other graphs in Plotly Express! Refer back to the website link above for more graphing ideas. 

### Second, add animation functions to the graphics

In [None]:
# Let's animate the scatterplot we made above, using animation_frame="___"
px.scatter(df, x="___", y="___", 
           color="___", 
           size="___", 
           animation_frame="___")  # moves between each group/value as a new frame

In [None]:
# What happens when you add animation_group too? How does it compare to the graph above? Is it more or less useful?
px.scatter(df, x="___", y="___", 
           color="___", 
           size="___", 
           animation_frame="___",  # moves between each group/value as a new frame
           animation_group="___")   # displays grouped data per frame as opposed to individual data points

In [None]:
# Let's try some different graphs with animations! How do you think the animations improve your graphs?
px.bar(df, x="___", y="___", 
       color="___",
       barmode="group",      # can change how bars are displayed by changing barmode
       animation_frame="___") 

What other animations can you come up with? What types of graphs work best with animations?

Also, try adding other parameters such as these:
* `hover_name=___` to show labels for different data points
* `size_max=___` (in scatter plots) to change the max scaling size for the data points (circles)
* add a title and axis labels to your graphs

In [None]:
# Try using the animation parameters with different types graphs 

In [None]:
# To save your interactive graphs offline as an html file, use the following code (using "gap" graph as an example):
py.offline.plot(gap, filename='gap.html')  
   # py = the plotly library
   # gap = the name assigned to the gapminder demo graph
   # gap.html = the file name you want to assign (must end in html)