# Plotting

In this notebook, we'll start plotting things out using the `matplotlib` library. 

Most notebooks will add lines like the below at the start, to indicate early on what modules have been imported. You want to do this early on for neatness and for others to know what modules they need to run your code, so even if you realise later on that you want to add modules, you should still come all the way to the top and import them.

`%matplotlib inline` is a special command that configures the plotting library, matplotlib, to show its results inline.

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

In [None]:
data = pd.read_csv("gdp_asia.csv", index_col = "country")
data.head()

<hr>

To plot, we just use the handy `.plot()` function, built into each DataFrame:

However, the graph looks like someone trampled all over some leaves. Our secondary school maths teachers would not be pleased. Let's fix things up a bit by making adjustments to the `plt` module. See if you can figure out what each line means.

Note the ordering--you plot _first_, in the first line, then you adjust the settings!

Now, how about showing more than one country at a time? You can just plot each of them separately.

### <font color="red">Exercise 1: Singapore and Neighbours</font>

Plot the GDP per capita growth of Singapore vs. 2 of our ASEAN neighbours. 

Look through the `.plot` [documentation](https://matplotlib.org/api/_as_gen/matplotlib.pyplot.plot.html) to make Singapore's plot line red, please. Happy National Day!!!

### <font color="red">Exercise 2: Minimum & Maximum GDP in Europe</font>

Fill in the blanks below to plot the minimum GDP per capita over time for all the countries in Europe. Modify it again to plot the maximum GDP per capita over time for Europe.

In [None]:
data_europe = pd.read_csv('gdp_europe.csv', index_col="country")

In [None]:
#hint: remember the .min() and .max() operations we could call on dataframes?

### <font color="red">Exercise 3: Other plots</font>

Try this out. What does this plot? What does the `s` parameter refer to? What is 1e6?

In [None]:
data_all = pd.read_csv('gdp_pop_all.csv', index_col="country")
data_all.plot(kind='scatter', x='gdpPercap_2007', y='lifeExp_2007',
              s=data_all['pop_2007']/1e6)

#### Adding preset styles to your plots
Thankfully, matplotlib also has some preset themes that we can use to make our plots looks nice.

## Plot.ly

<p><b>Getting Started:</b> https://plotly.com/python/getting-started-with-chart-studio/</p>
<p><b>Full Documentation:</b> https://plotly.com/python/chart-studio/ </p>

Plotly is a data visualization toolbox that’s compatible with Jupyter notebooks. It has an offline mode that allows you to save plots locally inside Jupyter notebooks as well as an online mode that allows you to save plots to a Plotly account. Plotly is nice because you can dynamically zoom in on plot sections and hover over your plot to identify specific data points.

Note: when you save your notebooks with plotly plots inside them, they can be quite large (as it saves all the generated images inside the notebook). To save without the plots, simply clear your output by going to Cell > All Output > Clear.

#### Installing

**If using in Google Colab:**
Plotly is available by default.

**If using in Anaconda:**
The chart-studio module doesn't come with the default installation of Anaconda. To install, open Command Prompt (or Terminal if on Mac) with admin rights and run the following command: ``conda install -c plotly chart-studio``

As an example, to install chart-studio on my Anaconda installation on Windows, I had to:
1. Open Command Prompt with admin priviledges
2. Type: ``conda install -c plotly chart-studio``
3. When it asks for yes/no confirmation, type 'y'

If all goes well, you'll be able to import the libraries below!

In [None]:
#import modules
import pandas as pd
import plotly
import plotly.graph_objs as go

### <font color="red">Plotly Basics</font>

Remember our GDP plots up above using matplotlib? Let's try to reproduce them. We'll plot the GDP per capita growth of Singapore from 1952 to 2007 using plot.ly.

In [None]:
data = pd.read_csv("gdp_asia.csv", index_col = "country")


Plotly has great documentation. For example, if you want more info on text and annotation options: plot.ly/python/text-and-annotations/ has many examples

### <font color="red">Exercise 4: Plotting Multiple Countries</font>

Plot the GDP per capita growth of Singapore alongside two other ASEAN countries from 1952 to 2007 using plotly:

### <font color="red">Exercise 5: Plotly Practice</font>

So far we've only plotted scatter plots with plotly. What about a bar graph? You'll notice we use 'plotly.graph_objs.Scatter' in our above examples. To plot a bar graph, we'll use **plotly.graph_objs.Bar**. We'll be using the CSV located on plotly's github below:

In [None]:
#Here's some starter code:

df = pd.read_csv("https://raw.githubusercontent.com/plotly/datasets/master/school_earnings.csv")
df.head()
#This is a dataset hosted by plotly for practice use.

### <font color="red">Exercise 5b: Plotly Practice</font>

Now try splitting the data into three separate bars per entry: one for men, one for women, and one for the gap. For example, under the MIT entry there will be three bars, under the Stanford entry there will be another 3 bars, and so on.

In [None]:
# Hint: create a different trace for each bar

### More Features

### Formatting DataFrames

In [None]:
import plotly.figure_factory as ff
data = pd.read_csv("https://raw.githubusercontent.com/plotly/datasets/master/school_earnings.csv")

table = ff.create_table(data)
table.show()

### Semi-Logarithmic Plots

In [None]:
import plotly
import pandas as pd
df = pd.read_csv('http://www.stat.ubc.ca/~jenny/notOcto/STAT545A/examples/gapminder/data/gapminderDataFiveYear.txt', sep='\t')
df2007 = df[df.year==2007]
df1952 = df[df.year==1952]
df.head(2)

fig = {
    'data': [
        {
            'x': df2007.gdpPercap, 
            'y': df2007.lifeExp, 
            'text': df2007.country, 
            'mode': 'markers', 
            'name': '2007'},
        {
            'x': df1952.gdpPercap, 
            'y': df1952.lifeExp, 
            'text': "df1952.country", 
            'mode': 'markers', 
            'name': '1952'}
    ],
    'layout': {
        'xaxis': {'title': 'GDP per Capita', 'type': 'log'},
        'yaxis': {'title': "Life Expectancy"}
    }
}

go.Figure(fig).show()

### Stacked Bar Graphs

In [None]:
import numpy as np

N = 20
x = np.linspace(1, 10, N)
y = np.random.randn(N)+3
y2 = np.random.randn(N)+6
y3 = np.random.randn(N)+9
y4 = np.random.randn(N)+12
df = pd.DataFrame({'x': x, 'y': y, 'y2':y2, 'y3':y3, 'y4':y4})
df.head()

data = [
    go.Bar(
        x=df['x'], # assign x as the dataframe column 'x'
        y=df['y']
    ),
    go.Bar(
        x=df['x'],
        y=df['y2']
    ),
    go.Bar(
        x=df['x'],
        y=df['y3']
    ),
    go.Bar(
        x=df['x'],
        y=df['y4']
    )

]

layout = go.Layout(
    barmode='stack',
    title='Stacked Bar with Pandas'
)

fig = go.Figure(data=data, layout=layout)

fig.show()

### Histograms (Distribution Plots)

In [None]:
data = pd.read_csv("gdp_pop_all.csv", index_col = "country")
life_exp = ff.create_distplot([data.lifeExp_1952, data.lifeExp_2007], ['Life Expectancy 1952', 'Life Expectancy 2007'], bin_size=2)
life_exp.show()

### We'll stop introducing Plot.ly here
- there's still plenty more that can be done with plot.ly, but this should serve as a good introduction to create basic plots.
- if you're curious about some examples of even more complex plots, you can easily google for more!