# Basic basic basic Matplotlib

This is the "starter" version of this notebook. To run it, make sure you've installed Python 3, Pandas, and Matplotlib.

* [Very short video](https://www.youtube.com/watch?v=SiCyTcudoSE) 

* [Pyplot tutorial](https://matplotlib.org/1.4.2/users/pyplot_tutorial.html); includes a list of options for line charts. 

* [Text properties and layout](https://matplotlib.org/1.4.2/users/text_props.html)

* [Matplotlib colors](https://matplotlib.org/2.0.2/api/colors_api.html)

* [List of named colors](https://matplotlib.org/3.1.0/gallery/color/named_colors.html)


In [None]:
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib notebook

In [None]:
# create some simple data
year = [1950, 1970, 1990, 2010]
population = [2.519, 3.692, 5.263, 6.972]

In [None]:
# draw a plot
# x axis first, y axis second 
plt.plot(year, population)

In [None]:
# scatterplot 
# x axis first, y axis second 
plt.scatter(year, population)

In [None]:
# add labels to x and y axis
plt.ylabel('People on Earth (Billions)')
plt.xlabel('YEARS')

# title above chart
plt.title('Population Growth')

plt.scatter(year, population)

In [None]:
# add gridlines
# zorder used to put grid behind dots
# basic colors are rgb (red, green, blue) and cmyk (cyan, magenta, yellow, black)
# alpha sets transparency

fig, ax = plt.subplots()
ax.grid(zorder=0, color='r', alpha=0.3)

# to make dots larger in scatterplot, use s=number
plt.scatter(year, population, zorder=3, color='c', s=200)


See this [insanely detailed StackOverflow post](https://stackoverflow.com/questions/14827650/pyplot-scatter-plot-marker-size#targetText=The%20standard%20size%20of%20points,is%20hence%201%2F72%20inches.) for more info about markers and dots in Matplotlib plots. 

In [None]:
# here 'rs' means 'red square' and ms= sets the marker size
plt.plot(year, population, 'rs', ms=10)

In [None]:
# here 'c^' means 'cyan triangle' and ms= sets the marker size
plt.plot(year, population, 'c^', ms=15)

## Import a CSV and plot a dataframe

In [None]:
# now some pandas work to get data to be plotted 
# plot 4 countries' GDP per capita using a (tiny) CSV 

# import CSV 
df = pd.read_csv('../data/gdp_data_sm.csv')

# how many rows, columns?
df.shape

In [None]:
# show first 5 rows (there are only 4)
df.head()

[Data source](https://databank.worldbank.org/reports.aspx?source=2&series=NY.GDP.PCAP.CD&country=#) (World Bank)

In [None]:
# change the index column 
df = df.set_index('country')

In [None]:
df.head()

In [None]:
# note, the index does not count as a "column" 
df.columns

In [None]:
# check to see whether all values are floats
df.dtypes

In [None]:
# apply a function to divide every value by 1000 before plotting; limit decimal places to 2 
# note, all columns must be numeric for this to work 
df = df.apply( lambda val: round( (val / 1000), 2 ) )

In [None]:
df.head()

In [None]:
# ready to plot! 
# simpler than you might think - see below for what T does
df.T.plot()

In [None]:
# T transforms the table, making a simple plot possible
# plot() wants to plot COLUMNS as lines 
df.T

## Make some enhancements to the chart

In [None]:
# create chart with title above chart
ax = df.T.plot(title='GDP Per Capita')

# add labels to x and y axis
ax.set_xlabel('YEARS')
ax.set_ylabel('Thousands (US 2010 dollars)')


In [None]:
# alternative code, same chart
plt.figure()
df.T.plot(title='GDP Per Capita')
plt.xlabel('YEARS')
plt.ylabel('Thousands (US 2010 dollars)')
plt.show()

In [None]:
# change the color palette 
colors=['mediumpurple', 'forestgreen', 'magenta', 'steelblue']

plt.figure()
df.T.plot(title='GDP Per Capita', color=colors)
plt.xlabel('YEARS')
plt.ylabel('Thousands (US 2010 dollars)')
plt.show()

In [None]:
# change the line styles
styles = ['bs-', 'y^-', 'k^-', 'ro-']

plt.figure()
df.T.plot(title='GDP Per Capita', style=styles)
plt.xlabel('YEARS')
plt.ylabel('Thousands (US 2010 dollars)')
plt.show()