# Matplotlib - Unit 01 - Introduction

## <img width="3%" height="3%" align="top"  src="https://codeinstitute.s3.amazonaws.com/predictive_analytics/jupyter_notebook_icons/Icon%201%20-%20Lesson%20Learning%20Outcome.png"> Lesson Learning Outcome

* **Matplotlib Lesson consists of 4 units.**
* By the end of this lesson, you should be able to:
  * Understand what Figures and Axes are
  * Understand and use the major plot types from Matplotlib
  * Customize your plots and save high-resolution figures

---

## <img width="3%" height="3%" align="top"  src="https://codeinstitute.s3.amazonaws.com/predictive_analytics/jupyter_notebook_icons/Icon%202%20-%20Unit%20Objective.png"> Unit Objectives

* Understand what Figures and Axes are



---

<img width="3%" height="3%" align="top"  src="https://codeinstitute.s3.amazonaws.com/predictive_analytics/jupyter_notebook_icons/Icon%207-%20Note.png"> Matplotlib is one of the most popular libraries used for data visualization. 
* It is a very powerful plotting library useful when handling data in NumPy arrays and Pandas Dataframes, offering a wide set of 2D and 3D plots that help you in your Data Science projects.
* Generally speaking, it is easy to get started with simple plots in Matplotlib. The plots are very customizable in general, offering you great control of the plot's elements.

<img width="3%" height="3%" align="top"  src=" https://codeinstitute.s3.amazonaws.com/predictive_analytics/jupyter_notebook_icons/Question%20mark%20icon.png
">
 **Why do we study Matplotlib?**
  * Data visualisation is presenting data pictorially or graphically.
  * Presenting your data analytics visually enables business problem owners to grasp complex concepts and identify patterns in the data. 
  * Because Data Visualisation allows you to report the data story and findings properly.
  * Matplotlib provides you with the capability to plot any data structure while using minimum memory and CPU in any tool.  


## <img width="3%" height="3%" align="top"  src="https://codeinstitute.s3.amazonaws.com/predictive_analytics/jupyter_notebook_icons/Icon%203%20-%20Additional%20Learning%20Context.png"> Additional Learning Context

* We encourage you to:
  * Add **code cells and try out** other possibilities, ie.: play around with parameter values in a function/method, or consider additional function parameters etc.
  * Also, **add your comments** in the cells. It can help you to consolidate your learning. 

* Parameters in a given function/method
  * As you may expect, a given function in a package may contain multiple parameters. 
  * Some of them are mandatory to declare; some have pre-defined values, and some are optional. We will cover the most common parameters used/employed within Data Science for a particular function/method. 
  * However, you should seek additional information in the respective package documentation, where you will find instructions on how to use a given function/method. The studied packages are open source, so this documentation is public.
  * **For Matplotlib the link is [here](https://matplotlib.org/)**.

---

## <img width="3%" height="3%" align="top"  src="https://codeinstitute.s3.amazonaws.com/predictive_analytics/jupyter_notebook_icons/Icon%204%20-%20Import%20Package%20for%20Learning.png"> Import Package for Learning

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt # this is how you import pyplot module from matplotlib
import seaborn as sns # used for loading datasets

---

## <img width="3%" height="3%" align="top"  src="https://codeinstitute.s3.amazonaws.com/predictive_analytics/jupyter_notebook_icons/Icon%2010-%20Lesson%20Content.png"> Why plt and not matplotlib?

  * There is no `plt` module within matplotlib. 
  * In fact, in the majority of the use cases in Data Science, we use a matplotlib submodule called **`pyplot`**, which is imported under the **alias `plt`**

import matplotlib.pyplot as plt

---

## <img width="3%" height="3%" align="top"  src="https://codeinstitute.s3.amazonaws.com/predictive_analytics/jupyter_notebook_icons/Icon%2010-%20Lesson%20Content.png"> Figures And Axes

Matplotlib plots use 2 terminologies: Figures and Axes. 
To illustrate the concept, let's use a function called **plt.plot()**, which plots y versus x, as a line. The documentation can be found [here](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.plot.html)
  * This is useful to plot one number against another.
  * The default is to draw a line between the points.
  * In this case, we are parsing only one y-axis.
  * The data is a NumPy array made with `np.random.rand()`

np.random.seed(seed=19)
plt.plot(np.random.rand(50)) # try adding the parameter 'o' to change from a line to points
plt.show()

Please note the following components

* A `Figure` object is shown in green.
* An `Axes` object is generated, which is the line chart shown in red.
* The plot information, shown in blue, such as the x and y-axis are displayed in the `Axes`

* An analogy we could make here is:
  * The Figure is a piece of paper where you can draw anything
  * We can draw one plot, in this case, we will have one Axis. Or we can draw multiple plots, say three plots. In that case, we would have three Axes.

---

## <img width="3%" height="3%" align="top"  src="https://codeinstitute.s3.amazonaws.com/predictive_analytics/jupyter_notebook_icons/Icon%2010-%20Lesson%20Content.png"> A Figure with 1 Axis

When we draw one plot or one Axis, it is not necessary to state we are drawing one Axis in one Figure. We just plot on the Figure (or on "the piece of paper") using `plt.plot()`
  * `plt.plot()` gives the command to plot the data.
  * When producing multiple plots in the same cell, we need to use `plt.show()`, so the code will understand it needs to produce a set of plots. 

np.random.seed(seed=19)
plt.plot(np.random.rand(50))
plt.show()

If you want to adjust the Figure size using the approach above, add `plt.figure()` before setting the plot and use the `figsize` parameter.
  * This function allows us to change Figure attributes. In this case, we set the **Figure width and height**, in inches, using a tuple. The documentation is found [here](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.figure.html)

np.random.seed(seed=19)
plt.figure(figsize=(12,5))  # line added to adjust Figure size
plt.plot(np.random.rand(50))
plt.show()

When we plot with `plt.SomeFunction()` - like `plt.plot()` we used above - **we automatically create a Figure and an Axes inside the Figure.**
  *  This is useful when we want to create a single plot

<img width="3%" height="3%" align="top"  src="https://codeinstitute.s3.amazonaws.com/predictive_analytics/jupyter_notebook_icons/Icon%207-%20Note.png"> Alternatively, **we can state clearly we will draw one Axis on the Figure**.
  * This can be done with `plt.subplots()`. 
    * In this example, we are not parsing any argument. 
    * The function documentation is found [here](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.subplots.html)

fig, axes = plt.subplots()
plt.plot(np.random.rand(50))
plt.show()

You will notice `plt.subplots()` returned  2 variables: **fig and axes**
  * You can name these variables what you want. However, fig and axes are more intuitive.

But, what is fig? Let's check its type

type(fig)

It is a Figure type. Now run fig to see its content

fig

It is the plot itself. Now, what are axes? Let's check its type

type(axes)

It is the Axes. 
  * Or the available places you defined that you want to plot in that Figure

When you check axes content, it will not be particularly informative. 
  * However, you will change the Axes content programmatically soon

axes

 This approach allows matplotlib to have an **object-oriented interface**, where you define your objects - `Figure` and `Axes` - and work on them.

---

## <img width="3%" height="3%" align="top"  src="https://codeinstitute.s3.amazonaws.com/predictive_analytics/jupyter_notebook_icons/Icon%2010-%20Lesson%20Content.png"> A figure with multiple Axes

<img width="3%" height="3%" align="top"  src="https://codeinstitute.s3.amazonaws.com/predictive_analytics/jupyter_notebook_icons/Icon%207-%20Note.png"> We can use `plt.subplots()` to draw more plots in our Figure with the parameters `nrows` and `ncols`
* You will notice another parameter: `figsize()`.
  * Here you set the plot's width and height, in inches, in a tuple

fig, axes = plt.subplots(nrows=3, ncols=3, figsize=(10,5))
plt.show()

 You will notice the plot above is not useful yet
  * For the moment check axes content
  * You will notice it is a set of Axes, in this case, 9 Axes stored in a array

axes

Since it is an array, we can access each Axis and plot the content we are interested in displaying.
  * Note that we left multiple Axes blank intentionally
  * We accessed `axes[0,0]` and gave the command to plot a Numpy array, using `.plot()`
  * We accessed `axes[2,2]` and gave the command to plot a Numpy array, using `.plot()`

fig, axes = plt.subplots(nrows=3, ncols=3)
axes[0,0].plot(np.random.rand(25))
axes[2,2].plot(np.random.rand(200))
plt.show()

---

<img width="3%" height="3%" align="top"  src="https://codeinstitute.s3.amazonaws.com/predictive_analytics/jupyter_notebook_icons/Icon%207-%20Note.png"> You will adjust the approach for grabbing the Axes according to their size.
  * In the case where your Figure has only one row, you will use the notation below to grab the Axes.
  * In the previous example, you accessed the first plot using `axes[0,0]`, but here you only require `axes[0]`





fig, axes = plt.subplots(nrows=1, ncols=2, figsize=(15,2))
axes[0].plot(np.random.rand(5))
axes[1].plot(np.random.rand(25))
plt.show()

---

<img width="3%" height="3%" align="top"  src="https://codeinstitute.s3.amazonaws.com/predictive_analytics/jupyter_notebook_icons/Icon%205%20-%20Practice.png"> **PRACTICE**: You should practice what you covered in this unit.

Feel free to try out your ideas or use the following suggestion.

* Create a figure with multiple plots having 2 rows and 3 columns.
* Using np.random.rand, display data on two of the plots.

# Write your code here.


# Matplotlib - Unit 02 - Matplotlib Plots Part 01

## <img width="3%" height="3%" align="top"  src="https://codeinstitute.s3.amazonaws.com/predictive_analytics/jupyter_notebook_icons/Icon%202%20-%20Unit%20Objective.png"> Unit Objectives

* Learn and use Matplotlib for line plot, histograms, boxplot and Scatter plot



---

## <img width="3%" height="3%" align="top"  src="https://codeinstitute.s3.amazonaws.com/predictive_analytics/jupyter_notebook_icons/Icon%204%20-%20Import%20Package%20for%20Learning.png"> Import Package for Learning

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns # used for loading datasets

---

## <img width="3%" height="3%" align="top"  src="https://codeinstitute.s3.amazonaws.com/predictive_analytics/jupyter_notebook_icons/Icon%2010-%20Lesson%20Content.png"> Matplotlib Plots - Part 01

<img width="3%" height="3%" align="top"  src="https://codeinstitute.s3.amazonaws.com/predictive_analytics/jupyter_notebook_icons/Icon%207-%20Note.png"> In our last unit, we learned how to use `plt.plot()` to plot data in the x and y axis.
* We will study more examples in this unit.
  * However we will not study these examples in detail, such as setting and customising plot parameters, like title, axis, grid, colour. We will leave such details until the next unit.
  * However, there will be cases in this unit where we will introduce a little customisation to better present the examples

* We will study the following plot types in this unit:
  * Line Plot
  * Histogram
  * Boxplot
  * Scatter Plot
  * Areaplot
  * Bar plot
  * Pie chart
  * 3D Plot
  * Images


* The matplotlib gallery with documented use cases can be found [here](https://matplotlib.org/stable/gallery/index.html)

We will consider NumPy arrays and Pandas DataFrames (loaded from Seaborn dataset's library)

import seaborn as sns
sns.get_dataset_names()

### <img width="3%" height="3%" align="top"  src="https://codeinstitute.s3.amazonaws.com/predictive_analytics/jupyter_notebook_icons/Icon%2010-%20Lesson%20Content.png"> Line Plot

In the previous unit, we showed an example using `plt.plot() ` with NumPy array data. Matplotlib handles data structure either with NumPy or Pandas. The next example will consider Pandas.
  * Consider the `flights` dataset from Seaborn

df = sns.load_dataset('flights').head(50)
print(df.shape)
df.head()

We process the variables, so that the `index` becomes a date type and is in a nice format. We will use this date index in an upcoming chart. Note that we set the Year as a string as typically you use the year as a category rather than an integer. 

df['year'] = df['year'].astype('str')
df['month'] = df['month'].astype('str')
df['Date'] = pd.to_datetime(df['month'] + '-' + df['year'] )
df.set_index('Date',inplace=True)
df.head()

We plot the passengers' quantity. The x-axis, when not stated explicitly, is the index.

plt.plot(df['passengers'])
plt.show()

Already you can see a pattern of increasing passenger numbers over time. We are now interested to plot multiple variables in the same line plot.
  * First this we will generate some data using `np.random.randn()`, where the number of values is the number of rows in the DataFrame - note `df.shape[0]`

np.random.seed(100)
df['Col1'] = 300 + 20 * np.random.randn(df.shape[0])
df.head()

Using `plt.plot()`, we subset both variables.
  * In this case bracket notation was used, but you could use any approach you learned so far (like `.filter()`).
  * Run this code and you will see this numerical data from Col1 plotted on the same chart as the passengers data. 

plt.plot(df[['passengers','Col1']])
plt.show()

### <img width="3%" height="3%" align="top"  src="https://codeinstitute.s3.amazonaws.com/predictive_analytics/jupyter_notebook_icons/Icon%2010-%20Lesson%20Content.png"> Histograms
We use `plt.hist()` to plot histograms in Matplotlib. The documentation can be found [here](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.hist.html)
* A histogram is a bar graph where the bar height on the y-axis is a count of occurrences in the data of the classes in the horizontal axis. 
* Sometimes the data is said to be ordered into buckets along the x-axis. 
* Let's consider the penguins dataset. It has records for three species of penguins, collected from three islands in the Palmer Archipelago, Antarctica.

df = sns.load_dataset('penguins')
df = df.sample(n=50, random_state=1)
df.head(3)

* In `plt.hist()` you will parse the following arguments:
  * `data`, which is, in this case, the DataFrame. It could be your array
  * `x`, which is the variable
  * `bins`, which are the buckets where your data will be grouped

fig, axes = plt.subplots(nrows=1,ncols=2,figsize=(10,5))
axes[0].hist(data=df, x='bill_depth_mm')
axes[1].hist(data=df, x='bill_depth_mm', bins=50)
plt.show()

From this plot you can easily see the most commonly occurring bill lengths in mm for these penguins. The bill length values have been bucketed into fifty bins. 
In a case where you want to plot multiple variables within a Matplotlib histogram, you can loop over your variables using `plt.hist()`
* Then you add `plt.legend()`, so a legend is added to the Figure. The function documentation is found [here](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.legend.html)
* Finally, you plot the Figure with `plt.show()`

for col in ['bill_length_mm','bill_depth_mm']:
  plt.hist(data=df,x=col,label=col)
plt.legend()
plt.show()

### <img width="3%" height="3%" align="top"  src="https://codeinstitute.s3.amazonaws.com/predictive_analytics/jupyter_notebook_icons/Icon%2010-%20Lesson%20Content.png"> Boxplot

A boxplot is a useful chart to visually show the distribution or spread of data. It is sometimes known as a box and whisker plot. The box extends from the first to the third quartile with the centre line showing the median. The whiskers' overall length is 1.5 times longer than the box. The circles show any data outside the whiskers and are called fliers or outliers. Do not worry about what any of these terms mean as you will learn about statistics in an upcoming lesson. 



We will consider the iris dataset for delivering a boxplot. This dataset contains records of 3 classes of iris plants, with petal and sepal measurements.

df = sns.load_dataset('iris')
df = df.sample(n=50, random_state=1)
df.head(3)

We will use `plt.boxplot()`. The documentation can be found [here](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.boxplot.html)
* Ther arguments are data, where you parse the dataset, and x, where you inform the variable you want the boxplot.

plt.boxplot(data=df,x='sepal_length')
plt.show()

You can use `plt.xticks()` to set the ticks on x axis. The documentation is found [here](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.xticks.html)
* In the cell below we are plotting multiple variables, we set them in a list col_boxplot.
* In a loop, we append the content of each variable to another list. This list will be used to plot the data in a boxplot.
* Finally we set the variables name in the x axis with `plt.xticks()`  



list_of_data = []
col_boxplot = ['sepal_length',	'sepal_width','petal_length',	'petal_width'	]
for col in col_boxplot:
  list_of_data.append(df[col])

plt.boxplot(x=list_of_data)
plt.xticks(np.arange(1,len(col_boxplot)+1), col_boxplot)
plt.show()


<img width="3%" height="3%" align="top"  src=" https://codeinstitute.s3.amazonaws.com/predictive_analytics/jupyter_notebook_icons/Challenge%20test.png
">
 At first, this code may look too difficult to plot a boxplot using multiple variables. And you are right; there are alternative ways to do this task. The idea here is to present the numerous existing alternatives when deciding to pick your tool for a particular use case in the workplace.
 What the above plot tells us about iris flower data is that sepal widths are tightly distributed about 3mm but petal length varies a lot and is skewed towards the high end of its distribution. 

### <img width="3%" height="3%" align="top"  src="https://codeinstitute.s3.amazonaws.com/predictive_analytics/jupyter_notebook_icons/Icon%2010-%20Lesson%20Content.png"> Scatter Plot
We are using the penguins dataset for this exercise. It has records for 3 different species of penguins, collected from 3 islands in the Palmer Archipelago, Antarctica. A scatter plot is one that uses dots to plot two different numeric variables. They are useful to see if there is a relationship between two variables. In the above plot, you can see that a penguin with a longer flipper tends to be heavier.

df = sns.load_dataset('penguins')
df = df.sample(n=50, random_state=1)
df.head()

We use `plt.scatter()` to plot a scatter plot in Matplotlib. The documentation is found [here](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.scatter.html)

Let's plot a scattor plot setting `kind='scatter'`, `x='flipper_length_mm'` and `y='body_mass_g'`

x, y = 'flipper_length_mm' , 'body_mass_g'
plt.scatter(data=df, x=x, y=y)
plt.ylabel(y)
plt.xlabel(x)
plt.show()

You can add the parameter `c`, for colouring the plot based on another variable. The colourmap options are found [here](https://matplotlib.org/stable/tutorials/colors/colormaps.html). In this case, we set `c='body_mass_g'`

x, y = 'flipper_length_mm' , 'body_mass_g'
plt.scatter(data=df, x=x, y=y, c=y)
plt.ylabel(y)
plt.xlabel(x)
plt.show()

Noe we added the parameter `s`, for size. The dots' size will be defined according to `body_mass_g` levels.

x, y = 'flipper_length_mm' , 'body_mass_g'
plt.scatter(data=df, x=x, y=y, c=y, s=y)
plt.ylabel(y)
plt.xlabel(x)
plt.show()

You can rescale the dot's size. If you parse `s=y/60`, the code will not work, since it is a string (y) divided by a number. In this case, you should parse the array df[y], then divide by a number for scaling.

x, y = 'flipper_length_mm' , 'body_mass_g'
plt.scatter(data=df, x=x, y=y, c=y, s=df[y]/60)
plt.ylabel(y)
plt.xlabel(x)
plt.show()

<img width="3%" height="3%" align="top"  src="https://codeinstitute.s3.amazonaws.com/predictive_analytics/jupyter_notebook_icons/Icon%205%20-%20Practice.png"> **PRACTICE**: We will use the DataFrame below to practice with.

df_practice = sns.load_dataset('tips')
df_practice = df_practice.sample(n=50, random_state=1)
df_practice.head(10)

Feel free to try out your ideas or use the following suggestion.

Display total_bill and tip on one or more charts of your choice, using what you learned in this unit.

# Write your code here.


# Write your code here


# Matplotlib - Unit 03 - Matplotlib Plots Part 02

## <img width="3%" height="3%" align="top"  src="https://codeinstitute.s3.amazonaws.com/predictive_analytics/jupyter_notebook_icons/Icon%202%20-%20Unit%20Objective.png"> Unit Objectives

* Learn and use Matplotlib for bar plot, pie chart, 3D plot and images


---

## <img width="3%" height="3%" align="top"  src="https://codeinstitute.s3.amazonaws.com/predictive_analytics/jupyter_notebook_icons/Icon%204%20-%20Import%20Package%20for%20Learning.png"> Import Package for Learning

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns # used for loading datasets

---

## <img width="3%" height="3%" align="top"  src="https://codeinstitute.s3.amazonaws.com/predictive_analytics/jupyter_notebook_icons/Icon%2010-%20Lesson%20Content.png"> Matplotlib Plots - Part 02

In this unit we will learn and deliver:
* bar plot
* pie chart
* 3D plot
* image plotting with Matplotlib

---
### <img width="3%" height="3%" align="top"  src="https://codeinstitute.s3.amazonaws.com/predictive_analytics/jupyter_notebook_icons/Icon%2010-%20Lesson%20Content.png"> Bar Plot

You have already seen a variation of the bar plot in the histogram. In a standard bar plot, the x-axis has categories of data and the bar height is proportional to the values in the dataset. You can then visually compare the values for those categories. We are using the penguins dataset. It has records for 3 different species of penguins, collected from 3 islands in the Palmer Archipelago, Antarctica

df = sns.load_dataset('penguins')
df = df.sample(n=50, random_state=1)
df.head(3)

We use `plt.bar()` for bar plot. The documentation is found [here](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.bar.html)
* We are first counting the frequencies with `.value_counts()`, then use` plt.bar()`


categorical_count = df['species'].value_counts()
categorical_count

We parse that to `plt.bar()`, where in the x axis we place the categories name (located in the index) and the height we parse the count

plt.bar(x=categorical_count.index, height=categorical_count)
plt.show()

We use plt.barh() for bar plot in horizontal mode. The documentation is found [here](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.barh.html)
* The arguments are y for the y-axis and width for the categories levels

categorical_count = df['species'].value_counts()
plt.barh(y=categorical_count.index, width=categorical_count)
plt.show()

---

### <img width="3%" height="3%" align="top"  src="https://codeinstitute.s3.amazonaws.com/predictive_analytics/jupyter_notebook_icons/Icon%2010-%20Lesson%20Content.png"> Pie Chart

A pie plot shows one series of data. The area of the circle denotes 100% of the data. Each slice or wedge of the pie shows the relative percentage for that category. It provides a quick visual summary of the relative size of a category.  We are using the penguins dataset. It has records for 3 different species of penguins, collected from 3 islands in the Palmer Archipelago, Antarctica

df = sns.load_dataset('penguins')
df = df.sample(n=50, random_state=1)
df.head(3)

We are using `plt.pie()`. The documentation is found [here](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.pie.html#matplotlib.pyplot.pie)
* We are first counting the frequencies with .value_counts(), then use `plt.pie()`. The arguments are x and labels

categorical_count = df.value_counts('island',normalize=True)
plt.pie(x=categorical_count, labels=categorical_count.index)
plt.show()

Here we can easily see that Biscoe island accounts for half the rows. 

<img width="3%" height="3%" align="top"  src="https://codeinstitute.s3.amazonaws.com/predictive_analytics/jupyter_notebook_icons/Icon%205%20-%20Practice.png"> **PRACTICE**: We will use the DataFrame below to create a chart.

df_practice = sns.load_dataset('tips')
df_practice = df_practice.sample(n=50, random_state=1)
df_practice.head(10)

Feel free to try out your ideas or use the following suggestion.


Using the df_practice DataFrame, create a bar or pie chart, choosing which columns of data you wish to display.

# Write your code here.


---

### <img width="3%" height="3%" align="top"  src="https://codeinstitute.s3.amazonaws.com/predictive_analytics/jupyter_notebook_icons/Icon%2010-%20Lesson%20Content.png"> 3D Plot
For a 3D plot in Matplotlib, we use `plt.axes` to create a plot with three axes. The documentation is found [here](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.axes.html)

fig = plt.figure()
ax = plt.axes(projection="3d")

We can draw a 3D Line Plot by adding to an axes the method `.plot3D()`, and parsing the 3D arrays to match the x, y and z axis
* First, we create a Figure with plt.Figure. Then a 3D axes with `plt.axes()`
* x, y and z are generated using NumPy mathematical functions. Don't worry what the functions mean. You may remember sin and cos from school. 
* We plot x, y, and z with `.plot3D()`

fig = plt.figure()
ax = plt.axes(projection='3d')
z = np.linspace(0, 5, 500)
x = z * np.sin(20 * z)
y = z * np.cos(20 * z)
ax.plot3D(x, y, z)
plt.show()


In addition, we can draw a 3D Scatter Plot. The documentation is found [here](https://matplotlib.org/stable/gallery/mplot3d/scatter3d.html)

* We will consider the penguins dataset. It has records for 3 different species of penguins, collected from 3 islands in the Palmer Archipelago, Antarctica

df = sns.load_dataset('penguins')
df = df.sample(n=50, random_state=1)
df.head(3)

The approach is similar to the previous examples, but here we plot by using `.scatter() `and parse the x, y and z data

fig = plt.figure(figsize=(10,8))
ax = plt.axes(projection='3d')

x = df['bill_depth_mm']
y = df['bill_length_mm']
z = df['flipper_length_mm']

ax.scatter(x, y, z)
plt.show()

<img width="3%" height="3%" align="top"  src="https://codeinstitute.s3.amazonaws.com/predictive_analytics/jupyter_notebook_icons/Icon%205%20-%20Practice.png"> **PRACTICE** : Use the DataFrame below to create a 3D scatter chart.

df_practice = sns.load_dataset('tips')
df_practice = df_practice.sample(n=50, random_state=1)
df_practice.head(10)

Feel free to try out your ideas or use the following suggestion.

Create a 3D scatter chart using total_bill, tip, and size. Use your own values for figsize.

# Write your code here.


---

### <img width="3%" height="3%" align="top"  src="https://codeinstitute.s3.amazonaws.com/predictive_analytics/jupyter_notebook_icons/Icon%2010-%20Lesson%20Content.png"> Images

Matplotlib can handle image datasets as you saw in the NumPy lesson. You can read the data as a NumPy array using plt.imread(). The documentation is found [here](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.imread.html)
* First, you will need your image location. In this case, it is a URL. You can ignore the matplotlib warning: "Passing a URL is deprecated" and suggesting using another library for managing the task of loading url images. In the course, we will use this command to load images from the local directory.
* the second argument is optional and is the format. If not given, the format is deduced from the filename. If nothing can be deduced, PNG is tried.

image_url = "https://ci-jshint.herokuapp.com/static/images/logo.png"
image = plt.imread(image_url, format='png')

You can check its type

type(image)

You can plot the image using `plt.imshow()`. The documentation is found [here](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.imshow.html)

plt.imshow(image)
plt.show()

In practical projects, other libraries are typically used to load images directly from the Internet (web scraping). 
* We will have exercises in future lessons where we will download images from the internet.
* As Matplotlib documentation suggests, we will be using other libraries for managing the task of loading url images.

---

# Matplotlib - Unit 04 - Customizing your plots

## <img width="3%" height="3%" align="top"  src="https://codeinstitute.s3.amazonaws.com/predictive_analytics/jupyter_notebook_icons/Icon%202%20-%20Unit%20Objective.png"> Unit Objectives

* Customise your plots by adding titles and legends, changing plot layout, adjusting line colour and marker style, adding horizontal or vertical lines, changing the colourmap, updating the grid or adding annotations.
* Save your plots in high resolution.



---

## <img width="3%" height="3%" align="top"  src="https://codeinstitute.s3.amazonaws.com/predictive_analytics/jupyter_notebook_icons/Icon%204%20-%20Import%20Package%20for%20Learning.png"> Import Package for Learning

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns # used for loading datasets

---

## <img width="3%" height="3%" align="top"  src="https://codeinstitute.s3.amazonaws.com/predictive_analytics/jupyter_notebook_icons/Icon%2010-%20Lesson%20Content.png"> Customizing your plots

We will study approaches to customizing your plot, like:
* Mix multiple plot types
* Title, legend and layout
* Line style, colour and marker
* Horizontal and Vertical Lines
* Grid
* Annotation
* Save a plot

---

### <img width="3%" height="3%" align="top"  src="https://codeinstitute.s3.amazonaws.com/predictive_analytics/jupyter_notebook_icons/Icon%2010-%20Lesson%20Content.png"> Mix multiple plot types in a Figure

We will consider the penguins dataset. It has records for 3 different species of penguins, collected from 3 islands in the Palmer Archipelago, Antarctica

df = sns.load_dataset('penguins').sample(50, random_state=1)
df.head(3)

Imagine you need 3 plots in a Figure
  * A bar plot showing the species distribution.
  * A pie plot showing the proportion of different islands in the dataset.
  * A scatter plot describing the relationship between '`flipper_length_mm`' and  '`body_mass_g`'.

* We will create a Figure and multiple Axes using `plt.subplots()`. For each Axes, we plot the respective chart type
  * Note: the generated Figure will miss some important visualisation elements, like legend, title, axis label and Figure layout. We will cover that in the next section. The exercise objective is to demonstrate we can use multiple different plot types in a Figure.

fig, axes = plt.subplots(nrows=1, ncols=3, figsize=(13,5))

categorical_count = df['species'].value_counts()
axes[0].bar(x=categorical_count.index, height=categorical_count)


categorical_count = df.value_counts('island',normalize=True)
axes[1].pie(x=categorical_count, labels=categorical_count.index)


axes[2].scatter(data=df, x='flipper_length_mm', y= 'body_mass_g')

plt.show()

Let's see how that would be done when using NumPy arrays
  * Consider three numeric arrays created from mathematic functions
    * x is generated with `np.linspace()`, which returns an evenly spaced number set over a specified interval
    * y1, is the sine of x, made with `np.sin()`
    * y2, is x times sine of x, made with `np.sin()`

x = np.linspace(start=0, stop=10, num=500)
y1 = np.sin(x)
y2 = x * np.sin(x)

We will create a Figure with 3 Axes and draw a line plot in the first, and a histogram for y1 and y2 in the second and third Axes, respectively

fig, axes = plt.subplots(nrows=3, ncols=1, figsize=(8,10))

axes[0].plot(x, y1, label='sin(x)')
axes[0].plot(x, y2, label='x * sin(x)')
axes[0].legend()

axes[1].hist(y1, label='sin(x)')
axes[1].legend()

axes[2].hist(y2, label='x * sin(x)')
axes[2].legend()

plt.show()

---

### <img width="3%" height="3%" align="top"  src="https://codeinstitute.s3.amazonaws.com/predictive_analytics/jupyter_notebook_icons/Icon%2010-%20Lesson%20Content.png"> Titles, Legend and Tight Layout

* Although charts are visual you do still need to label them for your user's benefit.
* You can add a title and axes labels.
* In the example below, we consider random data generated with NumPy.
* In a Figure with 1 Axes, you will write before `plt.show()`:
  * `plt.xlabel()` to set x-axis label. The function documentation is [here](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.xlabel.html)
  * `plt.ylabel()` to set y-axis label. The function documentation is [here](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.ylabel.html)
  * `plt.title()` to set title. The function documentation is [here](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.title.html)

np.random.seed(seed=19)
x = np.random.randn(50)

plt.plot(x) 
plt.xlabel('X-Axis')
plt.ylabel('Y-Axis')
plt.title('Plot Title Here')
plt.show()

---

For the next few examples, we created an auxiliary function, that returns `fig` and `axes`, which in this case is **a Figure with 4 plots**. 
  * It draws mathematical functions (sine, cosine), made with NumPy arrays

def MultiplePlots(): 
  np.random.seed(seed=50)
  x = np.linspace(0, 10, 500)

  fig, axes = plt.subplots(nrows=2, ncols=2, figsize=(10,4))
  axes[0,0].plot(x, np.sin(x) )
  axes[0,1].plot(x, x * np.sin(x) )
  axes[1,0].plot(x, x * np.sin(x**2) )
  axes[1,1].plot(x, np.cos(2*x) * np.sin(x) )

  return fig, axes

Call the function to see what plots it generates.

fig, axes = MultiplePlots()
plt.show()

---

Customise the Figure and Axes
* `MultiplePlots()` returns `fig` and `axes`, so you can customise them

  * To access the Axes you want, we use the bracket notation we are already familiar with.
  * `.set_title()`, sets given Axes title. The function documentation is [here](https://matplotlib.org/stable/api/_as_gen/matplotlib.axes.Axes.set_title.html)
  * `.set_xlabel()`, sets x-axis label. The function documentation is [here](https://matplotlib.org/stable/api/_as_gen/matplotlib.axes.Axes.set_xlabel.html)
  * `.set_ylabel()`, sets y-axis label. The function documentation is [here](https://matplotlib.org/stable/api/_as_gen/matplotlib.axes.Axes.set_ylabel.html)

fig, axes = MultiplePlots()

axes[0,0].set_title('sin(x)')
axes[0,0].set_xlabel('Time')
axes[0,0].set_ylabel('Level')

axes[0,1].set_title('x * sin(x)')
axes[0,1].set_xlabel('Time')
axes[0,1].set_ylabel('Level')

axes[1,0].set_title('x * sin(x^2)')
axes[1,0].set_xlabel('Time')
axes[1,0].set_ylabel('Level')

axes[1,1].set_title('cos(2x) * sin(x)')
axes[1,1].set_xlabel('Time')
axes[1,1].set_ylabel('Level')

plt.show()

You noticed in the previous figure the **x-axis values from the upper Axes are overlapping with the titles from the lower Axes**.
* You could:
  * Increase Figure size, using figsize parameter, or
  * Add `plt.tight_layout()`, so the plots will not overlap anymore. The function documentation is found [here](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.tight_layout.html)

fig, axes = MultiplePlots()

axes[0,0].set_title('sin(x)')
axes[0,0].set_xlabel('Time')
axes[0,0].set_ylabel('Level')

axes[0,1].set_title('x * sin(x)')
axes[0,1].set_xlabel('Time')
axes[0,1].set_ylabel('Level')

axes[1,0].set_title('x * sin(x^2)')
axes[1,0].set_xlabel('Time')
axes[1,0].set_ylabel('Level')

axes[1,1].set_title('cos(2x) * sin(x)')
axes[1,1].set_xlabel('Time')
axes[1,1].set_ylabel('Level')

plt.tight_layout()   #### added plt.tight_layout()
plt.show()

When your Figure has multiple plots, you can add a title to each plot, and a **title to the Figure**, applying `.suptitle()` method to your Figure. 
  * The function documentation is found [here](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.suptitle.html)

fig, axes = MultiplePlots()

axes[0,0].set_title('sin(x)')
axes[0,0].set_xlabel('Time')
axes[0,0].set_ylabel('Level')

axes[0,1].set_title('x * sin(x)')
axes[0,1].set_xlabel('Time')
axes[0,1].set_ylabel('Level')

axes[1,0].set_title('x * sin(x^2)')
axes[1,0].set_xlabel('Time')
axes[1,0].set_ylabel('Level')

axes[1,1].set_title('cos(2x) * sin(x)')
axes[1,1].set_xlabel('Time')
axes[1,1].set_ylabel('Level')


fig.suptitle('Different types of functions', fontsize=16, y=1.1) #### added title for the Figure
plt.tight_layout()
plt.show()

You can add and customise your legend
  * Consider 2 arrays plotted in the same Axes
  * The first is a mathematical function of sin(x), and the second is x * sin(x)
  * You will notice the argument label at `plt.plot()`, where you set how you want to call that particular plot.
  * You will notice the labels are not displaying.

x = np.linspace(0, 10, 500)
plt.plot(x, np.sin(x), label='sin(x)')
plt.plot(x, x * np.sin(x), label='x * sin(x)')
plt.show()


What is needed in this case is a legend so the viewer knows which plot is which. We saw previously `plt.legend()`. Now we consider additional arguments
  * `loc`, set the legend at the corresponding corner of the Axes: '`upper left`', '`upper right`', '`lower left`', '`lower right`', '`center`', '`best`'
  * `title`, set legend title
  * and `frameon`, which is a `True` / `False` flag to indicate if you want the frame around the legend

x = np.linspace(0, 10, 500)
plt.plot(x, np.sin(x), label='sin(x)')
plt.plot(x, x * np.sin(x),label='x * sin(x)')
plt.legend(loc='upper left', title='Legend', frameon=False)
plt.show()


When you are using `plt.subplots()`, either one axis or multiple Axes, you label the axes then call `.legend()`
  * The documentation is found [here](https://matplotlib.org/stable/api/_as_gen/matplotlib.axes.Axes.legend.html)

* The example below shows a Figure with one axis

x = np.linspace(0, 10, 500)
fig, axes = plt.subplots()
axes.plot(x, np.sin(x), label='sin(x)')
axes.plot(x, x * np.sin(x),label='x * sin(x)')
axes.legend()
plt.show()

---

The example below shows a Figure with 2 Axes
  * It considers data from NumPy arrays, using `np.linspace()` and `np.sin()`

x = np.linspace(0, 10, 500)
y1 = np.sin(x)
y2 = x * np.sin(x)

fig, axes = plt.subplots(nrows=1, ncols=2, figsize=(10,5))

axes[0].hist(x=y1,label='Histogram of sin(x)')
axes[0].legend(loc='best', frameon=False)

axes[1].plot(x, y1, label='sin(x)')
axes[1].plot(x, y2, label='x * sin(x)')
axes[1].legend(loc='lower left')


plt.show()

---

### <img width="3%" height="3%" align="top"  src="https://codeinstitute.s3.amazonaws.com/predictive_analytics/jupyter_notebook_icons/Icon%2010-%20Lesson%20Content.png"> Line Style, Color and Marker

In the plotting functions we studied, there are arguments to set **line style and colour**

  * Once you find the argument notation for setting the colour, in `plt.plot()` is `color`, you can set the colours, considering the basic options, like **['b', 'g', 'r', 'c', 'm', 'y', 'k']**, or write the [hexadecimal](https://htmlcolorcodes.com/) value of your desired colour. Don't forget to add `#` before the color hex code when parsing to Matplotlib
  * Once you find the argument notation for setting the style, in plt.plot() is `linestyle`. This [link](https://matplotlib.org/3.0.3/gallery/lines_bars_and_markers/line_styles_reference.html) shows options for it: **[ '-' , '--' , '-.' , ':' ]**
  * In addition, you can set line width with linewidth parameter.

* The rule of thumb for customising is to double-check the plotting function, so you are aware of the proper argument notation.

x = np.linspace(0, 10, 500)
fig, axes = plt.subplots(figsize=(8,4))
axes.plot(x, np.sin(x), color='m', linewidth=4 , linestyle=':', label='sin(x)')
axes.plot(x, x * np.sin(x),color='#B35946', linestyle='-.', label='x * sin(x)')
axes.legend()
plt.show()


You can set the **marker** notation
  * The marker options are found [here](https://matplotlib.org/stable/api/markers_api.html#module-matplotlib.markers)
  * The rule of thumb for customising it is to double-check the plotting function, so you can be aware of the proper argument notation. In this case, `.scatter()` has `marker` as the notation to set a marker.

x = np.linspace(0, 10, 500)
fig, axes = plt.subplots(figsize=(10,6))
axes.scatter(x=x,y= np.random.randn(500),marker='2')
plt.show()

---

### <img width="3%" height="3%" align="top"  src="https://codeinstitute.s3.amazonaws.com/predictive_analytics/jupyter_notebook_icons/Icon%2010-%20Lesson%20Content.png"> Add horizontal and vertical lines

You can add horizontal and vertical lines in your Figure to highlight something you are interested in.
  * If your Figure has a single plot, use
    * `plt.axhline()` to add horizontal line. The function documentation is [here](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.axhline.html)
    * `plt.axvline()` to add vertical line. The function documentation is [here](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.axvline.html)

  * Naturally you can customise the line with the aspects we learned so far, like colour or line style

x = np.linspace(0, 10, 500)
plt.plot(x, np.sin(x))
plt.axhline(y=0.5, color='r', linestyle='dashed',linewidth=2)
plt.axvline(x=8, color='g', linestyle=':')

plt.show()

If your Figure has multiple Axes, you will select a given Axes and use:
  * `.axvline()` to add a vertical line. The function documentation is [here](https://matplotlib.org/stable/api/_as_gen/matplotlib.axes.Axes.axvline.html)
  * `.axhline()` to add a horizontal line. The function documentation is [here](https://matplotlib.org/stable/api/_as_gen/matplotlib.axes.Axes.axhline.html)

x = np.linspace(0, 10, 500)
y1 = np.random.randn(500)
y2 = np.cos(x)


fig, axes = plt.subplots(nrows=1, ncols=2, figsize=(10,6))
axes[0].scatter(x=x, y=y1)
axes[0].axvline(x=8,color='#D1349C', linestyle='-')

axes[1].plot(x,y2)
axes[1].axhline(y=0.5, color='g')

plt.show()

---

### <img width="3%" height="3%" align="top"  src="https://codeinstitute.s3.amazonaws.com/predictive_analytics/jupyter_notebook_icons/Icon%2010-%20Lesson%20Content.png"> Colormap

Let's use the dataset mpg. It has car models with their respective data for miles per gallon, number of cylinders, horsepower, country of origin and model name.

df = sns.load_dataset('mpg')
df = df.head(50)
print(df.shape)
df.head(3)

If you plot in Matplotlib, for example, a scatter plot and add the parameter c, to colour the dots based on a variable level, the plot will not show a gradient bar-level, so you can relate the color to the levels.

plt.figure(figsize=(10,6))
plt.scatter(data=df, x='weight',y='acceleration',c='mpg',cmap='inferno')
plt.show()

You should add plt.colorbar() to display a bar for the colour variable. The gallery for matplotlib colormap reference is found [here](https://matplotlib.org/stable/gallery/color/colormap_reference.html#sphx-glr-gallery-color-colormap-reference-py)

plt.figure(figsize=(10,6))
plt.scatter(data=df, x='weight',y='acceleration',c='mpg',cmap='inferno')
plt.colorbar()
plt.show()

---

### <img width="3%" height="3%" align="top"  src="https://codeinstitute.s3.amazonaws.com/predictive_analytics/jupyter_notebook_icons/Icon%2010-%20Lesson%20Content.png"> Grid

You can change grid properties, in a figure level, with plt.grid(). The documentation function is [here](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.grid.html)
* In this case, we just changed the line style. More options are available in the documentation

x = np.linspace(0, 10, 500)
y = np.random.randn(500)
plt.scatter(x,y)
plt.grid(True, linestyle='-.') 
plt.show()

You can change grid properties, at an axes level, with .grid(). The documentation function is [here](https://matplotlib.org/stable/api/_as_gen/matplotlib.axes.Axes.grid.html)
* In this case, we set a line style for the grid and applied it only to the y-axis.

x = np.linspace(0, 10, 500)
y = np.random.randn(500)

fig, axes = plt.subplots()
axes.scatter(x=x,y=y)
axes.grid(True, linestyle='-.', axis='y')
plt.show()

---

### <img width="3%" height="3%" align="top"  src="https://codeinstitute.s3.amazonaws.com/predictive_analytics/jupyter_notebook_icons/Icon%2010-%20Lesson%20Content.png"> Annotation

You can annotate your plot, to convey specific information with plt.text() The documentation function, which is used at a figure level, is [here](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.text.html)

x = np.linspace(0, 10, 500)
plt.plot(x, np.sin(x))

plt.text(x=2, y=0, s='Text Annotation', fontsize=12, c='r')
plt.show()

* The documentation function to be used in an axes level can be found [here](https://matplotlib.org/stable/api/_as_gen/matplotlib.axes.Axes.text.html)

x = np.linspace(0, 10, 500)
fig, axes = plt.subplots(nrows=1, ncols=2)
axes[0].plot(x, np.sin(x))
axes[0].text(x=2, y=0, s='Text Annotation', fontsize=12, c='r')
plt.show()

---

### <img width="3%" height="3%" align="top"          src="https://codeinstitute.s3.amazonaws.com/predictive_analytics/jupyter_notebook_icons/Icon%2010-%20Lesson%20Content.png"> Save your plot

You can save your plot with plt.savefig(). The documentation is [here](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.savefig.html)
* The arguments are the file path and the image name, where the image should be stored. `bbox_inches='tight'`, so the figure has nice and even space among its elements, and `dpi=150` to generate an image with high quality.
* You will draw your plot, then you will run the plt.savefig() command. 
* In this case, the image name is "created_image.png", and is saved in the root directory of your application.
* When saving an image, you should not add plt.show(), since your objective is not to display the image.
* When using a jupyter notebook, and saving an image with plt.savefig(), the image will appear as an output anywway, but not due to plt.show(), but rather because of plt.savefig() functionality.

x = np.linspace(0, 10,500)
plt.plot(x, np.sin(3*x) * x)
plt.title("Nice title!")
plt.tight_layout()
plt.savefig('created_image.png', bbox_inches='tight', dpi=150)

Check your root folder of your application to check your new image!

! ls

---