---   
 <img align="left" width="75" height="75"  src="../University_of_the_Punjab_logo.png"> 

<h1 align="center">Department of Data Science</h1>
<h1 align="center">Data Science Journey from Beginners to Expert</h1>

---
<h3><div align="right">Instructor: Engr. Muhammad Nadeem Majeed, Ph.D.</div></h3>     

<h1 align="center">Lecture 3.22 (Data Visualization-II)</h1>

## _Data Visualization with Matplotlib_

**Read Documentation for details:** 
https://matplotlib.org/stable/users/index.html

<img align="left" width="500" height="500"  src="images/intromatlab.png"  >
<img align="right" width="400" height="500"  src="images/matplotlibadvantages.png"  >

## Learning agenda of this notebook

1. Anatomy of a Figure
2. Recap of Line Chart
3. Bar-plot
4. Scatter-plot

## 1. Anatomy of a Figure
>- Matplotlib is a versatile tool that can be used to create many different plot elements. 
>- Before trying to create a plot, let’s make sure to define some basic terms first. For example the  image in front tells us some important parts of a figure.

<img align="center" width="700" height="500"  src="images/anotomyoffig.png"  >

To begin, let's install the Matplotlib library. We'll use the `matplotlib.pyplot` module for basic plots like line & bar charts. It is often imported with the alias `plt`. 

In [None]:
# To install this library in Jupyter notebook
import sys
#!{sys.executable} -m pip install --upgrade pip
!{sys.executable} -m pip install matplotlib --quiet

In [None]:
import matplotlib
matplotlib.__version__ , matplotlib.__path__

## 2. Line Chart (A Recap)

In [None]:
from matplotlib import pyplot as plt
import numpy as np

chemical_exports = [0.810, 0.831, 0.895, 0.91, 0.915, 0.926, 0.945, 0.931, 0.919, 0.921, 0.920, 0.919]
medicine_exports = [0.791, 0.818, 0.832, 0.816, 0.840, 0.833, 0.835, 0.838, 0.842, 0.910, 0.930, 0.940]
years = [2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020, 2021]


fig = plt.figure()
ax = fig.add_subplot()

ax.plot(years, chemical_exports, label="Chemicals", marker='o', c='b', ls='-', lw=2)
ax.plot(years, medicine_exports, label="Medicines", marker='x', c='r', ls=':', lw=2)

ax.set_xlabel("Years")
ax.set_ylabel("Amount (Million US$)")


xvals = np.arange(2011, 2022, 2)
yvals = np.linspace(0.80, 0.98, 10)
ax.set_xticks(xvals)
ax.set_yticks(yvals)


plt.title("LCI Exports in last 12 years")
plt.legend(loc='best')

plt.grid(True)
plt.tight_layout()
plt.show()

### 3. Bar-plot
- A Bar plot or Bar chart is another very common type of plot that is used when we want to compare a given numeric value on different categories. It shows relationship between a numerical variable and a categorical variable. For example marks of a single student in different subjects, or marks of multiple students in a single subject, count of students achieving different grades, count of people in different countries, sale of cars in different months, and so on.
- A Bar plot represent categorical data with rectangular bars, with each bar having a height that corresponds to the value it represent.
- The bars can be plotted vertically as well as horizontally.
- We can use the `plt.bar()` method to draw a vertical bar chart, and `plt.barh()` method to draw a horizontal bar chart:
```
ax.bar(x, height, width=0.8)
ax.barh(y, width, height=0.8)
```

#### a. Draw Vertical Bar-plot

In [None]:
# creating a dataset containing list of students and their marks
students = ["Hadeed", "Maaz", "Mujahid", "Mohid", "Kamal",  "Jamal"]
english = [85, 45, 60, 70, 83, 35]


# Create figure and axes objects
fig = plt.figure()
ax = fig.add_subplot()

#Draw bar plot between names (categorical variable) and marks (numerical variable) of students
ax.bar(x=students, height=english, width=0.4)

# adding labels
ax.set_xlabel("Names of Students")
ax.set_ylabel("Marks of Students")

# adding title
plt.title("Distribution of Student Marks")

#plt.savefig("mybargraph.png")
# show plot
plt.show()

#### b. Plot more than One bar in Chart
Many times, multiple sets of data are bound to the same variable. In these cases, we need to show the data together on the same chart for comparison. On a bar chart, we can do this by using two sets of bars.

In [None]:
# creating a dataset containing list of students and their marks
students = ["Hadeed", "Maaz", "Mujahid", "Mohid", "Kamal",  "Jamal"]
english = [85, 45, 60, 70, 83, 35]
maths = [ 80, 59, 80, 60, 65, 60]

# Create figure and axes objects
fig = plt.figure()
ax = fig.add_subplot()

#Draw bar plot between names (categorical variable) and marks (numerical variable) of students
ax.bar(x=students, height=english, width=0.8)
ax.bar(x=students, height=maths, width=0.8)

# adding labels
ax.set_xlabel("Names of Students")
ax.set_ylabel("Marks of Students")

# adding title
plt.title("Distribution of Student Marks")


# show plot
plt.show()

#### c. Adjust the Bars, so that they do not overlap

In [None]:
from matplotlib import pyplot as plt
import numpy as np
# creating a dataset containing list of students and their marks
students = ["Hadeed", "Maaz", "Mujahid", "Mohid", "Kamal",  "Jamal"]
english = [85, 45, 60, 70, 83, 35]
maths = [ 80, 59, 80, 60, 65, 60]

# Create figure and axes objects
fig = plt.figure()
ax = fig.add_subplot()

index = np.arange(len(students))

#Draw bar plot between names (categorical variable) and marks (numerical variable) of students
ax.bar(x=index-0.15, height=english, width=0.3, label='English')
ax.bar(x=index+0.15, height=maths, width=0.3, label='Maths')

# adding labels
ax.set_xlabel("Names of Students")
ax.set_ylabel("Marks of Students")

ax.set_xticks(ticks=index, labels=students, color='g')

# adding title
plt.title("Distribution of Student Marks")

plt.legend()
# show plot
plt.savefig("vbar.png")
plt.show()

#### d. Draw Horizontal Bar-plot

In [None]:
# creating a dataset containing list of students and their marks
students = ["Hadeed", "Maaz", "Mujahid", "Mohid", "Kamal",  "Jamal"]
english = [85, 45, 60, 70, 83, 35]
maths = [ 80, 59, 80, 60, 65, 60]

# Create figure and axes objects
fig = plt.figure()
ax = fig.add_subplot()

index = np.arange(len(students))

#Draw bar plot between names (categorical variable) and marks (numerical variable) of students
ax.barh(y=index-0.15, width=english, height=0.3, label='English')
ax.barh(y=index+0.15, width=maths, height=0.3, label='Maths')

# adding labels
ax.set_xlabel("Marks of Students")
ax.set_ylabel("Names of Students")


plt.yticks(ticks=index, labels=students, color='g')

# adding title
plt.title("Distribution of Student Marks")

plt.legend()

plt.savefig("hbar.png")

# show plot
plt.show()

#### e. Bar-Plot for a Real-Life DataSet
Visit to Download Data: https://insights.stackoverflow.com/survey/

In [None]:
import pandas as pd
df = pd.read_csv('datasets/so_survey_subset.csv')
df.shape

In [None]:
df

**Let us draw a bar chart with languages along the y-axis and thir popularity count along the y-axis.**
- Unique list of languages under the `LanguagesWorkedWith` column
- Count of every Language under the `LanguagesWorkedWith` column

In [None]:
# Check out the NaN values in different columns specially LanguageWorkedWith
df.info()

In [None]:
df1 = df.dropna()

In [None]:
df1.info()

In [None]:
languages = df1['LanguageWorkedWith']
languages

In [None]:
from collections import Counter

In [None]:
mylist = ['C', 'HTML', 'Python', 'Java']

In [None]:
# Elements are stored as dictionary keys and their counts are stored as dictionary values.
myctr = Counter(mylist)
myctr

In [None]:
myctr.update(['Python', 'C'])
myctr

In [None]:
myctr.update(['Python', 'HTML'])
myctr

In [None]:
languages

In [None]:
for language in languages:
    print(language.split(';'))

In [None]:
myctr = Counter()
for language in languages:
    myctr.update(language.split(';'))
myctr

In [None]:
mydict = dict(myctr)
languages = list(mydict.keys())
popularity = list(mydict.values())

In [None]:
languages

In [None]:
popularity

In [None]:
fig = plt.figure(figsize=(10,10))
ax = fig.add_subplot()
ax.barh(y=languages, width=popularity)

**Try to display the above chart with the most popular language at the top (`Hint: Use Pandas Dataframe`)

#### Stacking the Bars:
>Stacking the bars is a useful feature that allows us to stack multiple bars on top of each other. Students should explore this at their own. Can be done by passing the value of the previous bar to the bottom parameter of the next bar.
```
plt.bar(x, subject1, width=0.8)
plt.bar(x, subject2, width=0.8, bottom = subject1)

```

### 4. Scatter plot
A scatter plot is used to plot data points on a figure based on the horizontal and vertical axes. Scatter plots can be used to show the relationship between two variables. They can also show how data clusters in a dataset. One of the most striking features of drawing scatter plots in Matplotlib is the ability to set different colors and sizes for individual data points by using additional variables.

In a scatter plot, the values of 2 variables are plotted as points on a 2-dimensional grid. Additionally, you can also use a third variable to determine the size or color of the points. Let's try out an example.

#### a. Basic Scatter-plot

In [None]:
import numpy as np
from matplotlib import pyplot as plt

# Creating list of random values
l1 = np.random.randint(0, 100, 20)
l2 = np.random.randint(0, 100, 20)

# Create figure and axes objects
fig,ax = plt.subplots()

# Draw Scatter plot where two points intersect in l1 and l2
ax.scatter(x=l1, y=l2)

# adding labels
ax.set_xlabel("List 1")
ax.set_ylabel("List 2")

# adding title
plt.title("Scatter Plot")

plt.grid(True)
plt.tight_layout()
plt.show()

#### b. Customize Scatter Plot
- Set the size, marker, color, edgecolor, linewidth, alpha arguments to `scatter()` method

In [None]:
import numpy as np
from matplotlib import pyplot as plt

# Creating list of random values
l1 = np.random.randint(0, 100, 20)
l2 = np.random.randint(0, 100, 20)

# Create figure and axes objects
fig, ax = plt.subplots()

# Draw Scatter plot where two points intersect in l1 and l2
ax.scatter(x=l1, y=l2, s=200, marker='o', c='green', edgecolor='black', lw=2, alpha=0.5)

# adding labels
ax.set_xlabel("List 1")
ax.set_ylabel("List 2")

# adding title
plt.title("Scatter Plot")

plt.grid(True)
plt.tight_layout()
plt.show()

In [None]:
l1, l2

#### c. Houses Dataset

In [None]:
import pandas as pd
df = pd.read_csv('datasets/housesdata.csv')
df

In [None]:
# reading values from dataframe in x and y 
x = df['x']
y = df['y']
# plot data on scatter plot
plt.scatter(x,y)
plt.show()

In [None]:
# reading values from dataframe in x and y 
x = df['x']
y = df['y']

# plot data on scatter plot
plt.scatter(x,y)

# setting labels and title
plt.xlabel("Distance to place X")
plt.ylabel("Distance to place Y")
plt.title("Price of Houses in Different Location")

plt.show()

**Change the sizes and colors of datapoints based on price column**

In [None]:
import numpy as np
x = df['x']
y = df['y']

# set the sizes of marker in scatter plot as the multiple of price column
sizes = df['price'].astype(int)
# also set the color of markers according to the size
colors = sizes

# Instead of passing an integer values to sizes and colors argument, pass them a list on which matplotlib will iterate
# Use edgecolor and linewidth arugment
# alpha is the blending value, which we can set between 0 (transparent) and 1 (opaque)
plt.scatter(x,y, s=sizes, c = colors, alpha=0.5, edgecolor='black', lw=2)

# setting labels and title
plt.xlabel("Distance to place X")
plt.ylabel("Distance to place Y")
plt.title("Price of Houses in Different Location")


# saving the figure
plt.savefig("scatterchart.png")

plt.show()

**Add colorbar**

In [None]:
import numpy as np
x = df['x']
y = df['y']

# set the sizes of marker in scatter plot as the multiple of price column
sizes = df['price'].astype(int)
# also set the color of markers according to the size
colors = sizes

# Instead of passing an integer values to sizes and colors argument, pass them a list on which matplotlib will iterate
# Use edgecolor and linewidth arugment
# alpha is the blending value, which we can set between 0 (transparent) and 1 (opaque)
plt.scatter(x,y, s=sizes, c = colors, alpha=0.5, edgecolor='black', lw=2)


# plot data on scatter plot
cbar = plt.colorbar()
cbar.set_label("Price of house")

# setting labels and title
plt.xlabel("Distance to place X")
plt.ylabel("Distance to place Y")
plt.title("Price of Houses in Different Location")


# saving the figure
plt.savefig("scatterchart.png")

plt.show()