<h1 style="text-align:center">Learn-DataVisulization-WithMe</h1>

<h4 style="text-align:center">Authored By B.Sasi Vatsal</h4>


![title](https://ubiq.co/analytics-blog/wp-content/uploads/2020/03/principles-good-data-visualization-design.png)

### Welcome to the Part-1 of the Learn-DataVisulization-WithMe series, in this notebook we will be discussing different plotting techniques in matplotlib library through famous Iris Dataset in a very understandable and easy way, Let's digin 

## Importing Libraries

In [None]:
import pandas as pd
import numpy as np
from sklearn import datasets
import matplotlib.pyplot as plt
%matplotlib inline

## Loading the Iris Dataset

In [None]:
iris = datasets.load_iris()
iris

### Classifying the Iris Dataset

In [None]:
# contains the length and width of sepal and petal
iris_x = iris['data']
# contain the flower species (0,1,2) --> setosa,versicolor,virginica
iris_y = iris['target']
# iris_y_labels contains ['sepal length (cm)','sepal width (cm)''petal length (cm)','petal width (cm)']
iris_x_labels = iris['feature_names']
# iris_y_labels contains ['setosa', 'versicolor', 'virginica']
iris_y_labels = iris['target_names']

### Dividing the obtained data into respective species

In [None]:
# we have 3 different species in the iris datset i.e theree different kinds of
# flowers belonging to iris family namely setosa,versicolor and virginica

# setosa is the first kind and represented as 0 in the iris datset 
setosa = np.where(iris_y == 0)

# versicolor is the second kind and represented as 1 in the iris datset 
versicolor = np.where(iris_y == 1)

# virginica is the third kind and represented as 2 in the iris datset 
virginica = np.where(iris_y == 2)

# DataVisualization

## 1. Bar Chart

A bar graph is a graphical representation of data in which we can highlight the category with particular shapes like a rectangle. The length and heights of the bar chart represent the data distributed in the dataset. In a bar chart, we have one axis representing a particular category of a column in the dataset and another axis representing the values or counts associated with it.  Bar charts can be plotted vertically or horizontally. A vertical bar chart is often called a column chart. When we arrange bar charts in a high to low-value counts manner, we called them Pareto charts.

In [None]:
# computing the means of features of respective species
mean_of_setosa = iris_x[setosa].mean(axis=0)
mean_of_versicolor = iris_x[versicolor].mean(axis=0)
mean_of_virginica = iris_x[virginica].mean(axis=0)

In [None]:
# determining the color of our bar chart
cmap = plt.cm.magma
color = [cmap(0.2),cmap(0.3),cmap(0.4)]

In [None]:
figure,  ax = plt.subplots(2,2,sharex=True,sharey=True,figsize=(20,20))
for i,feat in enumerate(iris_x_labels):
    axes = ax[int(i/2),i%2]
    axes.bar([0,1,2],[mean_of_setosa[i],mean_of_versicolor[i],mean_of_virginica[i]],color=color)
    axes.set_title(iris_x_labels[i],size=30)
    axes.set_xticks([0,1,2])
    axes.set_xticklabels(iris_y_labels,size=40)
figure.suptitle("Averge feature values per respective species",size=60,y=1)
figure.tight_layout()

## 2. PieChart

A pie chart is a type of a chart that visually displays data in a circular graph. It is one of the most commonly used graphs to represent data using the attributes of circles, spheres, and angular data to represent real-world information. The shape of a pie chart is circular where the pie represents the whole data and the slice out of the pie represents the parts of the data and records it discretely.
Pie charts, also commonly known as pie diagrams help in interpreting and representing the data more clearly. It is also used to compare the given data.

In [None]:
# lets see the percentage of each flower species in the datset
# for the we are gonna make use of numpy unique method and count method to obtain no.of unique
# values and also count of the unqiue values repectievely
(unique, counts) = np.unique(iris_y,return_counts=True)
# setting figure size to make our pie chart look bigger
plt.figure(figsize=(10,10))
# making the piechart
plt.pie(counts,labels=iris_y_labels,shadow=True,explode=(0.05,0,0),autopct='%.1f%%')
# we can observe every species is equal in size i.e there are 50 records of each flower 
# in total of 150 records of floweres

#### Bonus ( Styling our piechart )

In [None]:
plt.figure(figsize=(10,10))
plt.pie(counts,labels=iris_y_labels,autopct='%.1f%%',textprops={'fontsize':17}, wedgeprops={'edgecolor':'#ffffff'})
# previously we increased the font size and added little bling to the chart with white color edges
# now lets make our chart into a donut cozz why not...
circle = plt.Circle(xy=(0,0),radius=0.8,color='white')
# gca = get current axis, this method get the current working figure 
plt.gca().add_artist(circle)
# finally lets add a title to our donut
plt.title('Class Distribution of IRIS DataPoints',fontsize=25)
# whoo that's our unique looking donut graph, remmber this is actually a donut, we just overlayed an
# white circle with center at (0,0) and radius 0.8 on the top of our pie chart

## 3. Scatter Plot

A scatter plot is a chart type that is normally used to observe and visually display the relationship between variables. The values of the variables are represented by dots. The positioning of the dots on the vertical and horizontal axis will inform the value of the respective data point; hence, scatter plots make use of Cartesian coordinates to display the values of the variables in a data set. Scatter plots are also known as scattergrams, scatter graphs, or scatter charts.

scatter plots are used for:
1. Demonstration of the relationship between two variables
2. Identification of correlational relationships
3. Identification of data patterns


In [None]:
# importing the func formatter
from matplotlib.ticker import FuncFormatter

In [None]:
# don't worry about this as now we will see the use of funcformatter in next cell

# basically funcformatter is used to label the color bar in the scatter plot
# we want names of species in the iris dataset so here we are making use of a lamba func 
# which gets the values of different spcies names by iterating through iris_y_labels we made earlier 
formatss = FuncFormatter(lambda s, i:iris_y_labels[s])

In [None]:
?plt.scatter

In [None]:
# remember the drill right? setting the plot size
plt.figure(figsize=(10,10))
# for scatter plot we use scatter method, then we pass x & y axes variables
# additionally u can set color using c and edge color etc, for more info run --> ?plt.scatter in the above cel
plt.scatter(iris_x[:,2],iris_x[:,3],c=iris_y,edgecolor='black')
plt.xlabel(iris_x_labels[2],size=20)
plt.ylabel(iris_x_labels[3],size=20)
# making the color bar, ticks is used to restrict the color bar to certain values
# in our case they are 0,1,2 i.e the id's of our flowers
plt.colorbar(ticks=[0,1,2], format=formatss)
# giving it a title, found from stackoverflow that after new upadte in matplotlib 
# now we can give padding to out title using pad argument
plt.title("Respective Petal width by height of different species",size=25,pad=25)

# that being said we can clearly observer and conclude that setosa has the least petal size
# followed by versicolor and largest petal size among the group is virginicas

## Bonus : 3-Dimensional Scatter Plot

In [None]:
# 2D is boring so lets make our scatter plot 3D which also takes 3 variables, it look so beautiful,
# promising and very informative 

# for 3d plotting we need to import the following class
from mpl_toolkits.mplot3d import Axes3D

In [None]:
# again same defining our fig size
figure = plt.figure(figsize=(20,20))

# this step is crucial, here we will be making an object of the class Axes3D

# here elev is used to determine the elevation and azim is used to determine the angle of z axis
# try toggling the values of elev and azim its so fun
three_d_scatter_var = Axes3D(figure,elev=-160,azim=150)
# petal width
three_d_scatter_var.set_xlabel(iris_x_labels[2],size=30)
# petal length
three_d_scatter_var.set_ylabel(iris_x_labels[3],size=30)
# sepal length, ik there is also sepal width but we can max plot 3d coz 4d isn't discovered yet lol
three_d_scatter_var.set_zlabel(iris_x_labels[0],size=30)

# plotting the sactter plot by passing the three variables
scatter_plot_3d = three_d_scatter_var.scatter(iris_x[:,2],iris_x[:,3],iris_x[:,0], edgecolor='black',c=iris_y,s=80)
# here we are storing the species names into marker
markers,sasi = scatter_plot_3d.legend_elements()
# making the legend
legend = three_d_scatter_var.legend(markers,iris_y_labels,loc='upper left',title='Species', prop={'size': 30})
# overlaying our legend on the plot or simply placing our legend on the plot
three_d_scatter_var.add_artist(legend)

# yeahh! that's our 3D scatter plot , each individual dot represents a flower, by each dot we can know
# a flowers petal width and height and its sepal length

## 4.BoxPlot

In [None]:
plt.figure(figsize=(20,10))
# making the boxplot using boxplot function
plt.boxplot(iris_x,labels=iris_x_labels,vert=False)
# increasing the font size in x and y axes using x and yticks method
plt.yticks(fontsize=25)
plt.xticks(fontsize=25)
# giving our box plot a title
plt.title("Species Features",size=50,pad=50)

# here in box plot we illustrated the median, min and maz sizes of indivdual observations
# of sepal, petal width and height

# the min width of petal is around 0.2 and max is somewhere near =~2.6
# likewise we can conclude for other parameters as well, boxplot is a handy feature for 
# feature analysis

## 5. Violin Plot

A violin plot is a hybrid of a box plot and a kernel density plot, which shows peaks in the data. It is used to visualize the distribution of numerical data. The box plot is an old standby for visualizing basic distributions. It's convenient for comparing summary statistics (such as range and quartiles), but it doesn't let you see variations in the data. For multimodal distributions (those with multiple peaks) this can be particularly limiting.Unlike a box plot that can only show summary statistics, violin plots depict summary statistics and the density of each variable.Sometimes the median and mean aren't enough to understand a dataset.kernel density estimation to show the distribution shape of the data. Wider sections of the violin plot represent a higher probability that members of the population will take on the given value; the skinnier sections represent a lower probability.

In [None]:
plt.figure(figsize=(20,10))
# making the violin plot and also making it horizontal
plt.violinplot(iris_x,vert=False)
# routine steps giving y axis names and increasing size
plt.yticks([1,2,3,4],iris_x_labels,size=26)
# increasing the x axis font size
plt.xticks(size=26)
# giving out violin plot title
plt.title("Feature Distribution",size=50,pad=50)


# here by violin plot we can conclude that where flowers have more frequency in respective features 
# or we can conclude where flowers have majority propertiee lies in the range
# i'm pretty bad at xplaining this but let be clear
# observe the shape of petal width some regions have more width then other
# regions with more width are regions where most of the observation lie, i.e we can say most of the flower
# have some x width for sure.
# same goes for the petal length, sepal width and length

# the regions having more width has the highest frequency and contains most of the points and 
# vice versa for less width regions

### Bonus : Styling Violin Plot

In [None]:
quar1,med,quart3 = np.percentile(iris_x,[25,50,75],axis=0)

In [None]:
plt.figure(figsize=(20,10))
# making the violin plot and also making it horizontal
# pt is dictionary object containing all the features of our violin plot
pt = plt.violinplot(iris_x,vert=False)

# we traverse through the values of the key 'bodies' and change some of it's default features
for vpfeature in pt['bodies']:
    # changes the color of violin
    vpfeature.set_facecolor("#53B8BB")
    # chnages the contour i.e the border
    vpfeature.set_edgecolor("#FF4C29")
    # makes violin darker by overlaying a alpha layer on top of violin
    vpfeature.set_alpha(0.5)

plt.yticks([1,2,3,4],iris_x_labels,size=26)
# increasing the x axis font size
plt.xticks(size=26)
plt.gca().scatter(med,[1,2,3,4],color="#082032",s=60)
# giving out violin plot title
plt.title("Feature Distribution",size=50,pad=50)


# voila we styled our violin plot to make it look good aesthetically
# there are ton of ways u can customize you're violin depending on you're creativity
# explore them by googling on you're own you'll new things during the research journey

# Thank you , will meet you in the part-2

## Until then Peace ✌️.