# Basic Univariate, Bivariate & Multivariate Analysis on Iris Dataset

The preliminary analysis of data to discover relationships between measures in the data and to gain an insight on the trends, patterns, and relationships among various entities present in the data set with the help of statistics and visualization tools is called Exploratory Data Analysis (EDA). 

Exploratory data analysis is cross-classified in two different ways where each method is either graphical or non-graphical. And then, each method is either univariate, bivariate or multivariate.




In [None]:
import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
import plotly.express as px
import seaborn as sns



In [None]:
df = pd.read_csv('../input/iris-flower-dataset/IRIS.csv')

In [None]:
df.shape

In [None]:
df.head(100)

In [None]:
df.info()

In [None]:
df.describe()

# UNIVARIATE ANALYSIS

Uni means one and variate means variable, so in univariate analysis, there is only one dependable variable. The objective of univariate analysis is to derive the data, define and summarize it, and analyze the pattern present in it. In a dataset, it explores each variable separately. It is possible for two kinds of variables- Categorical and Numerical.

Some patterns that can be easily identified with univariate analysis are Central Tendency (mean, mode and median), Dispersion (range, variance), Quartiles (interquartile range), and Standard deviation.

In [None]:
df_setosa=df.loc[df['species']=='Iris-setosa']
df_virginica=df.loc[df['species']=='Iris-virginica']
df_versicolor=df.loc[df['species']=='Iris-versicolor']

**Linear Plot**

In [None]:
plt.plot(df_setosa['sepal_length'], np.zeros_like(df_setosa['sepal_length']),'o')
plt.plot(df_virginica['sepal_length'], np.zeros_like(df_virginica['sepal_length']),'o')
plt.plot(df_versicolor['sepal_length'], np.zeros_like(df_versicolor['sepal_length']),'o')
plt.xlabel('Sepal length')

**Bar Plot**

In [None]:
plt.bar(df['species'], df['sepal_length'])
plt.xlabel('Species')

**Histogram**

In [None]:
plt.hist(df['species'])
plt.xlabel('Species')

# BIVARIATE ANALYSIS

Bi means two and variate means variable, so here there are two variables. The analysis is related to cause and the relationship between the two variables.

In [None]:
sns.FacetGrid(df,hue="species",size=5).map(plt.scatter,"sepal_length","sepal_width").add_legend();

# MULTIVARIATE ANALYSIS

Multivariate analysis is required when more than two variables have to be analyzed simultaneously. It is a tremendously hard task for the human brain to visualize a relationship among 4 variables in a graph and thus multivariate analysis is used to study more complex sets of data. Types of Multivariate Analysis include Cluster Analysis, Factor Analysis, Multiple Regression Analysis, Principal Component Analysis, etc. More than 20 different ways to perform multivariate analysis exist and which one to choose depends upon the type of data and the end goal to achieve. 

In [None]:
sns.pairplot(df,hue="species",size=3)