# Univariate, Bivariate, and Multivariate Analysis in EDA

* Univariate, bivariate, and multivariate analysis are core concepts in Exploratory Data Analysis (EDA) used to understand different aspects of a dataset. Here's a breakdown of these analysis types, when to use them, and real-time examples to make them easy to understand:

## 1. Univariate Analysis

### Definition:
* Univariate analysis involves analyzing one variable at a time. Its main purpose is to summarize and find patterns in individual variables. It helps to understand the distribution, central tendency (mean, median, mode), spread (range, variance, standard deviation), and the presence of outliers.

### Common Techniques:

* Frequency distribution (for categorical data)
* Histogram/Bar plot (for continuous/categorical data)
* Boxplot (for distribution and outliers)
* Descriptive statistics (mean, median, mode, variance, etc.)

### When to use:

* When you want to summarize the behavior of a single variable.
* To understand the distribution of data for numeric or categorical variables.

### Real-time Example:

* Scenario: You are analyzing the ages of customers who purchased insurance.
* Objective: You want to know the age distribution of the customers.

## Univariate Analysis: 
* Create a histogram of the "Age" column, showing how many customers fall within different age ranges, and calculate the mean and median age to summarize the overall customer base.

## 2. Bivariate Analysis
### Definition:
* Bivariate analysis examines the relationship between two variables. It helps in identifying correlations, trends, and patterns between the variables, allowing you to understand how one variable affects or relates to another.

### Common Techniques:

* Scatter plot (for continuous variables)
* Correlation coefficient (for continuous variables)
* Bar plot/Box plot (for categorical vs continuous data)
* Line plot (to see trends between two continuous variables)

### When to use:

* When you want to analyze the relationship between two variables.
* To check if a variable impacts another (e.g., "Does age impact insurance premiums?").

### Real-time Example:

* Scenario: You're analyzing how the Annual Premium varies with Age for insurance buyers.
* Objective: To see if older customers tend to pay higher premiums.

### Bivariate Analysis:
* Plot a scatter plot between "Age" (x-axis) and "Annual Premium" (y-axis) and calculate the correlation. If the scatter plot shows an upward trend, this indicates a positive correlation between age and premium.

### 3. Multivariate Analysis
#### Definition:
* Multivariate analysis involves examining more than two variables simultaneously. It helps in understanding the complex interactions between multiple variables, identifying patterns and trends that cannot be observed through univariate or bivariate analysis.

### Common Techniques:

* Multiple regression analysis (for continuous dependent variables)
* Heatmap (to visualize correlations between multiple variables)
* Pair plot (scatterplot matrix to see relationships between several variables)
* Principal Component Analysis (PCA) (to reduce dimensionality)
* Clustering (e.g., K-means) (to group data points based on multiple variables)

### When to use:

* When you need to analyze complex relationships between several variables at once.
* To build predictive models (e.g., predicting insurance purchase decisions based on multiple customer characteristics).

### Real-time Example:

* Scenario: You want to predict whether a customer will buy insurance based on multiple factors: Age, Gender, Vehicle Age, Annual Premium, and Previously Insured status.
* Objective: Use all these variables to predict the customer’s purchase decision.

### Multivariate Analysis: 
* Use logistic regression (for binary outcome) or decision trees to model the relationship between multiple input variables (features) and the purchase decision (target).
