## 3. Multivariate Analysis

### Definition
**Examines relationships among three or more variables simultaneously.**

### Purpose
- Uncover complex relationships and interactions
- Reduce dimensionality of data
- Identify groups/patterns across multiple variables
- Build predictive models with multiple predictors
- Control for confounding variables

### Common Techniques

#### Statistical Methods
- **Multiple Regression**: Extends linear regression to multiple predictors
- **MANOVA**: Multivariate analysis of variance
- **Factor Analysis**: Reduces variables to underlying factors
- **Principal Component Analysis (PCA)**: Reduces dimensionality
- **Cluster Analysis**: Identifies natural groupings (K-means, hierarchical)
- **Discriminant Analysis**: Classifies observations into groups
- **Structural Equation Modeling (SEM)**: Tests complex relational hypotheses

#### Visualizations
- **Pair Plots**: Matrix of scatter plots for all variable combinations
- **3D Scatter Plot**: Shows relationship among three variables
- **Heat Maps**: Visualizes correlation matrices
- **Parallel Coordinates Plot**: Shows multivariate patterns
- **Bubble Charts**: Scatter plots with a third dimension as bubble size
- **Radar/Spider Charts**: Compares multiple variables across categories


![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

sns.set_style("white")
# plt.rcParams['figure.figsize'] = (20, 6)

In [2]:
df = pd.read_csv('datasets/Titanic-Dataset.csv')
df = df.drop('Cabin', axis=1) # droping Cabin column

In [3]:
# Let's see all variables we have first, so that we can decide which relationship has meaning.
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 891 entries, 0 to 890
Data columns (total 11 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   PassengerId  891 non-null    int64  
 1   Survived     891 non-null    int64  
 2   Pclass       891 non-null    int64  
 3   Name         891 non-null    object 
 4   Sex          891 non-null    object 
 5   Age          714 non-null    float64
 6   SibSp        891 non-null    int64  
 7   Parch        891 non-null    int64  
 8   Ticket       891 non-null    object 
 9   Fare         891 non-null    float64
 10  Embarked     889 non-null    object 
dtypes: float64(2), int64(5), object(4)
memory usage: 76.7+ KB


![purple-divider](https://user-images.githubusercontent.com/7065401/52071927-c1cd7100-2562-11e9-908a-dde91ba14e59.png)


## Comparison of Analysis Types

| Feature | Univariate | Bivariate | Multivariate |
|---------|-----------|-----------|--------------|
| **Variables** | One | Two | Three or more |
| **Focus** | Distribution, central tendency | Relationships, correlations | Complex interactions, patterns |
| **Key Questions** | "What is the distribution?" | "Are these variables related?" | "How do multiple factors interact?" |
| **Common Tools** | Histograms, box plots | Scatter plots, correlation | PCA, multiple regression |
| **Complexity** | Low | Medium | High |
| **Insight Depth** | Basic descriptions | Relationships | System understanding |

## When to Use Each Analysis Type

### Univariate Analysis
- **Best for**: Initial data exploration
- **When you need to**: Understand the basic properties of a variable
- **Before proceeding to**: More complex analyses

### Bivariate Analysis
- **Best for**: Testing simple hypotheses about relationships
- **When you need to**: Establish if two variables are related
- **Before proceeding to**: Building predictive models

### Multivariate Analysis
- **Best for**: Building predictive models, uncovering complex patterns
- **When you need to**: Understand how multiple variables interact
- **After completing**: Univariate and bivariate analyses
