# Data Exploration Methods

## 1. Descriptive Statistics


### Summary Statistics
- Calculate key statistics like mean, median, mode, variance, and standard deviation for numerical features.
- Example:
```python
import pandas as pd

data = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})
summary_stats = data.describe()
```

### Frequency Distribution
- Identify how often different values occur in categorical data.
- Example:
```python
freq_distribution = data['B'].value_counts()
```

### Correlation Matrix
- Measure relationships between numerical variables using Pearson or Spearman correlation coefficients.
- Example:
```python
correlation_matrix = data.corr()
```

### Skewness and Kurtosis
- Identify the asymmetry and tailedness of data distribution to check for normality.
- Example:
```python
skewness = data['A'].skew()
kurtosis = data['A'].kurtosis()
```


## 2. Data Visualization


### Histograms/Bar Charts
- Visualize the distribution of individual variables.
- Example:
```python
import matplotlib.pyplot as plt

plt.hist(data['A'], bins=5)
plt.title('Histogram of A')
plt.show()
```

### Box Plots
- Identify outliers and visualize the spread of data across quartiles.
- Example:
```python
plt.boxplot(data['A'])
plt.title('Box Plot of A')
plt.show()
```

### Scatter Plots
- Visualize relationships between two continuous variables.
- Example:
```python
plt.scatter(data['A'], data['B'])
plt.title('Scatter Plot of A vs B')
plt.show()
```

### Heatmaps
- Visualize correlations between variables using colors.
- Example:
```python
import seaborn as sns

sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm')
plt.title('Correlation Heatmap')
plt.show()
```

### Pair Plots
- Show pairwise relationships between features.
- Example:
```python
sns.pairplot(data)
plt.show()
```

### Pie Charts
- Visualize the proportion of categorical variables.
- Example:
```python
data['B'].value_counts().plot.pie(autopct='%1.1f%%')
plt.title('Pie Chart of B')
plt.show()
```

### Violin Plots
- Combine a box plot with a density plot for visualizing data distribution.
- Example:
```python
sns.violinplot(data=data['A'])
plt.title('Violin Plot of A')
plt.show()
```


## 3. Handling Missing Data


### Identify Missing Data
- Use methods like `.isnull()` or `.missing()` to check for missing values.
- Example:
```python
missing_data = data.isnull().sum()
```

### Visualize Missing Data
- Use missingness heatmaps to visualize where missing values occur.
- Example:
```python
sns.heatmap(data.isnull(), cbar=False)
plt.title('Missing Data Heatmap')
plt.show()
```

### Missing Value Patterns
- Analyze if missing data is random or follows a pattern.
