# 📝 Data Analysis with Python Cheat Sheet

This is a compiled cheat sheet for your **Data Analysis with Python Final Exam**, covering:

✅ NumPy basics  
✅ pandas basics  
✅ Data Cleaning & Preparation  
✅ Data Wrangling & Aggregation  
✅ Visualization  
✅ File I/O  
✅ Descriptive Stats  
✅ Key Python concepts


## 💥 NumPy Basics
```python

```

In [None]:
import numpy as np

# Create arrays
np.array([1,2,3])
np.zeros((2,3))
np.ones((2,3))
np.arange(0,10,2)
np.linspace(0,1,5)

# Array attributes
arr.shape
arr.dtype

# Indexing/Slicing
arr[1]
arr[1:3]
arr[:,0]   # column
arr[0,:]   # row

# Boolean indexing
arr[arr > 5]

# Math
arr.sum(), arr.mean(), arr.std(), arr.min(), arr.max()

# Elementwise
np.add(arr1, arr2)
np.multiply(arr1, arr2)

# Matrix operations
np.dot(a,b)
arr.T  # transpose

# Universal Functions
np.sqrt(arr), np.exp(arr), np.log(arr)

# Conditional
np.where(arr>5, 1, 0)

## 🐼 pandas Basics
```python

```

In [None]:
import pandas as pd

# Create
df = pd.DataFrame({'A':[1,2],'B':[3,4]})

# Inspect
df.head(), df.info(), df.describe(), df.shape

# Select
df['A']          # column
df[['A','B']]    # multiple columns
df.iloc[0]       # row by index
df.loc[0,'A']    # value

# Filter
df[df['A'] > 1]

# Add/Remove
df['C'] = df['A'] + df['B']
df.drop('C', axis=1)

# Missing data
df.isnull().sum()
df.fillna(0)
df.dropna()

# Aggregation
df.groupby('A').mean()
df.groupby('A').agg({'B':'sum'})

# Merge/Join
pd.merge(df1, df2, on='key')
pd.concat([df1, df2], axis=0)
df1.join(df2, on='key')

# File I/O
pd.read_csv('file.csv')
df.to_csv('file.csv')
pd.read_excel('file.xlsx')
df.to_excel('file.xlsx')

## 🎨 Visualization
```python
import matplotlib.pyplot as plt
import seaborn as sns

# Basic plots
df.plot()
df['A'].plot(kind='bar')
df['A'].plot(kind='hist')

# Matplotlib
plt.plot(x, y)
plt.scatter(x, y)
plt.bar(x, y)
plt.hist(x)
plt.boxplot(x)

plt.title('Title')
plt.xlabel('X')
plt.ylabel('Y')
plt.legend()
plt.show()

# Seaborn
sns.heatmap(df.corr(), annot=True)
sns.boxplot(x='col1', y='col2', data=df)
```

## 🧹 Data Cleaning
```python
# Missing values
df.isnull()
df.fillna(0)
df.fillna(method='ffill')
df.dropna()

# Duplicates
df.duplicated()
df.drop_duplicates()

# Replace
df.replace({'old':'new'})

# Binning
pd.cut(df['A'], bins=3)
pd.qcut(df['A'], q=4)

# Map
df['B'] = df['B'].map({'A':1, 'B':2})
```

## 📐 Data Aggregation
```python
# GroupBy
df.groupby('col')['val'].mean()
df.groupby(['col1','col2']).agg({'val':'sum'})

# Transform
df.groupby('group')['value'].transform('mean')

# Filter
df.groupby('group').filter(lambda x: x['val'].mean()>50)

# Quantile Binning
pd.cut(df['col'], bins=[0,10,20])
pd.qcut(df['col'], q=4)
```

## 📄 File I/O
```python
# CSV
pd.read_csv('file.csv')
df.to_csv('file.csv')

# Excel
pd.read_excel('file.xlsx')
df.to_excel('file.xlsx')

# JSON
pd.read_json('file.json')
df.to_json('file.json')

# HTML Table
pd.read_html('url')
```

## 📊 Descriptive Statistics
```python
df.describe()
df['col'].value_counts()
df['col'].unique()
df['col'].nunique()
df.corr()
df.cov()
```

## 📝 Python Basics
```python
# List Comprehension
[x**2 for x in range(10) if x%2==0]

# Dict comprehension
{k:v for v,k in enumerate(['a','b','c'])}

# Lambda
f = lambda x: x*2

# map/filter
list(map(f, [1,2,3]))
list(filter(lambda x: x>1, [1,2,3]))
```

✅ **REMINDERS:**
- Missing Data: MCAR, MAR, MNAR
- groupby: split → apply → combine
- Matplotlib layers: backend, artist, scripting
- join/merge types: inner, outer, left, right
- Visualization: line=trend, bar=category, scatter=relation
