# DSAI Python Lab Exam Cheatsheet

**Topics:** OOP, NumPy, Pandas, Matplotlib, Seaborn

*Format: Each section has a concise Markdown summary followed by example code.*

> Tip: Run cells top-to-bottom to import libraries and seed data.

## 1) Object-Oriented Programming (OOP)

**Key ideas:** classes, objects (instances), attributes, methods, `__init__`, `self`, special methods (`__repr__`, `__str__`, `__lt__`, `__gt__`), inheritance, composition.

**Patterns you may need (from labs/assignments):**
- *Student ⟶ Department ⟶ Institute* hierarchy (composition).
- Comparisons by average/score using `__gt__` / `__lt__` to enable sorting/ranking.
- Readable print with `__repr__` / `__str__` for debugging.
- Use `@staticmethod` / `@classmethod` when instance (`self`) is not needed.
- Avoid mutable default args in `__init__` (use `None` then set inside).

**Common pitfalls:** forgetting `self`, chained attribute writes outside class, mutable class attributes shared across instances.


In [None]:
import numpy as np

class Student:
    def __init__(self, name: str, roll_number: int, scores):
        self.name = name
        self.roll_number = roll_number
        self.scores = np.array(scores, dtype=float)

    def average(self): return float(np.mean(self.scores))
    def highest_score(self): return float(np.max(self.scores))
    def lowest_score(self): return float(np.min(self.scores))
    def std(self): return float(np.std(self.scores, ddof=0))

    # Compare by average score
    def __gt__(self, other): return self.average() > other.average()
    def __lt__(self, other): return self.average() < other.average()
    def __repr__(self): return f"Student(name={self.name!r}, avg={self.average():.2f})"

class Department:
    def __init__(self, name: str):
        self.name = name
        self.students = []  # list[Student]

    def add_student(self, student: Student):
        self.students.append(student)

    def department_average(self):
        # subject-wise average across all students (stack and mean axis=0)
        if not self.students: return None
        stacked = np.vstack([s.scores for s in self.students])
        return np.mean(stacked, axis=0)

    def topper(self):
        return max(self.students) if self.students else None

    def rank_students(self):
        return sorted(self.students, reverse=True)

    def weakest_subject(self):
        # return index of column with lowest mean
        avg = self.department_average()
        return int(np.argmin(avg)) if avg is not None else None

class Institute:
    def __init__(self):
        self.departments = []  # list[Department]

    def add_department(self, dept: Department):
        self.departments.append(dept)

    def institute_average(self):
        # subject-wise average across all departments/students
        all_scores = []
        for d in self.departments:
            for s in d.students:
                all_scores.append(s.scores)
        if not all_scores: return None
        return np.mean(np.vstack(all_scores), axis=0)

    def best_department(self):
        # highest average (mean of all subjects)
        if not self.departments: return None
        return max(self.departments, key=lambda d: np.mean(d.department_average()))

    def overall_topper(self):
        all_students = [s for d in self.departments for s in d.students]
        return max(all_students) if all_students else None

    def find_student_by_roll(self, roll_number: int):
        for d in self.departments:
            for s in d.students:
                if s.roll_number == roll_number:
                    return s
        return None

# --- Quick demo ----
cs = Department("CS")
ee = Department("EE")
for i in range(1, 6):
    cs.add_student(Student(f"CS_Student_{i}", 100+i, np.random.randint(60, 100, size=6)))
    ee.add_student(Student(f"EE_Student_{i}", 200+i, np.random.randint(55, 98, size=6)))

inst = Institute()
inst.add_department(cs); inst.add_department(ee)

print("CS topper:", cs.topper())
print("EE weakest subject index:", ee.weakest_subject())
print("Institute avg (subject-wise):", inst.institute_average().round(2))
print("Overall topper:", inst.overall_topper())
print("Search roll 103:", inst.find_student_by_roll(103))

## 2) NumPy

**Why:** fast vectorized math on `ndarray`s (homogeneous types).

**Essentials:**
- Creation: `np.array`, `np.zeros/ones/eye`, `np.arange`, `np.linspace`, `np.random.*`
- Shape/Reshape/Transpose: `.shape`, `.reshape`, `.T`
- Indexing/Slicing: `a[i,j]`, `a[:,0]`, boolean masks `a[a>0]`
- Aggregations: `sum/mean/std/min/max`, `axis` arg
- Sorting & Top-k: `np.sort`, `np.argsort`, `np.argmax/argmin`
- Broadcasting: operations auto-expand across compatible shapes
- Randomness: `np.random.seed(42)` for reproducibility

**Common patterns (from labs):**
- BMI computation from height/weight arrays
- 2D sales matrix ⟶ weekly totals, best product, top-2 weeks via `argsort`


In [None]:
import numpy as np

# Reproducible RNG
np.random.seed(42)

# --- BMI Example ---
heights_in = np.array([65, 70, 75])     # inches
weights_lb = np.array([150, 180, 210])  # pounds
heights_m = heights_in * 0.0254
weights_kg = weights_lb * 0.453592
bmi = weights_kg / (heights_m ** 2)
print("BMI:", bmi.round(2))
print("BMI<25 mask:", bmi < 25)
print("BMI<25 values:", bmi[bmi < 25].round(2))

# --- 2D sales (products x weeks) ---
sales = np.random.randint(10, 200, size=(6, 8))  # 6 products, 8 weeks
weekly_totals = sales.sum(axis=0)
best_product_idx = sales.sum(axis=1).argmax()
top2_weeks_each_product = np.argsort(sales, axis=1)[:, -2:]
print("Weekly totals:", weekly_totals)
print("Best product index:", best_product_idx)
print("Top-2 weeks per product (col indices):\n", top2_weeks_each_product)

## 3) Pandas

**Essentials:**
- IO: `pd.read_csv`, `to_csv`, `read_excel`, `json`, ...
- Inspect: `.head()`, `.info()`, `.shape`, `.dtypes`, `.describe()`
- Select: `df['col']`, `df[['c1','c2']]`, `df.loc[row_label, col_label]`, `df.iloc[i,j]`
- Filter: boolean masks `df[df.col>0]`, combine with `&` / `|`; or `df.query("col>0 and y==1")`
- Assign: `df.loc[mask, 'col'] = val` (avoid chained-indexing warnings)
- Missing: `df.isna()`, `df.dropna()`, `df.fillna(val|method='ffill'/'bfill')`
- Group/Aggregate: `df.groupby(keys).agg({...})`
- Pipe/Chaining: `df.pipe(f).pipe(g)` for readable pipelines

**Common patterns (from labs):**
- Fill missing with median/mean
- Drop high-missing columns
- Z-score / IQR outlier flags
- Pipeline functions: `handle_missing → detect_outliers → summarize`


In [None]:
import pandas as pd
import numpy as np

# Sample mini-dataset (Titanic-like fields)
df = pd.DataFrame({
    'Age':      [22, 38, 26, np.nan, 40],
    'Fare':     [7.25, 71.83, 7.925, 8.05, 15.50],
    'Survived': [0, 1, 1, 0, 1],
    'Cabin':    ['C85', None, None, 'C123', 'E33']
})
print(df.head(), "\n")
print("shape:", df.shape, "columns:", list(df.columns))

# Handle missing: fill Age with median, drop Cabin
df['Age'] = df['Age'].fillna(df['Age'].median())
df = df.drop(columns=['Cabin'])

# Derived feature
df['Fare_per_Year'] = (df['Fare'] / df['Age']).round(3)
print("\nCleaned DF:\n", df)

# GroupBy example
grouped = df.groupby('Survived')[['Age','Fare']].mean().round(2)
print("\nMeans by Survived:\n", grouped)

# Z-score outlier flag on Fare
z = (df['Fare'] - df['Fare'].mean()) / df['Fare'].std(ddof=0)
df['Outlier_Fare_Z'] = (z.abs() > 3)
print("\nOutlier flags (Z>3):\n", df[['Fare','Outlier_Fare_Z']])

# Simple pipeline functions
def handle_missing(data: pd.DataFrame) -> pd.DataFrame:
    d = data.copy()
    if 'Age' in d: d['Age'] = d['Age'].fillna(d['Age'].median())
    return d

def detect_outliers(data: pd.DataFrame) -> pd.DataFrame:
    d = data.copy()
    if 'Fare' in d:
        z = (d['Fare'] - d['Fare'].mean()) / d['Fare'].std(ddof=0)
        d['Outlier_Fare_Z'] = (z.abs() > 3)
    return d

def summarize(data: pd.DataFrame) -> pd.DataFrame:
    return data.describe(include='all')

summary = (df.pipe(handle_missing)
             .pipe(detect_outliers)
             .pipe(summarize))
print("\nSummary via pipeline:\n", summary)

## 4) Matplotlib

**Essentials:**
- Quick plots: `plt.plot`, `plt.scatter`, `plt.bar`, `plt.hist`, `plt.pie`
- Anatomy: *Figure* (canvas) → *Axes* (subplots)
- OO API: `fig, ax = plt.subplots(); ax.plot(...); fig.suptitle(...)`
- Decorate: `plt.title/xlabel/ylabel`, `plt.legend`, `plt.axhline/axvline`, `plt.xlim/ylim`
- Subplots: `plt.subplots(r, c)`, `axs[i,j]`
- Save: `plt.savefig("out.png")` before `plt.show()`

**Exam-friendly patterns:**
- 2x2 dashboard (line, bar, scatter, pie)
- Highlight point: plot a marker at argmax
- Reference lines: healthy thresholds, targets


In [None]:
import numpy as np
import matplotlib.pyplot as plt

months = np.arange(1, 13)
steps = np.array([220,200,250,270,300,310,290,280,260,240,230,250]) * 1000
cals  = np.array([68,64,72,76,82,85,80,78,74,70,69,73]) * 1000
sleep = np.array([7.1,6.9,7.3,7.0,7.2,7.4,6.8,6.9,7.1,6.7,7.0,7.2])
hrate = np.array([75,78,74,76,73,72,77,76,75,79,78,74])

# 2x2 dashboard
fig, axs = plt.subplots(2, 2, figsize=(10,8))

# (0,0) line
axs[0,0].plot(months, steps, marker='o', label='Steps')
axs[0,0].plot(months, cals, marker='s', label='Calories')
imax = steps.argmax()
axs[0,0].plot(months[imax], steps[imax], 'ro')
axs[0,0].set_title("Steps & Calories"); axs[0,0].legend()

# (0,1) bar
axs[0,1].bar(months, sleep)
axs[0,1].axhline(7, linestyle='--', color='red', label='7h')
axs[0,1].set_title("Sleep Hours"); axs[0,1].legend()

# (1,0) scatter colored by steps
sc = axs[1,0].scatter(sleep, hrate, c=steps, cmap='viridis')
axs[1,0].set_xlabel('Sleep (h)'); axs[1,0].set_ylabel('Heart Rate')
axs[1,0].set_title("Heart Rate vs Sleep")
fig.colorbar(sc, ax=axs[1,0], label='Steps')

# (1,1) pie
axs[1,1].pie([steps.sum(), cals.sum()], labels=['Steps','Calories'],
             autopct='%1.1f%%')
axs[1,1].set_title("Yearly Steps vs Calories")

fig.suptitle("Health & Fitness Analysis (Demo)")
plt.tight_layout()
plt.show()

## 5) Seaborn

**Essentials:**
- High-level API built on Matplotlib; integrates with DataFrames.
- `hue`, `style`, `size` add semantics (groups/colors/markers).
- Common: `sns.scatterplot`, `sns.boxplot`, `sns.violinplot`, `sns.barplot` (mean by default),
  `sns.heatmap` (for correlations), `sns.pairplot`, `sns.jointplot`.
- Theme: `sns.set_theme()`, palettes, `sns.set_context('talk')` for bigger fonts.

**Lab-style tasks (heart data):**
- Pairplot on selected cols with `hue='target'`
- Box/Violin splits by category
- Heatmap of `df.corr()` with `annot=True`
- Jointplot (`kind='scatter'` / `'kde'`) for (age vs thalach)
- Barplots of group means with `hue` (e.g., `cp` vs avg `thalach` by `target`)

> The code auto-loads **heart.csv** from this folder if present; otherwise uses a tiny demo sample.


In [None]:
import seaborn as sns
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from pathlib import Path

sns.set_theme()

heart_path = Path('/mnt/data/heart.csv')
if heart_path.exists():
    heart_df = pd.read_csv(heart_path)
else:
    # Tiny demo fallback
    heart_df = pd.DataFrame({
        'age':    [63, 41, 67, 62, 58, 45, 54, 50],
        'sex':    [1,  0,  1,  0,  1,  0,  1,  0],
        'cp':     [3,  1,  0,  0,  2,  2,  1,  3],
        'trestbps':[145,130,160,140,120,130,150,110],
        'chol':   [233,204,286,268,354,180,250,210],
        'thalach':[150,172,108,160,165,185,140,175],
        'target': [1,  1,  0,  0,  1,  1,  0,  1]
    })

# 1) Pairplot
sns.pairplot(heart_df[['age','chol','thalach','target']], hue='target')

# 2) Boxplots
plt.figure(figsize=(6,4))
sns.boxplot(x='target', y='trestbps', data=heart_df)
plt.title('Resting BP by Heart Disease (target)')
plt.figure(figsize=(6,4))
sns.boxplot(x='sex', y='chol', data=heart_df)
plt.title('Cholesterol by Sex')
plt.show()

# 3) Violin plots
plt.figure(figsize=(6,4))
sns.violinplot(x='target', y='thalach', data=heart_df, inner='quartile')
plt.title('Max Heart Rate by Heart Disease')
plt.figure(figsize=(6,4))
sns.violinplot(x='sex', y='age', data=heart_df, inner='quartile')
plt.title('Age Distribution by Sex')
plt.show()

# 4) Correlation heatmap
corr = heart_df[['age','sex','cp','trestbps','chol','thalach','target']].corr(numeric_only=True)
sns.heatmap(corr, annot=True, fmt='.2f', cmap='coolwarm')
plt.title('Correlation Matrix (Heart Data)')
plt.show()

# 5) Jointplots (age vs thalach)
sns.jointplot(x='age', y='thalach', data=heart_df, hue='target', kind='scatter')
sns.jointplot(x='age', y='thalach', data=heart_df, hue='target', kind='kde')

# 6) Barplots (means with hue)
plt.figure(figsize=(6,4))
sns.barplot(x='cp', y='thalach', hue='target', data=heart_df, estimator=np.mean, errorbar='sd')
plt.title('Chest Pain Type vs Avg Max HeartRate')
plt.figure(figsize=(6,4))
sns.barplot(x='sex', y='chol', hue='target', data=heart_df, estimator=np.mean, errorbar='sd')
plt.title('Sex vs Avg Cholesterol')
plt.show()

---

### Quick Reference
- **NumPy:** `np.mean(a, axis=0)`, `np.argsort(a, axis=1)[:, -k:]`, boolean masks, broadcasting
- **Pandas:** `df.loc[mask, 'col'] = v`, `df.groupby(k).agg({...})`, `df.fillna(...)`, pipelines with `.pipe`
- **Matplotlib:** `fig, axs = plt.subplots(...)`, `.set_title`, `plt.axhline`, `plt.savefig(...)`
- **Seaborn:** `hue`, `pairplot`, `jointplot`, `heatmap(corr, annot=True)`, `violinplot`, `boxplot`, `barplot(estimator=np.mean)`

> Tip: If a file (e.g., `heart.csv`) exists alongside this notebook, cells will load it; otherwise, they run with small demo data.
