# **Q1 — Generate a CSV File Using Python Libraries (pandas)**

### ✔️ **Python Code**

```python
import pandas as pd

# Your details
data = {
    "ID": ["YOUR_ID"],
    "Name": ["YOUR_NAME"]
}

# Create dataframe
df = pd.DataFrame(data)

# Save as CSV
df.to_csv("student_info.csv", index=False)

print("CSV file created successfully!")
```

### **Notes**

* `pandas.DataFrame()` is used to store tabular data.
* `to_csv()` writes the data to a CSV file.
* `index=False` removes the row index column.

---

# ***Q2 — Data Cleaning, Normalization, Encoding, Querying**

### Using **pandas**, **sklearn**, **numpy**

---

# Step-by-Step Clean Code (FULL LIBRARY-BASED)

```python
import pandas as pd
import numpy as np
from sklearn.preprocessing import MinMaxScaler
from sklearn.preprocessing import OneHotEncoder

# ----------------------------
# STEP 0: Create the dataset
# ----------------------------
data = {
    "Student_ID": [101, 102, 103, 104, 105],
    "CGPA": [3.8, 3.2, np.nan, 3.9, 2.7],           # Missing value
    "Major": ["CSE", "EEE", "Business", "CSE", "Economics"],
    "Internships": [2, 1, 0, 3, np.nan],             # Missing value
    "Placed": ["Yes", "No", "No", "Yes", "No"]
}

df = pd.DataFrame(data)
print("Original Data:\n", df)
```

---

# Handle Missing Values (Using pandas)

```python
# Fill missing CGPA with mean
df["CGPA"] = df["CGPA"].fillna(df["CGPA"].mean())

# Fill missing Internships with median
df["Internships"] = df["Internships"].fillna(df["Internships"].median())

print("\nAfter Handling Missing Values:\n", df)
```

---

# Normalize Numeric Columns (Using MinMaxScaler)

```python
scaler = MinMaxScaler()

df[["CGPA", "Internships"]] = scaler.fit_transform(df[["CGPA", "Internships"]])

print("\nAfter Normalization:\n", df)
```

---

# Encode Categorical Columns (Using pandas get_dummies)

```python
df_encoded = pd.get_dummies(df, columns=["Major", "Placed"], drop_first=True)

print("\nAfter Encoding Categorical Features:\n", df_encoded)
```

---

# Find Student With Highest CGPA in CSE Department

```python
# Filter CSE department
cse_df = df[df["Major"] == "CSE"]

# Find row with max CGPA
top_cgpa_cse = cse_df.loc[cse_df["CGPA"].idxmax()]

print("\nStudent with Highest CGPA in CSE Dept:\n", top_cgpa_cse)
```

---

# **FINAL OUTPUTS YOU WILL GET**

### **After Missing Value Handling**

* CGPA and Internships will have no NaN values.

### **After Normalization**

* CGPA and Internships will be between **0 and 1**.

### **After Encoding**

You will get columns like:

```
Major_EEE, Major_Business, Major_Economics, Placed_Yes
```

### **Highest CGPA in CSE Dept**

Will return student:

* Student_ID: **104**
* CGPA: highest among CSE students

### ✔️ Missing Value Handling

* **Mean** for continuous values (CGPA)
* **Median** for skewed values (Internships)

### ✔️ Normalization Method

* **MinMaxScaler**:
  [
  x' = \frac{x - \min}{\max - \min}
  ]

### ✔️ Encoding

* **One-Hot Encoding via pandas.get_dummies()**

### ✔️ Querying

* Filter by department
* Use `.idxmax()` to find max CGPA
