<a href="https://colab.research.google.com/github/mallelamanojkumar90/AIML/blob/main/Week2_Day7_Review_and_FeedBack.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Week 2, Day 7: Review and Feedback Session

## Session Overview
This session will review the key concepts covered in Week 2 and provide practice exercises to reinforce learning:

1. Introduction to Pandas
2. Data Visualization
3. Exploratory Data Analysis
4. Basic Statistics
5. Probability Fundamentals

## Learning Objectives
- Reinforce key concepts
- Practice problem-solving
- Identify areas for improvement
- Prepare for Week 3

In [None]:
# Import required libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats

## 1. Pandas Review Exercise

In [None]:
# Create sample dataset
data = {
    'Name': ['John', 'Emma', 'Alex', 'Sarah', 'Mike'],
    'Age': [25, 30, 22, 28, 35],
    'City': ['New York', 'London', 'Paris', 'Tokyo', 'London'],
    'Salary': [50000, 60000, 45000, 70000, 65000]
}

df = pd.DataFrame(data)

# Practice tasks
print("1. Basic DataFrame operations:")
print("\nShape:", df.shape)
print("\nColumns:", df.columns.tolist())
print("\nData types:\n", df.dtypes)

print("\n2. Data filtering:")
print("\nPeople over 25:")
print(df[df['Age'] > 25])

print("\n3. Groupby operation:")
print("\nAverage salary by city:")
print(df.groupby('City')['Salary'].mean())

## 2. Data Visualization Review

In [None]:
def visualization_review():
    # Generate sample data
    np.random.seed(42)
    data = np.random.normal(100, 15, 1000)
    categories = ['A', 'B', 'C', 'D']
    values = [23, 45, 56, 78]

    # Create multiple plots
    fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(2, 2, figsize=(12, 10))

    # Histogram
    sns.histplot(data, kde=True, ax=ax1)
    ax1.set_title('Histogram with KDE')

    # Bar plot
    ax2.bar(categories, values)
    ax2.set_title('Bar Plot')

    # Box plot
    sns.boxplot(y=data, ax=ax3)
    ax3.set_title('Box Plot')

    # Scatter plot
    x = np.random.normal(50, 10, 100)
    y = x + np.random.normal(0, 10, 100)
    ax4.scatter(x, y)
    ax4.set_title('Scatter Plot')

    plt.tight_layout()
    plt.show()

visualization_review()

## 3. Statistical Concepts Review

In [None]:
def statistics_review():
    # Generate sample data
    np.random.seed(42)
    data = np.random.normal(100, 15, 1000)

    # Calculate basic statistics
    print("Basic Statistics:")
    print(f"Mean: {np.mean(data):.2f}")
    print(f"Median: {np.median(data):.2f}")
    print(f"Standard Deviation: {np.std(data):.2f}")
    print(f"Variance: {np.var(data):.2f}")

    # Calculate z-scores
    z_scores = stats.zscore(data)

    # Visualize original data and z-scores
    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 5))

    sns.histplot(data, kde=True, ax=ax1)
    ax1.set_title('Original Data Distribution')

    sns.histplot(z_scores, kde=True, ax=ax2)
    ax2.set_title('Z-Score Distribution')

    plt.tight_layout()
    plt.show()

statistics_review()

## 4. Comprehensive Review Exercise

In [None]:
def comprehensive_review():
    # Create sample dataset
    np.random.seed(42)
    n_samples = 200

    data = {
        'feature1': np.random.normal(0, 1, n_samples),
        'feature2': np.random.normal(2, 1.5, n_samples),
        'category': np.random.choice(['A', 'B', 'C'], n_samples),
        'target': np.random.uniform(0, 100, n_samples)
    }

    df = pd.DataFrame(data)

    # 1. Data Analysis
    print("Basic Statistics:")
    print(df.describe())

    # 2. Visualization
    plt.figure(figsize=(15, 5))

    # Scatter plot
    plt.subplot(131)
    sns.scatterplot(data=df, x='feature1', y='feature2', hue='category')
    plt.title('Feature Relationship')

    # Box plot
    plt.subplot(132)
    sns.boxplot(data=df, x='category', y='target')
    plt.title('Target Distribution by Category')

    # Distribution plot
    plt.subplot(133)
    sns.histplot(data=df, x='target', kde=True)
    plt.title('Target Distribution')

    plt.tight_layout()
    plt.show()

    # 3. Statistical Analysis
    print("\nCorrelation Analysis:")
    correlation = df[['feature1', 'feature2', 'target']].corr()
    print(correlation)

comprehensive_review()

## 5. Week 2 Review Quiz

### Multiple Choice Questions

1. Which Pandas method is used to handle missing values?
   - a) remove()
   - b) fillna()
   - c) dropna()
   - d) Both b and c

2. What type of plot is best for showing the distribution of a continuous variable?
   - a) Bar plot
   - b) Histogram
   - c) Scatter plot
   - d) Line plot

3. Which measure of central tendency is most affected by outliers?
   - a) Mean
   - b) Median
   - c) Mode
   - d) Range

4. What does the standard deviation measure?
   - a) Central tendency
   - b) Variability
   - c) Correlation
   - d) Causation

5. Which visualization library is built on top of Matplotlib?
   - a) NumPy
   - b) Pandas
   - c) Seaborn
   - d) SciPy

6. What is the primary purpose of EDA?
   - a) Model building
   - b) Data cleaning
   - c) Understanding patterns in data
   - d) Feature engineering

7. Which probability concept is most relevant to machine learning?
   - a) Permutations
   - b) Combinations
   - c) Conditional probability
   - d) Counting principles

8. What does df.describe() show?
   - a) Column names
   - b) Missing values
   - c) Statistical summary
   - d) Data types

9. Which plot is best for showing relationships between variables?
   - a) Histogram
   - b) Scatter plot
   - c) Box plot
   - d) Bar plot

10. What is the IQR used for?
    - a) Measuring central tendency
    - b) Detecting outliers
    - c) Calculating correlation
    - d) Determining causation

Answers: 1-d, 2-b, 3-a, 4-b, 5-c, 6-c, 7-c, 8-c, 9-b, 10-b

## Week 2 Summary

### Key Concepts Covered:
1. Pandas DataFrame operations and data manipulation
2. Data visualization techniques with Matplotlib and Seaborn
3. Exploratory Data Analysis methods
4. Basic statistical concepts and calculations
5. Probability fundamentals and applications

### Preparation for Week 3:
- Review any challenging concepts
- Complete practice exercises
- Prepare for introduction to Machine Learning
- Review Python programming fundamentals

### Additional Resources:
- Pandas documentation: https://pandas.pydata.org/docs/
- Seaborn tutorial: https://seaborn.pydata.org/tutorial.html
- Statistics refresher: https://www.khanacademy.org/math/statistics-probability