<a href="https://colab.research.google.com/github/mallelamanojkumar90/AIML/blob/main/Week1_Day7_Hackathon_Challenge.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Week 1, Day 7: Hackathon Challenge
## Create a Simple Python Program for Data Analysis

## Challenge Overview
Create a data analysis program that demonstrates your understanding of:
- Python basics
- Data structures
- Functions
- NumPy operations

## Task Description
You will create a program that analyzes student performance data, including:
1. Data loading and cleaning
2. Basic statistical analysis
3. Data visualization
4. Report generation

In [None]:
# Import required libraries
import numpy as np
import matplotlib.pyplot as plt

## Part 1: Data Generation and Structure

In [None]:
# Create sample student data
def generate_student_data(num_students=30):
    np.random.seed(42)  # For reproducibility

    # Generate random scores
    math_scores = np.random.normal(75, 15, num_students).round(2)
    science_scores = np.random.normal(70, 12, num_students).round(2)
    english_scores = np.random.normal(80, 10, num_students).round(2)

    # Ensure scores are between 0 and 100
    math_scores = np.clip(math_scores, 0, 100)
    science_scores = np.clip(science_scores, 0, 100)
    english_scores = np.clip(english_scores, 0, 100)

    return {
        'math': math_scores,
        'science': science_scores,
        'english': english_scores
    }

# Generate data
student_data = generate_student_data()
print("Sample of generated data:")
for subject in student_data:
    print(f"\n{subject.capitalize()} scores (first 5):", student_data[subject][:5])

## Part 2: Statistical Analysis

In [None]:
def analyze_scores(scores_dict):
    analysis_results = {}

    for subject, scores in scores_dict.items():
        analysis_results[subject] = {
            'mean': np.mean(scores),
            'median': np.median(scores),
            'std': np.std(scores),
            'min': np.min(scores),
            'max': np.max(scores)
        }

    return analysis_results

# Perform analysis
analysis = analyze_scores(student_data)

# Print results
print("Statistical Analysis Results:")
for subject, stats in analysis.items():
    print(f"\n{subject.capitalize()}:")
    for metric, value in stats.items():
        print(f"{metric}: {value:.2f}")

## Part 3: Data Visualization

In [None]:
def visualize_data(data_dict):
    plt.figure(figsize=(15, 5))

    # Box plot
    plt.subplot(131)
    plt.boxplot(data_dict.values())
    plt.xticks(range(1, len(data_dict) + 1), data_dict.keys())
    plt.title('Score Distribution by Subject')
    plt.ylabel('Scores')

    # Histogram
    plt.subplot(132)
    for subject, scores in data_dict.items():
        plt.hist(scores, alpha=0.5, label=subject)
    plt.title('Score Frequency Distribution')
    plt.xlabel('Scores')
    plt.ylabel('Frequency')
    plt.legend()

    # Bar plot of means
    plt.subplot(133)
    means = [np.mean(scores) for scores in data_dict.values()]
    plt.bar(data_dict.keys(), means)
    plt.title('Average Scores by Subject')
    plt.ylabel('Mean Score')

    plt.tight_layout()
    plt.show()

# Create visualizations
visualize_data(student_data)

## Part 4: Performance Analysis and Grading

In [None]:
def analyze_performance(data_dict):
    # Calculate overall scores
    overall_scores = np.mean(list(data_dict.values()), axis=0)

    # Grade distribution function
    def assign_grade(score):
        if score >= 90: return 'A'
        elif score >= 80: return 'B'
        elif score >= 70: return 'C'
        elif score >= 60: return 'D'
        else: return 'F'

    # Calculate grades
    grades = [assign_grade(score) for score in overall_scores]
    grade_counts = {grade: grades.count(grade) for grade in set(grades)}

    # Calculate pass rate (grades A, B, C)
    pass_rate = sum(1 for grade in grades if grade in 'ABC') / len(grades) * 100

    return {
        'grade_distribution': grade_counts,
        'pass_rate': pass_rate,
        'overall_average': np.mean(overall_scores)
    }

# Perform performance analysis
performance = analyze_performance(student_data)

# Print results
print("Performance Analysis Results:")
print(f"\nGrade Distribution:")
for grade, count in performance['grade_distribution'].items():
    print(f"Grade {grade}: {count} students")
print(f"\nPass Rate: {performance['pass_rate']:.2f}%")
print(f"Overall Average: {performance['overall_average']:.2f}")

## Challenge Tasks

Now it's your turn! Complete the following tasks:

1. Modify the data generation to include more subjects or different score distributions
2. Add new statistical measures to the analysis
3. Create additional visualizations
4. Implement a function to identify top performers and subjects needing improvement
5. Generate a comprehensive report combining all analyses

### Bonus Challenges:
- Add student demographics and analyze performance patterns
- Implement data export functionality to CSV
- Create a correlation analysis between subjects
- Add predictive analytics for future performance

In [None]:
# Your solution here

# Example structure for a solution:
def enhanced_analysis():
    # 1. Generate enhanced data
    # Your code here

    # 2. Perform additional statistical analysis
    # Your code here

    # 3. Create new visualizations
    # Your code here

    # 4. Identify patterns and insights
    # Your code here

    # 5. Generate comprehensive report
    # Your code here

    pass

## Evaluation Criteria

Your solution will be evaluated based on:
1. Code organization and clarity (20%)
2. Proper use of Python data structures and NumPy (25%)
3. Accuracy of analysis and calculations (25%)
4. Visualization quality and insights (20%)
5. Additional features and creativity (10%)

## Submission Guidelines
1. Ensure all code is well-commented
2. Include a brief explanation of your approach
3. Document any additional features you implemented
4. Test your code with different datasets