The following assignments are provided to the learner to assess their python understanding after the python base camp 1. Do attempt them without the help of Gen AI ðŸ™‚. Of course you may take help of sample programs, online references etc â€¦!

Assignment 1:::

The file â€˜city_temperature.csvâ€™ provides sample data for daily average temperature for 5 cities across 3 months. Make python program to read and process this data into numpy arrays for the following requirements:

Function to identify the day when max temperature recorded for each city. Return as a dict (city - temp pair)
Function to identify the monthly average temperature for each of the city. Return dict.
Consider the timeline into 5 days of sliding window. Identify for each city : the number of 5 day stretch(s) where temperature stayed more than that of monthly average. 


In [1]:
import pandas as pd
import numpy as np

# Read the CSV file
df = pd.read_csv('city_temperature.csv - 1')

# Display the first few rows to understand the data
print("Dataset shape:", df.shape)
print("\nFirst few rows:")
print(df.head())
print("\nData info:")
print(df.info())

Dataset shape: (92, 6)

First few rows:
         Date  London  Tokyo  Sydney  Cairo   Rio
0  01-06-2024    15.5   23.1    13.8   28.5  21.0
1  02-06-2024    16.1   23.5    14.0   28.8  21.2
2  03-06-2024    16.3   23.8    14.2   29.1  21.5
3  04-06-2024    15.9   24.0    13.9   29.5  21.3
4  05-06-2024    16.5   24.2    13.5   29.8  21.7

Data info:
<class 'pandas.DataFrame'>
RangeIndex: 92 entries, 0 to 91
Data columns (total 6 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   Date    92 non-null     str    
 1   London  92 non-null     float64
 2   Tokyo   92 non-null     float64
 3   Sydney  92 non-null     float64
 4   Cairo   92 non-null     float64
 5   Rio     92 non-null     float64
dtypes: float64(5), str(1)
memory usage: 4.4 KB
None


In [2]:
# Convert data to numpy arrays for each city
cities = df.columns[1:]  # Skip the 'Date' column
city_arrays = {}

for city in cities:
    city_arrays[city] = df[city].values

print("Cities processed:", list(city_arrays.keys()))
print("\nArray shape for London:", city_arrays['London'].shape)

Cities processed: ['London', 'Tokyo', 'Sydney', 'Cairo', 'Rio']

Array shape for London: (92,)


In [3]:
# Function 1: Identify the day when max temperature recorded for each city
def max_temperature_day(city_arrays, df):
    """
    Find the day (index) when maximum temperature was recorded for each city.
    
    Args:
        city_arrays: Dictionary with city names as keys and numpy arrays as values
        df: Original dataframe with Date column
    
    Returns:
        Dictionary with city as key and (day, max_temp) as value
    """
    max_temp_days = {}
    
    for city, temps in city_arrays.items():
        max_idx = np.argmax(temps)  # Get index of max temperature
        max_temp = temps[max_idx]
        date = df['Date'].iloc[max_idx]
        max_temp_days[city] = {'date': date, 'day_index': max_idx, 'max_temperature': max_temp}
    
    return max_temp_days

# Function 2: Identify the monthly average temperature for each city
def monthly_average_temperature(city_arrays, df):
    """
    Calculate the average temperature for each city across the entire period.
    
    Args:
        city_arrays: Dictionary with city names as keys and numpy arrays as values
    
    Returns:
        Dictionary with city as key and average temperature as value
    """
    monthly_avg = {}
    
    for city, temps in city_arrays.items():
        avg_temp = np.mean(temps)
        monthly_avg[city] = avg_temp
    
    return monthly_avg

# Function 3: Count 5-day stretches where temperature stayed above monthly average
def five_day_stretches_above_average(city_arrays, monthly_avg):
    """
    Identify stretches of 5 consecutive days where temperature exceeded the monthly average.
    
    Args:
        city_arrays: Dictionary with city names as keys and numpy arrays as values
        monthly_avg: Dictionary with city as key and monthly average temperature as value
    
    Returns:
        Dictionary with city as key and count of stretches above average as value
    """
    stretches_above_avg = {}
    window_size = 5
    
    for city, temps in city_arrays.items():
        avg_temp = monthly_avg[city]
        count = 0
        
        # Iterate through the array with a sliding window of size 5
        for i in range(len(temps) - window_size + 1):
            window = temps[i:i + window_size]
            # Check if all temperatures in the window are above the monthly average
            if np.all(window > avg_temp):
                count += 1
        
        stretches_above_avg[city] = count
    
    return stretches_above_avg

print("All functions defined successfully!")

All functions defined successfully!


In [4]:
# Execute the functions
max_temp_days = max_temperature_day(city_arrays, df)
monthly_avg_temps = monthly_average_temperature(city_arrays, df)
stretches_above_avg = five_day_stretches_above_average(city_arrays, monthly_avg_temps)

print("=" * 70)
print("ASSIGNMENT 1 RESULTS")
print("=" * 70)

print("\n1. MAXIMUM TEMPERATURE DAY FOR EACH CITY:")
print("-" * 70)
for city, info in max_temp_days.items():
    print(f"\n{city}:")
    print(f"  Date: {info['date']}")
    print(f"  Day Index: {info['day_index']}")
    print(f"  Max Temperature: {info['max_temperature']:.1f}Â°C")

print("\n\n2. MONTHLY AVERAGE TEMPERATURE FOR EACH CITY:")
print("-" * 70)
for city, avg_temp in monthly_avg_temps.items():
    print(f"{city}: {avg_temp:.2f}Â°C")

print("\n\n3. COUNT OF 5-DAY STRETCHES WITH TEMPERATURE ABOVE MONTHLY AVERAGE:")
print("-" * 70)
for city, count in stretches_above_avg.items():
    print(f"{city}: {count} stretches")

ASSIGNMENT 1 RESULTS

1. MAXIMUM TEMPERATURE DAY FOR EACH CITY:
----------------------------------------------------------------------

London:
  Date: 30-08-2024
  Day Index: 90
  Max Temperature: 21.7Â°C

Tokyo:
  Date: 30-08-2024
  Day Index: 90
  Max Temperature: 31.9Â°C

Sydney:
  Date: 28-08-2024
  Day Index: 88
  Max Temperature: 16.0Â°C

Cairo:
  Date: 30-08-2024
  Day Index: 90
  Max Temperature: 36.8Â°C

Rio:
  Date: 30-08-2024
  Day Index: 90
  Max Temperature: 27.4Â°C


2. MONTHLY AVERAGE TEMPERATURE FOR EACH CITY:
----------------------------------------------------------------------
London: 19.23Â°C
Tokyo: 28.26Â°C
Sydney: 14.31Â°C
Cairo: 33.37Â°C
Rio: 24.31Â°C


3. COUNT OF 5-DAY STRETCHES WITH TEMPERATURE ABOVE MONTHLY AVERAGE:
----------------------------------------------------------------------
London: 42 stretches
Tokyo: 42 stretches
Sydney: 19 stretches
Cairo: 42 stretches
Rio: 36 stretches


Assignment 2:::

The file â€˜student_scores.csvâ€™ provides sample data of students and their test score along with attendance and project status. This is available in a CSV file. Use this content and create a python program (using pandas) for the following requirements. Make them as functions in a module.

Function that reads the CSV file and returns subject wise topper(s) and overall topper(s), such that:
Toppers shall have at least 60% attendance
Toppers shall have the project submitted.

A function that returns a data frame with the following columns added along with original data:
â€˜Average Scoreâ€™ : For each student
â€˜Gradeâ€™ :  based on average score (A : >= 90; B : 75 .. 89.99; C : 60 .. 74.99; D : <60)
â€˜Performanceâ€™ : 
â€˜Excellentâ€™ : Grade A and attendance > 90%, project submitted
â€˜Needs Attentionâ€™ : Grade D OR project not submitted OR attendance < 60%
â€˜Satisfactoryâ€™ : All others
A function that exports the summary statistics of the subject wise marks, attendance to a CSV file.



In [5]:
# Load and explore the student_scores data
df_students = pd.read_csv('student_scores.csv - 1')

print("Dataset shape:", df_students.shape)
print("\nFirst few rows:")
print(df_students.head())
print("\nColumn names:")
print(df_students.columns.tolist())
print("\nData info:")
print(df_students.info())

Dataset shape: (200, 6)

First few rows:
             Name  Math  Science  English  Attendance (%)  Project Submitted
0    Robert Roman    78       93       96           96.55               True
1  Joseph Sanchez    91       62       60           98.71               True
2  Christina Hall    68       76       57           99.80              False
3       Ann Brown    54       79       94           52.79               True
4  Thomas Herrera    82       96       86           86.85               True

Column names:
['Name', 'Math', 'Science', 'English', 'Attendance (%)', 'Project Submitted']

Data info:
<class 'pandas.DataFrame'>
RangeIndex: 200 entries, 0 to 199
Data columns (total 6 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   Name               200 non-null    str    
 1   Math               200 non-null    int64  
 2   Science            200 non-null    int64  
 3   English            200 non-null    int64  
 4   Attend

In [6]:
# Function 1: Get subject-wise and overall toppers with conditions
def get_toppers(df):
    """
    Find subject-wise and overall toppers with conditions:
    - At least 60% attendance
    - Project submitted
    
    Args:
        df: DataFrame with student data
    
    Returns:
        Dictionary with 'subject_toppers' and 'overall_toppers'
    """
    # Filter students based on conditions
    filtered_df = df[(df['Attendance (%)'] >= 60) & (df['Project Submitted'] == True)].copy()
    
    if filtered_df.empty:
        return {'subject_toppers': {}, 'overall_toppers': {'toppers': [], 'average_score': 0}}
    
    # Subject-wise toppers
    subjects = ['Math', 'Science', 'English']
    subject_toppers = {}
    
    for subject in subjects:
        max_score = filtered_df[subject].max()
        toppers = filtered_df[filtered_df[subject] == max_score]['Name'].tolist()
        subject_toppers[subject] = {
            'toppers': toppers,
            'score': max_score
        }
    
    # Overall topper(s) - based on average score
    filtered_df['Average_Score'] = filtered_df[subjects].mean(axis=1)
    max_avg = filtered_df['Average_Score'].max()
    overall_toppers = filtered_df[filtered_df['Average_Score'] == max_avg]['Name'].tolist()
    
    result = {
        'subject_toppers': subject_toppers,
        'overall_toppers': {
            'toppers': overall_toppers,
            'average_score': max_avg
        }
    }
    
    return result

# Function 2: Add Average Score, Grade, and Performance columns
def add_student_analysis(df):
    """
    Add columns for Average Score, Grade, and Performance to the dataframe.
    
    Args:
        df: DataFrame with student data
    
    Returns:
        DataFrame with new columns added
    """
    df_analysis = df.copy()
    subjects = ['Math', 'Science', 'English']
    
    # Calculate Average Score
    df_analysis['Average Score'] = df_analysis[subjects].mean(axis=1)
    
    # Assign Grade
    def assign_grade(avg_score):
        if avg_score >= 90:
            return 'A'
        elif avg_score >= 75:
            return 'B'
        elif avg_score >= 60:
            return 'C'
        else:
            return 'D'
    
    df_analysis['Grade'] = df_analysis['Average Score'].apply(assign_grade)
    
    # Assign Performance
    def assign_performance(row):
        grade = row['Grade']
        attendance = row['Attendance (%)']
        project = row['Project Submitted']
        
        # Excellent: Grade A and attendance > 90%, project submitted
        if grade == 'A' and attendance > 90 and project == True:
            return 'Excellent'
        
        # Needs Attention: Grade D OR project not submitted OR attendance < 60%
        if grade == 'D' or project == False or attendance < 60:
            return 'Needs Attention'
        
        # Satisfactory: All others
        return 'Satisfactory'
    
    df_analysis['Performance'] = df_analysis.apply(assign_performance, axis=1)
    
    return df_analysis

# Function 3: Export summary statistics
def export_summary_statistics(df, filename='summary_statistics.csv'):
    """
    Export subject-wise and overall summary statistics to CSV.
    
    Args:
        df: DataFrame with student data
        filename: Output filename
    
    Returns:
        Summary statistics dataframe
    """
    subjects = ['Math', 'Science', 'English']
    summary_data = []
    
    # Subject-wise statistics
    for subject in subjects:
        summary_data.append({
            'Subject': subject,
            'Mean': df[subject].mean(),
            'Median': df[subject].median(),
            'Std Dev': df[subject].std(),
            'Min': df[subject].min(),
            'Max': df[subject].max()
        })
    
    # Attendance statistics
    summary_data.append({
        'Subject': 'Attendance (%)',
        'Mean': df['Attendance (%)'].mean(),
        'Median': df['Attendance (%)'].median(),
        'Std Dev': df['Attendance (%)'].std(),
        'Min': df['Attendance (%)'].min(),
        'Max': df['Attendance (%)'].max()
    })
    
    summary_df = pd.DataFrame(summary_data)
    summary_df.to_csv(filename, index=False)
    
    return summary_df

print("All functions defined successfully!")

All functions defined successfully!


In [7]:
# Execute all Assignment 2 functions
print("=" * 80)
print("ASSIGNMENT 2 RESULTS")
print("=" * 80)

# Function 1: Get toppers
print("\n1. SUBJECT-WISE AND OVERALL TOPPERS (with conditions):")
print("-" * 80)
toppers_result = get_toppers(df_students)

print("\nSubject-Wise Toppers:")
for subject, info in toppers_result['subject_toppers'].items():
    print(f"  {subject}: {', '.join(info['toppers'])} (Score: {info['score']})")

print("\nOverall Topper(s):")
print(f"  {', '.join(toppers_result['overall_toppers']['toppers'])} (Average Score: {toppers_result['overall_toppers']['average_score']:.2f})")

# Function 2: Add analysis columns
print("\n\n2. STUDENT ANALYSIS WITH GRADES AND PERFORMANCE:")
print("-" * 80)
df_with_analysis = add_student_analysis(df_students)

# Display relevant columns
display_cols = ['Name', 'Math', 'Science', 'English', 'Average Score', 'Grade', 'Attendance (%)', 'Performance']
print("\nSample of processed data:")
print(df_with_analysis[display_cols].head(10).to_string(index=False))

print("\n\nPerformance Summary:")
performance_counts = df_with_analysis['Performance'].value_counts()
for perf, count in performance_counts.items():
    print(f"  {perf}: {count} students")

print("\n\nGrade Distribution:")
grade_counts = df_with_analysis['Grade'].value_counts().sort_index(ascending=False)
for grade, count in grade_counts.items():
    print(f"  Grade {grade}: {count} students")

# Function 3: Export summary statistics
print("\n\n3. SUMMARY STATISTICS (exported to CSV):")
print("-" * 80)
summary_stats = export_summary_statistics(df_with_analysis)
print(summary_stats.to_string(index=False))
print(f"\nFile saved as: summary_statistics.csv")

ASSIGNMENT 2 RESULTS

1. SUBJECT-WISE AND OVERALL TOPPERS (with conditions):
--------------------------------------------------------------------------------

Subject-Wise Toppers:
  Math: Rachel Mcneil, Ashley Garcia, Heidi Edwards (Score: 99)
  Science: Benjamin Stein (Score: 98)
  English: Warren Harris (Score: 99)

Overall Topper(s):
  Ashley Garcia (Average Score: 93.67)


2. STUDENT ANALYSIS WITH GRADES AND PERFORMANCE:
--------------------------------------------------------------------------------

Sample of processed data:
           Name  Math  Science  English  Average Score Grade  Attendance (%)     Performance
   Robert Roman    78       93       96      89.000000     B           96.55    Satisfactory
 Joseph Sanchez    91       62       60      71.000000     C           98.71    Satisfactory
 Christina Hall    68       76       57      67.000000     C           99.80 Needs Attention
      Ann Brown    54       79       94      75.666667     B           52.79 Needs Attenti

In [8]:
# Display the complete processed dataframe
print("\n" + "=" * 80)
print("COMPLETE STUDENT ANALYSIS DATAFRAME")
print("=" * 80)
print(f"\nTotal Students: {len(df_with_analysis)}\n")

# Show all columns for reference
print("Full dataframe (all columns):")
print(df_with_analysis.to_string(index=False))





COMPLETE STUDENT ANALYSIS DATAFRAME

Total Students: 200

Full dataframe (all columns):
               Name  Math  Science  English  Attendance (%)  Project Submitted  Average Score Grade     Performance
       Robert Roman    78       93       96           96.55               True      89.000000     B    Satisfactory
     Joseph Sanchez    91       62       60           98.71               True      71.000000     C    Satisfactory
     Christina Hall    68       76       57           99.80              False      67.000000     C Needs Attention
          Ann Brown    54       79       94           52.79               True      75.666667     B Needs Attention
     Thomas Herrera    82       96       86           86.85               True      88.000000     B    Satisfactory
     Zachary Walker    47       91       74           77.30               True      70.666667     C    Satisfactory
    Melinda Ramirez    60       40       63           85.29               True      54.333333     D

In [9]:
print("\nEnd of Assignment 2 Results")




End of Assignment 2 Results
