# Week 1 - Day 3 Lab: Data & Matrix Manipulation
In this lab, you'll work with a realistic weather dataset. You'll use **Pandas** to explore and clean the data, and **NumPy** to perform matrix operations.

**Dataset:** `hourly_weather_10_days.csv` (10 days of hourly weather data)

## Step 1: Load the Data
- Use Pandas to load the CSV file
- Display the first few rows
- Check the number of rows and columns

In [1]:
# TODO: Load the data into a DataFrame
import pandas as pd

# Replace the file path if needed
df = pd.read_csv('hourly_weather_10_days.csv')
print(df.head())

count_rows = df.shape[0]
print(f'The dataset contains {count_rows} rows.')
count_cols = df.shape[1]
print(f'The dataset contains {count_cols} Labels.')

             timestamp  temperature_C  humidity_%  wind_speed_kmph  \
0  2023-03-01 00:00:00           16.6        74.4              5.7   
1  2023-03-01 01:00:00           16.2        78.5              5.0   
2  2023-03-01 02:00:00           15.3        73.3              4.7   
3  2023-03-01 03:00:00           15.8        72.4              1.3   
4  2023-03-01 04:00:00           20.9        70.6              6.8   

   pressure_hPa  visibility_km  
0        1012.5            9.5  
1        1012.1           10.3  
2           NaN           11.1  
3        1005.0            8.9  
4        1016.3            9.8  
The dataset contains 240 rows.
The dataset contains 6 Labels.


## Step 2: Basic Exploration
- Check column names and data types
- Display basic statistics using `.describe()`
- Count missing values in each column

In [2]:
# TODO: Explore the DataFrame
print(df.info())
print(df.describe())

print(df.isna().sum())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 240 entries, 0 to 239
Data columns (total 6 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   timestamp        240 non-null    object 
 1   temperature_C    228 non-null    float64
 2   humidity_%       224 non-null    float64
 3   wind_speed_kmph  226 non-null    float64
 4   pressure_hPa     223 non-null    float64
 5   visibility_km    228 non-null    float64
dtypes: float64(5), object(1)
memory usage: 11.4+ KB
None
       temperature_C  humidity_%  wind_speed_kmph  pressure_hPa  visibility_km
count     228.000000  224.000000       226.000000    223.000000     228.000000
mean       21.315789   66.795982        10.105310   1011.884753       9.989474
std         3.421237    8.190300         3.940668      5.187080       1.022166
min        11.500000   47.800000         1.300000    998.100000       6.800000
25%        18.700000   61.075000         6.625000   1008.900000       9.275

## Step 3: Handle Missing Values
- Drop or fill missing values
- Justify your approach (e.g., fill with mean, forward fill, etc.)

In [3]:
# TODO: Fill missing values
# Example: df['column'] = df['column'].fillna(df['column'].mean())
# Fill in your logic here
# df["temperature_C"]=df["temperature_C"].fillna(df["temperature_C"].mean())
# df["humidity_%"]=df["humidity_%"].fillna(df["humidity_%"].mean())
# df["wind_speed_kmph"]=df["wind_speed_kmph"].fillna(df["wind_speed_kmph"].mean())
# df['pressure_hPa']=df['pressure_hPa'].fillna(df['pressure_hPa'].mean())
# df['visibility_km']=df['visibility_km'].fillna(df['visibility_km'].mean())
# df['visibility_kmp']=df['visibility_kmp'].fillna(df['visibility_kmp'].mean())
print(df.isna().sum())

df = df.fillna(df.mean(numeric_only=True))

print(df.isna().sum())


timestamp           0
temperature_C      12
humidity_%         16
wind_speed_kmph    14
pressure_hPa       17
visibility_km      12
dtype: int64
timestamp          0
temperature_C      0
humidity_%         0
wind_speed_kmph    0
pressure_hPa       0
visibility_km      0
dtype: int64


## Step 4: Data Analysis
- Calculate daily average temperature
- Find max, min, mean for each metric
- Which hour of the day is the most humid on average?

In [6]:
# TODO: Perform analysis
# Use groupby, aggregation, and filtering functions
# Placeholder example:
# df['timestamp'] = pd.to_datetime(df['timestamp'])
# df['hour'] = df['timestamp'].dt.hour
# avg_humidity_by_hour = df.groupby('hour')['humidity_%'].mean()

#Calculate the average temperature for each day
df['timestamp']=pd.to_datetime(df['timestamp'])
df['days']=df['timestamp'].dt.day
avg_humidity_by_day = df.groupby('days')['temperature_C'].mean()
print(avg_humidity_by_day)

# Calculate the average humidity for each day
metrics_summary = df[['temperature_C', 'humidity_%', 'wind_speed_kmph', 'pressure_hPa','visibility_km']].agg(['max', 'min', 'mean'])
print(metrics_summary)

# In which hour of the day is the humidity the highest on average?
df['hour'] = pd.to_datetime(df['timestamp']).dt.hour
avg_humidity_by_hour = df.groupby('hour')['humidity_%'].mean()
most_humid_hour = avg_humidity_by_hour.idxmax()
print(f"Most humid hour on average: {most_humid_hour}:00")




days
1     21.263158
2     21.258991
3     21.304825
4     21.425658
5     21.529825
6     21.858333
7     21.179825
8     20.947807
9     20.792325
10    21.597149
Name: temperature_C, dtype: float64
      temperature_C  humidity_%  wind_speed_kmph  pressure_hPa  visibility_km
max       28.700000   88.100000         17.80000   1027.000000      12.600000
min       11.500000   47.800000          1.30000    998.100000       6.800000
mean      21.315789   66.795982         10.10531   1011.884753       9.989474
Most humid hour on average: 1:00


## Step 5: NumPy Matrix Exercises
Convert relevant DataFrame columns into NumPy arrays and perform matrix operations.

In [7]:
# TODO: Extract temperature and wind_speed as NumPy arrays
import numpy as np

temp = df['temperature_C'].to_numpy()
wind = df['wind_speed_kmph'].to_numpy()

### a) Reshape into matrix form
- Assume each row is a day
- Reshape temperature into a (10, 24) matrix
- Calculate daily min, max, and mean using axis-based operations

In [None]:
# TODO: Reshape and aggregate
# Hint: temp_matrix = temp.reshape((10, 24))
# Write functions to find min, max, mean across rows

shaped_temp = temp.reshape((10,24))
#print(shaped_temp)

max_temp_row = np.max(shaped_temp, axis=1)
print("Maximum Temperature in a day: ",max_temp_row)
min_temp_row = np.min(shaped_temp, axis=1)
print("Minimum Temperature in a day: ",min_temp_row)
mean_temp_row = np.round(np.mean(shaped_temp, axis=1), 1)
print("Average Temperature in a day: ",mean_temp_row)

Maximum Temperature in a day:  [28.2 28.7 25.7 27.1 24.9 26.2 25.9 26.  27.1 28.5]
Minimum Temperature in a day:  [14.7 15.7 13.6 15.9 12.4 15.5 15.3 13.5 14.3 11.5]
Average Temperature in a day:  [21.3 21.3 21.3 21.4 21.5 21.9 21.2 20.9 20.8 21.6]


### b) Normalize the temperature matrix
- Subtract the mean and divide by std deviation
- Do it manually using NumPy functions

In [24]:
# TODO: Normalize temp_matrix
# Placeholder for function: def normalize(matrix):
# return ...

# # Apply it to temp_matrix
# Normalize the temperature matrix
# - Subtract the mean and divide by std deviation
# - Do it manually using NumPy functions
def temp_matrix(matrix):
    mean = np.mean(matrix, axis=1)
    std = np.std(matrix, axis=1)
    normalized_matrix = (matrix - mean[:, np.newaxis]) / std[:, np.newaxis]
    return normalized_matrix
normalized_temp = temp_matrix(shaped_temp)
print("Normalized Temperature Matrix: ",normalized_temp)

print("Minimum Temperature: ",normalized_temp.min())
print("Maximum Temperature : ",normalized_temp.max())
    

Normalized Temperature Matrix:  [[-1.27475461 -1.3841015  -1.63013202 -1.4934484  -0.09927547 -0.1266122
   0.42012229  0.33811211 -0.0172653   1.89630539  0.01438775  1.18555056
   1.13087711  0.58414263  1.73228504  0.69348953  0.20142849  0.6661528
   0.50213246  0.22876522 -0.83736702 -0.50932633 -0.42731616 -1.79415237]
 [-1.7532813  -1.81871456 -1.1970986  -0.41189949  0.24243309 -0.01929994
   0.4714495   0.07884995  0.76589917  2.43444727 -0.24831635  1.25664861
   0.27514972  1.06034883  0.43873287  0.20971646  0.86404905  0.01858247
   0.96219894  0.24243309 -0.44461612 -0.80449905 -1.55698153 -1.06623208]
 [-2.26136292 -1.64501376 -0.82321488 -1.58631384  1.28998223 -0.11881585
  -0.14816581 -0.17751577  0.29208359  1.28998223  1.02583259 -0.03076597
   1.23128231  0.17468375  0.32143355  0.99648263  0.90843275  0.0032182
   0.58558319  0.90843275  0.40948343 -0.11881585 -0.79386492 -1.73306364]
 [-1.50871823 -0.87019625 -1.12560504 -1.73220093  0.02373453  0.21529113
   1.8

### c) Apply custom mask/filter
- Create a mask for wind speed > 15 kmph
- Use it to extract high-wind readings

In [None]:
# TODO: Create boolean mask and filter wind speeds
# mask = wind > 15
# high_wind = wind[mask]

boolean_mask = wind > 15
high_wind = wind[boolean_mask]
print("High Wind Speeds: ",high_wind)

High Wind Speeds:  [17.6 16.  16.5 16.3 16.7 15.8 17.8 15.1 16.3 15.2 17.  15.9 15.6 15.8
 15.4 15.6 16.3 15.3 16.2 16.9 15.3 15.2 15.5 17.4 17.4 15.4 15.4 16.5
 17.  15.7]


## Final Challenge: Write Your Own Function
Write a function `daily_summary(matrix)` that takes a NumPy matrix of shape (10, 24) and returns a summary dictionary for each day.

In [None]:
# TODO: Write and test your function
def daily_summary(matrix):
    # return list of dicts with min, max, mean
    max_temp = np.max(matrix, axis=1)
    min_temp = np.min(matrix, axis=1)
    mean_temp = np.mean(matrix, axis=1)
    summary = []
    for i in range(len(max_temp)):
        summary.append({
            'day': i + 1,
            'max_temp': max_temp[i],
            'min_temp': min_temp[i],
            'mean_temp': mean_temp[i]
        })
    return summary

# Call the function with the reshaped temperature matrix
daily_summary(shaped_temp)



[{'day': 1,
  'max_temp': np.float64(28.2),
  'min_temp': np.float64(14.7),
  'mean_temp': np.float64(21.263157894736846)},
 {'day': 2,
  'max_temp': np.float64(28.7),
  'min_temp': np.float64(15.7),
  'mean_temp': np.float64(21.258991228070176)},
 {'day': 3,
  'max_temp': np.float64(25.7),
  'min_temp': np.float64(13.6),
  'mean_temp': np.float64(21.304824561403507)},
 {'day': 4,
  'max_temp': np.float64(27.1),
  'min_temp': np.float64(15.9),
  'mean_temp': np.float64(21.425657894736844)},
 {'day': 5,
  'max_temp': np.float64(24.9),
  'min_temp': np.float64(12.4),
  'mean_temp': np.float64(21.52982456140351)},
 {'day': 6,
  'max_temp': np.float64(26.2),
  'min_temp': np.float64(15.5),
  'mean_temp': np.float64(21.858333333333334)},
 {'day': 7,
  'max_temp': np.float64(25.9),
  'min_temp': np.float64(15.3),
  'mean_temp': np.float64(21.17982456140351)},
 {'day': 8,
  'max_temp': np.float64(26.0),
  'min_temp': np.float64(13.5),
  'mean_temp': np.float64(20.947807017543862)},
 {'day': 9

## ✅ Submit your notebook once complete.
- Add comments where necessary