# Week 1 - Day 3 Lab: Data & Matrix Manipulation
In this lab, you'll work with a realistic weather dataset. You'll use **Pandas** to explore and clean the data, and **NumPy** to perform matrix operations.

**Dataset:** `hourly_weather_10_days.csv` (10 days of hourly weather data)

## Step 1: Load the Data
- Use Pandas to load the CSV file
- Display the first few rows
- Check the number of rows and columns

In [13]:
# TODO: Load the data into a DataFrame
import pandas as pd

# Replace the file path if needed
df = pd.read_csv('hourly_weather_10_days.csv')
df.head()

Unnamed: 0,timestamp,temperature_C,humidity_%,wind_speed_kmph,pressure_hPa,visibility_km
0,2023-03-01 00:00:00,16.6,74.4,5.7,1012.5,9.5
1,2023-03-01 01:00:00,16.2,78.5,5.0,1012.1,10.3
2,2023-03-01 02:00:00,15.3,73.3,4.7,,11.1
3,2023-03-01 03:00:00,15.8,72.4,1.3,1005.0,8.9
4,2023-03-01 04:00:00,20.9,70.6,6.8,1016.3,9.8


## Step 2: Basic Exploration
- Check column names and data types
- Display basic statistics using `.describe()`
- Count missing values in each column

In [2]:
# TODO: Explore the DataFrame
print(df.info())
print(df.describe())
print(df.isna().sum())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 240 entries, 0 to 239
Data columns (total 6 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   timestamp        240 non-null    object 
 1   temperature_C    228 non-null    float64
 2   humidity_%       224 non-null    float64
 3   wind_speed_kmph  226 non-null    float64
 4   pressure_hPa     223 non-null    float64
 5   visibility_km    228 non-null    float64
dtypes: float64(5), object(1)
memory usage: 11.4+ KB
None
       temperature_C  humidity_%  wind_speed_kmph  pressure_hPa  visibility_km
count     228.000000  224.000000       226.000000    223.000000     228.000000
mean       21.315789   66.795982        10.105310   1011.884753       9.989474
std         3.421237    8.190300         3.940668      5.187080       1.022166
min        11.500000   47.800000         1.300000    998.100000       6.800000
25%        18.700000   61.075000         6.625000   1008.900000       9.275


## Step 3: Handle Missing Values
- Drop or fill missing values
- Justify your approach (e.g., fill with mean, forward fill, etc.)

In [12]:
#check if there are missing values
print('Columns having missing values:',df.isnull().sum())


# TODO: Fill missing values
# Example: df['column'] = df['column'].fillna(df['column'].mean())
df['temperature_C'] = df['temperature_C'].fillna(df['temperature_C'].mean())
df['humidity_%'] = df['humidity_%'].fillna(df['humidity_%'].mean())
df['wind_speed_kmph'] = df['wind_speed_kmph'].fillna(df['wind_speed_kmph'].mean())
df['pressure_hPa'] = df['pressure_hPa'].fillna(df['pressure_hPa'].mean())
df['visibility_km'] = df['visibility_km'].fillna(df['visibility_km'].mean())


print('after removing and filling missing values, if Columns having missing values:',df.isnull().sum())

# Fill in your logic here
#this is another way for filling the missing values, in all over the dataframe
df2 = df.fillna(df.mean(numeric_only=True), inplace=False)

Columns having missing values: timestamp          0
temperature_C      0
humidity_%         0
wind_speed_kmph    0
pressure_hPa       0
visibility_km      0
dtype: int64
after removing and filling missing values, if Columns having missing values: timestamp          0
temperature_C      0
humidity_%         0
wind_speed_kmph    0
pressure_hPa       0
visibility_km      0
dtype: int64


Unnamed: 0,0
timestamp,0
temperature_C,0
humidity_%,0
wind_speed_kmph,0
pressure_hPa,0
visibility_km,0


## Step 4: Data Analysis
- Calculate daily average temperature
- Find max, min, mean for each metric
- Which hour of the day is the most humid on average?

In [35]:
# TODO: Perform analysis
# Use groupby, aggregation, and filtering functions
# Placeholder example:
# df['timestamp'] = pd.to_datetime(df['timestamp'])
# df['hour'] = df['timestamp'].dt.hour
# avg_humidity_by_hour = df.groupby('hour')['humidity_%'].mean()

#average of temperature column
average = df['temperature_C'].mean()
print('TASK 01: average for the temprature column is:', average)

#Find max, min, mean for each metric
print("\n\nTASK 02: FINING MAX, MIN, MEAN FOR EVERY COLUMN")
summary = df.describe().loc[['min', 'mean', 'max']]
print(summary)

print("\n\nTASK 03: the day with most humid on average")
df['timestamp'] = pd.to_datetime(df['timestamp'])
df['hour'] = df['timestamp'].dt.hour
avg_humidity_by_hour = df.groupby('hour')['humidity_%'].mean()
print(avg_humidity_by_hour)

TASK 01: average for the temprature column is: 21.31578947368421


TASK 02: FINING MAX, MIN, MEAN FOR EVERY COLUMN
                timestamp  temperature_C  humidity_%  wind_speed_kmph  \
min   2023-03-01 00:00:00      11.500000   47.800000          1.30000   
mean  2023-03-05 23:30:00      21.315789   66.795982         10.10531   
max   2023-03-10 23:00:00      28.700000   88.100000         17.80000   

      pressure_hPa  visibility_km  hour           timestamp1  
min     998.100000       6.800000   0.0  2023-03-01 00:00:00  
mean   1011.884753       9.989474  11.5  2023-03-05 23:30:00  
max    1027.000000      12.600000  23.0  2023-03-10 23:00:00  


TASK 03: the day with most humid on average
hour
0     78.170000
1     78.420000
2     75.414286
3     71.940000
4     69.310000
5     68.611111
6     65.770000
7     65.044444
8     63.490000
9     59.650000
10    58.710000
11    58.910000
12    59.422222
13    58.330000
14    61.366667
15    60.888889
16    59.600000
17    64.030000
1

## Step 5: NumPy Matrix Exercises
Convert relevant DataFrame columns into NumPy arrays and perform matrix operations.

In [39]:
# TODO: Extract temperature and wind_speed as NumPy arrays
import numpy as np

temp = df['temperature_C'].to_numpy()
wind = df['wind_speed_kmph'].to_numpy()

numpy.ndarray

### a) Reshape into matrix form
- Assume each row is a day
- Reshape temperature into a (10, 24) matrix
- Calculate daily min, max, and mean using axis-based operations

In [49]:
# TODO: Reshape and aggregate
# Hint: temp_matrix = temp.reshape((10, 24))
# Write functions to find min, max, mean across rows

temp_matrix = temp.reshape((10, 24))


daily_min = np.nanmin(temp_matrix, axis=1)
daily_max = np.nanmax(temp_matrix, axis=1)
daily_mean = np.nanmean(temp_matrix, axis=1)

print('Min: ',daily_min)
print('Max: ',daily_max)
print('Mean: ',daily_mean)

Min:  [14.7 15.7 13.6 15.9 12.4 15.5 15.3 13.5 14.3 11.5]
Max:  [28.2 28.7 25.7 27.1 24.9 26.2 25.9 26.  27.1 28.5]
Mean:  [21.26086957 21.25652174 21.30434783 21.43043478 21.53913043 21.85833333
 21.17391304 20.8952381  20.76956522 21.62272727]


### b) Normalize the temperature matrix
- Subtract the mean and divide by std deviation
- Do it manually using NumPy functions

In [50]:
# TODO: Normalize temp_matrix
# Placeholder for function: def normalize(matrix):
# return ...

# Apply it to temp_matrix
def normalize(matrix):
    mean = np.nanmean(matrix)
    std = np.nanstd(matrix)

    return (matrix - mean) / std

normalized_temp_matrix = normalize(temp_matrix)
print(normalized_temp_matrix)

[[-1.38142018 -1.49859422 -1.76223579 -1.61576825 -0.12179932 -0.15109283
   0.43477733  0.34689681 -0.0339188   2.01662679         nan  1.25499557
   1.19640855  0.61053838  1.84086574  0.72771242  0.20042927  0.69841891
   0.52265786  0.22972278 -0.91272405 -0.56120195 -0.47332142 -1.93799684]
 [-1.58647474 -1.64506176 -1.0884851  -0.3854409   0.20042927 -0.0339188
   0.40548383  0.05396173  0.6691254   2.16309433 -0.23897336  1.10852803
   0.22972278  0.93276698  0.37619032  0.17113576  0.75700593         nan
   0.84488645  0.20042927 -0.41473441 -0.736963   -1.41071369 -0.97131107]
 [-2.26022543 -1.64506176 -0.82484352 -1.58647474  1.28428908 -0.12179932
  -0.15109283 -0.18038634  0.28830979  1.28428908  1.0206475  -0.0339188
   1.22570206  0.17113576  0.3176033   0.99135399  0.90347347         nan
   0.58124488  0.90347347  0.40548383 -0.12179932 -0.79555002 -1.73294228]
 [-1.35212667 -0.76625651 -1.00060457 -1.55718123  0.05396173  0.22972278
   1.69439819  0.69841891  0.17113576

### c) Apply custom mask/filter
- Create a mask for wind speed > 15 kmph
- Use it to extract high-wind readings

In [60]:
# TODO: Create boolean mask and filter wind speeds
# mask = wind > 15
# high_wind = wind[mask]
mask = df['wind_speed_kmph'] > 15
high_wind = wind[mask]
print('These are the values having speed > 15:\n',high_wind)


These are the values having speed > 15:
 [17.6 16.  16.5 16.3 16.7 15.8 17.8 15.1 16.3 15.2 17.  15.9 15.6 15.8
 15.4 15.6 16.3 15.3 16.2 16.9 15.3 15.2 15.5 17.4 17.4 15.4 15.4 16.5
 17.  15.7]


## Final Challenge: Write Your Own Function
Write a function `daily_summary(matrix)` that takes a NumPy matrix of shape (10, 24) and returns a summary dictionary for each day.

In [63]:
# TODO: Write and test your function
# Example usage:
# summaries = daily_summary(temp_matrix)
def daily_summary(matrix):
    summary_list = []

    for day in matrix:
        day_summary = {
            'min': np.nanmin(day),
            'max': np.nanmax(day),
            'mean': np.nanmean(day)
        }
        summary_list.append(day_summary)

    return summary_list

#calling the function
summaries = daily_summary(temp_matrix)

# Print the summary for each day
for i, summary in enumerate(summaries):
    print(f"Day {i + 1} Summary: {summary}")


Day 1 Summary: {'min': np.float64(14.7), 'max': np.float64(28.2), 'mean': np.float64(21.260869565217394)}
Day 2 Summary: {'min': np.float64(15.7), 'max': np.float64(28.7), 'mean': np.float64(21.256521739130434)}
Day 3 Summary: {'min': np.float64(13.6), 'max': np.float64(25.7), 'mean': np.float64(21.304347826086957)}
Day 4 Summary: {'min': np.float64(15.9), 'max': np.float64(27.1), 'mean': np.float64(21.430434782608696)}
Day 5 Summary: {'min': np.float64(12.4), 'max': np.float64(24.9), 'mean': np.float64(21.539130434782606)}
Day 6 Summary: {'min': np.float64(15.5), 'max': np.float64(26.2), 'mean': np.float64(21.858333333333334)}
Day 7 Summary: {'min': np.float64(15.3), 'max': np.float64(25.9), 'mean': np.float64(21.17391304347826)}
Day 8 Summary: {'min': np.float64(13.5), 'max': np.float64(26.0), 'mean': np.float64(20.895238095238096)}
Day 9 Summary: {'min': np.float64(14.3), 'max': np.float64(27.1), 'mean': np.float64(20.769565217391307)}
Day 10 Summary: {'min': np.float64(11.5), 'max'

## ✅ Submit your notebook once complete.
- Add comments where necessary