# **Assignment: Exploring Weather Data Using NumPy**

---

## Objective

This assignment focuses on analyzing weather data using **NumPy**. You will:
1. Create and manipulate datasets.
2. Perform statistical computations on NumPy arrays.
3. Solve real-world problems involving temperature data.

---

## Dataset Description

The dataset represents daily temperature records for one year (365 days). It consists of the following columns:
1. **Day of the Year**: An integer from 1 to 365.
2. **Minimum Temperature**: A random integer between -10 and 15 (in degrees Celsius).
3. **Maximum Temperature**: A random integer between 5 and 35 (in degrees Celsius).

To ensure uniformity across all submissions, we’ve fixed the random seed for data generation.

---

## Instructions

1. Use **only NumPy** for all computations.
2. Do **NOT** use loops unless explicitly stated.
3. Pass all visible and hidden test cases to complete the assignment.

---

## Tasks

### Task 1: Generate the Dataset

Write a function `generate_weather_data()` that:
1. Generates a NumPy array with the following columns:
   - Column 1: Days of the year (1 to 365).
   - Column 2: Minimum temperatures (random integers between -10 and 15).
   - Column 3: Maximum temperatures (random integers between 5 and 35).
2. Use a fixed random seed (`np.random.seed(42)`) to ensure reproducibility.
3. Return the generated array.

---

### Task 2: Basic Statistics

Write a function `basic_statistics(data)` that:
1. Accepts the weather data array from Task 1.
2. Computes and returns:
   - The average minimum temperature.
   - The average maximum temperature.
   - The highest temperature and the day it occurred.
   - The lowest temperature and the day it occurred.

---

### Task 3: Daily Temperature Range

Write a function `daily_temperature_range(data)` that:
1. Computes the daily temperature range (maximum - minimum) for all days.
2. Returns:
   - An array of daily ranges.
   - The day with the largest temperature range and its value.

---

### Task 4: Heatwave Identification

Write a function `identify_heatwaves(data)` that:
1. Defines a **heatwave** as three or more consecutive days where the maximum temperature is above 30°C.
2. Identifies and returns:
   - The total number of heatwaves.
   - A list of tuples, where each tuple represents the start and end day of a heatwave.

**Example**:
- If maximum temperatures on days 74, 75, and 76 are all above 30°C, it is a single heatwave: `(74, 76)`.
- A single day above 30°C is **not** a heatwave.

```Zaio.IO```

In [4]:
#### STUDENT CODE CELL

import numpy as np

def generate_weather_data():
    np.random.seed(42) # For reproducibility
    days = np.arange(1,366)
    min_temp = np.random.randint(-10, 16, size=365)
    max_temp = np.random.randint(5, 36, size=365)
    data = np.column_stack((days, min_temp, max_temp))
    return data
    raise NotImplementedError()

In [5]:
def basic_statistics(data):
    min_temps = data[:, 1]
    max_temps = data[:, 2]
    avg_min = min_temps.mean()
    avg_max = max_temps.mean()
    max_idx = np.argmax(max_temps)
    min_idx = np.argmin(min_temps)   
    max_temp = (max_temps[max_idx], data[max_idx, 0])  
    min_temp = (min_temps[min_idx], data[min_idx, 0])  
    return avg_min, avg_max, max_temp, min_temp
    raise NotImplementedError()

In [6]:
def daily_temperature_range(data):
    min_temps = data[:, 1]
    max_temps = data[:, 2]
    
    ranges = max_temps - min_temps
    idx_max = ranges.argmax()
    
    max_range_info = (int(data[idx_max, 0]), int(ranges[idx_max]))
    
    return ranges, max_range_info
    raise NotImplementedError()

In [7]:
def identify_heatwaves(data):
    max_temps = data[:, 2] # Extract max temperatures
    hot = max_temps > 30
    
    changes = np.diff(hot.astype(int))
    starts = np.where(changes == 1)[0] + 1
    ends = np.where(changes == -1)[0]
    
    if hot[0]: starts = np.insert(starts, 0, 0)
    if hot[-1]: ends = np.append(ends, len(hot) - 1)
    
    mask = (ends - starts + 1) >= 3
    heatwave_ranges = list(zip((data[starts[mask], 0]).astype(int),
                               (data[ends[mask], 0]).astype(int)))
    
    return len(heatwave_ranges), heatwave_ranges
    raise NotImplementedError()

In [8]:
# VISIBLE TEST CASES
data = generate_weather_data()

# Test Case 1: Dataset generation
assert data.shape == (365, 3),                                           "Test Case 1 Failed: Dataset shape incorrect."

# Test Case 2: Basic statistics
avg_min, avg_max, max_temp, min_temp = basic_statistics(data)
assert np.isclose(avg_min, 2.10, atol=0.01),                             "Test Case 2 Failed: Incorrect average minimum temperature."
assert np.isclose(avg_max, 20.89, atol=0.01),                            "Test Case 2 Failed: Incorrect average maximum temperature."

# Test Case 3: Daily temperature range
ranges, max_range_info = daily_temperature_range(data)
assert max_range_info == (364, 45),                                      "Test Case 3 Failed: Incorrect day or value for largest temperature range."

## Submission Instructions

1. Submit a Python script or notebook that includes:
   - Your implementations of all required functions.
   - Outputs for all test cases (visible and hidden).
2. Ensure your code passes **all visible and hidden test cases**.

---

## Grading Rubric

| Task                          | Points |
|-------------------------------|--------|
| Dataset generation            | 0     |
| Basic statistics              | 0     |
| Daily temperature range       | 0     |
| Heatwave identification       | 0     |
| Passing visible test cases    | 25     |
| Passing hidden test cases     | 25     |
| **Total**                     | **50**|