# Project 1: Weather Analyzer

For this project, you are going to write code to analyze the high and low temperatures near Grand Rapids over the past 30 years. In this workspace, you should have two data files:

* `grr_high_low_94_24.csv.gz` contains the actual high and low temperatures measured at the airport from October 1994 through early February 2024.
* `iona_mi_data.csv.gz` contains estimated high and low temperatures in a city near Grand Rapids. This data is similar to, but not identical to, the data measured at the airport.

Both of these files have been compressed in `gzip` format to reduce the space consumed on the PrairieLearn servers. Please do not store uncompressed versions of these files in your PrairieLearn workspace. The file `grr_sample.csv` contains the first 500 lines of the data file uncompressed, so you can see the structure of the file.

***Important!**  Take time to carefully check your work before submitting.
You are limited to <em>three</em> submissions per day. Do not use the PrairieLearn autograder 
as a substitute for preparing your own test cases.</p>

This three-submission-per-day limit is a policy, not a technical limit. In other words, 
  PrairieLearn will not prevent you from making additional submissions; but, doing so may lower your score.


### Task 1

Some of the tasks below will require you to determine which of two dates comes earlier or later in the year (e.g., 14 April or 3 May). Complete the two functions below.

(Python does have a `datetime` module; but, this library assumes that all dates have a year, which is not a valid assumption for the code you will be writing for this project. You are allowed to use `datetime` if you like; but, 
these functions can be implemented in five lines or fewer without it.)

In [11]:
#grade DO NOT REMOVE

# return True if m1/d1 comes earlier in the year than m2/d2, otherwise return False
# Do not use Python's datetime module

def is_date_before(m1: int, d1: int, m2: int, d2: int) -> bool:
    if m1 < m2:
        return True
    elif m1 == m2:
        return d1 < d2
    else:
        return False


# return True if m1/d1 comes later in the year than m2/d2, otherwise return False
# Do not use Python's datetime module

def is_date_after(m1: int, d1: int, m2: int, d2: int) -> bool:
    if m1 > m2:
        return True
    elif m1 == m2:
        return d1 > d2
    else:
        return False


In [12]:
# Use this block to test your code.
# (This block is not run by the auto-grader)
# Add more tests.
print(is_date_before(1, 1, 2, 1))

print(is_date_after(1, 1, 2, 1))


True
False


### Task 2

Write a function `all_high_above` that returns the list of all dates on which the high temperature was above the given threshold temperature. Dates should be strings in `mm/dd/yyyy` format.

The "starter code" in the function below shows how to read the csv data from a file that is compressed. You may either use this code as is or modify it if you like. 

**Important**: Please do not uncompress or modify the `.gz`files.  The uncompressed files are large and storing them would strain the PrairieLearn system.

In [13]:
import gzip
import csv

def display_csv_content(filename: str):
    with gzip.open(filename, 'rt') as csv_file:
        f = csv.reader(csv_file)
        for i, row in enumerate(f):
            if i >= 50:  # Print only the first 5 rows
                break
            print(row)

# Example usage:
display_csv_content('grr_high_low_94_24.csv.gz')


['STATION', 'NAME', 'DATE', 'TAVG', 'TMAX', 'TMIN', 'TOBS']
['USW00094860', 'GRAND RAPIDS GERALD R FORD INTERNATIONAL AIRPORT, MI US', '10/1/1994', '', '63', '43', '']
['USW00094860', 'GRAND RAPIDS GERALD R FORD INTERNATIONAL AIRPORT, MI US', '10/2/1994', '', '61', '36', '']
['USW00094860', 'GRAND RAPIDS GERALD R FORD INTERNATIONAL AIRPORT, MI US', '10/3/1994', '', '61', '36', '']
['USW00094860', 'GRAND RAPIDS GERALD R FORD INTERNATIONAL AIRPORT, MI US', '10/4/1994', '', '58', '44', '']
['USW00094860', 'GRAND RAPIDS GERALD R FORD INTERNATIONAL AIRPORT, MI US', '10/5/1994', '', '58', '40', '']
['USW00094860', 'GRAND RAPIDS GERALD R FORD INTERNATIONAL AIRPORT, MI US', '10/6/1994', '', '71', '40', '']
['USW00094860', 'GRAND RAPIDS GERALD R FORD INTERNATIONAL AIRPORT, MI US', '10/7/1994', '', '77', '53', '']
['USW00094860', 'GRAND RAPIDS GERALD R FORD INTERNATIONAL AIRPORT, MI US', '10/8/1994', '', '72', '51', '']
['USW00094860', 'GRAND RAPIDS GERALD R FORD INTERNATIONAL AIRPORT, MI US', '

In [14]:
#grade DO NOT REMOVE
import gzip
import csv

def all_high_above(filename: str, threshold: float) -> list[str]:
    dates_above_threshold = []
    
    with gzip.open(filename, 'rt') as file:
        reader = csv.reader(file)
        header = next(reader)  # Skip the header row
        
        for row in reader:
            date = row[2]
            high_temp = row[4]
            
            if high_temp and threshold < float(high_temp):
                date_parts = date.split('/')
                if len(date_parts) == 3:
                    month = date_parts[0].zfill(2)
                    day = date_parts[1].zfill(2)
                    year = date_parts[2]
                    date = f"{month}/{day}/{year}"
                    dates_above_threshold.append(date)
    return dates_above_threshold

In [15]:
# Use this block to test your code.
# (This block is not run by the auto-grader)
# Add more tests.
print(all_high_above('grr_high_low_94_24.csv.gz', 99))

['07/05/2012', '07/06/2012']


## Task 3

Write a function `minima_for_day` that returns the list of years during which the minimum for that date was observed. The return value is a list because the minimum may have been observed during several different years.

In [16]:
#grade DO NOT REMOVE
import gzip
import csv

def minima_for_day(filename: str, month: int, day: int) -> list[int]:
    min_temp = float('inf')  # Initialize min_temp to positive infinity
    years_with_min_temp = []

    with gzip.open(filename, 'rt') as csv_file:
        f = csv.reader(csv_file)
        next(f, None)  # Skip the header row
        for row in f:
            date_parts = row[2].split('/')  # Assuming the date is in the third column
            row_month = int(date_parts[0])
            row_day = int(date_parts[1])
            if row_month == month and row_day == day:
                try:
                    temperature = float(row[5])  # Assuming the temperature is in the sixth column
                except ValueError:
                    continue  # Skip rows where temperature conversion fails
                if temperature < min_temp:
                    min_temp = temperature
                    years_with_min_temp = [int(date_parts[2])]  # Extract the year from the date and convert to int
                elif temperature == min_temp:
                    years_with_min_temp.append(int(date_parts[2]))

    return years_with_min_temp if years_with_min_temp else []  # Return an empty list if no minimum temperature is found

In [17]:
# Use this block to test your code.
# (This block is not run by the auto-grader)
# Add your own tests.
# Test with provided data file
print(minima_for_day('grr_high_low_94_24.csv.gz', 10, 5))  # Assuming 10/1 is the date for testing

[2004]


## Task 4

Write a function `earliest_low_above` that returns the earliest date in any year that the low temperature was above <span style="font-family: monospace;">threshold</span> degrees. Dates should be strings in `mm/dd/yyyy` format.

****


In [18]:
#grade DO NOT REMOVE
import csv
import gzip
from datetime import datetime

def earliest_low_above(filename: str, threshold: float) -> str:
    earliest_date_str = None
    earliest_month_day = None
    
    with gzip.open(filename, 'rt') as csv_file:
        f = csv.reader(csv_file)
        next(f, None)  # Skip the header row
        
        above_threshold = []
        
        earliest_date_str = min(filter(lambda x: x['month_day'] == earliest_month_day, above_threshold), key=lambda x: x['date'])['date'].strftime('%m/%d/%Y')
        if above_threshold:
            earliest_month_day = min(above_threshold, key=lambda x: x['month_day'])['month_day']
            earliest_date_str = min(filter(lambda x: x['month_day'] == earliest_month_day, above_threshold), key=lambda x: x['date'])['date'].strftime('%m/%d/%Y')
            
    return earliest_date_str


In [19]:
# Test with provided data file and threshold 30
print(earliest_low_above('grr_high_low_94_24.csv.gz', 30))  # Expected output: '1/10/2020' (assuming it's the earliest date with low temperature above 30)

# Test with provided data file and threshold 40
print(earliest_low_above('grr_high_low_94_24.csv.gz', 40))  # Expected output: '1/7/2008' (assuming it's the earliest date with low temperature above 40)

# Test with provided data file and threshold 50
print(earliest_low_above('grr_high_low_94_24.csv.gz', 50))  # Expected output: '1/1/2006' (assuming it's the earliest date with low temperature above 50)

# Test with provided data file and threshold 60
print(earliest_low_above('grr_high_low_94_24.csv.gz', 60))  # Expected output: '1/1/1996' (assuming it's the earliest date with low temperature above 60)


01/01/2006
01/02/2004
01/07/2008
03/21/2012
