# Project 1: Weather Analyzer

For this project, you are going to write code to analyze the high and low temperatures near Grand Rapids over the past 30 years. In this workspace, you should have two data files:

* `grr_high_low_94_24.csv.gz` contains the actual high and low temperatures measured at the airport from October 1994 through early February 2024.
* `clarksville_mi_data.csv.gz` contains estimated high and low temperatures in a city near Grand Rapids. This data is similar to, but not identical to, the data measured at the airport.

Both of these files have been compressed in `gzip` format to reduce the space consumed on the PrairieLearn servers. Please do not store uncompressed versions of these files in your PrairieLearn workspace. The file `grr_sample.csv` contains the first 500 lines of the data file uncompressed, so you can see the structure of the file.

***Important!**  Take time to carefully check your work before submitting.
You are limited to <em>three</em> submissions per day. Do not use the PrairieLearn autograder 
as a substitute for preparing your own test cases.</p>

This three-submission-per-day limit is a policy, not a technical limit. In other words, 
  PrairieLearn will not prevent you from making additional submissions; but, doing so may lower your score.


### Task 1

Some of the tasks below will require you to determine which of two dates comes earlier or later in the year (e.g., 14 April or 3 May). Complete the two functions below.

(Python does have a `datetime` module; but, this library assumes that all dates have a year, which is not a valid assumption for the code you will be writing for this project. You are allowed to use `datetime` if you like; but, 
these functions can be implemented in five lines or fewer without it.)

In [42]:
#grade DO NOT REMOVE

# return True if m1/d1 comes earlier in the year than m2/d2, otherwise return False
# Do not use Python's datetime module
import calendar

def is_date_before(m1 :int, d1 :int , m2 : int, d2 :int) -> bool:
    if m2>m1:
        return True
    elif m1==m2:
        if d2>d1:
            return True
        
        else:
            return False
    else:
        return False
            

    


# return True if m1/d1 comes later in the year than m2/d2, otherwise return False
# Do not use Python's datetime module

def is_date_after(m1 :int, d1 :int , m2 : int, d2 :int) -> bool:
    if m1>m2:
        return True
    elif m1==m2:
        if d1>d2:
            return True
        
        else:
            return False
    else:
        return False
            


In [43]:
# Use this block to test your code.
# (This block is not run by the auto-grader)
# Add more tests.
print(is_date_before(1, 1, 2, 1))

print(is_date_after(1, 1, 2, 1))


True
False


In [44]:
print(is_date_before(2, 5, 2, 30))



True


In [45]:
print(is_date_after(2, 1, 2, 1))

False


### Task 2

Write a function `all_low_below` that returns the list of all dates on which the low temperature was below the given threshold temperature. Dates should be strings in `mm/dd/yyyy` format.

The "starter code" in the function below shows how to read the csv data from a file that is compressed. You may either use this code as is or modify it if you like. 

**Important**: Please do not uncompress or modify the `.gz`files.  The uncompressed files are large and storing them would strain the PrairieLearn system.

In [4]:
#grade DO NOT REMOVE
import gzip
import csv
from datetime import datetime

def all_low_below(filename: str, threshold: int | float) -> list[str]:
    with gzip.open(filename, 'rt') as csv_file:
        f = csv.reader(csv_file)
        next(f, None)  # skip the header row
        result = []
        for i in f:
            # Convert date string to datetime object
            date_str = i[-5]
            month1 = int(i[-5].split('/')[0])  
            day1 = int(i[-5].split('/')[1])   
            year1 = int(i[-5].split('/')[2])
            
            
            if int(i[-2]) < int(threshold):
               
                formatted_date=f'{month1:02d}/{day1:02d}/{year1:02d}'
                result.append(formatted_date)
    return result
      
            
        
        
      

In [5]:
# Use this block to test your code.
# (This block is not run by the auto-grader)
# Add more tests.
print(all_low_below('grr_high_low_94_24.csv.gz', -10))

['02/03/1996', '02/04/1996', '01/27/2003', '02/05/2007', '02/28/2014', '01/14/2015', '02/20/2015', '02/27/2015', '12/27/2017', '01/06/2018', '02/17/2021']


## Task 3

Write a function `minima_for_day` that returns the list of years during which the minimum for that date was observed. The return value is a list because the minimum may have been observed during several different years.

In [46]:
#grade DO NOT REMOVE
import gzip
import csv

def minima_for_day(filename: str, month: int, day: int) -> list[int]:
    with gzip.open(filename, 'rt') as csv_file:
        f = csv.reader(csv_file)
        next(f, None) # skip the header row
        ye = []
        temp = float('inf')
        for i in f:
            month1 = int(i[-5].split('/')[0])  
            day1 = int(i[-5].split('/')[1])   
            year1 = int(i[-5].split('/')[2])
            if month1 == month and day1 == day:
                if int(i[-2]) < temp:
                    temp = int(i[-2]) 
                    ye = [year1] 
                elif int(i[-2]) == temp:
                    ye.append(year1)  
    return ye





In [47]:
print(minima_for_day('grr_high_low_94_24.csv.gz',3,12))

[1999, 2014, 2017, 2022]


## Task 4

Write a function `earliest_high_below` that returns the earliest date in any year that the high temperature was below <span style="font-family: monospace;">threshold</span> degrees. Dates should be strings in `mm/dd/yyyy` format.

**Only consider dates on or after 1 July.**


In [6]:
#grade DO NOT REMOVE



import pandas as pd

def earliest_high_below(filename: str, threshold: int|float) -> str:
    # Read the CSV file into a DataFrame
    df = pd.read_csv(filename)
    
    # Convert the 'DATE' column to datetime format
    df['date'] = pd.to_datetime(df['DATE'])
    
    # Filter the DataFrame to include only dates on or after 1 July
    df = df[df['date'].dt.month >= 7]
    
    # Filter the DataFrame to include only high temperatures below the threshold
    below_threshold = df[df['TMAX'] < threshold].copy()
    
    # Create a new column for month and day
    below_threshold['month_day'] = below_threshold['date'].dt.strftime('%m/%d')
    
    # Find the earliest month and day when the high temperature was below threshold
    earliest_month_day = below_threshold['month_day'].min()
    
    # Find the earliest date corresponding to the earliest month and day
    earliest_date = below_threshold.loc[below_threshold['month_day'] == earliest_month_day, 'date'].min()
    
    # Convert the earliest date to string in mm/dd/yyyy format
    earliest_date_str = earliest_date.strftime('%m/%d/%Y')
    
    return earliest_date_str




In [7]:
# Use this block to test your code.
# (This block is not run by the auto-grader)
# Add your own tests.
earliest_high_below('grr_high_low_94_24.csv.gz', 110 )

'07/01/1995'