# GEOS 505: Problem Set 1

__Instructions__: Complete the two problems below. One is a coding problem, where you will review and comment existing code. Where you are asked to provide descriptive text and answer questions, please do so via well formatted Markdown cells below the problem.  

__Due Date and Time__: September 19, 2025 at 5:00 PM MT

__Turn In Via__: Commit and push your complete notebook to your personal GitHub repository for the class, and submit the URL for notebook via Canvas. 

## Problem 1

### 1. Problem Background:

As a new graduate student, your research is examining the performance of 5-day precipitation forecasts at a number of Snotel sites in the Upper Boise River Basin. The forecasts are generated by a research center at a university in the western United States using the [Weather Research and Forecasting](https://www2.mmm.ucar.edu/wrf/users/) model. They run a 360 hr (i.e., 15 day) forecast using WRF every day at the top of every hour for the entire western United States, and puts the output you need on an Amazon Web Services (AWS) S3 bucket that is freely available.  

File pattern:
WRF_fcst_YYYYMMDD_HHz_VVV.nc

Where:
- YYYY is the year of the forecast date
- MM is the month of the forecast date
- DD is the day of the forecast
- HH is the hour the forecast was initiated (0-23)
- VVV is the valid hour of the forecast from the time of initiation in hours (0-360 hr in 3 hr increments). So 000 is the initial hour of the forecast, 003 would be the 3rd hour, 006 the 6th, and so forth.
- 'WRF_fcst_' is the prefix of the file name
- '.nc' is the file extension (this stands for NetCDF file - a real file format)

### 2. Instructions: 

1. Write a function that takes as input the forecast year, month, day, and hour, as well as a maximum valid hour and returns a list of file names that contain the forecast data
2. Test your code for three days (2025-03-01, 2025-04-01, and 2025-05-01), for forecast hour 6, and a maximum valid hour of 120 hours. Print the results of the returned list to the screen and verify manually. 

### 3. Concepts Assessed:

- String handling
- For loops
- Basic functions

In [1]:
import numpy as np

In [None]:
def first_create_file_names(year:str, month:str, day:str, max_valid_hr:int, file_extension:str) -> list:
    """
    This function creates a string that represents a file name for valid hours of the Weather Research and forecasting model (WRF).
    This is version of the function uses a while loop to create the list of valid hours. 

    Parameters:
        year (string): YYYY
        month (string): MM
        day (string): DD
        max_valid_hr (integer): number between 0-360
        file_extension (string): computer readable file extension without a leading period.

    Returns:
        List[string]: a list of string representing file names.

    Raises:
        ValueError: none
        TypeError: none
    """
    file_list = []
    valid_hr = 0
    fx_hr = 6
    vvv_list = [0]
    while valid_hr < max_valid_hr:
        valid_hr += 3
        vvv_list.append(valid_hr)
    for i in vvv_list:
        file_name = f"WRF_fcst_{year}{month}{day}_{fx_hr:02d}z_{i:03d}{file_extension}"
        file_list.append(file_name)
    return file_list

In [None]:
def second_create_file_names(year:str, month:str, day:str, max_valid_hr:float, file_extension:str) -> list:
    """
    This function creates a string that represents a file name for valid hours of the Weather Research and forecasting model (WRF).
    This is version of the function uses numpy linspace to create the list of valid hours.

    Parameters:
        year (string): YYYY
        month (string): MM
        day (string): DD
        max_valid_hr (float or int or np.array): number between 0-360
        file_extension (string): computer readable file extension without a leading period.

    Returns:
        List[string]: a list of string representing file names.

    Raises:
        ValueError: none
        TypeError: none
    """
    file_list = []
    valid_hr = 0
    fx_hr = 6
    vvv_list = np.linspace(0,max_valid_hr,num=40) # how make this multiples of 3 using modulo?

    for i in vvv_list:
        file_name = f"WRF_fcst_{year:4d}{month:02d}{day:02d}_{fx_hr:02d}z_{fx_hr:03d}z_{i:.0f}.{file_extension}"
        file_list.append(file_name)
    return file_list

In [None]:
file_name_list = second_create_file_names("2025", "03", "01", 120, "nc")
print(*file_name_list, sep='\n')

WRF_fcst_20250301_06z_006z_0.nc
WRF_fcst_20250301_06z_006z_3.nc
WRF_fcst_20250301_06z_006z_6.nc
WRF_fcst_20250301_06z_006z_9.nc
WRF_fcst_20250301_06z_006z_12.nc
WRF_fcst_20250301_06z_006z_15.nc
WRF_fcst_20250301_06z_006z_18.nc
WRF_fcst_20250301_06z_006z_22.nc
WRF_fcst_20250301_06z_006z_25.nc
WRF_fcst_20250301_06z_006z_28.nc
WRF_fcst_20250301_06z_006z_31.nc
WRF_fcst_20250301_06z_006z_34.nc
WRF_fcst_20250301_06z_006z_37.nc
WRF_fcst_20250301_06z_006z_40.nc
WRF_fcst_20250301_06z_006z_43.nc
WRF_fcst_20250301_06z_006z_46.nc
WRF_fcst_20250301_06z_006z_49.nc
WRF_fcst_20250301_06z_006z_52.nc
WRF_fcst_20250301_06z_006z_55.nc
WRF_fcst_20250301_06z_006z_58.nc
WRF_fcst_20250301_06z_006z_62.nc
WRF_fcst_20250301_06z_006z_65.nc
WRF_fcst_20250301_06z_006z_68.nc
WRF_fcst_20250301_06z_006z_71.nc
WRF_fcst_20250301_06z_006z_74.nc
WRF_fcst_20250301_06z_006z_77.nc
WRF_fcst_20250301_06z_006z_80.nc
WRF_fcst_20250301_06z_006z_83.nc
WRF_fcst_20250301_06z_006z_86.nc
WRF_fcst_20250301_06z_006z_89.nc
WRF_fcst_20250

In [28]:
file_name_list = second_create_file_names("2025", "04", "01", 120, "nc")
print(*file_name_list, sep='\n')

WRF_fcst_20250401_06z_006z_0.nc
WRF_fcst_20250401_06z_006z_3.nc
WRF_fcst_20250401_06z_006z_6.nc
WRF_fcst_20250401_06z_006z_9.nc
WRF_fcst_20250401_06z_006z_12.nc
WRF_fcst_20250401_06z_006z_15.nc
WRF_fcst_20250401_06z_006z_18.nc
WRF_fcst_20250401_06z_006z_22.nc
WRF_fcst_20250401_06z_006z_25.nc
WRF_fcst_20250401_06z_006z_28.nc
WRF_fcst_20250401_06z_006z_31.nc
WRF_fcst_20250401_06z_006z_34.nc
WRF_fcst_20250401_06z_006z_37.nc
WRF_fcst_20250401_06z_006z_40.nc
WRF_fcst_20250401_06z_006z_43.nc
WRF_fcst_20250401_06z_006z_46.nc
WRF_fcst_20250401_06z_006z_49.nc
WRF_fcst_20250401_06z_006z_52.nc
WRF_fcst_20250401_06z_006z_55.nc
WRF_fcst_20250401_06z_006z_58.nc
WRF_fcst_20250401_06z_006z_62.nc
WRF_fcst_20250401_06z_006z_65.nc
WRF_fcst_20250401_06z_006z_68.nc
WRF_fcst_20250401_06z_006z_71.nc
WRF_fcst_20250401_06z_006z_74.nc
WRF_fcst_20250401_06z_006z_77.nc
WRF_fcst_20250401_06z_006z_80.nc
WRF_fcst_20250401_06z_006z_83.nc
WRF_fcst_20250401_06z_006z_86.nc
WRF_fcst_20250401_06z_006z_89.nc
WRF_fcst_20250

In [29]:
file_name_list = second_create_file_names("2025", "05", "01", 120, "nc")
print(*file_name_list, sep='\n')

WRF_fcst_20250501_06z_006z_0.nc
WRF_fcst_20250501_06z_006z_3.nc
WRF_fcst_20250501_06z_006z_6.nc
WRF_fcst_20250501_06z_006z_9.nc
WRF_fcst_20250501_06z_006z_12.nc
WRF_fcst_20250501_06z_006z_15.nc
WRF_fcst_20250501_06z_006z_18.nc
WRF_fcst_20250501_06z_006z_22.nc
WRF_fcst_20250501_06z_006z_25.nc
WRF_fcst_20250501_06z_006z_28.nc
WRF_fcst_20250501_06z_006z_31.nc
WRF_fcst_20250501_06z_006z_34.nc
WRF_fcst_20250501_06z_006z_37.nc
WRF_fcst_20250501_06z_006z_40.nc
WRF_fcst_20250501_06z_006z_43.nc
WRF_fcst_20250501_06z_006z_46.nc
WRF_fcst_20250501_06z_006z_49.nc
WRF_fcst_20250501_06z_006z_52.nc
WRF_fcst_20250501_06z_006z_55.nc
WRF_fcst_20250501_06z_006z_58.nc
WRF_fcst_20250501_06z_006z_62.nc
WRF_fcst_20250501_06z_006z_65.nc
WRF_fcst_20250501_06z_006z_68.nc
WRF_fcst_20250501_06z_006z_71.nc
WRF_fcst_20250501_06z_006z_74.nc
WRF_fcst_20250501_06z_006z_77.nc
WRF_fcst_20250501_06z_006z_80.nc
WRF_fcst_20250501_06z_006z_83.nc
WRF_fcst_20250501_06z_006z_86.nc
WRF_fcst_20250501_06z_006z_89.nc
WRF_fcst_20250

## Problem 2

### 1. Problem Background

You just started grad school and inhereted some code from the final chapter of a previous student's thesis. That last chapter was completed in about 1 month and your advisor has said they want you to follow up on the precipitation analysis the student has done with a new version of the precipitation dataset that was just released. 

As you dig through their code to make it work, you realize it's not commented well at all. Additionally, the student is now working Goldman Sachs working on CliFi (climate finance) problems and, well, you can't afford their hourly rate to get some help! In order to get the data, you would need to modify the name of the files, because the revised dataset uses a different naming convention. The previous student's code to create a list of the names of files to download and analyze is below. 

### 2. Instructions

1. In your own words, what does the code do? Is it correct? 
2. Go through the old student's code and add comments describing what the code is doing. Comments should be thorough enough that they're helpful for you, but also the student that comes after you. 
3. Are there some things in the code that you would do to edit the code for clarity (i.e., to make it more readible)? Make a list of these things, but do not implement them. 

### 3. Concepts Assessed:

- Peer review
- Commenting code
- String handling
- For loops
- Functions

In [None]:
def IsLeapYear(year):
    
    if (year % 4 == 0): # if the year is divisible by 4 execute the next line
        if (year % 500 == 0): # if the year is divisible by 500 return True and exit the function, if this is false go to next line
            return True
        elif (year % 100 == 0): # if the year is divisible by 4 and 100 but not 500 return False and exit the function 
            return False
        else:
            return True # if the all the above are True return True and exit the function
    else:
        return False # if the year is not divisible by 4 return false and exit the function
    
def ReturnFileList(file_base, file_ext, start_yr, end_yr):
    # list with number of days in each month
    DiM = [31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31]
    # create new variable with value of the start_year
    year = start_yr
    #empty list for file_names
    files = []
    

    for i in range(end_yr - start_yr + 1): # iterate over the number of years in the dataset
        for j in range(12): # iterate over the number of months in a year
            if (j+1==2) and IsLeapYear(year): # decide if the index value is the second month of the year and the year is a leap year
                days_in_month = 29 # number of days february has during a leap year
            else:
                days_in_month = DiM[j] # if the year is not a leap use the number of days in the DiM list
            
            for k in range(days_in_month): # iterate over the number of days in february list (either 28 or 29)
                
                file_name = f'{file_base}-{year:4d}-{(j+1):02d}-{(k+1):02d}{file_ext}' # put the file name together
                files.append(file_name) # add the file name to the end of the files list
                
        year = year + 1 # assign the year variable to the next one on the list 
    
    return files

In [None]:
StartYr = 1998
EndYr = 2000

file_list = ReturnFileList('precip','.nc',StartYr,EndYr)

print(str(len(file_list))+' files')
print(*file_list, sep='\n')



1096 files


Your Markdown answers go here...

2.1.1 IsLeapYear takes one argument called year and returns a boolean. The function body contains nested if statements that controls code execution based on the result of modulo expressions. If the year argument is divisible by 4 AND 500 with remainder 0 the function returns True. If the number is divisible by 4 AND 100 with remainder 0 the function returns False. If the function is divisible only by 4 with remainder 0 the function returns True. If the function is not evenly divisble by 4 the function returns False.
2.1.2 ReturnFileList takes two string and two number arguments and returns a list of strings. The body of the function starts with a list of int assigned to variable DiM (a very opaque name) which represents the number of days in each month of a non-leap year. There are a couple of inialization variables for year and the empty file name list. Then theres a series of nested loops with logic to construct the file name string. The outer most loop iterates over the number of years between the startYear and endYear arguments, adding an additional value at the end because the range function is not inclusive. The second loop inwards iterates over a range of 12. Inside this for loop is an if statement that returns True if the index value j is exactly equal to 2 AND the IsLeapYear function returns True for the local year variable then a new variable days_in_month variable is initialized and assigned a value of 29. When the if statement returns False the days_in_month variable is assigned the value of the locally defined DiM at index j. The third loop iterates over the index range in whatever day_in_month value is assigned above. This loop complies the file name f-string with formatting and append the f-string to the files list. The now non-empty files list is returned.

These function seem to produce the correct outputs for the years given since the number of files in the list is 1096 and one of the years is a leap year 365 + 366 + 365 = 1096 BUT this function will give correct output for all years. The correct logic for the second if statement must be `if (year % 400 == 0):`. This sent me down a rabbit hole learning new things about leap years.

2.3.0 PEP8 Issues - function names and variables should be lowercase with underscorse. The arguments of both functions should have type declarations in the function signature and the return value should type checked, missing doc string. 
2.3.1 Other issues - writing verbose variable names reduces the need for commenting
- Change Dim to days_in_month
- create a variable above the loops called number_of_years and assign the expression `end_yr - start_yr + 1` and use this variable in the first loop `for year in range(number_of_years):`
- remove the year varialbe and just use startYear in the IsLeapYear function.
- change the index variable i to 'year' and j to 'month'
- change days_in_month to 'days_in_february' because that's the only month with different days in a leap year.
- change the index variable k to 'day'
