# Exploring Chicago Weather

Based on my work in the "Exploring NOAA GSOD Weather Stations" Jupyter notebook, I'll now retrieve data just for the Chicago Midway Airport weather station and do some analysis on that data.

Based on the prior work, I determined that the Chicago Midway Airport weather station has the unique identifiers of `USAF` = 725340, `WBAN` = 14819 and that there are data files for years 1973 through 2024.

In [31]:
def getAllData(usaf, wban, beginYear, endYear):
    data = []
    for year in range(int(beginYear), int(endYear) + 1):
        requestUrl = f"https://www.ncei.noaa.gov/data/global-summary-of-the-day/access/{year}/{usaf}{wban}.csv"
        print(f"Retrieving data for {year}")
        # We need to cast FRSHTT column as a string because it looks like a string of zeroes and ones 
        # and would otherwise be interpreted as an integer
        data.append(pd.read_csv(requestUrl, dtype={'FRSHTT': 'string'}))
    return pd.concat(data)
        
gsod = getAllData(usaf=725340, wban=14819, beginYear=1973, endYear=2024)
# How many rows do we have?
print(f"Rows returned: {gsod.shape[0]}")

Retrieving data for 1973
Retrieving data for 1974
Retrieving data for 1975
Retrieving data for 1976
Retrieving data for 1977
Retrieving data for 1978
Retrieving data for 1979
Retrieving data for 1980
Retrieving data for 1981
Retrieving data for 1982
Retrieving data for 1983
Retrieving data for 1984
Retrieving data for 1985
Retrieving data for 1986
Retrieving data for 1987
Retrieving data for 1988
Retrieving data for 1989
Retrieving data for 1990
Retrieving data for 1991
Retrieving data for 1992
Retrieving data for 1993
Retrieving data for 1994
Retrieving data for 1995
Retrieving data for 1996
Retrieving data for 1997
Retrieving data for 1998
Retrieving data for 1999
Retrieving data for 2000
Retrieving data for 2001
Retrieving data for 2002
Retrieving data for 2003
Retrieving data for 2004
Retrieving data for 2005
Retrieving data for 2006
Retrieving data for 2007
Retrieving data for 2008
Retrieving data for 2009
Retrieving data for 2010
Retrieving data for 2011
Retrieving data for 2012


In [32]:
def cleanData(df):
    df = df[['DATE', 'NAME', 'TEMP', 'MIN', 'MAX', 'PRCP', 'SNDP', 'FRSHTT']]
    df = df[(df['MIN'] < 9999.9) & (df['MAX'] < 9999.9)].copy()
    df['FOG']     = df['FRSHTT'].apply(lambda s: bool(int(s, 2) & int('100000', 2)))
    df['RAIN']    = df['FRSHTT'].apply(lambda s: bool(int(s, 2) & int('010000', 2)))
    df['SNOW']    = df['FRSHTT'].apply(lambda s: bool(int(s, 2) & int('001000', 2)))
    df['HAIL']    = df['FRSHTT'].apply(lambda s: bool(int(s, 2) & int('000100', 2)))
    df['THUNDER'] = df['FRSHTT'].apply(lambda s: bool(int(s, 2) & int('000010', 2)))
    df['TORNADO'] = df['FRSHTT'].apply(lambda s: bool(int(s, 2) & int('000001', 2)))
    df = df.drop(columns=['FRSHTT'])    
    # For the `SNDP` and `PRCP` values, if there is no snow accumulation or rain precipitation 
    # then 999.9 or 99.99 (respectively) may appear as a value. For my purposes, it would be
    # more helpful if these showed up as 0.
    df.loc[df['SNDP'] == 999.9, 'SNDP'] = 0
    df.loc[df['PRCP'] == 99.99, 'PRCP'] = 0
    df['DATE'] = pd.to_datetime(df['DATE'])
    df['YEAR'] = df['DATE'].dt.year
    df['MONTH'] = df['DATE'].dt.month
    df['DAY_OF_MONTH'] = df['DATE'].dt.day
    df['DAY_OF_YEAR'] = df['DATE'].dt.dayofyear
    return df

gsod = cleanData(gsod)
print(gsod.head(10))

        DATE                           NAME  TEMP   MIN   MAX  PRCP  SNDP  \
0 1973-01-01  CHICAGO MIDWAY AIRPORT, IL US  25.2  19.9  30.0  0.00   0.0   
1 1973-01-02  CHICAGO MIDWAY AIRPORT, IL US  22.0  14.0  32.0  0.00   0.0   
2 1973-01-03  CHICAGO MIDWAY AIRPORT, IL US  29.3  21.9  39.0  0.00   0.0   
3 1973-01-04  CHICAGO MIDWAY AIRPORT, IL US  27.0  17.1  41.0  0.63   0.0   
4 1973-01-05  CHICAGO MIDWAY AIRPORT, IL US  11.5   8.1  16.0  1.38   0.0   
5 1973-01-06  CHICAGO MIDWAY AIRPORT, IL US  11.0   6.1  19.0  0.00   0.0   
6 1973-01-07  CHICAGO MIDWAY AIRPORT, IL US  17.7  12.9  23.0  1.50   0.0   
7 1973-01-08  CHICAGO MIDWAY AIRPORT, IL US  22.7  19.9  26.1  0.00   0.0   
8 1973-01-09  CHICAGO MIDWAY AIRPORT, IL US  13.2   8.1  21.0  0.00   0.0   
9 1973-01-10  CHICAGO MIDWAY AIRPORT, IL US   7.4   0.0  19.0  0.00   0.0   

     FOG   RAIN   SNOW   HAIL  THUNDER  TORNADO  YEAR  MONTH  DAY_OF_MONTH  \
0  False  False   True  False    False    False  1973      1             1

## Weather Questions

I've got some 🔥BURNING🔥 questions about the trends of weather in Chicago, and especially about our decline into 🥶COLD DAYS🥶. 

1. How often have we had snowfall on Christmas? How much was there?
2. How often has it been above 40℉ on Christmas?
3. How often have we had Decembers where the temperature reaches at least 50℉ for more than one day in a row? How long is that streak?
4. What's the last day in December when the temperature was at least 50℉?
5. For each year, and for each month of the winter, what was the biggest temperature differential in a 2-day period?
6. As we approach winter, it feels like it sharply declines from warm temperatures (> 70℉) to cold temperatures (< 32℉). When does the first sub-freezing (< 32℉) temperature happen each Fall/Winter, and how long before that day was it still warm (> 70℉)?
7. Every year it feels like we have a very short Fall in Chicago, which I would characterize as a time during which the temperatures exist between 32℉ and 70℉. How long do the temperatures exist in this range?
8. When's the last Fall day (after September 22) when the temperature is at least 70℉?
9. When's the first Fall day (after September 22) that the temperature drops to below freezing?
10. When's the first Fall day (after September 22) that the temperature never gets above freezing?

Having grown up in Doylestown, Pennsylvania I'm used to different winter weather. I want to also compare all of these answers to what has historically happened around my hometown and see if really is harsher in Chicago or if that's just my imagination.

### 1. How often have we had snowfall on Christmas? How much was there?

In [33]:
gsod[(gsod['MONTH'] == 12) & (gsod['DAY_OF_MONTH'] == 25) & gsod['SNOW'] & (gsod['SNDP'] > 0)][['DATE', 'SNDP']]

Unnamed: 0,DATE,SNDP
358,1977-12-25,2.0
358,1978-12-25,0.8
358,2002-12-25,3.1
358,2005-12-25,1.2
359,2008-12-25,5.1
358,2009-12-25,1.2
358,2010-12-25,7.1
358,2017-12-25,2.0


Out of 50 years, we've only had 8 years where there was accumulated snow on Christmas! 

In 2010 and 2008 we had very significant snowfall on Christmas. We haven't had snow on Christmas since 2017. 

### 2. How often has it been above 40℉ on Christmas?

In [34]:
gsod[(gsod['MONTH'] == 12) & (gsod['DAY_OF_MONTH'] == 25) & (gsod['MAX'] > 40)][['DATE', 'MAX']]

Unnamed: 0,DATE,MAX
358,1973-12-25,44.6
358,1978-12-25,41.0
358,1979-12-25,52.0
358,1982-12-25,62.1
358,1987-12-25,46.9
355,1991-12-25,42.1
358,1994-12-25,50.9
358,2006-12-25,43.0
358,2007-12-25,42.1
358,2009-12-25,43.0


### 3. How often have we had Decembers where the temperature reaches at least 50℉ for more than one day in a row? How long is that streak?

In [35]:
min_temp = 50
warm_days_by_year = gsod[(gsod['MONTH'] == 12) & (gsod['MAX'] >= min_temp)].groupby('YEAR')
for year, data in warm_days_by_year:
    longest_streak = 1
    current_streak = 1
    for i in range(1, len(data)):
        if data.iloc[i]['DAY_OF_MONTH'] == data.iloc[i - 1]['DAY_OF_MONTH'] + 1:
            current_streak += 1
        else:
            longest_streak = current_streak
            current_streak = 1
    print(f"Longest streak of >={min_temp}℉ for {year}: {longest_streak} days")

Longest streak of >=50℉ for 1973: 1 days
Longest streak of >=50℉ for 1975: 3 days
Longest streak of >=50℉ for 1976: 1 days
Longest streak of >=50℉ for 1977: 1 days
Longest streak of >=50℉ for 1979: 3 days
Longest streak of >=50℉ for 1980: 1 days
Longest streak of >=50℉ for 1982: 3 days
Longest streak of >=50℉ for 1984: 1 days
Longest streak of >=50℉ for 1985: 1 days
Longest streak of >=50℉ for 1987: 1 days
Longest streak of >=50℉ for 1988: 1 days
Longest streak of >=50℉ for 1990: 1 days
Longest streak of >=50℉ for 1991: 3 days
Longest streak of >=50℉ for 1992: 1 days
Longest streak of >=50℉ for 1993: 1 days
Longest streak of >=50℉ for 1994: 1 days
Longest streak of >=50℉ for 1995: 1 days
Longest streak of >=50℉ for 1996: 1 days
Longest streak of >=50℉ for 1997: 1 days
Longest streak of >=50℉ for 1998: 1 days
Longest streak of >=50℉ for 1999: 4 days
Longest streak of >=50℉ for 2001: 1 days
Longest streak of >=50℉ for 2002: 2 days
Longest streak of >=50℉ for 2003: 1 days
Longest streak o

### 4. What's the last day in December when the temperature was at least 50℉?

In [36]:
print(f"Last day in December with max temperature >= {min_temp}℉")
for year, data in warm_days_by_year:
    print(f"{data.iloc[-1]['YEAR']}, December {data.iloc[-1]['DAY_OF_MONTH']}: {data.iloc[-1]['MAX']}℉")

Last day in December with max temperature >= 50℉
1973, December 5: 55.4℉
1975, December 15: 66.0℉
1976, December 20: 54.0℉
1977, December 18: 60.1℉
1979, December 25: 52.0℉
1980, December 8: 53.1℉
1982, December 28: 61.0℉
1984, December 29: 64.9℉
1985, December 1: 50.0℉
1987, December 9: 54.0℉
1988, December 20: 57.0℉
1990, December 29: 50.0℉
1991, December 13: 60.1℉
1992, December 31: 52.9℉
1993, December 10: 52.0℉
1994, December 27: 52.0℉
1995, December 3: 57.9℉
1996, December 24: 50.0℉
1997, December 19: 51.8℉
1998, December 19: 51.8℉
1999, December 9: 53.6℉
2001, December 6: 64.4℉
2002, December 31: 55.9℉
2003, December 10: 55.4℉
2004, December 31: 57.2℉
2006, December 31: 59.0℉
2007, December 23: 52.0℉
2008, December 28: 54.0℉
2009, December 2: 50.0℉
2010, December 31: 55.9℉
2011, December 19: 51.1℉
2012, December 20: 50.0℉
2013, December 28: 51.1℉
2014, December 27: 51.1℉
2015, December 24: 57.9℉
2016, December 26: 55.0℉
2017, December 19: 51.1℉
2018, December 28: 54.0℉
2019, Dec

### 5. For each year, and for each month of the winter, what was the biggest temperature differential in a 2-day period?

In [37]:
# First let's get just the data for winter months, which we'll consider to be December - March
winter = gsod[gsod['MONTH'].isin([12, 1, 2, 3])].groupby(['YEAR', 'MONTH'])
for (year, month), group in winter:
    max_positive_diff = 0
    max_negative_diff = 0
    for i in range(1, len(group)):
        if group.iloc[i]['MAX'] - group.iloc[i-1]['MIN'] > max_positive_diff:
            max_positive_diff = group.iloc[i]['MAX'] - group.iloc[i-1]['MIN']
        if group.iloc[i]['MIN'] - group.iloc[i-1]['MAX'] < max_negative_diff:
            max_negative_diff = group.iloc[i]['MIN'] - group.iloc[i-1]['MAX']
    print(f"{month}/{year}")
    print(f"Max positive differential: {max_positive_diff}")
    print(f"Max negative differential: {max_negative_diff}")

1/1973
Max positive differential: 30.1
Max negative differential: -32.9
2/1973
Max positive differential: 23.9
Max negative differential: -29.0
3/1973
Max positive differential: 35.099999999999994
Max negative differential: -30.0
12/1973
Max positive differential: 34.9
Max negative differential: -32.4
1/1974
Max positive differential: 43.6
Max negative differential: -36.4
2/1974
Max positive differential: 41.0
Max negative differential: -34.0
3/1974
Max positive differential: 44.099999999999994
Max negative differential: -41.9
12/1974
Max positive differential: 31.200000000000003
Max negative differential: -27.9
1/1975
Max positive differential: 27.0
Max negative differential: -58.2
2/1975
Max positive differential: 38.0
Max negative differential: -35.1
3/1975
Max positive differential: 41.1
Max negative differential: -37.900000000000006
12/1975
Max positive differential: 39.900000000000006
Max negative differential: -43.0
1/1976
Max positive differential: 36.0
Max negative differentia

## First Fall day where the minimum temperature was < 32 F

In [38]:
minFreezingDays = gsod[(gsod['MIN'] < 32) & (gsod['MONTH'] > 8)]
# Group by YEAR and get the first DAY_OF_MONTH and MONTH values for each group
result1 = minFreezingDays.groupby('YEAR').agg({'MONTH': 'first', 'DAY_OF_MONTH': 'first'}).reset_index()

print(result1)

    YEAR  MONTH  DAY_OF_MONTH
0   1973     11             5
1   1974     10             2
2   1975     11            13
3   1976     10            22
4   1977     11            11
5   1978     10            24
6   1979     10            26
7   1980     10             5
8   1981     10            23
9   1982     10            23
10  1983     11             5
11  1984     11             2
12  1985     11            21
13  1986     11             2
14  1987     10            12
15  1988     10            13
16  1989     11             3
17  1990     11             8
18  1991     11             2
19  1992     10            19
20  1993     10            31
21  1994     11            19
22  1995     11             3
23  1996     10            31
24  1997     10            28
25  1998     11             7
26  1999     10            24
27  2000     11            11
28  2001     11            20
29  2002     11             1
30  2003     11             7
31  2004     10            17
32  2005  

## First Fall day where the maximum temperature was < 32 F (the whole day was below freezing)

In [39]:
maxFreezingDays = gsod[(gsod['MAX'] < 32) & (gsod['MONTH'] > 8)]
# Group by YEAR and get the first DAY_OF_MONTH and MONTH values for each group
result2 = maxFreezingDays.groupby('YEAR').agg({'MONTH': 'first', 'DAY_OF_MONTH': 'first'}).reset_index()

print(result2)

    YEAR  MONTH  DAY_OF_MONTH
0   1973     12            11
1   1974     12            18
2   1975     12             1
3   1976     11            29
4   1977     11            27
5   1978     12             9
6   1979     12             1
7   1980     12             3
8   1981     11            21
9   1982     11            13
10  1983     11            30
11  1984     12             4
12  1985     12             2
13  1986     11            12
14  1987     12            16
15  1988     12             1
16  1989     11            18
17  1990     12            23
18  1991     11             3
19  1992     12             5
20  1993     11            29
21  1994     12            11
22  1995     11            12
23  1996     11            12
24  1997     11            16
25  1998     12            22
26  1999     12            21
27  2000     11            21
28  2001     12            24
29  2002     11            25
30  2003     12            12
31  2004     12            14
32  2005  

## Last Fall day where the maximum temperature was > 70 F

In [40]:
maxWarmDays = gsod[(gsod['MAX'] > 70) & (gsod['MONTH'] > 8)]
# Group by YEAR and get the first DAY_OF_MONTH and MONTH values for each group
result3 = maxWarmDays.groupby('YEAR').agg({'MONTH': 'last', 'DAY_OF_MONTH': 'last'}).reset_index()

print(result3)

    YEAR  MONTH  DAY_OF_MONTH
0   1973     10            25
1   1974     11             3
2   1975     11            18
3   1976     10            13
4   1977     11             4
5   1978     11             6
6   1979     11             1
7   1980     10            16
8   1981     10            30
9   1982     10             8
10  1983     10            28
11  1984     10            28
12  1985     10            24
13  1986     10            22
14  1987     11             3
15  1988     10            16
16  1989     11            14
17  1990     11            15
18  1991     10            29
19  1992     10            24
20  1993     10            25
21  1994     10            22
22  1995     10            23
23  1996     10            17
24  1997     10            13
25  1998     10            27
26  1999     11            18
27  2000     11             1
28  2001     11             7
29  2002     10            12
30  2003     10            30
31  2004     10            30
32  2005  