# Exploring Chicago Weather

Based on my work in the "Exploring NOAA GSOD Weather Stations" Jupyter notebook, I'll now retrieve data just for the Chicago Midway Airport weather station and do some analysis on that data. I'll also merge in cloudcover data I retrieved from Visual Crossing. 

Based on the prior work, I determined that the Chicago Midway Airport weather station has the unique identifiers of `USAF` = 725340, `WBAN` = 14819 and that there are data files for years 1973 through 2024. To match the data from Visual Crossing, I'll only retrieve years 2000 to 2024.

In [1]:
import pandas as pd

In [2]:
# This cloud cover data is retrieved from https://www.visualcrossing.com/weather/weather-data-services
cloud_cover = pd.read_csv('chicago, il 2000-01-01 to 2024-05-14 - CLOUD COVER.csv')
print(f"Retrieved {len(cloud_cover)} records.")
cloud_cover['datetime'] = pd.to_datetime(cloud_cover['datetime'])
cloud_cover = cloud_cover.rename(columns={'datetime': 'date'})
# Let's see what the last 100 days with less than 50% cloud cover were
cloud_cover[cloud_cover['cloudcover'] < 50][-100:]

Retrieved 8901 records.


Unnamed: 0,date,cloudcover
8574,2023-06-23,48.1
8575,2023-06-24,29.3
8584,2023-07-03,39.6
8585,2023-07-04,33.5
8590,2023-07-09,31.9
...,...,...
8881,2024-04-25,40.7
8886,2024-04-30,27.6
8894,2024-05-08,34.5
8897,2024-05-11,41.8


In [3]:
def getAllDataGSOD(usaf, wban, beginYear, endYear):
    data = []
    for year in range(int(beginYear), int(endYear) + 1):
        requestUrl = f"https://www.ncei.noaa.gov/data/global-summary-of-the-day/access/{year}/{usaf}{wban}.csv"
        print(f"Retrieving data for {year}")
        # We need to cast FRSHTT column as a string because it looks like a string of zeroes and ones 
        # and would otherwise be interpreted as an integer
        data.append(pd.read_csv(requestUrl, dtype={'FRSHTT': 'string'}))
    return pd.concat(data)
        
gsod = getAllDataGSOD(usaf = 725340, wban = 14819, beginYear = 2000, endYear = 2024)
# How many rows do we have?
print(f"Rows returned: {gsod.shape[0]}")

Retrieving data for 2000
Retrieving data for 2001
Retrieving data for 2002
Retrieving data for 2003
Retrieving data for 2004
Retrieving data for 2005
Retrieving data for 2006
Retrieving data for 2007
Retrieving data for 2008
Retrieving data for 2009
Retrieving data for 2010
Retrieving data for 2011
Retrieving data for 2012
Retrieving data for 2013
Retrieving data for 2014
Retrieving data for 2015
Retrieving data for 2016
Retrieving data for 2017
Retrieving data for 2018
Retrieving data for 2019
Retrieving data for 2020
Retrieving data for 2021
Retrieving data for 2022
Retrieving data for 2023
Retrieving data for 2024
Rows returned: 8900


## Data Cleaning

Now let's apply the same data cleaning we applied in the "Exploring NOAA GSOD Weather Stations" Jupyter notebook.

In [4]:
# I only care about these basic parameters
gsod = gsod[['DATE', 'NAME', 'TEMP', 'MIN', 'MAX', 'PRCP', 'SNDP', 'FRSHTT']]
gsod = gsod.rename(columns={'DATE': 'date', 
                            'NAME': 'station_name', 
                            'TEMP': 'avg_temp', 
                            'MIN': 'min_temp', 
                            'MAX': 'max_temp',
                            'PRCP': 'precipitation_depth',
                            'SNDP': 'snow_depth'})
# Ignore any observations where the temperature min/max was invalid
gsod = gsod[(gsod['min_temp'] < 9999.9) & (gsod['max_temp'] < 9999.9)].copy()

# FRSHTT is a binary bit mask indicating presence of Fog, Rain, Snow, Hail, Thunder, and Tornado
# gsod['fog']     = gsod['FRSHTT'].apply(lambda s: bool(int(s, 2) & int('100000', 2)))
gsod['rain']    = gsod['FRSHTT'].apply(lambda s: bool(int(s, 2) & int('010000', 2)))
gsod['snow']    = gsod['FRSHTT'].apply(lambda s: bool(int(s, 2) & int('001000', 2)))
# gsod['hail']    = gsod['FRSHTT'].apply(lambda s: bool(int(s, 2) & int('000100', 2)))
# gsod['thunder'] = gsod['FRSHTT'].apply(lambda s: bool(int(s, 2) & int('000010', 2)))
# gsod['tornado'] = gsod['FRSHTT'].apply(lambda s: bool(int(s, 2) & int('000001', 2)))
gsod = gsod.drop(columns=['FRSHTT'])

# For the `SNDP` and `PRCP` values, if there is no snow accumulation or rain precipitation 
# then 999.9 or 99.99 (respectively) may appear as a value. For my purposes, it would be
# more helpful if these showed up as 0.
gsod.loc[gsod['snow_depth'] == 999.9, 'snow_depth'] = 0
gsod.loc[gsod['precipitation_depth'] == 99.99, 'precipitation_depth'] = 0

# The DATE Field is a string and needs to be parsed, and then it will be helpful for filtering
# to break out the components of the date into separate fields.
gsod['date'] = pd.to_datetime(gsod['date'])
gsod['year'] = gsod['date'].dt.year
gsod['month'] = gsod['date'].dt.month
gsod['day_of_month'] = gsod['date'].dt.day
gsod['day_of_year'] = gsod['date'].dt.dayofyear

# Let's merge in the cloudcover data
merged_df = pd.merge(gsod, cloud_cover[['date', 'cloudcover']], on='date', how='left')
gsod = merged_df

print(gsod.head(10))

        date                   station_name  avg_temp  min_temp  max_temp  \
0 2000-01-01  CHICAGO MIDWAY AIRPORT, IL US      40.5      34.0      50.0   
1 2000-01-02  CHICAGO MIDWAY AIRPORT, IL US      50.5      43.0      61.0   
2 2000-01-03  CHICAGO MIDWAY AIRPORT, IL US      38.7      37.0      45.0   
3 2000-01-04  CHICAGO MIDWAY AIRPORT, IL US      30.6      26.6      37.4   
4 2000-01-05  CHICAGO MIDWAY AIRPORT, IL US      21.8      15.8      26.6   
5 2000-01-06  CHICAGO MIDWAY AIRPORT, IL US      31.3      25.0      43.0   
6 2000-01-07  CHICAGO MIDWAY AIRPORT, IL US      27.8      23.0      34.0   
7 2000-01-08  CHICAGO MIDWAY AIRPORT, IL US      34.4      26.6      46.4   
8 2000-01-09  CHICAGO MIDWAY AIRPORT, IL US      39.1      37.0      41.0   
9 2000-01-10  CHICAGO MIDWAY AIRPORT, IL US      43.1      41.0      46.9   

   precipitation_depth  snow_depth   rain   snow  year  month  day_of_month  \
0                 0.00         0.0  False  False  2000      1            

## Weather Questions

I've got some 🔥BURNING🔥 questions about the trends of weather in Chicago, and especially about our decline into 🥶COLD DAYS🥶. 

1. How often have we had snowfall on Christmas? How much was there?
2. How often has it been above 40℉ on Christmas?
3. How often have we had Decembers where the temperature reaches at least 50℉ for more than one day in a row? How long is that streak?
4. What's the last day in December when the temperature was at least 50℉?
5. For each year, and for each month of the winter, what was the biggest temperature differential in a 2-day period?
6. As we approach winter, it feels like it sharply declines from warm temperatures (> 70℉) to cold temperatures (< 32℉). When does the first sub-freezing (< 32℉) temperature happen each Fall/Winter, and how long before that day was it still warm (> 70℉)?
7. Every year it feels like we have a very short Fall in Chicago, which I would characterize as a time during which the temperatures exist between 32℉ and 70℉. How long do the temperatures exist in this range?
8. When's the last Fall day (after September 22) when the temperature is at least 70℉?
9. When's the first Fall day (after September 22) that the temperature drops to below freezing?
10. When's the first Fall day (after September 22) that the temperature never gets above freezing?

Having grown up in Doylestown, Pennsylvania I'm used to different winter weather. I want to also compare all of these answers to what has historically happened around my hometown and see if really is harsher in Chicago or if that's just my imagination.

### 1. How often have we had snowfall on Christmas? How much was there?

In [6]:
gsod[(gsod['month'] == 12) & (gsod['day_of_month'] == 25) & gsod['snow'] & (gsod['snow_depth'] > 0)][['date', 'snow_depth']]

Unnamed: 0,date,snow_depth
1089,2002-12-25,3.1
2185,2005-12-25,1.2
3280,2008-12-25,5.1
3645,2009-12-25,1.2
4010,2010-12-25,7.1
6566,2017-12-25,2.0


Out of 50 years, we've only had 8 years where there was accumulated snow on Christmas! 

In 2010 and 2008 we had very significant snowfall on Christmas. We haven't had snow on Christmas since 2017. 

### 2. How often has it been above 40℉ on Christmas?

In [7]:
gsod[(gsod['month'] == 12) & (gsod['day_of_month'] == 25) & (gsod['max_temp'] > 40)][['date', 'max_temp']]

Unnamed: 0,date,max_temp
2550,2006-12-25,43.0
2915,2007-12-25,42.1
3645,2009-12-25,43.0
4375,2011-12-25,45.0
5471,2014-12-25,44.1
5836,2015-12-25,46.0
6931,2018-12-25,46.9
7296,2019-12-25,61.0
8027,2021-12-25,59.0
8757,2023-12-25,60.1


### 3. How often have we had Decembers where the temperature reaches at least 50℉ for more than one day in a row? How long is that streak?

In [9]:
min_temp = 50
warm_days_by_year = gsod[(gsod['month'] == 12) & (gsod['max_temp'] >= min_temp)].groupby('year')
for year, data in warm_days_by_year:
    longest_streak = 1
    current_streak = 1
    for i in range(1, len(data)):
        if data.iloc[i]['day_of_month'] == data.iloc[i - 1]['day_of_month'] + 1:
            current_streak += 1
        else:
            longest_streak = current_streak
            current_streak = 1
    print(f"Longest streak of >={min_temp}℉ for December {year}: {longest_streak} days")

Longest streak of >=50℉ for December 2001: 1 days
Longest streak of >=50℉ for December 2002: 2 days
Longest streak of >=50℉ for December 2003: 1 days
Longest streak of >=50℉ for December 2004: 2 days
Longest streak of >=50℉ for December 2006: 2 days
Longest streak of >=50℉ for December 2007: 1 days
Longest streak of >=50℉ for December 2008: 1 days
Longest streak of >=50℉ for December 2009: 1 days
Longest streak of >=50℉ for December 2010: 1 days
Longest streak of >=50℉ for December 2011: 2 days
Longest streak of >=50℉ for December 2012: 3 days
Longest streak of >=50℉ for December 2013: 3 days
Longest streak of >=50℉ for December 2014: 1 days
Longest streak of >=50℉ for December 2015: 1 days
Longest streak of >=50℉ for December 2016: 1 days
Longest streak of >=50℉ for December 2017: 5 days
Longest streak of >=50℉ for December 2018: 4 days
Longest streak of >=50℉ for December 2019: 6 days
Longest streak of >=50℉ for December 2020: 3 days
Longest streak of >=50℉ for December 2021: 6 days


### 4. What's the last day in December when the temperature was at least 50℉?

In [10]:
print(f"Last day in December with max temperature >= {min_temp}℉")
for year, data in warm_days_by_year:
    print(f"{data.iloc[-1]['year']}, December {data.iloc[-1]['day_of_month']}: {data.iloc[-1]['max_temp']}℉")

Last day in December with max temperature >= 50℉
2001, December 6: 64.4℉
2002, December 31: 55.9℉
2003, December 10: 55.4℉
2004, December 31: 57.2℉
2006, December 31: 59.0℉
2007, December 23: 52.0℉
2008, December 28: 54.0℉
2009, December 2: 50.0℉
2010, December 31: 55.9℉
2011, December 19: 51.1℉
2012, December 20: 50.0℉
2013, December 28: 51.1℉
2014, December 27: 51.1℉
2015, December 24: 57.9℉
2016, December 26: 55.0℉
2017, December 19: 51.1℉
2018, December 28: 54.0℉
2019, December 30: 54.0℉
2020, December 24: 52.0℉
2021, December 25: 59.0℉
2022, December 30: 57.0℉
2023, December 26: 55.9℉


### 5. For each year, and for each month of the winter, what was the biggest temperature differential in a 2-day period?

In [11]:
# First let's get just the data for winter months, which we'll consider to be December - March
winter = gsod[gsod['month'].isin([12, 1, 2, 3])].groupby(['year', 'month'])

for (year, month), group in winter:
    max_positive_diff = 0
    max_negative_diff = 0
    for i in range(1, len(group)):
        if group.iloc[i]['max_temp'] - group.iloc[i-1]['min_temp'] > max_positive_diff:
            max_positive_diff = group.iloc[i]['max_temp'] - group.iloc[i-1]['min_temp']
        if group.iloc[i]['min_temp'] - group.iloc[i-1]['max_temp'] < max_negative_diff:
            max_negative_diff = group.iloc[i]['min_temp'] - group.iloc[i-1]['max_temp']
    print(f"In any 2-day period, the biggest temperature increase in {month}/{year} was +{max_positive_diff} F.")
    print(f"In any 2-day period, the biggest temperature decrease in {month}/{year} was {max_negative_diff} F.")

In any 2-day period, the biggest temperature increase in 1/2000 was +28.3 F.
In any 2-day period, the biggest temperature decrease in 1/2000 was -30.4 F.
In any 2-day period, the biggest temperature increase in 2/2000 was +37.8 F.
In any 2-day period, the biggest temperature decrease in 2/2000 was -25.9 F.
In any 2-day period, the biggest temperature increase in 3/2000 was +40.9 F.
In any 2-day period, the biggest temperature decrease in 3/2000 was -45.400000000000006 F.
In any 2-day period, the biggest temperature increase in 12/2000 was +32.4 F.
In any 2-day period, the biggest temperature decrease in 12/2000 was -32.6 F.
In any 2-day period, the biggest temperature increase in 1/2001 was +24.1 F.
In any 2-day period, the biggest temperature decrease in 1/2001 was -19.6 F.
In any 2-day period, the biggest temperature increase in 2/2001 was +36.4 F.
In any 2-day period, the biggest temperature decrease in 2/2001 was -39.6 F.
In any 2-day period, the biggest temperature increase in 3/2

## First Fall day where the minimum temperature was < 32 F

In [13]:
minFreezingDays = gsod[(gsod['min_temp'] < 32) & (gsod['month'] > 8)]
# Group by year and get the first day_of_month and month values for each group
result1 = minFreezingDays.groupby('year').agg({'month': 'first', 'day_of_month': 'first'}).reset_index()

print(result1)

    year  month  day_of_month
0   2000     11            11
1   2001     11            20
2   2002     11             1
3   2003     11             7
4   2004     10            17
5   2005     11            10
6   2006     10            24
7   2007     11             7
8   2008     11            10
9   2009     10            11
10  2010     11             6
11  2011     11            11
12  2012     11             6
13  2013     10            22
14  2014     11             2
15  2015     11            14
16  2016     11            20
17  2017     11             8
18  2018     10            21
19  2019     11             1
20  2020     10            30
21  2021     11             3
22  2022     11            13
23  2023     10            31


## First day where the maximum temperature was < 32 F (the whole day was below freezing)

In [14]:
maxFreezingDays = gsod[(gsod['max_temp'] < 32) & (gsod['month'] > 8)]
# Group by year and get the first day_of_month and month values for each group
result2 = maxFreezingDays.groupby('year').agg({'month': 'first', 'day_of_month': 'first'}).reset_index()

print(result2)

    year  month  day_of_month
0   2000     11            21
1   2001     12            24
2   2002     11            25
3   2003     12            12
4   2004     12            14
5   2005     11            17
6   2006     12             2
7   2007     11            23
8   2008     11            21
9   2009     12            10
10  2010     12             1
11  2011     12            10
12  2013     11            24
13  2014     11            13
14  2015     11            22
15  2016     12             8
16  2017     12            24
17  2018     11            13
18  2019     11            12
19  2020     12            25
20  2021     12             7
21  2022     11            19
22  2023     11            28


## Last Fall day where the maximum temperature was > 70 F

In [15]:
maxWarmDays = gsod[(gsod['max_temp'] > 70) & (gsod['month'] > 8)]
# Group by YEAR and get the first DAY_OF_MONTH and MONTH values for each group
result3 = maxWarmDays.groupby('year').agg({'month': 'last', 'day_of_month': 'last'}).reset_index()

print(result3)

    year  month  day_of_month
0   2000     11             1
1   2001     11             7
2   2002     10            12
3   2003     10            30
4   2004     10            30
5   2005     10            18
6   2006     11             9
7   2007     10            22
8   2008     11             5
9   2009     11             8
10  2010     10            26
11  2011     11            13
12  2012     12             3
13  2013     10            12
14  2014     10            28
15  2015     11             5
16  2016     11            17
17  2017     10            22
18  2018     10            10
19  2019     10            11
20  2020     11            10
21  2021     10            20
22  2022     11            10
23  2023     11             6
