# Cloud Cover Data

In the course of my investigation into how climate affects mood especially in the winter, I know that sunlight has a big impact on my own experience with Seasonal Affective Disorder (SAD). The best proxy measurement I've found for indicating possible exposure to sunlight on a daily basis is [cloud cover](https://en.wikipedia.org/wiki/Cloud_cover). A helpful visual reference with photos can be found [here](https://www.eoas.ubc.ca/courses/atsc113/flying/met_concepts/01-met_concepts/01c-cloud_coverage/index.html).

The NOAA GSOD dataset does not include cloud cover data, so I need to get that from [Visual Crossing](https://www.visualcrossing.com/resources/documentation/weather-data/where-can-i-find-historical-cloud-and-visibility-data/). I needed to create an account with a credit card in order to retrieve the values for cloud cover for January 1, 2000 to the present (May 14, 2024 as of this writing). I downloaded this to a CSV file which contains only the date and the cloud cover value ranging from 0 to 100, where 0 is completely clear skies and 100 is completely overcast all day. 

In [3]:
import pandas as pd

# This cloud cover data is retrieved from https://www.visualcrossing.com/weather/weather-data-services
df = pd.read_csv('chicago, il 2000-01-01 to 2024-05-14 - CLOUD COVER.csv')
print(f"Retrieved {len(df)} records.")
df['datetime'] = pd.to_datetime(df['datetime'])
df = df.rename(columns={'datetime': 'date'})
df[-100:]

Retrieved 8901 records.


Unnamed: 0,date,cloudcover
8801,2024-02-05,79.2
8802,2024-02-06,79.0
8803,2024-02-07,78.4
8804,2024-02-08,76.5
8805,2024-02-09,27.2
...,...,...
8896,2024-05-10,71.0
8897,2024-05-11,41.8
8898,2024-05-12,44.5
8899,2024-05-13,70.9


These numbers are hard to understand at a glance, so let's simplify using the meteorological standard unit for cloud cover, [Okta](https://en.wikipedia.org/wiki/Okta), which is roughly eighths of the sky covered by clouds. A more precise table indicating this correspondence can be found [here](https://www.researchgate.net/figure/Calculation-of-cloudiness-in-percentage-for-corresponding-okta-values_tbl1_331176763), with symbols available [here](https://en.wikipedia.org/wiki/Okta#Unicode). I'll reproduce this below:

| Cloud Cover Range | Okta | Symbol |
| ----------------- | ---- | ------ |
| 0                 | 0    | ◯      |
| (0,18.75)         | 1    | ⌽      |
| [18.75, 31.25)    | 2    | ◔      |
| [31.25, 43.75)    | 3    | ◔      |
| [43.75, 56.25)    | 4    | ◑      |
| [56.25, 68.75)    | 5    | ◑      |
| [68.75, 81.25)    | 6    | ◕      |
| [81.25, 100)      | 7    | ◕      |
| 100               | 8    | ⬤      |


In [4]:
def get_emoji(cloudcover):
    if cloudcover == 0:
        return '◯'
    elif 0 < cloudcover < 18.75:
        return '⌽'
    elif 18.75 <= cloudcover < 31.25:
        return '◔'
    elif 31.25 <= cloudcover < 43.75:
        return '◔'
    elif 43.75 <= cloudcover < 56.25:
        return '◑'
    elif 56.25 <= cloudcover < 68.75:
        return '◑'
    elif 68.75 <= cloudcover < 81.25:
        return '◕'
    elif 81.25 <= cloudcover < 100:
        return '◕'
    elif cloudcover == 100:
        return '⬤'
    else:
        return ''

df['emoji'] = df['cloudcover'].apply(get_emoji)

df[-100:]


Unnamed: 0,date,cloudcover,emoji
8801,2024-02-05,79.2,◕
8802,2024-02-06,79.0,◕
8803,2024-02-07,78.4,◕
8804,2024-02-08,76.5,◕
8805,2024-02-09,27.2,◔
...,...,...,...
8896,2024-05-10,71.0,◕
8897,2024-05-11,41.8,◔
8898,2024-05-12,44.5,◑
8899,2024-05-13,70.9,◕


In [6]:
# Filter for the month of May in each year from 2000 to 2023
df_may = df[(df['date'].dt.month == 5) & (df['date'].dt.year >= 2000) & (df['date'].dt.year <= 2023)]

# Filter rows where cloudcover <= 50
filtered_df = df_may[df_may['cloudcover'] <= 50]

# Identify streaks of consecutive dates
streaks = []
current_streak = []

for i in range(len(filtered_df)):
    if i == 0 or (filtered_df.iloc[i]['date'] - filtered_df.iloc[i-1]['date']).days == 1:
        current_streak.append(filtered_df.iloc[i])
    else:
        if len(current_streak) >= 2:
            streaks.append(pd.DataFrame(current_streak))
        current_streak = [filtered_df.iloc[i]]

if len(current_streak) >= 2:
    streaks.append(pd.DataFrame(current_streak))

# Print each streak
for i, streak in enumerate(streaks):
    print(f"Streak {i+1}:\n", streak)

Streak 1:
           date  cloudcover emoji
122 2000-05-02        18.6     ⌽
123 2000-05-03        35.9     ◔
Streak 2:
           date  cloudcover emoji
134 2000-05-14        22.4     ◔
135 2000-05-15        40.6     ◔
Streak 3:
           date  cloudcover emoji
144 2000-05-24        46.3     ◑
145 2000-05-25        20.9     ◔
Streak 4:
           date  cloudcover emoji
493 2001-05-08        35.8     ◔
494 2001-05-09        43.0     ◔
Streak 5:
           date  cloudcover emoji
497 2001-05-12        38.4     ◔
498 2001-05-13        17.5     ⌽
Streak 6:
           date  cloudcover emoji
853 2002-05-03         9.4     ⌽
854 2002-05-04         4.3     ⌽
855 2002-05-05        32.6     ◔
Streak 7:
           date  cloudcover emoji
871 2002-05-21        18.9     ◔
872 2002-05-22        32.6     ◔
Streak 8:
            date  cloudcover emoji
1236 2003-05-21        37.7     ◔
1237 2003-05-22        39.1     ◔
Streak 9:
            date  cloudcover emoji
1241 2003-05-26        22.0     ◔
1242 

In [22]:
# Function to determine the season
def get_season(date):
    year = date.year
    if date >= pd.Timestamp(year=year, month=12, day=21) or date < pd.Timestamp(year=year, month=3, day=21):
        return 'Winter'
    elif date >= pd.Timestamp(year=year, month=3, day=21) and date < pd.Timestamp(year=year, month=6, day=21):
        return 'Spring'
    elif date >= pd.Timestamp(year=year, month=6, day=21) and date < pd.Timestamp(year=year, month=9, day=21):
        return 'Summer'
    elif date >= pd.Timestamp(year=year, month=9, day=21) and date < pd.Timestamp(year=year, month=12, day=21):
        return 'Fall'

def get_month(date):
    return f"{date.month:02d}-{date.strftime('%B')}"

# Apply the function to get the season for each date
df['season'] = df['date'].apply(get_season)
df['month'] = df['date'].apply(get_month)

# Filter the dataframe for the specified date range
filtered_df = df[(df['date'] >= '2000-01-01') & (df['date'] <= '2024-04-30')]

# Calculate the average cloud cover for each season
seasonal_avg_cloudcover = filtered_df.groupby('season')['cloudcover'].mean()

# Calculate the average cloud cover for each month
monthly_avg_cloudcover = filtered_df.groupby('month')['cloudcover'].mean()

print("Average cloud cover for each season between 2000-01-01 and 2024-04-30:")
print(seasonal_avg_cloudcover)

print("Average cloud cover for each month between 2000-01-01 and 2024-04-30:")
print(monthly_avg_cloudcover)

Average cloud cover for each season between 2000-01-01 and 2024-04-30:
season
Fall      65.535852
Spring    65.784349
Summer    55.512636
Winter    70.355833
Name: cloudcover, dtype: float64
Average cloud cover for each month between 2000-01-01 and 2024-04-30:
month
01-January      72.823871
02-February     67.591231
03-March        69.060129
04-April        65.987333
05-May          66.442070
06-June         60.124306
07-July         56.443145
08-August       55.365188
09-September    53.762917
10-October      62.918952
11-November     66.847500
12-December     73.755108
Name: cloudcover, dtype: float64
