# Welcome to the AWS Global Surface Summary of the Day data notebook!
#### **Audience:** Anybody with a computer and access to at least 4GB of memory.
#### **Intent:** Build familiarity with GSOD data and understand how it could be used in analysis. 
#### **Outcome:** Statistics and plots for one selected surface station over its entire data record.          

The NOAA Global Surface Summary of the Day (GSOD) dataset contains information from over 9,000 weather stations worldwide. These stations often measure these 18 variables: Mean temperature, Mean dew point, Mean sea level pressure, Mean station pressure, Mean visibility, Mean wind speed, Maximum sustained wind speed, Maximum wind gust, Maximum temperature, Minimum temperature, Precipitation amount, Snow depth, and Indicator for occurrence of: Fog, Rain or Drizzle, Snow or Ice Pellets, Hail, Thunder, Tornado/Funnel Cloud. Each station has a different period of record, though the earliest files contained in GSOD were recorded in 1929, and the most recent files are current and available 1-2 days after the observations were recorded. See documentation here: https://data.noaa.gov/dataset/dataset/global-surface-summary-of-the-day-gsod.

This notebook will use GSOD data to perform an analysis of a weather station's complete record, producing a local synopsis of the most relevent weather variables. 

#### **Read and follow these steps below before beginning the notebook.**     
1. Open the URL (https://www.ncei.noaa.gov/maps/daily/?layers=0001) to see the global dataset. 
2. Using the map, pan to your region of interest to see available stations.  
3. Select the blue wrench icon to the right of the "Global Summary of the Day" layer in the lefthand menu. Use a tool to select your desired station(s) to analyze. 
4. Click the blue "Download Station List" icon to download a csv file labeled "stations.csv". 
5. Look through the .csv file to decide which station(s) you'd like to analyze. Base your decisions off of:
    - The station's period of record (how much data is available for this station?)
    - The station's location (how close is the station to your area of interest?)
    - The station's elevation (is the elevation of the station similar to that of your interest area?)
6. Copy the STATION_ID number and replace the number in the station_id variable in the cell below.
7. You're ready to begin making interesting plots and observations about your local weather station!

In [None]:
station_id = "72315003812"

### Import Python modules

In [None]:
import s3fs
from IPython.display import display
import ipywidgets as widgets
import numpy as np
import requests
import matplotlib.pyplot as plt
import csv
import pandas as pd
import datetime
from datetime import datetime

import warnings
warnings.filterwarnings('ignore')

### Run the definitions!

In [None]:
#getting variables from AWS

currentYear = datetime.now().year
year_options = list(range(1900, currentYear+1))

date_list = []
temp_list = []
max_list = []
min_list = []
slp_list = []
wdsp_list = []
gust_list = []
prcp_list = []
sndp_list = []

for year in year_options:
    try:
        df = pd.read_csv(f'https://noaa-gsod-pds.s3.amazonaws.com/{year}/{station_id}.csv')

        date_list.append(df.DATE)
        temp_list.append(df.TEMP)
        min_list.append(df.MIN)
        max_list.append(df.MAX)
        slp_list.append(df.SLP)
        wdsp_list.append(df.WDSP)
        gust_list.append(df.GUST)
        prcp_list.append(df.PRCP)
        sndp_list.append(df.SNDP)
        
    except:
        pass

dates = pd.concat(date_list, ignore_index=True)
temps = pd.concat(temp_list, ignore_index=True)
mins = pd.concat(min_list, ignore_index=True)
maxs = pd.concat(max_list, ignore_index=True)
slps = pd.concat(slp_list, ignore_index=True)
wdsps = pd.concat(wdsp_list, ignore_index=True)
gusts = pd.concat(gust_list, ignore_index=True)
prcps = pd.concat(prcp_list, ignore_index=True)
sndps = pd.concat(sndp_list, ignore_index=True)

# Put lists together
df_concat = pd.concat([dates, temps, mins, maxs, slps, wdsps, gusts, prcps, sndps], axis=1)

# Change datetime formats
df_concat.DATE = pd.to_datetime(df_concat.DATE, format='%Y-%m-%d') 
df_concat["day"] = df_concat.DATE.dt.strftime('%b %d')
df_concat["year"] = df_concat.DATE.dt.strftime('%Y')
df_concat["month"] = df_concat.DATE.dt.strftime('%m')
df_concat = df_concat[df_concat.day != "Feb 29"]

# QC
df_concat = df_concat.replace(9999.9, np.nan)
df_concat = df_concat.replace(999.9, np.nan)
df_concat = df_concat.replace(99.99, np.nan)

# Average variables
df_averaged = df_concat.groupby(df_concat.day).mean()


In [None]:
# High temperature
def high_temps():
    fig, ax = plt.subplots(figsize=(10, 2), dpi=300)
    plt.scatter(df_concat["day"], df_concat["year"], cmap = "turbo", s=2,c= df_concat["MAX"], marker="s")

    ax.xaxis.set_major_locator(plt.MaxNLocator(13))
    ax.yaxis.set_major_locator(plt.MaxNLocator(10))
    plt.gca().invert_yaxis()
    ax.margins(x=0, y=0.03)

    plt.colorbar(label = "Temperature (\u00b0F)")
    plt.title("Daily High Temperatures")
    plt.xticks(rotation=45)

    plt.show()
    plt.close("all")

    # Fun stats
    hot_day = (df_concat.loc[df_concat['MAX'] == max(df_concat.MAX)]).day.values[0]
    hot_year = (df_concat.loc[df_concat['MAX'] == max(df_concat.MAX)]).year.values[0]
    print(f"The hottest temperature ever recorded for your location was {max(df_concat.MAX)}\u00b0F on {hot_day}, {hot_year}.")
    
    exceed_100 = df_concat[df_concat.MAX >= 100]
    count_100 = exceed_100.groupby('year').count()       
    print(f"On average, your location meets or exceeds 100\u00b0F {count_100.MAX.sum()/len(pd.unique(df_concat['year']))} times per year.")
    
    print(f"The hottest 'high temperature' typically occurs on {df_averaged['MAX'].idxmax()} and averages {round(max(df_averaged.MAX), 1)}\u00b0F")

In [None]:
# Low temperature
def low_temps():
    df_concat.MIN[df_concat.MIN > 32] = np.nan

    fig, ax = plt.subplots(figsize=(10, 2), dpi=300)

    plt.scatter(df_concat["day"], df_concat["year"], cmap = "gist_rainbow",s=2,c= df_concat["MIN"], marker="s")

    ax.xaxis.set_major_locator(plt.MaxNLocator(13))
    ax.yaxis.set_major_locator(plt.MaxNLocator(10))
    plt.gca().invert_yaxis()
    ax.margins(x=0, y=0.03)

    plt.colorbar(label = "Temperature (\u00b0F)")
    plt.title("Freezing Temperatures (Below 32\u00b0F)")
    plt.xticks(rotation=45)

    plt.show()
    plt.close("all")

    # Cold stats
    cold_day = (df_concat.loc[df_concat['MIN'] == np.nanmin(df_concat.MIN)]).day.values[0]
    cold_year = (df_concat.loc[df_concat['MIN'] == np.nanmin(df_concat.MIN)]).year.values[0]
    print(f"The coldest temperature ever recorded for your location is {np.nanmin(df_concat.MIN)}\u00b0F on {cold_day}, {cold_year}.")

    freezing = df_concat[df_concat.MIN <= 32]
    freezing["month"] = freezing["month"].astype(int)
    
    try:
        late_freeze = freezing.loc[(freezing.month > 4) & (freezing.month < 8)]
        freeze_year = (late_freeze.loc[late_freeze['day'] == max(late_freeze.day)].year.values[0])
        print(f"There have been {len(pd.unique(late_freeze['year']))} years in the weather record where a freeze occurred on or after May 1st, where the latest ever freeze occurred on {max(late_freeze.day)}, {freeze_year}.")

    except:
        print("There are no instances of late freezes occurring on or after May 1st.")

    try:
        early_freeze = freezing.loc[(freezing.month > 6) & (freezing.month < 10)]
        early_freeze_year = (early_freeze.loc[early_freeze['day'] == min(early_freeze.day)].year.values[0])
        print(f"There have been {len(pd.unique(early_freeze['year']))} years in the weather record where a freeze occurred before October 1st, where the earliest ever freeze occurred on {min(early_freeze.day)}, {early_freeze_year}.")
    except:
        print("There are no instances of early freezes occurring before October 1st.")
        
    print(f"The coldest 'low temperature' typically occurs on {df_averaged['MIN'].idxmin()} and averages {round(min(df_averaged.MIN), 1)}\u00b0F")

In [None]:
# Precip amounts
def precip_amount():
    df_concat["PRCP"] = df_concat["PRCP"].replace(0, np.nan)

    fig, ax = plt.subplots(figsize=(10, 2), dpi=300)

    plt.scatter(df_concat["day"], df_concat["year"], cmap = 'jet', vmin = 0, vmax = 2, s=2,c= df_concat["PRCP"], marker="s")

    ax.xaxis.set_major_locator(plt.MaxNLocator(13))
    ax.yaxis.set_major_locator(plt.MaxNLocator(10))
    plt.gca().invert_yaxis()
    ax.margins(x=0, y=0.03)

    plt.colorbar(label = "Precipitation (in)")
    plt.title("Precipitation")
    plt.xticks(rotation=45)

    plt.show()
    plt.close("all")

    # Rain facts

    rain_day = (df_concat.loc[df_concat['PRCP'] == np.nanmax(df_concat.PRCP)]).day.values[0]
    rain_year = (df_concat.loc[df_concat['PRCP'] == np.nanmax(df_concat.PRCP)]).year.values[0]

    print(f"The most rain ever recorded for your location is {np.nanmax(df_concat.PRCP)} inches on {rain_day}, {rain_year}.")
    
    cumulative_year = df_concat.groupby('year').PRCP.sum()
    cumulative_year = cumulative_year[1:-1]
    print(f"The rainiest year on record was {cumulative_year.idxmax()}, with {max(cumulative_year)} inches of rain.")
    print(f"The driest year on record was {cumulative_year.idxmin()}, with {min(cumulative_year)} inches of rain.")

In [None]:
# Snow depths
def snow_depth():
    df_concat["SNDP"] = df_concat["SNDP"].replace(999.9, np.nan)

    fig, ax = plt.subplots(figsize=(10, 2), dpi=300)

    plt.scatter(df_concat["day"], df_concat["year"], s=2,c= df_concat["SNDP"], marker="s")

    ax.xaxis.set_major_locator(plt.MaxNLocator(13))
    ax.yaxis.set_major_locator(plt.MaxNLocator(10))
    plt.gca().invert_yaxis()
    ax.margins(x=0, y=0.03)

    plt.colorbar(label = "Snow depth (in)")
    plt.title("Snow Depth")
    plt.xticks(rotation=45)

    plt.show()
    plt.close("all")

    # Snow facts

    snow_day = (df_concat.loc[df_concat['SNDP'] == np.nanmax(df_concat.SNDP)]).day.values[0]
    snow_year = (df_concat.loc[df_concat['SNDP'] == np.nanmax(df_concat.SNDP)]).year.values[0]
    
    years_with_snow = df_concat.loc[(df_concat.SNDP > 0)]

    print(f"The most snow ever recorded at your location is {np.nanmax(df_concat.SNDP)} inches on {snow_day}, {snow_year}.")
    print(f"{len(pd.unique(years_with_snow['year']))} out of {len(pd.unique(df_concat['year']))} years in this station's record have had measureable snowfall.")

In [None]:
# Sea level pressure
def sea_level_pressure():
    fig, ax = plt.subplots(figsize=(10, 2), dpi=300)
    plt.scatter(df_concat["day"], df_concat["year"], s=2, c= df_concat["SLP"], marker="s")

    ax.xaxis.set_major_locator(plt.MaxNLocator(13))
    ax.yaxis.set_major_locator(plt.MaxNLocator(10))
    plt.gca().invert_yaxis()
    ax.margins(x=0, y=0.03)

    plt.colorbar(label = "Sea level pressure (hPa)")
    plt.title("Sea level pressure")
    plt.xticks(rotation=45)

    plt.show()
    plt.close("all")

In [None]:
# Wind speeds
def wind_speeds():
    fig, ax = plt.subplots(figsize=(10, 2), dpi=300)

    plt.scatter(df_concat["day"], df_concat["year"], s=2,c= df_concat["WDSP"], marker="s")

    ax.xaxis.set_major_locator(plt.MaxNLocator(13))
    ax.yaxis.set_major_locator(plt.MaxNLocator(10))
    plt.gca().invert_yaxis()
    ax.margins(x=0, y=0.03)

    plt.colorbar(label = "Wind speed (knots)")
    plt.title("Mean Wind Speed")
    plt.xticks(rotation=45)

    plt.show()
    plt.close("all")

In [None]:
# Wind gusts
def wind_gust():
    fig, ax = plt.subplots(figsize=(10, 2), dpi=300)

    plt.scatter(df_concat["day"], df_concat["year"], s=2,c= df_concat["GUST"], marker="s")

    ax.xaxis.set_major_locator(plt.MaxNLocator(13))
    ax.yaxis.set_major_locator(plt.MaxNLocator(10))
    plt.gca().invert_yaxis()
    ax.margins(x=0, y=0.03)

    plt.colorbar(label = "Wind Gust (knots)")
    plt.title("Maximum Daily Wind Gust")
    plt.xticks(rotation=45)

    plt.show()
    plt.close("all")
    
    gust_day = (df_concat.loc[df_concat['GUST'] == np.nanmax(df_concat.GUST)]).day.values[0]
    gust_year = (df_concat.loc[df_concat['GUST'] == np.nanmax(df_concat.GUST)]).year.values[0]
    print(f"The strongest wind gust ever recorded for your location is {np.nanmax(df_concat.GUST)} knots on {gust_day}, {gust_year}.")

### Step 4: Find the information!

In [None]:
# Temperatures! 

high_temps()
low_temps()

In [None]:
# Precipitation

precip_amount()
snow_depth()

In [None]:
# Surface variables

sea_level_pressure()
wind_speeds()
wind_gust()

**GSOD Product Documentation:**
- https://www.ncei.noaa.gov/access/metadata/landing-page/bin/iso?id=gov.noaa.ncdc:C00516 (NCEI dataset landing page)
- https://www.ncei.noaa.gov/data/global-summary-of-the-day/doc/readme.txt (Overview of the dataset)

**CSP Access:**
- AWS: https://registry.opendata.aws/noaa-gsod/
- Google: https://console.cloud.google.com/marketplace/details/noaa-public/gsod?filter=solution-type:dataset&q=NOAA&id=c6c1b652-3958-4a47-9e58-552a546df47f

The unique component of this Jupyter notebook is that you are not requried to download any datasets -- all data will be pulled directly from the cloud. You can learn more about NOAA's efforts to move more data to the cloud at this site: https://www.noaa.gov/nodd/datasets. As we continute to make more data widely accessible on the cloud, we'll also create more Jupyter notebooks like this one, so anyone can visualize weather and climate data without any cost or restriction. 