# Climate Record for a Pittsburgh, Pennsylvania

The goal of this example is to demonstrate the setup of a python notebook,
download and clean-up of raw data, use of functions and formulas to convert
clean data from a table, and to plot the data in a visual representation.
The location can be changed and new data processed and plotted. 

# About this data

**Description - Global Climate at a Glance**
The data is compiled at the request of NOAA for near real-time analysis 
of monthly climatic data (temperature and precipitation) from across the 
continental United States. Units used vary based upon user preferences 
(mm or in - precipitation and F or C - temperature). This dataset will 
use units of Fahrenheit. The period of record varies by station. Data is
available for national, regional, state, and city. Data from the Global 
Historical Climatology Network (GHCN) is a composite of climate records 
from numerous quality checked and merged sources. 
https://www.ncei.noaa.gov/pub/data/cdo/documentation/GHCND_documentation.pdf

**Data Citation - Pittsburgh, PA City Time Series**
NOAA National Centers for Environmental information, Climate at a Glance: 
City Time Series, published August 2023, retrieved on September 10, 2023 from 
https://www.ncei.noaa.gov/access/monitoring/climate-at-a-glance/city/time-series

The journal article describing GHCN-Daily is: Menne, M.J., I. Durre, R.S. 
Vose, B.E. Gleason, and T.G. Houston, 2012: An overview of the Global 
Historical Climatology Network-Daily Database. Journal of Atmospheric and
Oceanic Technology, 29, 897-910, doi:10.1175/JTECH-D-11-00103.1.

**Site Description - Pittsburgh, Pennsylvania**
The data is recorded from a station at the Pittsburgh International Airport 
for a period from 02-01-1945 to present. The station is located at lat/long 
40.48459°, -80.21448° and at an elevation of 341 m. Station GHCND:USW00094823 
metadata found at: 
https://www.ncdc.noaa.gov/cdo-web/datasets/GHCND/stations/GHCND:USW00094823/detail

# Set up the notebook

In [10]:
# Importing libraries for tabular data and plotting

import hvplot.pandas
import pandas as pd

# Download and clean-up the Data

In [11]:
# Define the URL for the Pittsburgh, PA max temperature data download

pit_temp_url = (
    'https://www.ncei.noaa.gov/access/monitoring/climate-at-a-glance/'
    'city/time-series/USW00094823/tmax/1/7/1948-2023.csv?base_prd='
    'true&begbaseyear=1991&endbaseyear=2020')

pit_temp_url

'https://www.ncei.noaa.gov/access/monitoring/climate-at-a-glance/city/time-series/USW00094823/tmax/1/7/1948-2023.csv?base_prd=true&begbaseyear=1991&endbaseyear=2020'

In [12]:
# List all current variables

%whos

Variable         Type         Data/Info
---------------------------------------
convert_f_to_c   function     <function convert_f_to_c at 0x7fa3a8c4d990>
hvplot           module       <module 'hvplot' from '/o<...>ages/hvplot/__init__.py'>
pd               module       <module 'pandas' from '/o<...>ages/pandas/__init__.py'>
pit_temp_df      DataFrame        year  temp_f  anomaly<...>\n\n[76 rows x 4 columns]
pit_temp_url     str          https://www.ncei.noaa.gov<...>ear=1991&endbaseyear=2020
temperature_f    Series       0     82.9\n1     85.1\n2<...>ength: 76, dtype: float64


In [13]:
# Import temperature data for the Pittsburgh, PA from NCEI

pit_temp_df = pd.read_csv(pit_temp_url, header=4,
                          names=['year', 'temp_f', 'anomaly'])
pit_temp_df

Unnamed: 0,year,temp_f,anomaly
0,194807,82.9,0.0
1,194907,85.1,2.2
2,195007,78.7,-4.2
3,195107,82.9,-0.1
4,195207,85.8,2.9
...,...,...,...
71,201907,84.0,1.1
72,202007,87.5,4.6
73,202107,82.0,-1.0
74,202207,83.2,0.3


In [14]:
# Extract the year from the date

pit_temp_df.year = pd.to_datetime(pit_temp_df.year, format='%Y%m').dt.year
pit_temp_df

Unnamed: 0,year,temp_f,anomaly
0,1948,82.9,0.0
1,1949,85.1,2.2
2,1950,78.7,-4.2
3,1951,82.9,-0.1
4,1952,85.8,2.9
...,...,...,...
71,2019,84.0,1.1
72,2020,87.5,4.6
73,2021,82.0,-1.0
74,2022,83.2,0.3


# Convert the data using equations and functions

In [15]:
# Convert units from Fahrenheit to Celcius (Option #1)
# Use a basic equation to convert temperature

pit_temp_df['temp_c'] = (pit_temp_df['temp_f'] - 32) * 5 / 9
pit_temp_df

Unnamed: 0,year,temp_f,anomaly,temp_c
0,1948,82.9,0.0,28.277778
1,1949,85.1,2.2,29.500000
2,1950,78.7,-4.2,25.944444
3,1951,82.9,-0.1,28.277778
4,1952,85.8,2.9,29.888889
...,...,...,...,...
71,2019,84.0,1.1,28.888889
72,2020,87.5,4.6,30.833333
73,2021,82.0,-1.0,27.777778
74,2022,83.2,0.3,28.444444


In [16]:
# Convert units from Fahrenheit to Celcius (Options #2)
# Assign the Fahrenheit temperature column entry to a variable

temperature_f = pit_temp_df['temp_f']


# Use a function to convert temperatures Fahrenheit to Celcius

def convert_f_to_c(temperature_f):
    temperature_c = (temperature_f - 32) * 5 / 9 # Assign result to variable
    pit_temp_df['temp_c'] = temperature_c # Write result back into dataframe
    return temperature_c


# Apply function to each line in the dataframe and return the converted value

pit_temp_df['temp_c'] = pit_temp_df['temp_f'].apply(convert_f_to_c)
pit_temp_df

Unnamed: 0,year,temp_f,anomaly,temp_c
0,1948,82.9,0.0,28.277778
1,1949,85.1,2.2,29.500000
2,1950,78.7,-4.2,25.944444
3,1951,82.9,-0.1,28.277778
4,1952,85.8,2.9,29.888889
...,...,...,...,...
71,2019,84.0,1.1,28.888889
72,2020,87.5,4.6,30.833333
73,2021,82.0,-1.0,27.777778
74,2022,83.2,0.3,28.444444


# Create a visual representation of the data

In [17]:
# Plot temperature in Fahrenheit by year for Pittsburgh, PA

pit_temp_df.hvplot(
    x='year', y='temp_f',
    title='Max Annual Temperature in Pittsburgh, PA',
    xlabel='Year', ylabel='Temperature (°F)')

  return dataset.data.dtypes[idx].type
  return dataset.data.dtypes[idx].type


In [18]:
%%capture
%%bash
jupyter nbconvert ncei_temp_pittsburgh.ipynb --to html --no-input