# Exploring the NOAA NCEI Precipitation Data
##### Author: John Mays
- Data downloaded from [the NOAA NCEI website](https://www.ncei.noaa.gov/access) in the global summary of the year (GSOY) format in a CSV
- Just doing one city (and one station) for now: Seattle (SEATTLE SAND POINT WEATHER FORECAST OFFICE)

In [1]:
import pandas as pd
# pd.set_option("display.max_columns", 40)

In [2]:
seattle = pd.read_csv("../data/USW00094290.csv")

In [3]:
seattle.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 33 entries, 0 to 32
Data columns (total 76 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   STATION          33 non-null     object 
 1   DATE             33 non-null     int64  
 2   LATITUDE         33 non-null     float64
 3   LONGITUDE        33 non-null     float64
 4   ELEVATION        33 non-null     float64
 5   NAME             33 non-null     object 
 6   CDSD             30 non-null     float64
 7   CDSD_ATTRIBUTES  0 non-null      float64
 8   CLDD             30 non-null     float64
 9   CLDD_ATTRIBUTES  30 non-null     object 
 10  DP01             29 non-null     float64
 11  DP01_ATTRIBUTES  29 non-null     object 
 12  DP10             29 non-null     float64
 13  DP10_ATTRIBUTES  29 non-null     object 
 14  DP1X             29 non-null     float64
 15  DP1X_ATTRIBUTES  29 non-null     object 
 16  DSND             21 non-null     float64
 17  DSND_ATTRIBUTES  0

In [4]:
columns = list(seattle.columns)
print(columns)

['STATION', 'DATE', 'LATITUDE', 'LONGITUDE', 'ELEVATION', 'NAME', 'CDSD', 'CDSD_ATTRIBUTES', 'CLDD', 'CLDD_ATTRIBUTES', 'DP01', 'DP01_ATTRIBUTES', 'DP10', 'DP10_ATTRIBUTES', 'DP1X', 'DP1X_ATTRIBUTES', 'DSND', 'DSND_ATTRIBUTES', 'DSNW', 'DSNW_ATTRIBUTES', 'DT00', 'DT00_ATTRIBUTES', 'DT32', 'DT32_ATTRIBUTES', 'DX32', 'DX32_ATTRIBUTES', 'DX70', 'DX70_ATTRIBUTES', 'DX90', 'DX90_ATTRIBUTES', 'DYFG', 'DYFG_ATTRIBUTES', 'DYHF', 'DYHF_ATTRIBUTES', 'DYTS', 'DYTS_ATTRIBUTES', 'EMNT', 'EMNT_ATTRIBUTES', 'EMSD', 'EMSD_ATTRIBUTES', 'EMSN', 'EMSN_ATTRIBUTES', 'EMXP', 'EMXP_ATTRIBUTES', 'EMXT', 'EMXT_ATTRIBUTES', 'FZF0', 'FZF0_ATTRIBUTES', 'FZF1', 'FZF1_ATTRIBUTES', 'FZF2', 'FZF2_ATTRIBUTES', 'FZF3', 'FZF3_ATTRIBUTES', 'FZF5', 'FZF5_ATTRIBUTES', 'FZF6', 'FZF6_ATTRIBUTES', 'FZF7', 'FZF7_ATTRIBUTES', 'FZF8', 'FZF8_ATTRIBUTES', 'HDSD', 'HDSD_ATTRIBUTES', 'HTDD', 'HTDD_ATTRIBUTES', 'PRCP', 'PRCP_ATTRIBUTES', 'SNOW', 'SNOW_ATTRIBUTES', 'TAVG', 'TAVG_ATTRIBUTES', 'TMAX', 'TMAX_ATTRIBUTES', 'TMIN', 'TMIN_AT

You can find the meaning of the columns/general schema in the [GSOY README](https://www.ncei.noaa.gov/pub/data/metadata/documents/GSOYReadme.txt)

*NOTE:* For this dataset, I think that the word "precipitation" implies anything that's not snow (sleet, hail, mostly rain), but I'll attempt to figure that out for sure with some obvious tests like: *"can the total precipiation be lower than the total snowfall?"* and whatnot.

The columns (so-called *"data types"*) I would like to pay attention to are:
- `PRCP`: total annual precipitation in inches
- `SNOW`: total annual snowfall in inches
- `DYFG`: number of days with fog
- `DYHF`: number of days with "heavy" fog
- `DYTS`: number of days with thunderstorms
- `EMXP`: highest daily total of precipitation in inches
- `DATE`: year
- `DP10`: number of days with over 0.10 inches of unspecified precipitation, probably rain w/o snow
- `DP1X`: number of days with over 1.00 inches of unspecified precipitation, probably rain w/o snow
- `DSNW`: number of days with over 1.00 inches of snowfall

In [5]:
partial_schema = pd.DataFrame(
    {
        "Column" : [
            "DATE",
            "PRCP",
            "SNOW",
            "DYFG",
            "DYHF",
            "DYTS",
            "EMXP",
            "DP01",
            "DP10",
            "DP1X",
            "DSNW"
        ],
        "Description" : [
            "year AD",
            "total annual precipitation in inches",
            "total annual snowfall in inches",
            "number of days with fog",
            "number of days with 'heavy' fog",
            "number of days with thunderstorms",
            "highest daily total of precipitation in inches",
            "number of days with over 0.01 inches of unspecified precipitation, probably rain w/o snow",
            "number of days with over 0.10 inches of unspecified precipitation, probably rain w/o snow",
            "number of days with over 1.00 inches of unspecified precipitation, probably rain w/o snow",
            "number of days with over 1.00 inches of snowfall"
        ]
    }
)

In [6]:
columns_selection = list(partial_schema["Column"])

In [7]:
seattle = seattle[columns_selection]

In [8]:
seattle.set_index("DATE", inplace=True)

In [9]:
seattle.loc[2012:2022]

Unnamed: 0_level_0,PRCP,SNOW,DYFG,DYHF,DYTS,EMXP,DP10,DP1X,DSNW
DATE,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
2012,1191.1,196.0,42.0,,4.0,66.0,122.0,5.0,3.0
2013,756.1,23.0,65.0,,5.0,38.9,80.0,3.0,0.0
2014,1188.1,51.0,72.0,1.0,4.0,36.3,108.0,9.0,1.0
2015,1000.1,0.0,78.0,10.0,9.0,55.4,85.0,4.0,0.0
2016,1145.1,35.0,61.0,3.0,5.0,44.7,115.0,7.0,0.0
2017,1139.0,101.0,57.0,2.0,5.0,46.0,95.0,8.0,3.0
2018,947.5,20.0,65.0,9.0,5.0,42.4,95.0,3.0,0.0
2019,795.5,291.0,45.0,2.0,3.0,67.1,88.0,2.0,5.0
2020,1132.5,102.0,54.0,6.0,4.0,55.6,110.0,6.0,2.0
2021,1017.3,361.0,32.0,2.0,3.0,45.0,106.0,3.0,5.0


In [10]:
# seattle.reset_index(inplace=True)