# Project-5: SNOTEL Water Level Analysis

##### Grant Hicks, Kathleen Wang, Samuel Yeager
--------

#### Imports

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
from statsmodels.tsa.stattools import adfuller
from statsmodels.tsa.arima.model import ARIMA
from sklearn.model_selection import GridSearchCV, train_test_split
import statsmodels.api as sm

from pandas.plotting import register_matplotlib_converters
register_matplotlib_converters()

# Supressing warnings on models
import warnings
warnings.filterwarnings("ignore")

### Problem Statement
The National Resources Conservation Service keeps track of precipitation levels in water basins across the Western United States using an automated system known as SNOTEL. This information is critical for resource management and provides insight on the climate status in the regions where the data is gathered. Understanding what areas may be experiencing abnormalities can be key  for planning resource management operations. Our goal is to determine if we can produce a model to predict precipitation levels in a water basin using data gathered from NRCS SNOTEL sites. 

----------------------------

### Contents
- Background
- Data Imports And Cleaning
- Exploratory Data Analysis
- Data Visualization
- ARIMA Model
- Conclusions and Recommendations

## Background
#### SNOTEL
The Natural Resources Conservation Service (NRCS) uses an automated system to collect snowpack and climate data in the Western United States known as SNOTEL (SNOwpack TELemetry). Growing from a manual measurement system SNOTEL has been reliably collecting data to produce water supply forecasts and support resource management activities since 1980. SNOTEL uses meteor burst communications to collect and communicate data in near real time without the use of satellites. There are more than 730 SNOTEL sites in 11 states, all designed to operate without maintenance for a year as they are typically in remote locations and maintenance trips can involve long hikes or helicopter trips. The NRCS National Water and Climate Center in Portland, Oregon houses the central computer that controls operation of the sites and receives the data gathered.

|                  More information on SNOTEL can be found at the following links                  |
|:------------------------------------------------------------------------------------------------:|
| [SNOTEL Data Collection Network Fact Sheet](https://www.wcc.nrcs.usda.gov/factpub/sntlfct1.html) |
| [SNOTEL Brochure](https://www.wcc.nrcs.usda.gov/snotel/snotel_brochure.pdf)                      |
| [Snow Telemetry and Snow Course Data and Products](https://www.wcc.nrcs.usda.gov/snow/)          |

Our main focus was to look at the current reported precipitation level as well as the precipitation year to date for each site.

--------------------------

## Data Imports and Cleaning
Data used contains data from SNOTEL sites in the Columbia River Basin on February 10th from the years 1990 - 2021.

SNOTEL Snow/Precipitation Update Reports were gathered [here](https://wcc.sc.egov.usda.gov/reports/SelectUpdateReport.html).

#### Data Dictionary
|Feature                 |Description                                             |
|------------------------|--------------------------------------------------------|
| Lat                    | Decimal Latitude of SNOTEL Site                        |
| Long                   | Decimal Latitude of SNOTEL Site                        |
| YYYYMMDD               | Date of Observation                                    |
| Basin_name             | SNOTEL Site Sub-basin Name                             |
| Station_id             | SNOTEL System Identification Code                      |
| Acton_id               | Snow Survey Program ACTON Code                         |
| Station_name           | SNOTEL Station Name                                    |
| Elevation              | SNOTEL Site Elevation (feet)                           |
| Wteq_amt               | Current Snow Water Equivalent(inches)                  |
| Wteq_med               | Snow Water Equivalent Median (inches)                  |
| Wteq_amt_pct_med       | Current Snow Water Equivalent as Percent of Median     |
| Wteq_amt_pct_med_flag  | Snow Water Equivalent Validity Code                    |
| Prec_wytd_amt          | Water Year to Date Precipitation (inches)              |
| Prec_wytd_avg          | Water Year to Date Precipitation Average (inches)      |
| Prec_wytd_pctavg       | Water Year to Date Precipitation as Percent of Average |
| Prec_wytd_pct_avg_flag | Water Year to Date Precipitation Validity Code         |

-------------------------

Since the data we looked at spanned 30 years we each took a decade and cleaned the data. The csv files for each originally contained many lines before the actual data, to prepare the data to be properly used by pandas a python script was used to iterate over each file, . This script is in the Scripts folder as 'strip_script.py'. The cleaned csv files were then located to the Data folder for use. Once we had each taken the steps to prepare the data for each decade the data was merged into a single csv file located in the data folder titled 'allyears.csv'.

In [2]:
allyears = pd.read_csv('data/allyears.csv')
allyears.head()

Unnamed: 0.1,Unnamed: 0,yyyymmdd,lat,long,station_id,acton_id,station_name,elevation,wteq_amt,wteq_med,...,"lower columbia, hood river",owyhee malheur,"raft, goose, salmon falls, bruneau",snake above palisades,"umatilla, walla walla, willow",upper clark fork river basin,"weiser, payette, boise","white, green, cedar, skykomish, snoqualmi, baker, skagit",willamette,"yakima, ahtanum"
0,0,2011-02-10,48.566667,-115.45,311,15A08S,Banfield Mountain,5600,12.1,12.2,...,0,0,0,0,0,0,0,0,0,0
1,1,2011-02-10,48.3,-116.066667,323,16A08S,Bear Mountain,5400,37.4,38.6,...,0,0,0,0,0,0,0,0,0,0
2,2,2011-02-10,48.983333,-115.816667,918,15A05S,Garver Creek,4250,7.7,6.9,...,0,0,0,0,0,0,0,0,0,0
3,3,2011-02-10,48.916667,-114.766667,500,14A11S,Grave Creek,4300,13.1,11.4,...,0,0,0,0,0,0,0,0,0,0
4,4,2011-02-10,48.3,-114.833333,510,14A14S,Hand Creek,5035,9.6,7.8,...,0,0,0,0,0,0,0,0,0,0


In [4]:
# Dropping the 'Unnamed: 0' column

allyears.drop(columns = 'Unnamed: 0', inplace = True)

In [5]:
# Getting a look at the numbers and some statistics

allyears.describe()

Unnamed: 0,lat,long,station_id,elevation,wteq_amt,wteq_med,wteq_amt_pct_med,prec_wytd_amt,prec_wytd_avg,prec_wytd_pctavg,...,"lower columbia, hood river",owyhee malheur,"raft, goose, salmon falls, bruneau",snake above palisades,"umatilla, walla walla, willow",upper clark fork river basin,"weiser, payette, boise","white, green, cedar, skykomish, snoqualmi, baker, skagit",willamette,"yakima, ahtanum"
count,8812.0,8812.0,8812.0,8812.0,8812.0,8812.0,8812.0,8812.0,8812.0,8812.0,...,8812.0,8812.0,8812.0,8812.0,8812.0,8812.0,8812.0,8812.0,8812.0,8812.0
mean,45.426398,-116.435764,607.206537,5970.510667,16.031355,15.847878,101.75244,25.145098,24.859998,101.08148,...,0.021788,0.034044,0.029051,0.054471,0.018157,0.054471,0.065365,0.033591,0.073763,0.028938
std,1.855368,3.536988,178.07011,1550.921705,10.40304,8.26238,40.035746,17.415914,15.231577,28.114302,...,0.146001,0.181354,0.16796,0.226958,0.133527,0.226958,0.247184,0.180183,0.2614,0.167641
min,41.233333,-123.366667,302.0,420.0,0.0,0.0,0.0,2.7,6.0,30.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,43.95,-119.833333,466.0,4930.0,8.7,9.9,78.0,13.2,14.1,82.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
50%,45.183333,-115.7,609.0,5850.0,13.7,14.0,99.0,20.0,21.1,99.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
75%,46.983333,-113.95,748.0,7180.0,21.2,20.7,121.0,31.7,31.7,117.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
max,48.983333,-110.05,1165.0,9580.0,71.9,47.6,622.0,160.5,99.4,234.0,...,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0
