# Example Data Science Project for USDA
Shane M. Wilkins, PhD
2021-01-21

## Introduction

This notebook provides a gentle introduction to some common data science methods that may be useful to USDA staff working on research projects for the Agency.
We will only use publicly available official government data.
None of this data contains personally identifiable information.

Please note that this is not an official report of the USDA.
This project is purely educational and all the examples given here are purely for the sake of illustration.

### The CRISP-DM Model


## Business Problem

## Data Sources

### USDA Data Sources

USDA maintains a library of APIs available at [usda.gov](https://www.usda.gov/media/digital/developer-resources).

NRCS National Water and Climate Center's Air and Water Database (AWDB) has an [API](https://www.wcc.nrcs.usda.gov/web_service/AWDB_Web_Service_Tutorial.htm).

In [5]:
import numpy as np 
import pandas as pd # Pandas provides useful data manipulation features
import plotly.express as px # Plotly creates interactive, beautiful visualizations

In [None]:
from zeep import Client # Zeep provides access to SOAP APIs

client = Client('https://www.wcc.nrcs.usda.gov/awdbWebService/services?WSDL')

# example API call from AWDB docs
# https://www.wcc.nrcs.usda.gov/web_service/AWDB_Web_Service_Reference.htm#getdata

stationTriplets = "302:OR:SNTL" #SNOTEL Station ANEROID LAKE #2 in Oregon
elementCd = "PREC" # accumulated precipitation
ordinal = 1 
heightDepth = None
duration = Duration.DAILY
getFlags = True
beginDate = "2010-01-01"
endDate = "2010-01-31"
alwaysReturnDailyFeb29 = False

query = [stationTriplets,elementCd, ordinal, heightDepth, duration, getFlags, beginDate,endDate,]

result = client.service.getData(query)
result