# Using .df() to create dataframes
When you make a request for data from the NWIS, you'll often want every parameter recorded at the site, plus all of the data quality flags. Later, however, when it comes time to graph your data, or to check the data quality, you'll only want a subset of your data, not the entire dataset.

The NWIS class has the ability to create a dataframe that only contains the data that you want. Simply use the .df() method with a few arguments to select only the information that you want.

In [1]:
import hydrofunctions as hf
print(hf.__version__)
%matplotlib inline

0.1.8dev


Request all of the data from two sites and describe the dataset:

In [2]:
test = hf.NWIS(['01589317', '01589330'], 'iv', period='P10D')
test

Requested data from https://waterservices.usgs.gov/nwis/iv/?format=json%2C1.1&sites=01589317%2C01589330&period=P10D


USGS:01589317: TRIBUTARY TO DEAD RUN TRIBUTARY AT WOODLAWN, MD
    00060: <5 * Minutes>  Discharge, cubic feet per second
    00065: <5 * Minutes>  Gage height, feet
USGS:01589330: DEAD RUN AT FRANKLINTOWN, MD
    00060: <5 * Minutes>  Discharge, cubic feet per second
    00065: <5 * Minutes>  Gage height, feet
Start: 2019-07-14 03:15:00+00:00
End:   2019-07-24 02:40:00+00:00

Use the .df() method to create a dataframe that contains everything we requested.

...but we'll use .head() to only view the first five rows.

In [3]:
test.df().head()

Unnamed: 0_level_0,USGS:01589317:00060:00000_qualifiers,USGS:01589317:00060:00000,USGS:01589317:00065:00000_qualifiers,USGS:01589317:00065:00000,USGS:01589330:00060:00000_qualifiers,USGS:01589330:00060:00000,USGS:01589330:00065:00000_qualifiers,USGS:01589330:00065:00000
datetimeUTC,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
2019-07-14 03:15:00+00:00,P,0.21,P,0.46,P,1.96,P,0.51
2019-07-14 03:20:00+00:00,P,0.21,P,0.46,P,1.8,P,0.5
2019-07-14 03:25:00+00:00,P,0.21,P,0.46,P,1.8,P,0.5
2019-07-14 03:30:00+00:00,P,0.21,P,0.46,P,1.8,P,0.5
2019-07-14 03:35:00+00:00,P,0.21,P,0.46,P,1.8,P,0.5


You can create a list of the column names too.

In [4]:
test.df().columns

Index(['USGS:01589317:00060:00000_qualifiers', 'USGS:01589317:00060:00000',
       'USGS:01589317:00065:00000_qualifiers', 'USGS:01589317:00065:00000',
       'USGS:01589330:00060:00000_qualifiers', 'USGS:01589330:00060:00000',
       'USGS:01589330:00065:00000_qualifiers', 'USGS:01589330:00065:00000'],
      dtype='object')

Now, let's select only the columns for site 01589317.
If we don't specify that we want the qualifier flags, hydrofunctions will assume you just want the data.

In [5]:
test.df('01589317').head()

Unnamed: 0_level_0,USGS:01589317:00060:00000,USGS:01589317:00065:00000
datetimeUTC,Unnamed: 1_level_1,Unnamed: 2_level_1
2019-07-14 03:15:00+00:00,0.21,0.46
2019-07-14 03:20:00+00:00,0.21,0.46
2019-07-14 03:25:00+00:00,0.21,0.46
2019-07-14 03:30:00+00:00,0.21,0.46
2019-07-14 03:35:00+00:00,0.21,0.46


You can ask for the qualifier flags too, if you just want to do a quality control check.

The flag 'P' means the data are provisional. The USGS may revise these data after conducting a site visit.
A more complete listing of qualifier flags can be found here: https://waterdata.usgs.gov/usa/nwis/uv?codes_help#dv_cd1

In [6]:
test.df('01589330', 'flags').head()

Unnamed: 0_level_0,USGS:01589330:00060:00000_qualifiers,USGS:01589330:00065:00000_qualifiers
datetimeUTC,Unnamed: 1_level_1,Unnamed: 2_level_1
2019-07-14 03:15:00+00:00,P,P
2019-07-14 03:20:00+00:00,P,P
2019-07-14 03:25:00+00:00,P,P
2019-07-14 03:30:00+00:00,P,P
2019-07-14 03:35:00+00:00,P,P


You can select the parameter you want using the parameter code ('00060' is discharge), 
but some codes also have an alias to make things easier to remember.

In [7]:
test.df('discharge').head()  # 'q' is another alias for discharge; 'stage' is for '00065'

Unnamed: 0_level_0,USGS:01589317:00060:00000,USGS:01589330:00060:00000
datetimeUTC,Unnamed: 1_level_1,Unnamed: 2_level_1
2019-07-14 03:15:00+00:00,0.21,1.96
2019-07-14 03:20:00+00:00,0.21,1.8
2019-07-14 03:25:00+00:00,0.21,1.8
2019-07-14 03:30:00+00:00,0.21,1.8
2019-07-14 03:35:00+00:00,0.21,1.8


The previous example selected discharge data at both sites in the dataset;
you can combine your requests in any order to get just the columns you want.
For example, the stage data at a single site would be:  `.df('01589317', 'stage')`

This will list only the qualifier flags for discharge at the site 01589317:

In [8]:
test.df('q', 'flags', '01589317').head()

Unnamed: 0_level_0,USGS:01589317:00060:00000_qualifiers
datetimeUTC,Unnamed: 1_level_1
2019-07-14 03:15:00+00:00,P
2019-07-14 03:20:00+00:00,P
2019-07-14 03:25:00+00:00,P
2019-07-14 03:30:00+00:00,P
2019-07-14 03:35:00+00:00,P


Hydrofunctions will let you know if you request something you don't have.

In [9]:
test.df('01580000')

ValueError: The site '01580000' is not in this dataset.

Let's use our data for something interesting!

In [None]:
test.df('q').plot()

In [None]:
test.df('stage').plot()