Downloads State Water Use data for 2000/2005/2010 and creates a formatted Physical Water Use Supply table. 

Sample URL (Louisiana)
https://waterdata.usgs.gov/la/nwis/water_use?format=rdb&rdb_compression=value&wu_area=County&wu_year=2000%2C2005%2C2010&wu_county=ALL&wu_category=ALL&wu_county_nms=--ALL%2BCounties--&wu_category_nms=--ALL%2BCategories--

Workflow
* Construct the url and download the data into a pandas data frame
* Melt/gather the usage columns into row values under the column name 'Group'
* Remove rows with no usage data (identified by not having "Mgal" in the 'Group' name)
*

In [1]:
#Import modules
import sys, os, urllib
import pandas as pd
import numpy as np

In [2]:
#Specify the state and year to process
state = 'la' #Louisiana
year = 2010

In [3]:
#Set the output file location to the Data/State data folder
outFolder = '../../Data/Statedata/'
outFN = outFolder + os.sep + '{0}_{1}.csv'.format(state,year)

In [4]:
#Set the data URL path and parameters and construct the url
path = 'https://waterdata.usgs.gov/{}/nwis/water_use?'.format(state)
values = {'format':'rdb',
         'rdb_compression':'value',
         'wu_area':'County',
         'wu_year': year,
         'wu_county':'ALL',
         'wu_county_nms':'--ALL+Counties--',
         'wu_category_nms':'--ALL+Categories--'
        }
url = path + urllib.urlencode(values)

In [5]:
#Pull data in using the URL and remove the 2nd row of headers
dfRaw = pd.read_table(url,comment='#',header=[0,1],na_values='-')
dfRaw.columns = dfRaw.columns.droplevel(level=1)

In [None]:
#Read locally, for debugging, and drop the 2nd row of headers
#dfRaw = pd.read_table('../../Data/Proprietary/LA.txt',comment='#',header=[0,1],na_values='-')
#dfRaw.columns = dfRaw.columns.droplevel(level=1)
dfRaw.head()

In [9]:
#Tidy the data
rowHeadings = ['county_cd', 'county_nm', 'state_cd', 'state_name', 'year']
dfTidy = pd.melt(dfRaw,id_vars=rowHeadings,value_name='MGal',var_name='Group')
dfTidy.shape

(17664, 7)

In [10]:
#Remove rows that don't have volume data (i.e. keep only columns with 'Mgal' in the name)
dfTidy = dfTidy[dfTidy['Group'].str.contains('Mgal')]
dfTidy.shape

(15040, 7)

In [11]:
#Change the type of the MGal column to float
dfTidy['MGal'] = dfTidy.MGal.astype(np.float)
dfTidy['MGal'].sum()

50620.130000000005

In [12]:
dfTidy.head()

Unnamed: 0,county_cd,county_nm,state_cd,state_name,year,Group,MGal
256,1,Acadia Parish,22,Louisiana,2010,Public Supply self-supplied groundwater withdr...,5.82
257,3,Allen Parish,22,Louisiana,2010,Public Supply self-supplied groundwater withdr...,4.27
258,5,Ascension Parish,22,Louisiana,2010,Public Supply self-supplied groundwater withdr...,3.02
259,7,Assumption Parish,22,Louisiana,2010,Public Supply self-supplied groundwater withdr...,0.0
260,9,Avoyelles Parish,22,Louisiana,2010,Public Supply self-supplied groundwater withdr...,3.85


In [None]:
#Create lists
useClasses = ["Aquaculture",
              "Commercial",
              "Domestic",
              "Fossil-fuel Thermoelectric",
              "Geothermal Thermoelectric",
              "",
              "",
              "",
              
             ]


In [16]:
#Summarize 
dfState = dfTidy.groupby(['Group'])['MGal'].sum()
dfState.to_csv(outFN)

Group
Aquaculture consumptive use, fresh, in Mgal/d                               NaN
Aquaculture consumptive use, saline, in Mgal/d                              NaN
Aquaculture self-supplied groundwater withdrawals, fresh, in Mgal/d      197.39
Aquaculture self-supplied groundwater withdrawals, saline, in Mgal/d        NaN
Aquaculture self-supplied surface-water withdrawals, fresh, in Mgal/d    113.67
Name: MGal, dtype: float64