# Data analysis with pandas

> Objectives:
> * Be able to load and save data using pandas
> * Be able to access columns, rows, and elements in DataFrames and Series objects
> * Be able to perform aggregate computations across different variables

[pandas](http://pandas.pydata.org/) is a Python library that provides tools for processing and manipulating data.

Typically, you will see pandas imported as "`pd`", which is shorter and therefore easier to type than the full name `pandas`:

In [1]:
import pandas as pd

## Loading data

First and foremost, pandas gives us a really convenient way to read in data in CSV ("comma separated value") format. In this lesson, we have two CSV files containing information about precipitation in California. The first file we'll take a look at is `precip_monthly.csv`, which contains monthly aggregate data:

In [2]:
!head precip_monthly.csv

region,subregion,station,abbreviation,elevation,month,precip,avg precip,pct of avg,year,date
NORTH COAST,SMITH RIVER,Gasquet Ranger Station,GAS,384,Oct,5.55,7.53,74.0,1987,1987-10-01
NORTH COAST,SMITH RIVER,Gasquet Ranger Station,GAS,384,Nov,8.21,14.14,58.0,1987,1987-11-01
NORTH COAST,SMITH RIVER,Gasquet Ranger Station,GAS,384,Dec,7.53,16.37,46.0,1987,1987-12-01
NORTH COAST,SMITH RIVER,Gasquet Ranger Station,GAS,384,Jan,14.73,16.45,90.0,1987,1987-01-01
NORTH COAST,SMITH RIVER,Gasquet Ranger Station,GAS,384,Feb,8.65,11.95,72.0,1987,1987-02-01
NORTH COAST,SMITH RIVER,Gasquet Ranger Station,GAS,384,Mar,15.05,11.08,136.0,1987,1987-03-01
NORTH COAST,SMITH RIVER,Gasquet Ranger Station,GAS,384,Apr,1.06,6.47,16.0,1987,1987-04-01
NORTH COAST,SMITH RIVER,Gasquet Ranger Station,GAS,384,May,2.81,4.43,63.0,1987,1987-05-01
NORTH COAST,SMITH RIVER,Gasquet Ranger Station,GAS,384,Jun,0.68,0.83,82.0,1987,1987-06-01


To load it, we call `read_csv` function, and pandas automatically figures out how to read the file for us:

In [3]:
monthly = pd.read_csv("precip_monthly.csv")
monthly

Unnamed: 0,region,subregion,station,abbreviation,elevation,month,precip,avg precip,pct of avg,year,date
0,NORTH COAST,SMITH RIVER,Gasquet Ranger Station,GAS,384,Oct,5.55,7.53,74,1987,1987-10-01
1,NORTH COAST,SMITH RIVER,Gasquet Ranger Station,GAS,384,Nov,8.21,14.14,58,1987,1987-11-01
2,NORTH COAST,SMITH RIVER,Gasquet Ranger Station,GAS,384,Dec,7.53,16.37,46,1987,1987-12-01
3,NORTH COAST,SMITH RIVER,Gasquet Ranger Station,GAS,384,Jan,14.73,16.45,90,1987,1987-01-01
4,NORTH COAST,SMITH RIVER,Gasquet Ranger Station,GAS,384,Feb,8.65,11.95,72,1987,1987-02-01
5,NORTH COAST,SMITH RIVER,Gasquet Ranger Station,GAS,384,Mar,15.05,11.08,136,1987,1987-03-01
6,NORTH COAST,SMITH RIVER,Gasquet Ranger Station,GAS,384,Apr,1.06,6.47,16,1987,1987-04-01
7,NORTH COAST,SMITH RIVER,Gasquet Ranger Station,GAS,384,May,2.81,4.43,63,1987,1987-05-01
8,NORTH COAST,SMITH RIVER,Gasquet Ranger Station,GAS,384,Jun,0.68,0.83,82,1987,1987-06-01
9,NORTH COAST,SMITH RIVER,Gasquet Ranger Station,GAS,384,Jul,0.28,0.56,50,1987,1987-07-01


The type of object that is returned is called a "data frame", and is one of two fundamental data types that pandas uses:

In [4]:
type(monthly)

pandas.core.frame.DataFrame

A data frame is essentially a table which has labeled rows and columns. By default when reading in a csv file, pandas will create a numerical index for the rows, and will use the first row of the CSV as the column names. However, having a numerical index isn't necessarily what we want. In this case, it might be more useful for us to have the rows correspond to different stations and the dates the data was recorded. To do this, we can use the `set_index` method of the data frame object:

In [5]:
monthly = pd.read_csv("precip_monthly.csv")
monthly = monthly.set_index(['station', 'date'])
monthly

Unnamed: 0_level_0,Unnamed: 1_level_0,region,subregion,abbreviation,elevation,month,precip,avg precip,pct of avg,year
station,date,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
Gasquet Ranger Station,1987-10-01,NORTH COAST,SMITH RIVER,GAS,384,Oct,5.55,7.53,74,1987
Gasquet Ranger Station,1987-11-01,NORTH COAST,SMITH RIVER,GAS,384,Nov,8.21,14.14,58,1987
Gasquet Ranger Station,1987-12-01,NORTH COAST,SMITH RIVER,GAS,384,Dec,7.53,16.37,46,1987
Gasquet Ranger Station,1987-01-01,NORTH COAST,SMITH RIVER,GAS,384,Jan,14.73,16.45,90,1987
Gasquet Ranger Station,1987-02-01,NORTH COAST,SMITH RIVER,GAS,384,Feb,8.65,11.95,72,1987
Gasquet Ranger Station,1987-03-01,NORTH COAST,SMITH RIVER,GAS,384,Mar,15.05,11.08,136,1987
Gasquet Ranger Station,1987-04-01,NORTH COAST,SMITH RIVER,GAS,384,Apr,1.06,6.47,16,1987
Gasquet Ranger Station,1987-05-01,NORTH COAST,SMITH RIVER,GAS,384,May,2.81,4.43,63,1987
Gasquet Ranger Station,1987-06-01,NORTH COAST,SMITH RIVER,GAS,384,Jun,0.68,0.83,82,1987
Gasquet Ranger Station,1987-07-01,NORTH COAST,SMITH RIVER,GAS,384,Jul,0.28,0.56,50,1987


## Accessing rows, columns, and elements

To access a column in a DataFrame, we index into the DataFrame as if it were a dictionary. For example, to get just the precipitation for each station and date:

In [6]:
monthly['precip']

station                 date      
Gasquet Ranger Station  1987-10-01     5.55
                        1987-11-01     8.21
                        1987-12-01     7.53
                        1987-01-01    14.73
                        1987-02-01     8.65
                        1987-03-01    15.05
                        1987-04-01     1.06
                        1987-05-01     2.81
                        1987-06-01     0.68
                        1987-07-01     0.28
                        1987-08-01     0.00
                        1987-09-01     0.05
Crescent City 1 N       1987-10-01     3.76
                        1987-11-01     8.54
                        1987-12-01     5.09
                        1987-01-01    10.78
                        1987-02-01     5.64
                        1987-03-01    12.02
                        1987-04-01     1.75
                        1987-05-01     1.22
                        1987-06-01     0.09
                        1987-07-01     0.

The type of object that is returned is a `Series` object, which is the 1D equivalent of a DataFrame. We can further index into this Series object, for example, to get the precipitation for one particular station:

In [7]:
monthly['precip']['San Jose']

date
1987-10-01    0.08
1987-11-01    0.17
1987-12-01    0.85
1987-01-01    1.60
1987-02-01    2.10
1987-03-01    1.84
1987-04-01    0.14
1987-05-01    0.00
1987-06-01    0.00
1987-07-01    0.00
1987-08-01    0.00
1987-09-01    0.00
1988-10-01    0.93
1988-11-01    1.65
1988-12-01    3.31
1988-01-01    2.08
1988-02-01    0.62
1988-03-01    0.06
1988-04-01    1.82
1988-05-01    0.66
1988-06-01    0.01
1988-07-01    0.00
1988-08-01    0.00
1988-09-01    0.00
1989-10-01    0.06
1989-11-01    1.42
1989-12-01    2.14
1989-01-01    1.06
1989-02-01    1.07
1989-03-01    1.91
              ... 
1998-04-01     NaN
1998-05-01     NaN
1998-06-01     NaN
1998-07-01     NaN
1998-08-01     NaN
1998-09-01     NaN
1999-10-01     NaN
1999-11-01     NaN
1999-12-01     NaN
1999-01-01     NaN
1999-02-01     NaN
1999-03-01     NaN
1999-04-01     NaN
1999-05-01     NaN
1999-06-01     NaN
1999-07-01     NaN
1999-08-01     NaN
1999-09-01     NaN
2000-10-01     NaN
2000-11-01     NaN
2000-12-01     NaN
2000-01

If we want to access the data the other way around -- i.e., access the row(s) first, and then the `'precip'` column -- we need to index slightly differently, using the `.loc` attribute:

In [8]:
monthly.loc['San Jose']

Unnamed: 0_level_0,region,subregion,abbreviation,elevation,month,precip,avg precip,pct of avg,year
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
1987-10-01,SAN FRANCISCO BAY,SOUTH SF BAY AREA,SNJ,67,Oct,0.08,0.69,12,1987
1987-11-01,SAN FRANCISCO BAY,SOUTH SF BAY AREA,SNJ,67,Nov,0.17,1.45,12,1987
1987-12-01,SAN FRANCISCO BAY,SOUTH SF BAY AREA,SNJ,67,Dec,0.85,2.46,35,1987
1987-01-01,SAN FRANCISCO BAY,SOUTH SF BAY AREA,SNJ,67,Jan,1.60,2.79,57,1987
1987-02-01,SAN FRANCISCO BAY,SOUTH SF BAY AREA,SNJ,67,Feb,2.10,2.38,88,1987
1987-03-01,SAN FRANCISCO BAY,SOUTH SF BAY AREA,SNJ,67,Mar,1.84,2.03,91,1987
1987-04-01,SAN FRANCISCO BAY,SOUTH SF BAY AREA,SNJ,67,Apr,0.14,1.14,12,1987
1987-05-01,SAN FRANCISCO BAY,SOUTH SF BAY AREA,SNJ,67,May,0.00,0.36,0,1987
1987-06-01,SAN FRANCISCO BAY,SOUTH SF BAY AREA,SNJ,67,Jun,0.00,0.06,0,1987
1987-07-01,SAN FRANCISCO BAY,SOUTH SF BAY AREA,SNJ,67,Jul,0.00,0.04,0,1987


This returns another DataFrame, which we can then index as we saw earlier:

In [9]:
monthly.loc['San Jose']['precip']

date
1987-10-01    0.08
1987-11-01    0.17
1987-12-01    0.85
1987-01-01    1.60
1987-02-01    2.10
1987-03-01    1.84
1987-04-01    0.14
1987-05-01    0.00
1987-06-01    0.00
1987-07-01    0.00
1987-08-01    0.00
1987-09-01    0.00
1988-10-01    0.93
1988-11-01    1.65
1988-12-01    3.31
1988-01-01    2.08
1988-02-01    0.62
1988-03-01    0.06
1988-04-01    1.82
1988-05-01    0.66
1988-06-01    0.01
1988-07-01    0.00
1988-08-01    0.00
1988-09-01    0.00
1989-10-01    0.06
1989-11-01    1.42
1989-12-01    2.14
1989-01-01    1.06
1989-02-01    1.07
1989-03-01    1.91
              ... 
1998-04-01     NaN
1998-05-01     NaN
1998-06-01     NaN
1998-07-01     NaN
1998-08-01     NaN
1998-09-01     NaN
1999-10-01     NaN
1999-11-01     NaN
1999-12-01     NaN
1999-01-01     NaN
1999-02-01     NaN
1999-03-01     NaN
1999-04-01     NaN
1999-05-01     NaN
1999-06-01     NaN
1999-07-01     NaN
1999-08-01     NaN
1999-09-01     NaN
2000-10-01     NaN
2000-11-01     NaN
2000-12-01     NaN
2000-01

To summarize:

In [10]:
# column indexing --> Series
monthly['precip']

# column, then row indexing --> Series or element
monthly['precip']['San Jose']

# row indexing --> DataFrame or Series
monthly.loc['San Jose']

# row, then column indexing --> Series or element
monthly.loc['San Jose']['precip']

date
1987-10-01    0.08
1987-11-01    0.17
1987-12-01    0.85
1987-01-01    1.60
1987-02-01    2.10
1987-03-01    1.84
1987-04-01    0.14
1987-05-01    0.00
1987-06-01    0.00
1987-07-01    0.00
1987-08-01    0.00
1987-09-01    0.00
1988-10-01    0.93
1988-11-01    1.65
1988-12-01    3.31
1988-01-01    2.08
1988-02-01    0.62
1988-03-01    0.06
1988-04-01    1.82
1988-05-01    0.66
1988-06-01    0.01
1988-07-01    0.00
1988-08-01    0.00
1988-09-01    0.00
1989-10-01    0.06
1989-11-01    1.42
1989-12-01    2.14
1989-01-01    1.06
1989-02-01    1.07
1989-03-01    1.91
              ... 
1998-04-01     NaN
1998-05-01     NaN
1998-06-01     NaN
1998-07-01     NaN
1998-08-01     NaN
1998-09-01     NaN
1999-10-01     NaN
1999-11-01     NaN
1999-12-01     NaN
1999-01-01     NaN
1999-02-01     NaN
1999-03-01     NaN
1999-04-01     NaN
1999-05-01     NaN
1999-06-01     NaN
1999-07-01     NaN
1999-08-01     NaN
1999-09-01     NaN
2000-10-01     NaN
2000-11-01     NaN
2000-12-01     NaN
2000-01

## Saving data

Let's say we want to save out just the San Jose data. We can do this using the `.to_csv()` method of the DataFrame:

In [11]:
monthly.loc['San Jose'].to_csv('san_jose.csv')

In [12]:
!head san_jose.csv

date,region,subregion,abbreviation,elevation,month,precip,avg precip,pct of avg,year
1987-10-01,SAN FRANCISCO BAY,SOUTH SF BAY AREA,SNJ,67,Oct,0.08,0.69,12.0,1987
1987-11-01,SAN FRANCISCO BAY,SOUTH SF BAY AREA,SNJ,67,Nov,0.17,1.45,12.0,1987
1987-12-01,SAN FRANCISCO BAY,SOUTH SF BAY AREA,SNJ,67,Dec,0.85,2.46,35.0,1987
1987-01-01,SAN FRANCISCO BAY,SOUTH SF BAY AREA,SNJ,67,Jan,1.6,2.79,57.0,1987
1987-02-01,SAN FRANCISCO BAY,SOUTH SF BAY AREA,SNJ,67,Feb,2.1,2.38,88.0,1987
1987-03-01,SAN FRANCISCO BAY,SOUTH SF BAY AREA,SNJ,67,Mar,1.84,2.03,91.0,1987
1987-04-01,SAN FRANCISCO BAY,SOUTH SF BAY AREA,SNJ,67,Apr,0.14,1.14,12.0,1987
1987-05-01,SAN FRANCISCO BAY,SOUTH SF BAY AREA,SNJ,67,May,0.0,0.36,0.0,1987
1987-06-01,SAN FRANCISCO BAY,SOUTH SF BAY AREA,SNJ,67,Jun,0.0,0.06,0.0,1987


### Exercise

Write code to load the "monthly" data and then pull out only the data corresponding to the month of July. Save the resulting DataFrame as a CSV file called `'july_precipitation.csv'`.

In [13]:
pd.read_csv("precip_monthly.csv").set_index('month').loc['Jul'].to_csv('july_precipitation.csv')

After you're done with the exercise, take a look at your `'july_precipitation.csv'` file to see the results!

In [14]:
!head july_precipitation.csv

month,region,subregion,station,abbreviation,elevation,precip,avg precip,pct of avg,year,date
Jul,NORTH COAST,SMITH RIVER,Gasquet Ranger Station,GAS,384,0.28,0.56,50.0,1987,1987-07-01
Jul,NORTH COAST,SMITH RIVER,Crescent City 1 N,CCC,40,0.32,0.48,67.0,1987,1987-07-01
Jul,NORTH COAST,KLAMATH RIVER,Tule Lake,TLL,4035,2.6,0.22,1182.0,1987,1987-07-01
Jul,NORTH COAST,KLAMATH RIVER,Callahan,CAL,3185,1.02,0.18,567.0,1987,1987-07-01
Jul,NORTH COAST,KLAMATH RIVER,Fort Jones RS,FJN,2725,0.55,0.36,153.0,1987,1987-07-01
Jul,NORTH COAST,KLAMATH RIVER,Yreka,YRK,2625,1.44,0.3,480.0,1987,1987-07-01
Jul,NORTH COAST,KLAMATH RIVER,Happy Camp RS,HAP,1120,0.72,0.32,225.0,1987,1987-07-01
Jul,NORTH COAST,KLAMATH RIVER,Klamath River At Orleans,OLS,430,0.02,0.14,14.0,1987,1987-07-01
Jul,NORTH COAST,TRINITY RIVER,Coffee Creek Ranger Station,CFF,4400,0.9,1.27,71.0,1987,1987-07-01


## Aggregate computations

One of the really powerful operations that pandas can do involves splitting the data up into particular groups, performing some operation on each group, and then recombining the results.

For example, how would we compute the total precipitation per year for each station? To do this, we want to:

* split the data into groups, where there each group corresponds to one station and year
* sum across the precipitation values for each group
* recombine the resulting sums

In pandas, this is really easy! First, we'll want to "reset" the index of our DataFrame so that everything is in columns:

In [15]:
monthly = monthly.reset_index()
monthly

Unnamed: 0,station,date,region,subregion,abbreviation,elevation,month,precip,avg precip,pct of avg,year
0,Gasquet Ranger Station,1987-10-01,NORTH COAST,SMITH RIVER,GAS,384,Oct,5.55,7.53,74,1987
1,Gasquet Ranger Station,1987-11-01,NORTH COAST,SMITH RIVER,GAS,384,Nov,8.21,14.14,58,1987
2,Gasquet Ranger Station,1987-12-01,NORTH COAST,SMITH RIVER,GAS,384,Dec,7.53,16.37,46,1987
3,Gasquet Ranger Station,1987-01-01,NORTH COAST,SMITH RIVER,GAS,384,Jan,14.73,16.45,90,1987
4,Gasquet Ranger Station,1987-02-01,NORTH COAST,SMITH RIVER,GAS,384,Feb,8.65,11.95,72,1987
5,Gasquet Ranger Station,1987-03-01,NORTH COAST,SMITH RIVER,GAS,384,Mar,15.05,11.08,136,1987
6,Gasquet Ranger Station,1987-04-01,NORTH COAST,SMITH RIVER,GAS,384,Apr,1.06,6.47,16,1987
7,Gasquet Ranger Station,1987-05-01,NORTH COAST,SMITH RIVER,GAS,384,May,2.81,4.43,63,1987
8,Gasquet Ranger Station,1987-06-01,NORTH COAST,SMITH RIVER,GAS,384,Jun,0.68,0.83,82,1987
9,Gasquet Ranger Station,1987-07-01,NORTH COAST,SMITH RIVER,GAS,384,Jul,0.28,0.56,50,1987


Now, we use the `groupby` command to specify which columns should be used to form the groups:

In [16]:
monthly.groupby(['station', 'year'])

<pandas.core.groupby.DataFrameGroupBy object at 0x107f0f630>

We can index into this "groupby" object just like a DataFrame, and select only the precipitation data:

In [17]:
monthly.groupby(['station', 'year'])['precip']

<pandas.core.groupby.SeriesGroupBy object at 0x107f0f908>

Finally, we can calculate summary statistics on these groups. For example, a sum of the precipitation each year for each station:

In [18]:
monthly.groupby(['station', 'year'])['precip'].sum()

station                year
Adin RS                1987    10.33
                       1988     8.31
                       1989    14.44
                       1990    13.50
                       1991    12.73
                       1992     7.95
                       1993    20.62
                       1994    10.81
                       1995    22.27
                       1996    22.74
                       1997    18.59
                       1998    27.50
                       1999    16.38
                       2000    14.58
                       2001     7.59
                       2002    11.68
                       2003    13.95
                       2004    13.68
                       2005    16.55
                       2006    20.99
                       2007     9.71
                       2008    10.26
                       2009    10.32
                       2010    13.20
                       2011    16.40
                       2012     9.95
          

The result is a Series object that has as its index labels for the stations and years. The values of the Series objects are the total precipitation for the corresponding station and year.

Similar computations follow the same basic recipe. For example, to compute the *average* precipitation per *month*:

In [19]:
monthly.groupby(['station', 'month'])['precip'].mean()

station                   month
Adin RS                   Apr      1.440714
                          Aug      0.233462
                          Dec      1.961786
                          Feb      1.431429
                          Jan      1.790714
                          Jul      0.257778
                          Jun      0.876538
                          Mar      1.765714
                          May      1.635000
                          Nov      1.546429
                          Oct      0.958929
                          Sep      0.390417
Alturas RS                Apr      1.284400
                          Aug      0.343636
                          Dec      1.356400
                          Feb      0.968462
                          Jan      1.222000
                          Jul      0.271364
                          Jun      0.769565
                          Mar      1.416800
                          May      1.479565
                          Nov      1.388800


The `.apply()` function of the groupby object is incredibly powerful, and allows us to perform even more complex computations. We can use the `.apply()` function to perform *any* computation we can write a function for! For example, if we wanted to compute the mean and standard deviation of the precipitation in one go:

In [20]:
def stats(data):
    return pd.Series(
        [data.mean(), data.std()],     # compute the mean and standard deviation of one particular group
        index=['mean', 'stddev'],      # label the computed statistics
        name=data.name                 # give a name to the result, so pandas knows how to put everything
                                       # back together
    )

monthly.groupby(['station', 'month'])['precip'].apply(stats)

station                month        
Adin RS                Apr    mean      1.440714
                              stddev    0.866128
                       Aug    mean      0.233462
                              stddev    0.334281
                       Dec    mean      1.961786
                              stddev    1.650423
                       Feb    mean      1.431429
                              stddev    0.952430
                       Jan    mean      1.790714
                              stddev    1.508683
                       Jul    mean      0.257778
                              stddev    0.421511
                       Jun    mean      0.876538
                              stddev    0.906505
                       Mar    mean      1.765714
                              stddev    0.919176
                       May    mean      1.635000
                              stddev    1.376637
                       Nov    mean      1.546429
                              st

If we want to make these statistics (`mean` and `stddev`) correspond to columns, rather than an additional level in the index, we can use the `.unstack()` method:

In [21]:
monthly.groupby(['station', 'month'])['precip'].apply(stats).unstack()

Unnamed: 0_level_0,Unnamed: 1_level_0,mean,stddev
station,month,Unnamed: 2_level_1,Unnamed: 3_level_1
Adin RS,Apr,1.440714,0.866128
Adin RS,Aug,0.233462,0.334281
Adin RS,Dec,1.961786,1.650423
Adin RS,Feb,1.431429,0.952430
Adin RS,Jan,1.790714,1.508683
Adin RS,Jul,0.257778,0.421511
Adin RS,Jun,0.876538,0.906505
Adin RS,Mar,1.765714,0.919176
Adin RS,May,1.635000,1.376637
Adin RS,Nov,1.546429,1.035443


### Exercise

Use the `.groupby()` method to compute the average yearly precipitation for each region, and modify the resulting DataFrame so that the rows correspond to regions, and the columns correspond to years. Store the result in a variable called `region_yearly_precip`.

In [22]:
region_yearly_precip = monthly.groupby(['region', 'year'])['precip'].mean().unstack()
region_yearly_precip

year,1987,1988,1989,1990,1991,1992,1993,1994,1995,1996,...,2005,2006,2007,2008,2009,2010,2011,2012,2013,2014
region,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
CENTRAL COAST,1.090809,1.430694,1.226241,0.946806,1.624861,1.895278,2.766944,1.475662,3.38305,1.844792,...,3.258615,2.406429,0.949231,1.693058,1.36916,2.365,2.56456,1.354375,1.19732,1.005435
COLORADO RIVER,0.2475,0.455417,0.11625,0.226458,0.312708,0.548333,0.576875,0.226667,0.415625,0.134,...,0.842576,0.19,0.070833,0.303621,0.203333,0.375606,0.318143,0.216818,0.253881,0.253437
NORTH COAST,2.681029,2.919242,3.560303,2.782821,2.341033,2.552572,4.545688,2.378487,5.36663,4.621439,...,3.611474,5.257094,3.222577,3.370113,2.681959,3.641623,4.043207,3.080824,3.149342,2.078874
NORTH LAHONTAN,0.891042,1.056458,2.075486,1.415625,1.318645,1.128065,2.462727,1.062968,3.151169,2.511346,...,1.992429,2.614681,1.136143,1.238293,1.431469,1.585407,2.65597,1.276615,1.45896,1.24203
SACRAMENTO RIVER,1.816042,2.332746,3.307723,2.405227,2.308409,2.385038,4.285682,2.056565,5.668232,4.115,...,4.032883,5.373669,2.503723,2.456358,3.018444,3.626492,4.756993,2.840516,3.060113,2.315586
SAN FRANCISCO BAY,1.335,1.744444,1.884028,1.48631,1.592619,1.889167,3.140723,1.610139,4.041316,3.521528,...,3.398571,3.98662,1.709583,2.105286,2.247361,2.966571,3.508485,2.098788,2.073333,1.726615
SAN JOAQUIN RIVER,1.229135,1.492066,1.860304,1.587179,1.726571,1.819453,3.200962,1.539708,3.945577,2.657724,...,3.52417,3.322664,1.442214,1.703564,2.050106,2.608949,3.596989,1.577774,1.662043,1.270935
SOUTH COAST,0.819286,1.517584,0.976944,0.892629,1.506389,1.809167,3.038222,1.116477,2.684722,1.182333,...,3.076452,1.142148,0.496467,1.309571,0.930897,1.654774,2.099871,1.057171,0.793791,0.707103
SOUTH LAHONTAN,0.637162,1.046987,0.851603,0.762308,0.956026,1.142692,1.756154,0.707226,1.822821,1.075641,...,1.815068,0.967333,0.26944,0.776379,0.606027,1.052763,1.182994,0.561203,0.445235,0.611299
TULARE LAKE,1.239474,1.483174,1.580833,1.118742,1.647933,1.43551,2.497785,1.604747,3.101541,2.006,...,2.482179,2.493932,1.101446,1.588042,1.553212,2.201125,3.067315,1.515292,1.229172,1.064625
