# Lab 10 - Pruning Lakes

Recall that one of the files (starts with `mces`) contains water quality measurements for lakes in the Twin Cities.  In this lab, we will narrow down the list of lakes for which we have at least one of each measurement type (phosphorus and secchi depth) for each year between 2004 and 2015.

## Tasks

Build a query that leads to a list of lake names and codes that fit the following criteria.

1. Only contains years after 2003.
2. Only contains lakes that have at least one non-null measurement of each type in each year.
3. Contains both the lake name and the lake code.


## Suggested workflow

1. filter and mutate as needed.
2. group and aggregate (hint: You will need to do this twice).
3. filter on the number of observations per year (we want 11, one for each year between 2004-2014).

In [17]:
import pandas as pd
from dfply import *

In [23]:
date_cols = ['START_DATE', 'END_DATE']
lakes = pd.read_csv('./data/MinneMUDAC_raw_files/mces_lakes_1999_2014.txt', '\t', parse_dates=date_cols)

  interactivity=interactivity, compiler=compiler, result=result)


In [24]:
lakes.head()

Unnamed: 0,PROJECT_ID,DATA_SET_TITLE,LAKE_NAME,CITY,COUNTY,DNR_ID_Site_Number,MAJOR_WATERSHED,WATER_PLANNING_AUTHORITY,LAKE_SITE_NUMBER,START_DATE,...,Secchi_Depth_RESULT_SIGN,Secchi_Depth_RESULT,Secchi_Depth_QUALIFIER,Secchi_Depth_Units,Total_Phosphorus_RESULT_SIGN,Total_Phosphorus_RESULT,Total_Phosphorus_QUALIFIER,Total_Phosphorus_Units,longitude,latitude
0,7108,Citizen Assisted Monitoring Program (CAMP) for...,Acorn Lake,Oakdale,Washington,82010200-01,Lower St. Croix River,Valley Branch WD,1,2006-04-16,...,,1.0,Approved,m,,0.156,Approved,mg/L,-92.971711,45.016556
1,7108,Citizen Assisted Monitoring Program (CAMP) for...,Acorn Lake,Oakdale,Washington,82010200-01,Lower St. Croix River,Valley Branch WD,1,2006-05-01,...,,,,m,,,,mg/L,-92.971711,45.016556
2,7108,Citizen Assisted Monitoring Program (CAMP) for...,Acorn Lake,Oakdale,Washington,82010200-01,Lower St. Croix River,Valley Branch WD,1,2006-05-02,...,,0.66,Approved,m,,0.107,Approved,mg/L,-92.971711,45.016556
3,7108,Citizen Assisted Monitoring Program (CAMP) for...,Acorn Lake,Oakdale,Washington,82010200-01,Lower St. Croix River,Valley Branch WD,1,2006-05-16,...,,0.66,Approved,m,,0.141,Approved,mg/L,-92.971711,45.016556
4,7108,Citizen Assisted Monitoring Program (CAMP) for...,Acorn Lake,Oakdale,Washington,82010200-01,Lower St. Croix River,Valley Branch WD,1,2006-05-30,...,,0.5,Approved,m,,0.029,Approved,mg/L,-92.971711,45.016556


In [15]:
lakes.columns

Index(['PROJECT_ID', 'DATA_SET_TITLE', 'LAKE_NAME', 'CITY', 'COUNTY',
       'DNR_ID_Site_Number', 'MAJOR_WATERSHED', 'WATER_PLANNING_AUTHORITY',
       'LAKE_SITE_NUMBER', 'START_DATE', 'START_HOURMIN24', 'END_DATE',
       'END_HOURMIN24', 'SAMPLE_DEPTH_IN_METERS', 'Seasonal_Lake_Grade_RESULT',
       'Seasonal_Lake_Grade_QUALIFIER', 'Seasonal_Lake_Grade_Units',
       'Physical_Condition_RESULT', 'Physical_Condition_QUALIFIER',
       'Physical_Condition_Units', 'Recreational_Suitability_RESULT',
       'Recreational_Suitability_QUALIFIER', 'Recreational_Suitability_Units',
       'Secchi_Depth_RESULT_SIGN', 'Secchi_Depth_RESULT',
       'Secchi_Depth_QUALIFIER', 'Secchi_Depth_Units',
       'Total_Phosphorus_RESULT_SIGN', 'Total_Phosphorus_RESULT',
       'Total_Phosphorus_QUALIFIER', 'Total_Phosphorus_Units', 'longitude',
       'latitude'],
      dtype='object')

In [37]:
lakes_w_non_empty_measurements = (lakes 
                                    >> select('LAKE_NAME', 
                                              'DNR_ID_Site_Number', 
                                              'Secchi_Depth_RESULT', 
                                              'Total_Phosphorus_RESULT', 
                                              'START_DATE')
                                    >> filter_by(pd.notna(lakes.Total_Phosphorus_RESULT) & pd.notna(lakes.Secchi_Depth_RESULT))
                                    >> mutate(Year = X.START_DATE.dt.year)
                                    >> drop(X.START_DATE)
                                    >> filter_by(X.Year >=2004)
                                 )

In [38]:
lakes_w_non_empty_measurements.head()

Unnamed: 0,LAKE_NAME,DNR_ID_Site_Number,Secchi_Depth_RESULT,Total_Phosphorus_RESULT,Year
0,Acorn Lake,82010200-01,1.0,0.156,2006
2,Acorn Lake,82010200-01,0.66,0.107,2006
3,Acorn Lake,82010200-01,0.66,0.141,2006
4,Acorn Lake,82010200-01,0.5,0.029,2006
5,Acorn Lake,82010200-01,0.5,0.058,2006


In [40]:
lakes_w_complete_measurement = (lakes_w_non_empty_measurements 
                                    >> group_by(X.LAKE_NAME, X.DNR_ID_Site_Number, X.Year)
                                    >> summarise(cnt = n(X.Year))
                                    >> ungroup
                                    >> group_by(X.LAKE_NAME, X.DNR_ID_Site_Number)
                                    >> summarise(cnt = n(X.LAKE_NAME))
                                    >> filter_by(X.cnt >=11)
                               )

In [60]:
lakes_w_complete_measurement.shape

(49, 3)

In [61]:
lakes_w_complete_measurement.to_csv('./data/MinneMUDAC_raw_files/lakes_w_complete_measurement.csv')

In [63]:
lakes_w_complete_measurement.LAKE_NAME

2          Alimagnet Lake
6          Armstrong Lake
9               Bass Lake
10           Bavaria Lake
16       Big Comfort Lake
20              Bone Lake
40       Cobblecrest Lake
43             Colby Lake
49           Crystal Lake
50     DeMontreville Lake
56             Eagle Lake
58            Earley Lake
59         East Boot Lake
67           Farquar Lake
70              Fish Lake
71            Forest Lake
77      George Watch Lake
81           Goggins Lake
84             Goose Lake
103             Jane Lake
108           Keller Lake
110           Kismet Lake
111        Klawitter Pond
113               La Lake
114        Lac Lavon Lake
118              Lee Lake
132             Long Lake
137      Lower Prior Lake
144           Marion Lake
145         Markgraf Lake
153         McKusick Lake
163         Mitchell Lake
168        Northwood Lake
173            Olson Lake
175          Orchard Lake
187        Pine Tree Lake
191           Powers Lake
194    Regional Park Lake
195         

In [43]:
lakes_w_complete_measurement.head()

Unnamed: 0,DNR_ID_Site_Number,LAKE_NAME,cnt
2,19002100-01,Alimagnet Lake,11
6,82011602-01,Armstrong Lake,11
9,82012300-01,Bass Lake,11
10,10001900-01,Bavaria Lake,11
16,13005300-01,Big Comfort Lake,11


In [52]:
lakes_by_year = (lakes_w_non_empty_measurements
                 >> filter_by(X.DNR_ID_Site_Number.isin(lakes_w_complete_measurement.DNR_ID_Site_Number))
                 >> group_by(X.DNR_ID_Site_Number, X.LAKE_NAME, X.Year)
                 >> summarise(mean_phos = X.Total_Phosphorus_RESULT.mean(),
                              med_phos = X.Total_Phosphorus_RESULT.median(),
                              sd_phos = X.Total_Phosphorus_RESULT.std(),
                              mean_secchi = X.Secchi_Depth_RESULT.mean(),
                              med_secchi = X.Secchi_Depth_RESULT.median(),
                              sd_secchi = X.Secchi_Depth_RESULT.std()
                             )
                 >> head
            )
lakes_by_year

Unnamed: 0,Year,LAKE_NAME,DNR_ID_Site_Number,mean_phos,med_phos,sd_phos,mean_secchi,med_secchi,sd_secchi
0,2004,George Watch Lake,02000500-01,0.199000,0.1440,0.127817,0.705000,0.875,0.363967
1,2005,George Watch Lake,02000500-01,0.210083,0.1465,0.166585,0.681667,0.690,0.402759
2,2006,George Watch Lake,02000500-01,0.164286,0.1750,0.077940,0.728571,0.700,0.292770
3,2007,George Watch Lake,02000500-01,0.203714,0.0990,0.179936,0.562857,0.500,0.352927
4,2008,George Watch Lake,02000500-01,0.148833,0.1335,0.074505,0.550000,0.500,0.236797
5,2009,George Watch Lake,02000500-01,0.105600,0.0960,0.047343,0.538000,0.600,0.153970
6,2010,George Watch Lake,02000500-01,0.173000,0.1700,0.077680,0.493333,0.550,0.247083
7,2011,George Watch Lake,02000500-01,0.119417,0.1115,0.050024,0.973333,1.030,0.295737
8,2012,George Watch Lake,02000500-01,0.264900,0.2120,0.155748,0.359000,0.260,0.235440
9,2013,George Watch Lake,02000500-01,0.310500,0.2845,0.225538,0.365000,0.225,0.372492


In [54]:
lakes_by_year.to_csv('./data/MinneMUDAC_raw_files/lakes_by_year.csv',index=False)

In [59]:
lakes_by_year['LAKE_NAME']

0      George Watch Lake
1      George Watch Lake
2      George Watch Lake
3      George Watch Lake
4      George Watch Lake
5      George Watch Lake
6      George Watch Lake
7      George Watch Lake
8      George Watch Lake
9      George Watch Lake
10     George Watch Lake
11            Riley Lake
12            Riley Lake
13            Riley Lake
14            Riley Lake
15            Riley Lake
16            Riley Lake
17            Riley Lake
18            Riley Lake
19            Riley Lake
20            Riley Lake
21            Riley Lake
22          St. Joe Lake
23          St. Joe Lake
24          St. Joe Lake
25          St. Joe Lake
26          St. Joe Lake
27          St. Joe Lake
28          St. Joe Lake
29          St. Joe Lake
             ...        
509          Forest Lake
510          Forest Lake
511          Forest Lake
512          Forest Lake
513          Forest Lake
514          Forest Lake
515          Forest Lake
516          Forest Lake
517          Kismet Lake
