<a href="https://colab.research.google.com/github/mocorderos/Water_Impair_Iowa/blob/main/ImpairedWaterListFeb25.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#**About this data**

### **Summary of impaired waters data analysis**

The dataset includes the last five **impaired waters lists** published biennially by the **Iowa DNR** to comply with the **Clean Water Act** and approved by the **EPA**.

A **segment** is a continuous water body where water quality remains similar throughout. Each segment has **two or more Designated Uses** based on its function.

### **Key context and limitations**
- The **Iowa DNR monitors only half** of the registered water bodies in the state (**data to be confirmed**).
- The analysis covers **five cycles** because data from **2014 and earlier** is not comparable.

### **Criteria for impairment**
A segment is **impaired** if it fails to meet designated use standards due to:
- **Pollutants** (e.g., excess nutrients, chemicals)
- **Biological decline** (e.g., reduced fish or aquatic life diversity)
- **Recreational risks** (e.g., high **E. coli**, algal toxins)

---

# **Analysis of Impaired Waters (2016-2024)**  

This analysis examines the last five **Impaired Waters Lists** published by the Department of Natural Resources for the **2016, 2018, 2020, 2022, and 2024** assessment cycles. Over the past decade, these reports have identified **694 impaired water segments**, with:  

- **Rivers:** 79%  
- **Lakes:** 18%  
- **Wetlands:** 1%  
- **Reservoirs:** 0.7%  

---

## **Rivers**  

Among the **555 impaired river segments**, **79% (443 segments)** have remained impaired for at least **10 years**, consistently appearing in all five assessment cycles. This deterioration has persisted across the same or different designated uses of these segments. Notably, **61 river segments** have been impaired for at least **24 years**, with impairment traced back to the year they were first classified as deteriorated for a specific use.  

### **Key Findings:**  
- **51%** of the permanently impaired river segments are designated for **recreational use**, affecting areas where people **swim and water ski**.  
- **E. coli contamination** is the leading cause of pollution, affecting **51% of these segments**.  

### **Impact and Remediation Challenges:**  
- **More than half** of the permanently impaired river segments are classified as **low impact, low complexity/cost**.  
- **43%** fall into the **low impact but high complexity/cost** category, indicating that while the environmental risk may be lower, remediation remains a significant challenge.  

---

## **Lakes**  

Among the **127 impaired lake segments**, **42% (54 segments)** have remained impaired for at least **10 years**, consistently appearing in all five assessment cycles. This deterioration has persisted across the same or different designated uses of these segments. **Six lake segments** have been impaired for at least **24 years**, with impairment traced back to the year they were first classified as deteriorated for a specific use.  

### **Key Findings:**  
- **49%** of permanently impaired lake segments are designated for **recreational use**, primarily swimming and water skiing.  
- The leading causes of pollution in lakes are:  
  - **Algal Growth: Chlorophyll** – affecting **25% of impaired lake segments**  
  - **Bacteria: Indicator Bacteria (E. coli)** – affecting **16%**  

### **Impact and Remediation Challenges:**  
- **40%** of permanently impaired lake segments are classified as **low impact, high complexity/cost**.  
- **29%** are classified as **high impact, high complexity/cost**.  
- **25%** fall under **high impact, low complexity/cost**.  



#**Imports**

In [3]:
import pandas as pd

#**Connect with Drive**

In [4]:
#Connection to the drive
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [6]:
# Load the datasets
df = pd.read_csv("/content/drive/MyDrive/Water/ImpairedWater/impaired/allfivecycles.csv")
# df2 = pd.read_csv("/content/drive/MyDrive/Water/ImpairedWater/Delistings/combined.csv")

#**Impaired list dataset | Cleaning**

In [8]:
# Check columns and rows | Impaired
df.shape

(3958, 18)

In [9]:
# Check columns
df.head(1)

Unnamed: 0,AssessID,SegID,cycle,name,adbCode,type,size,status,use,support,impCode,impairment,listingRationale,dataSource,tmdlPriority,legacyAdbCode,cycleListed,impairmentStatus
0,2553,1,2016,Shrickers Slough,01-MAQ-1,Wetland,140.0,Final,BWW1,PS,5a,Algal Growth: Chlorophyll a,Adverse impacts on plant/animal communities,Ambient monitoring: Long-Term Resource Monito...,Tier IV,IA 01-MAQ-0005-L_0,2004,Continuing


In [10]:
# Check data type of each column
df.dtypes

Unnamed: 0,0
AssessID,int64
SegID,int64
cycle,int64
name,object
adbCode,object
type,object
size,float64
status,object
use,object
support,object


In [11]:
# Convert to lowercase, remove leading/trailing spaces
new_columns = df.columns.str.strip().str.lower()
df.columns = new_columns
# df.head(1)

In [12]:
# Display missing values
df.isnull().sum()

Unnamed: 0,0
assessid,0
segid,0
cycle,0
name,0
adbcode,0
type,0
size,0
status,0
use,0
support,0


In [13]:
# Fill missing values in 'tmdlpriority', 'datasource', and 'legacyadbcode' with 'Unknown'
df['tmdlpriority'] = df['tmdlpriority'].fillna('Unknown')
df['datasource'] = df['datasource'].fillna('Unknown')
df['legacyadbcode'] = df['legacyadbcode'].fillna('Unknown')
# df.isnull().sum()

In [14]:
# Convert 'cycle', 'segid', and 'size' to integer
df['cycle'] = df['cycle'].astype(int)
df['segid'] = df['segid'].astype(int)
df['size'] = df['size'].astype(int)
df.dtypes

Unnamed: 0,0
assessid,int64
segid,int64
cycle,int64
name,object
adbcode,object
type,object
size,int64
status,object
use,object
support,object


In [15]:
# Add two columns that provide descriptions and classifications for the "use" column.
designated_uses_map = {
    'A1': ('Recreational Uses', 'Swimming and water skiing'),
    'A2': ('Recreational Uses', 'Fishing and shoreline activities'),
    'A3': ('Recreational Uses', 'Wading or playing in the water'),
    'BWW1': ('Aquatic Life Uses', 'Sport fish'),
    'BWW2': ('Aquatic Life Uses', 'Small perennial streams, non-game fish'),
    'BWW3': ('Aquatic Life Uses', 'Intermittent pools, non-game fish'),
    'BLW': ('Aquatic Life Uses', 'Impoundments supporting lake communities'),
    'BCW1': ('Protect Aquatic Life', 'Cold water habitat, diverse species'),
    'BCW2': ('Protect Aquatic Life', 'Small cold-water streams, no trout'),
    'C': ('Drinking Water Uses', 'Potable water source waters'),
    'HH': ('Other Uses', 'Fish harvest for consumption'),
    'OIW': ('Other Uses', 'Outstanding state resource water'),
    'GenUse': ('General Use', 'General use')
}

df['designateduses'] = df['use'].map(lambda x: designated_uses_map.get(x, ('Unknown', 'Unknown'))[0])
df['usedescription'] = df['use'].map(lambda x: designated_uses_map.get(x, ('Unknown', 'Unknown'))[1])
# df.head(1)

In [16]:
# Add two columns that provide descriptions and classifications for the "tmdlpriority" column.
tmdlpriority_map = {
    'N/A': 'N/A',
    'Tier I': 'High impact, low complexity/cost',
    'Tier II': 'High impact, high complexity/cost',
    'Tier III': 'Low impact, low complexity/cost',
    'Tier IV': 'Low impact, high complexity/cost'
}

df['tmdlprioritydescription'] = df['tmdlpriority'].map(lambda x: tmdlpriority_map.get(x, 'Unknown'))
# df.head(1)


In [17]:
# Create a new column to determine how many years a segment has been impaired since its 'cyclelisted' year
df['years_impaired'] = 2024 - df['cyclelisted']
df[['cyclelisted', 'years_impaired']].head()


Unnamed: 0,cyclelisted,years_impaired
0,2004,20
1,2004,20
2,2014,10
3,2012,12
4,2004,20


In [18]:
# Display columns name
list(df.columns)

['assessid',
 'segid',
 'cycle',
 'name',
 'adbcode',
 'type',
 'size',
 'status',
 'use',
 'support',
 'impcode',
 'impairment',
 'listingrationale',
 'datasource',
 'tmdlpriority',
 'legacyadbcode',
 'cyclelisted',
 'impairmentstatus',
 'designateduses',
 'usedescription',
 'tmdlprioritydescription',
 'years_impaired']

In [19]:
# New shape
df.shape

(3958, 22)

In [128]:
# unique segments all type
unique_segid_count = df['segid'].nunique()
unique_segid_count

694

#**Analysis**

####**How many segments are there per type and per year?**


In [20]:
segments_per_type_year = df.groupby(['cycle', 'type'])['segid'].nunique().reset_index()
segments_per_type_year = segments_per_type_year.sort_values(by=['type', 'cycle'], ascending=[True, True])
segments_per_type_year


Unnamed: 0,cycle,type,segid
0,2016,Lake,86
4,2018,Lake,92
8,2020,Lake,88
12,2022,Lake,88
16,2024,Lake,89
1,2016,Reservoir,7
5,2018,Reservoir,7
9,2020,Reservoir,3
13,2022,Reservoir,3
17,2024,Reservoir,3


####**How many segments are there per type?**

In [31]:
# #Count segments per type across all years
# segments_per_type = df.groupby('type')['segid'].nunique().reset_index()
# segments_per_type = segments_per_type.sort_values(by='segid', ascending=False)
# segments_per_type

# Count segments per type across all years
segments_per_type = df.groupby('type')['segid'].nunique().reset_index()

# Calculate the total number of unique segments
total_segments = segments_per_type['segid'].sum()

# Add a percentage column without decimals
segments_per_type['percentage'] = ((segments_per_type['segid'] / total_segments) * 100).astype(int)

# Sort by segment count in descending order
segments_per_type = segments_per_type.sort_values(by='segid', ascending=False)

# Display the result
segments_per_type


Unnamed: 0,type,segid,percentage
2,River,555,79
0,Lake,127,18
1,Reservoir,7,1
3,Wetland,5,0


In [27]:
# Count the number of unique 'SegID' values in the DataFrame
unique_id_count = df['segid'].nunique()
print(unique_id_count)


694


# **Rivers**

In [131]:
# Create dataset for river segments consistently present in all five cycles
df_rivers = df[df['type'] == 'River']
df_rivers.shape
river_segments_consecutive = df_rivers.groupby('segid')['cycle'].nunique().reset_index()
total_unique_river_segments = df_rivers['segid'].nunique()
print("Total unique river segments:", total_unique_river_segments)
consecutive_river_segments = river_segments_consecutive[river_segments_consecutive['cycle'] == 5]
print("Total unique river consecutive:", consecutive_river_segments)


Total unique river segments: 555
Total unique river consecutive:      segid  cycle
0        2      5
1       13      5
2       14      5
3       15      5
4       16      5
..     ...    ...
544   6598      5
545   6599      5
546   6600      5
550   6620      5
554   6638      5

[443 rows x 2 columns]


In [132]:
# Count unique segments
unique_segments_count = consecutive_river_segments['segid'].nunique()
print("Number of unique river segments present in all five cycles:", unique_segments_count)



Number of unique river segments present in all five cycles: 443


In [133]:
# As percentage of total.
total_river_segments = df_rivers['segid'].nunique()
representation_percentage = (unique_segments_count / total_river_segments) * 100
print(f"Percentage of total river segments that are present in all five cycles: {int(representation_percentage)}%")

Percentage of total river segments that are present in all five cycles: 79%


In [134]:
# # Most frequent uses | consecutive_river_segments = 5 ciclos | https://programs.iowadnr.gov/adbnet/Docs/Codex/Designated%20Uses
filtered_rivers = df_rivers[df_rivers['segid'].isin(consecutive_river_segments['segid'])]
use_counts = filtered_rivers.groupby(['use', 'usedescription'])['segid'].nunique().reset_index()
total_use_segments = use_counts['segid'].sum()
use_counts['percentage'] = ((use_counts['segid'] / total_use_segments) * 100).astype(int)
use_counts = use_counts.sort_values(by='segid', ascending=False)
print("Most frequent uses in deteriorated river segments:")
print(use_counts)

Most frequent uses in deteriorated river segments:
      use                          usedescription  segid  percentage
0      A1               Swimming and water skiing    300          51
5    BWW2  Small perennial streams, non-game fish     97          16
4    BWW1                              Sport fish     81          13
7      HH            Fish harvest for consumption     35           6
1      A2        Fishing and shoreline activities     29           5
3    BCW1     Cold water habitat, diverse species     27           4
6  GenUse                             General use      6           1
2      A3          Wading or playing in the water      5           0


In [135]:
# # Most frequent uses | consecutive_river_segments = 5 ciclos
filtered_rivers = df_rivers[df_rivers['segid'].isin(consecutive_river_segments['segid'])]
use_counts = filtered_rivers.groupby(['use','designateduses'])['segid'].nunique().reset_index()
total_use_segments = use_counts['segid'].sum()
use_counts['percentage'] = ((use_counts['segid'] / total_use_segments) * 100).astype(int)
use_counts = use_counts.sort_values(by='segid', ascending=False)
print("Most frequent uses in deteriorated river segments:")
print(use_counts)

Most frequent uses in deteriorated river segments:
      use        designateduses  segid  percentage
0      A1     Recreational Uses    300          51
5    BWW2     Aquatic Life Uses     97          16
4    BWW1     Aquatic Life Uses     81          13
7      HH            Other Uses     35           6
1      A2     Recreational Uses     29           5
3    BCW1  Protect Aquatic Life     27           4
6  GenUse           General Use      6           1
2      A3     Recreational Uses      5           0


In [136]:
# Most common impairments
filtered_rivers = df_rivers[df_rivers['segid'].isin(consecutive_river_segments['segid'])]
impairment_counts = filtered_rivers.groupby(['impairment'])['segid'].nunique().reset_index()
total_impairment_segments = impairment_counts['segid'].sum()
impairment_counts['percentage'] = ((impairment_counts['segid'] / total_impairment_segments) * 100).astype(int)
impairment_counts = impairment_counts.sort_values(by='segid', ascending=False)
print("Most frequent impairments in deteriorated river segments:")
print(impairment_counts)

Most frequent impairments in deteriorated river segments:
                                           impairment  segid  percentage
0               Bacteria: Indicator Bacteria- E. coli    301          51
4       Biological: low aquatic macroinvertebrate IBI     53           8
5   Biological: low fish & invert IBIs- cause unknown     48           8
8                  Fish Consumption Advisory: Mercury     34           5
6                            Biological: low fish IBI     30           5
11                  Fish Kill: Caused By Animal Waste     19           3
19                 Fish Kill: Due To Unknown Toxicity     16           2
2           Biological: loss of native mussel species     12           2
23           Organic Enrichment: Low Dissolved Oxygen     12           2
32                                                 pH      8           1
22                                   Metals: Selenium      8           1
25                 Temperature: Thermal Modifications      7      

In [137]:
# Most common TMDL priority levels
filtered_rivers = df_rivers[df_rivers['segid'].isin(consecutive_river_segments['segid'])]
tmdlpriority_counts = filtered_rivers.groupby(['tmdlpriority'])['segid'].nunique().reset_index()
total_tmdlpriority_segments = tmdlpriority_counts['segid'].sum()
tmdlpriority_counts['percentage'] = ((tmdlpriority_counts['segid'] / total_tmdlpriority_segments) * 100).astype(int)
tmdlpriority_counts = tmdlpriority_counts.sort_values(by='segid', ascending=False)
print("Most frequent tmdlprioritys in deteriorated river segments:")
print(tmdlpriority_counts)

Most frequent tmdlprioritys in deteriorated river segments:
  tmdlpriority  segid  percentage
1     Tier III    298          52
2      Tier IV    245          43
3      Unknown     21           3
0      Tier II      3           0


In [138]:
# Most common TMDL priority levels | https://programs.iowadnr.gov/adbnet/Docs/Codex/TMDL%20Prioritization
filtered_rivers = df_rivers[df_rivers['segid'].isin(consecutive_river_segments['segid'])]
tmdlpriority_counts = filtered_rivers.groupby(['tmdlprioritydescription'])['segid'].nunique().reset_index()
total_tmdlpriority_segments = tmdlpriority_counts['segid'].sum()
tmdlpriority_counts['percentage'] = ((tmdlpriority_counts['segid'] / total_tmdlpriority_segments) * 100).astype(int)
tmdlpriority_counts = tmdlpriority_counts.sort_values(by='segid', ascending=False)
print("Most frequent tmdlprioritys in deteriorated river segments:")
print(tmdlpriority_counts)

Most frequent tmdlprioritys in deteriorated river segments:
             tmdlprioritydescription  segid  percentage
2    Low impact, low complexity/cost    298          52
1   Low impact, high complexity/cost    245          43
3                            Unknown     21           3
0  High impact, high complexity/cost      3           0


In [139]:
# Most
filtered_rivers = df_rivers[df_rivers['segid'].isin(consecutive_river_segments['segid'])]
use_counts = filtered_rivers['use'].value_counts()
valid_uses = use_counts[use_counts >= 5].index
filtered_rivers = filtered_rivers[filtered_rivers['use'].isin(valid_uses)]
filtered_rivers = filtered_rivers[filtered_rivers['cyclelisted'] == 2004]
print("Most frequent uses in deteriorated river segments:")
print(filtered_rivers.groupby(['use', 'usedescription','cyclelisted','name'])['segid'].nunique().reset_index())



Most frequent uses in deteriorated river segments:
     use                          usedescription  cyclelisted  \
0     A1               Swimming and water skiing         2004   
1     A1               Swimming and water skiing         2004   
2     A1               Swimming and water skiing         2004   
3     A1               Swimming and water skiing         2004   
4     A1               Swimming and water skiing         2004   
..   ...                                     ...          ...   
56  BWW2  Small perennial streams, non-game fish         2004   
57  BWW2  Small perennial streams, non-game fish         2004   
58  BWW2  Small perennial streams, non-game fish         2004   
59  BWW2  Small perennial streams, non-game fish         2004   
60  BWW2  Small perennial streams, non-game fish         2004   

                  name  segid  
0     Des Moines River      1  
1           Iowa River      4  
2   Little Sioux River      1  
3    South Skunk River      1  
4       

# **Lake**

In [144]:
# Create dataset for river segments consistently present in all five cycles
df_lake = df[df['type'] == 'Lake']
df_lake.shape
river_segments_consecutive = df_lake.groupby('segid')['cycle'].nunique().reset_index()
total_unique_river_segments = df_lake['segid'].nunique()
print("Total unique river segments:", total_unique_river_segments)
consecutive_river_segments = river_segments_consecutive[river_segments_consecutive['cycle'] == 5]
print("Total unique river consecutive:", consecutive_river_segments)

Total unique river segments: 127
Total unique river consecutive:      segid  cycle
0       20      5
3      356      5
5      463      5
10     657      5
11     658      5
12     677      5
16     758      5
17     773      5
19     778      5
21     796      5
22     818      5
23     832      5
27     862      5
30     888      5
31     896      5
33     929      5
34     930      5
35     950      5
40    1016      5
41    1019      5
42    1035      5
45    1073      5
46    1080      5
48    1085      5
52    1134      5
53    1143      5
55    1168      5
62    1231      5
66    1255      5
68    1281      5
70    1304      5
71    1318      5
73    1358      5
74    1361      5
75    1367      5
81    1404      5
83    1435      5
85    1470      5
89    1477      5
93    1532      5
97    1625      5
98    1629      5
102   1649      5
109   1663      5
112   1711      5
114   1716      5
115   1734      5
116   1735      5
118   1754      5
121   1988      5
122   2064      5

In [145]:
# # shape
# df_lake.shape

(688, 22)

In [158]:
df_lake.head(1)

Unnamed: 0,assessid,segid,cycle,name,adbcode,type,size,status,use,support,...,listingrationale,datasource,tmdlpriority,legacyadbcode,cyclelisted,impairmentstatus,designateduses,usedescription,tmdlprioritydescription,years_impaired
10,90,20,2016,Backbone Lake,01-MAQ-20,Lake,2,Final,A1,PS,...,Geometric mean criterion exceeded,Beach monitoring: Iowa DNR WQMA,Tier III,IA 01-MAQ-0090-L_0,2004,Continuing,Recreational Uses,Swimming and water skiing,"Low impact, low complexity/cost",20


In [159]:
# Count unique segments
unique_segments_count = consecutive_lake_segments['segid'].nunique()
print("Number of unique lake segments present in all five cycles:", unique_segments_count)

Number of unique lake segments present in all five cycles: 54


In [160]:
# As percentage of total.
total_lake_segments = df_lake['segid'].nunique()
representation_percentage = (unique_segments_count / total_lake_segments) * 100
print(f"Percentage of total river segments that are present in all five cycles: {int(representation_percentage)}%")

Percentage of total river segments that are present in all five cycles: 42%


In [161]:
# Most frequent uses | consecutive_lake_segments = 5 ciclos
filtered_lake = df_lake[df_lake['segid'].isin(consecutive_lake_segments['segid'])]
use_counts = filtered_lake.groupby(['use', 'usedescription'])['segid'].nunique().reset_index()
total_use_segments = use_counts['segid'].sum()
use_counts['percentage'] = ((use_counts['segid'] / total_use_segments) * 100).astype(int)
use_counts = use_counts.sort_values(by='segid', ascending=False)
print("Most frequent uses in deteriorated lake segments:")
print(use_counts)

Most frequent uses in deteriorated lake segments:
   use                            usedescription  segid  percentage
0   A1                 Swimming and water skiing     35          49
1  BLW  Impoundments supporting lake communities     21          29
2   HH              Fish harvest for consumption     15          21


In [154]:
# Most frequent uses | consecutive_lake_segments = 5 ciclos
filtered_lake = df_lake[df_lake['segid'].isin(consecutive_lake_segments['segid'])]
use_counts = filtered_lake.groupby(['use', 'designateduses'])['segid'].nunique().reset_index()
total_use_segments = use_counts['segid'].sum()
use_counts['percentage'] = ((use_counts['segid'] / total_use_segments) * 100).astype(int)
use_counts = use_counts.sort_values(by='segid', ascending=False)
print("Most frequent uses in deteriorated lake segments:")
print(use_counts)

Most frequent uses in deteriorated lake segments:
   use     designateduses  segid  percentage
0   A1  Recreational Uses     35          49
1  BLW  Aquatic Life Uses     21          29
2   HH         Other Uses     15          21


In [155]:
# Most common impairments
filtered_lake = df_lake[df_lake['segid'].isin(consecutive_lake_segments['segid'])]
impairment_counts = filtered_lake.groupby(['impairment'])['segid'].nunique().reset_index()
total_impairment_segments = impairment_counts['segid'].sum()
impairment_counts['percentage'] = ((impairment_counts['segid'] / total_impairment_segments) * 100).astype(int)
impairment_counts = impairment_counts.sort_values(by='segid', ascending=False)
print("Most frequent impairments in deteriorated lake segments:")
print(impairment_counts)

Most frequent impairments in deteriorated lake segments:
                                  impairment  segid  percentage
0                Algal Growth: Chlorophyll a     29          25
2      Bacteria: Indicator Bacteria- E. coli     19          16
3         Fish Consumption Advisory: Mercury     14          12
8        Turbidity: Secchi Disk Transparency     13          11
10                                        pH     13          11
9                Turbidity: Suspended Solids      9           8
7                                  Turbidity      8           7
6   Organic Enrichment: Low Dissolved Oxygen      4           3
1                Algal Growth: Cyanobacteria      1           0
4            Fish Consumption Advisory: PCBs      1           0
5                            Metals: Mercury      1           0


In [163]:
# Most common TMDL priority levels
filtered_lake = df_lake[df_lake['segid'].isin(consecutive_lake_segments['segid'])]
tmdlpriority_counts = filtered_lake.groupby(['tmdlpriority'])['segid'].nunique().reset_index()
total_tmdlpriority_segments = tmdlpriority_counts['segid'].sum()
tmdlpriority_counts['percentage'] = ((tmdlpriority_counts['segid'] / total_tmdlpriority_segments) * 100).astype(int)
tmdlpriority_counts = tmdlpriority_counts.sort_values(by='segid', ascending=False)
print("Most frequent tmdlprioritys in deteriorated lake segments:")
print(tmdlpriority_counts)

Most frequent tmdlprioritys in deteriorated lake segments:
  tmdlpriority  segid  percentage
3      Tier IV     31          40
1      Tier II     23          29
0       Tier I     20          25
4      Unknown      2           2
2     Tier III      1           1


In [164]:
# Most common TMDL priority levels
filtered_lake = df_lake[df_lake['segid'].isin(consecutive_lake_segments['segid'])]
tmdlpriority_counts = filtered_lake.groupby(['tmdlprioritydescription'])['segid'].nunique().reset_index()
total_tmdlpriority_segments = tmdlpriority_counts['segid'].sum()
tmdlpriority_counts['percentage'] = ((tmdlpriority_counts['segid'] / total_tmdlpriority_segments) * 100).astype(int)
tmdlpriority_counts = tmdlpriority_counts.sort_values(by='segid', ascending=False)
print("Most frequent tmdlprioritys in deteriorated lake segments:")
print(tmdlpriority_counts)

Most frequent tmdlprioritys in deteriorated lake segments:
             tmdlprioritydescription  segid  percentage
2   Low impact, high complexity/cost     31          40
0  High impact, high complexity/cost     23          29
1   High impact, low complexity/cost     20          25
4                            Unknown      2           2
3    Low impact, low complexity/cost      1           1


In [165]:
#
filtered_lake = df_lake[df_lake['segid'].isin(consecutive_lake_segments['segid'])]
use_counts = filtered_lake['use'].value_counts()
valid_uses = use_counts[use_counts >= 5].index
filtered_lake = filtered_lake[filtered_lake['use'].isin(valid_uses)]
filtered_lake = filtered_lake[filtered_lake['cyclelisted'] == 2004]
print("Most frequent uses in deteriorated lake segments:")
print(filtered_lake.groupby(['use', 'usedescription','cyclelisted','name'])['segid'].nunique().reset_index())


Most frequent uses in deteriorated lake segments:
  use             usedescription  cyclelisted  \
0  A1  Swimming and water skiing         2004   
1  A1  Swimming and water skiing         2004   
2  A1  Swimming and water skiing         2004   
3  A1  Swimming and water skiing         2004   
4  A1  Swimming and water skiing         2004   
5  A1  Swimming and water skiing         2004   
6  A1  Swimming and water skiing         2004   

                               name  segid  
0                     Backbone Lake      1  
1                       Browns Lake      1  
2                       Desoto Bend      1  
3                    Lake Hendricks      1  
4                       Lake Manawa      1  
5                Roberts Creek Lake      1  
6  White Oak Conservation Area Lake      1  


In [None]:
# https://programs.iowadnr.gov/adbnet/Docs/Codex/Integrated%20Report%20Categories