<a href="https://colab.research.google.com/github/mocorderos/Water_Impair_Iowa/blob/main/ImpairedWaterListFeb25.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#**About this data**

### **Summary of impaired waters data analysis**

The dataset includes the last five **impaired waters lists** published biennially by the **Iowa DNR** to comply with the **Clean Water Act** and approved by the **EPA**.

A **segment** is a continuous water body where water quality remains similar throughout. Each segment has **two or more Designated Uses** based on its function.

### **Key context and limitations**
- The **Iowa DNR monitors only half** of the registered water bodies in the state (**data to be confirmed**).
- The analysis covers **five cycles** because data from **2014 and earlier** is not comparable.

### **Criteria for impairment**
A segment is **impaired** if it fails to meet designated use standards due to:
- **Pollutants** (e.g., excess nutrients, chemicals)
- **Biological decline** (e.g., reduced fish or aquatic life diversity)
- **Recreational risks** (e.g., high **E. coli**, algal toxins)

---

# **Analysis of Impaired Waters (2016-2024)**  

This analysis examines the last five **Impaired Waters Lists** published by the Department of Natural Resources for the **2016, 2018, 2020, 2022, and 2024** assessment cycles. Over the past decade, these reports have identified **694 impaired water segments**, with:  

- **Rivers:** 79%  
- **Lakes:** 18%  
- **Wetlands:** 1%  
- **Reservoirs:** 0.7%  

---

## **Rivers**  

Among the **555 impaired river segments**, **79% (443 segments)** have remained impaired for at least **10 years**, consistently appearing in all five assessment cycles. This deterioration has persisted across the same or different designated uses of these segments. Notably, **61 river segments** have been impaired for at least **24 years**, with impairment traced back to the year they were first classified as deteriorated for a specific use.  

### **Key Findings:**  
- **51%** of the permanently impaired river segments are designated for **recreational use**, affecting areas where people **swim and water ski**.  
- **E. coli contamination** is the leading cause of pollution, affecting **51% of these segments**.  

### **Impact and Remediation Challenges:**  
- **More than half** of the permanently impaired river segments are classified as **low impact, low complexity/cost**.  
- **43%** fall into the **low impact but high complexity/cost** category, indicating that while the environmental risk may be lower, remediation remains a significant challenge.  

---

## **Lakes**  

Among the **127 impaired lake segments**, **42% (54 segments)** have remained impaired for at least **10 years**, consistently appearing in all five assessment cycles. This deterioration has persisted across the same or different designated uses of these segments. **Six lake segments** have been impaired for at least **24 years**, with impairment traced back to the year they were first classified as deteriorated for a specific use.  

### **Key Findings:**  
- **49%** of permanently impaired lake segments are designated for **recreational use**, primarily swimming and water skiing.  
- The leading causes of pollution in lakes are:  
  - **Algal Growth: Chlorophyll** – affecting **25% of impaired lake segments**  
  - **Bacteria: Indicator Bacteria (E. coli)** – affecting **16%**  

### **Impact and Remediation Challenges:**  
- **40%** of permanently impaired lake segments are classified as **low impact, high complexity/cost**.  
- **29%** are classified as **high impact, high complexity/cost**.  
- **25%** fall under **high impact, low complexity/cost**.  



#**Imports**

In [64]:
import pandas as pd

In [65]:
from google.colab import files

#**Connect with Drive**

In [73]:
#Connection to the drive
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [74]:
# Load the datasets
df = pd.read_csv("/content/drive/MyDrive/Water/ImpairedWater/impaired/allfivecycles.csv")
# df2 = pd.read_csv("/content/drive/MyDrive/Water/ImpairedWater/Delistings/combined.csv")

#**Impaired list dataset | Cleaning**

In [75]:
# Check columns and rows | Impaired
df.shape

(3958, 18)

In [72]:
# Preview the first few rows to verify column names and data types
df.head(1)

Unnamed: 0,AssessID,SegID,cycle,name,adbCode,type,size,status,use,support,impCode,impairment,listingRationale,dataSource,tmdlPriority,legacyAdbCode,cycleListed,impairmentStatus
0,2553,1,2016,Shrickers Slough,01-MAQ-1,Wetland,140.0,Final,BWW1,PS,5a,Algal Growth: Chlorophyll a,Adverse impacts on plant/animal communities,Ambient monitoring: Long-Term Resource Monito...,Tier IV,IA 01-MAQ-0005-L_0,2004,Continuing


In [83]:
# Inspect data types
df.dtypes

Unnamed: 0,0
assessid,int64
segid,int64
cycle,int64
name,object
adbcode,object
type,object
size,int64
status,object
use,object
support,object


In [76]:
# Convert to lowercase, remove leading/trailing spaces
new_columns = df.columns.str.strip().str.lower()
df.columns = new_columns
# df.head(1)

In [77]:
# Display missing values
df.isnull().sum()

Unnamed: 0,0
assessid,0
segid,0
cycle,0
name,0
adbcode,0
type,0
size,0
status,0
use,0
support,0


In [80]:
# Fill missing values in 'tmdlpriority', 'datasource', and 'legacyadbcode' with 'Unknown'
df['tmdlpriority'] = df['tmdlpriority'].fillna('Unknown')
df['datasource'] = df['datasource'].fillna('Unknown')
df['legacyadbcode'] = df['legacyadbcode'].fillna('Unknown')
# df.isnull().sum()

In [81]:
# Convert 'cycle', 'segid', and 'size' to integer
df['cycle'] = df['cycle'].astype(int)
df['segid'] = df['segid'].astype(int)
df['size'] = df['size'].astype(int)
df.dtypes

Unnamed: 0,0
assessid,int64
segid,int64
cycle,int64
name,object
adbcode,object
type,object
size,int64
status,object
use,object
support,object


In [176]:
# Add columns that provide descriptions and classifications for the "use" column.
# Source: https://programs.iowadnr.gov/adbnet/Docs/Codex/Designated%20Uses
designated_uses_map = {
'A1': ('Recreational Uses', 'Swimming and water skiing'),
'A2': ('Recreational Uses', 'Fishing and shoreline activities'),
'A3': ('Recreational Uses', 'Wading or playing in the water'),
'BWW1': ('Aquatic Life Uses', 'Sport fish habitats'),
'BWW2': ('Aquatic Life Uses', 'Small streams with non-game fish'),
'BWW3': ('Aquatic Life Uses', 'Intermittent pools with non-game fish'),
'BLW': ('Aquatic Life Uses', 'artificial and natural impoundment'),
'BCW1': ('Cold Water Habitat', 'Diverse species, including trout'),
'BCW2': ('Cold Water Habitat', 'Small streams without trout'),
'C': ('Drinking Water Uses', 'Potable water sources'),
'HH': ('Human Health', 'Fish harvested for consumption'),
'OIW': ('Outstanding Waters', 'Exceptional state resource waters'),
'GenUse': ('General Use', 'Broad, unspecified uses')
}

df['designateduses'] = df['use'].map(lambda x: designated_uses_map.get(x, ('Unknown', 'Unknown'))[0])
df['usedescription'] = df['use'].map(lambda x: designated_uses_map.get(x, ('Unknown', 'Unknown'))[1])

In [86]:
# Add two columns that provide descriptions and classifications for the "tmdlpriority" column.
# Source: https://programs.iowadnr.gov/adbnet/Assessments/Summary/2024/Impaired
tmdlpriority_map = {
    'N/A': 'N/A',
    'Tier I': 'High impact, low complexity/cost',
    'Tier II': 'High impact, high complexity/cost',
    'Tier III': 'Low impact, low complexity/cost',
    'Tier IV': 'Low impact, high complexity/cost'
}

df['tmdlprioritydescription'] = df['tmdlpriority'].map(lambda x: tmdlpriority_map.get(x, 'Unknown'))
# df.head(1)

In [88]:
# Create mapping dictionary for impairment classification
impairment_map = {
    'Bacteria: Indicator Bacteria- E. coli': 'E. coli',
    'Fish Consumption Advisory: Mercury': 'Other',
    'Biological: low aquatic macroinvertebrate IBI': 'Other',
    'Biological: low fish & invert IBIs- cause unknown': 'Other',
    'Algal Growth: Chlorophyll a': 'Fertilizer Tie',
    'pH': 'Other',
    'Biological: low fish IBI': 'Other',
    'Organic Enrichment: Low Dissolved Oxygen': 'Fertilizer Tie',
    'Fish Kill: Caused By Animal Waste': 'CAFO Connection',
    'Fish Kill: Due To Unknown Toxicity': 'Other',
    'Turbidity': 'Other',
    'Biological: loss of native mussel species': 'Other',
    'Turbidity: Secchi Disk Transparency': 'Link with agriculture',
    'Turbidity: Suspended Solids': 'Link with agriculture',
    'Fish Kill: Caused By Fertilizer Spill': 'Fertilizer Tie',
    'Temperature: Thermal Modifications': 'Other',
    'Fish Kill: Caused By Pesticides': 'Pesticide Tie',
    'Metals: Aluminum': 'Other',
    'Wastewater': 'Other',
    'Metals: Selenium': 'Other',
    'Toxic Organics: Priority Organics': 'Other',
    'Toxic Inorganics: Ammonia': 'Fertilizer Tie',
    'Biological: low Biological Integrity': 'Other',
    'Bacteria: Indicator Bacteria- fecal coliform': 'Other',
    'pH- High': 'Other',
    'Fish Consumption Advisory: PCBs': 'Other',
    'Fish Kill: Caused By Fuel Spill': 'Other',
    'Fish Kill: Caused By Chlorine': 'Other',
    'Metals: Chromium': 'Other',
    'Turbidity: Siltation/Turbidity': 'Other',
    'Fish Kill: Caused By Silage Runoff': 'Other',
    'Toxic Organics: Coal Tar': 'Other',
    'Temperature: Water': 'Other',
    'Fish Kill: Caused By Other': 'Other',
    'Fish Kill: Caused By Organic Enrichment/Low Dissolved Oxygen': 'Fertilizer Tie',
    'Dissolved Solids: Chloride': 'Other',
    'Fish Kill: Due To Natural Causes': 'Other',
    'Fish Kill: Caused By Wastewater': 'Other',
    'Fish Kill: Caused By Spill': 'Other',
    'Algal Growth: Cyanobacteria': 'Fertilizer Tie',
    'Toxic Organics: PCBs': 'Other',
    'Sedimentation/Siltation': 'Other',
    'Metals: Mercury': 'Other',
    'Fish Kill: Caused By Ammonia': 'Fertilizer Tie',
    'Fish Kill: Cause Unknown': 'Other',
    'Pesticides': 'Pesticide Tie',
    'Metals: Copper': 'Other',
    'Fish Kill: Caused By Petroleum Spill': 'Other',
    'Aesthetics: Aesthetically Objectionable Conditions': 'Other'
}

# Mapping impairment descriptions to classification
df['impairment_classification'] = df['impairment'].map(lambda x: impairment_map.get(x, 'Unknown'))
# df.head(1)

In [89]:
# Create a new column to determine how many years a segment has been impaired since its 'cyclelisted' year
df['years_impaired'] = 2024 - df['cyclelisted']
df[['cyclelisted', 'years_impaired']].head()


Unnamed: 0,cyclelisted,years_impaired
0,2004,20
1,2004,20
2,2014,10
3,2012,12
4,2004,20


In [90]:
# Display columns name
list(df.columns)

['assessid',
 'segid',
 'cycle',
 'name',
 'adbcode',
 'type',
 'size',
 'status',
 'use',
 'support',
 'impcode',
 'impairment',
 'listingrationale',
 'datasource',
 'tmdlpriority',
 'legacyadbcode',
 'cyclelisted',
 'impairmentstatus',
 'designateduses',
 'usedescription',
 'tmdlprioritydescription',
 'impairment_classification',
 'years_impaired']

In [91]:
# New shape
df.shape

(3958, 23)

In [92]:
# unique segments all type
unique_segid_count = df['segid'].nunique()
unique_segid_count

694

#**Analysis**

####**Number of unique water segments per year**


In [97]:
segments_per_type_year = df.groupby(['cycle', 'type'])['segid'].nunique().reset_index()
segments_per_type_year = segments_per_type_year.sort_values(by=['type', 'cycle'], ascending=[True, True])
segments_per_type_year


Unnamed: 0,cycle,type,segid
0,2016,Lake,86
4,2018,Lake,92
8,2020,Lake,88
12,2022,Lake,88
16,2024,Lake,89
1,2016,Reservoir,7
5,2018,Reservoir,7
9,2020,Reservoir,3
13,2022,Reservoir,3
17,2024,Reservoir,3


####**Unique segments by type**

In [100]:
# #Count segments per type across all years
# segments_per_type = df.groupby('type')['segid'].nunique().reset_index()
# segments_per_type = segments_per_type.sort_values(by='segid', ascending=False)
# segments_per_type

# Count segments per type across all years
segments_per_type = df.groupby('type')['segid'].nunique().reset_index()

# Calculate the total number of unique segments
total_segments = segments_per_type['segid'].sum()

# Add a percentage column without decimals
segments_per_type['percentage'] = ((segments_per_type['segid'] / total_segments) * 100).astype(int)

# Sort by segment count in descending order
segments_per_type = segments_per_type.sort_values(by='segid', ascending=False)

# Display the result
segments_per_type


Unnamed: 0,type,segid,percentage
2,River,555,79
0,Lake,127,18
1,Reservoir,7,1
3,Wetland,5,0


In [101]:
# Count the number of unique 'SegID' values in the DataFrame
unique_id_count = df['segid'].nunique()
print(unique_id_count)

694


# **Rivers**

In [201]:
# Create dataset for river segments consistently present in all five cycles
df_rivers = df[df['type'] == 'River']
df_rivers.shape
river_segments_consecutive = df_rivers.groupby('segid')['cycle'].nunique().reset_index()
total_unique_river_segments = df_rivers['segid'].nunique()
consecutive_river_segments = river_segments_consecutive[river_segments_consecutive['cycle'] == 5]
consecutive_river_segments

# ## Download
# filename = 'consecutive_river_segments.csv'
# consecutive_river_segments.to_csv(filename)
# # Download the file to your local machine
# files.download(filename)

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

In [106]:
# Unique river segments that have been classified as impaired in five consecutive assessment cycles
consecutive_river_segments['segid'].nunique()

443

In [114]:
# Percentage
total_river_segments = df_rivers['segid'].nunique()
representation_percentage = (unique_segments_count / total_river_segments) * 100
print(f"The percentage of unique river segments impaired in five consecutive cycles in relation to the total number of segments assessed:{int(representation_percentage)}%")

The percentage of unique river segments impaired in five consecutive cycles in relation to the total number of segments assessed:79%


In [142]:
# # Most frequent uses | consecutive_river_segments = 5 ciclos | https://programs.iowadnr.gov/adbnet/Docs/Codex/Designated%20Uses
filtered_rivers = df_rivers[df_rivers['segid'].isin(consecutive_river_segments['segid'])]
use_counts = filtered_rivers.groupby(['usedescription']).size().reset_index(name='counts')
use_counts['percentages'] = (use_counts['counts']/ use_counts['counts'].sum()*100).round(0)
use_counts.sort_values(by='counts',ascending=False)



Unnamed: 0,usedescription,counts,percentages
6,Swimming and water skiing,1478,51.0
4,Small streams with non-game fish,499,17.0
5,Sport fish habitats,403,14.0
2,Fish harvested for consumption,166,6.0
1,"Diverse species, including trout",149,5.0
3,Fishing and shoreline activities,143,5.0
0,"Broad, unspecified uses",30,1.0
7,Wading or playing in the water,21,1.0


In [147]:
# Most frequent impairments among river segments classified as impaired in five consecutive assessment cycles
filtered_rivers = df_rivers[df_rivers['segid'].isin(consecutive_river_segments['segid'])]
use_counts_impairment = filtered_rivers.groupby(['impairment']).size().reset_index(name='counts')
use_counts_impairment['percentages'] = (use_counts_impairment['counts']/ use_counts_impairment['counts'].sum()*100).round(0)
use_counts_impairment.sort_values(by='counts',ascending=False)

Unnamed: 0,impairment,counts,percentages
0,Bacteria: Indicator Bacteria- E. coli,1593,55.0
4,Biological: low aquatic macroinvertebrate IBI,227,8.0
5,Biological: low fish & invert IBIs- cause unknown,218,8.0
8,Fish Consumption Advisory: Mercury,161,6.0
6,Biological: low fish IBI,138,5.0
19,Fish Kill: Due To Unknown Toxicity,71,2.0
11,Fish Kill: Caused By Animal Waste,69,2.0
2,Biological: loss of native mussel species,60,2.0
32,pH,54,2.0
23,Organic Enrichment: Low Dissolved Oxygen,50,2.0


In [149]:
# Most frequent impairments among river segments classified as impaired in five consecutive assessment cycles | impairment_classification = reclassification of the impairment
filtered_rivers = df_rivers[df_rivers['segid'].isin(consecutive_river_segments['segid'])]
use_counts_impairment_classification = filtered_rivers.groupby(['impairment_classification']).size().reset_index(name='counts')
use_counts_impairment_classification['percentages'] = (use_counts_impairment_classification['counts']/ use_counts_impairment_classification['counts'].sum()*100).round(0)
use_counts_impairment_classification.sort_values(by='counts',ascending=False)

Unnamed: 0,impairment_classification,counts,percentages
1,E. coli,1593,55.0
3,Other,1110,38.0
2,Fertilizer Tie,96,3.0
0,CAFO Connection,69,2.0
4,Pesticide Tie,21,1.0


In [177]:
# Most frequent tmld pri among river segments classified as impaired in five consecutive assessment cycles
filtered_rivers = df_rivers[df_rivers['segid'].isin(consecutive_river_segments['segid'])]
use_counts_tmdl = filtered_rivers.groupby(['tmdlprioritydescription']).size().reset_index(name='counts')
use_counts_tmdl['percentages'] = (use_counts_tmdl['counts']/ use_counts_tmdl['counts'].sum()*100).round(0)
use_counts_tmdl.sort_values(by='counts',ascending=False)

Unnamed: 0,tmdlprioritydescription,counts,percentages
2,"Low impact, low complexity/cost",1581,55.0
1,"Low impact, high complexity/cost",1279,44.0
3,Unknown,24,1.0
0,"High impact, high complexity/cost",5,0.0


In [154]:
# test shape
filtered_rivers.shape

(2889, 23)

In [None]:
# # segid with more listed by 20 years - 2004
# filtered_rivers = df_rivers[df_rivers['segid'].isin(consecutive_river_segments['segid'])]

# filter = filtered_rivers.groupby('use')['cyclelisted'].nunique().reset_index()
# filter = filter[filter['cyclelisted'] > 5]['use']
# filtered_rivers = filtered_rivers[filtered_rivers['uso'].isin(filter)]

# #use_counts = filtered_rivers['use'].value_counts()
# #valid_uses = use_counts[use_counts >= 5].index
# #filtered_rivers = filtered_rivers[filtered_rivers['use'].isin(valid_uses)]
# #filtered_rivers = filtered_rivers[filtered_rivers['cyclelisted'] == 2004]
# print("Most frequent uses in deteriorated river segments:")
# print(filtered_rivers.groupby(['use', 'usedescription','cyclelisted','name'])['segid'].nunique().reset_index())

# # ## Download
# # sixtseg = filtered_rivers.groupby(['use', 'usedescription','cyclelisted','name'])['segid'].nunique().reset_index()
# # filename = 'sixtseg.csv'
# # sixtseg.to_csv(filename)
# # # Download the file to your local machine
# # files.download(filename)

In [200]:
# df_20yrs = filtered_rivers [
#     (filtered_rivers['cyclelisted'] == 2004) &
#     (filtered_rivers['years_impaired'] == 20)]

# # Now df_20yrs contains every (segment, use) pair that:
# # - first became impaired for that use in 2004, and
# # - has remained impaired for 20 years (till 2024).
# print(df_20yrs)


# **Lake**

In [202]:
# Create dataset for segments (type = 'Lake') consistently present in all five cycles
df_lakes = df[df['type'] == 'Lake']

df_lakes.shape

lake_segments_consecutive = df_lakes.groupby('segid')['cycle'].nunique().reset_index()
total_unique_lake_segments = df_lakes['segid'].nunique()
print("Total unique lake segments:", total_unique_lake_segments)

consecutive_lake_segments = lake_segments_consecutive[lake_segments_consecutive['cycle'] == 5]
# print("Total unique lake consecutive:", consecutive_lake_segments)

# # Download
# filename = 'consecutive_lake_segments.csv'
# consecutive_lake_segments.to_csv(filename)
# files.download(filename)

Total unique lake segments: 127


<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

In [179]:
# Check df head
df_lakes.head(1)

Unnamed: 0,assessid,segid,cycle,name,adbcode,type,size,status,use,support,...,datasource,tmdlpriority,legacyadbcode,cyclelisted,impairmentstatus,designateduses,usedescription,tmdlprioritydescription,impairment_classification,years_impaired
10,90,20,2016,Backbone Lake,01-MAQ-20,Lake,2,Final,A1,PS,...,Beach monitoring: Iowa DNR WQMA,Tier III,IA 01-MAQ-0090-L_0,2004,Continuing,Recreational Uses,Swimming and water skiing,"Low impact, low complexity/cost",E. coli,20


In [180]:
# Count unique lake segments
unique_segments_count_lake = consecutive_lake_segments['segid'].nunique()
print('Unique lake segments that have been classified as impaired in five consecutive assessment cycles', unique_segments_count_lake)

Unique lake segments that have been classified as impaired in five consecutive assessment cycles 54


In [174]:
# As percentage of total lake segments
total_lake_segments = df_lakes['segid'].nunique()
unique_segments_count_lake = consecutive_lake_segments['segid'].nunique()
representation_percentage_lakes = (unique_segments_count_lake / total_lake_segments) * 100
print(f"The percentage of unique river segments impaired in five consecutive cycles in relation to the total number of segments assessed: {int(representation_percentage_lakes)}%")

The percentage of unique river segments impaired in five consecutive cycles in relation to the total number of segments assessed: 42%


In [186]:
# Most frequent uses in lake segments consistently present across all five cycles
filtered_lakes = df_lakes[df_lakes['segid'].isin(consecutive_lake_segments['segid'])]
use_counts_lakes = (filtered_lakes.groupby(['use', 'usedescription'])['segid'].nunique().reset_index())
total_use_segments_lakes = use_counts_lakes['segid'].sum()
use_counts_lakes['percentage'] = ((use_counts_lakes['segid'] / total_use_segments_lakes) * 100).astype(int)
use_counts_lakes = use_counts_lakes.sort_values(by='segid', ascending=False)
print("Most frequent uses in lake segments present in all five cycles:")
print(use_counts_lakes)

Most frequent uses in lake segments present in all five cycles:
   use                      usedescription  segid  percentage
0   A1           Swimming and water skiing     35          49
1  BLW  artificial and natural impoundment     21          29
2   HH      Fish harvested for consumption     15          21


In [195]:
# Most frequent impairments among lakes segments classified as impaired in five consecutive assessment cycles
filtered_lakes = df_lakes[df_lakes['segid'].isin(consecutive_lake_segments['segid'])]
use_counts_lakes = (filtered_lakes.groupby(['impairment'])['segid'].nunique().reset_index())
total_use_segments_lakes = use_counts_lakes['segid'].sum()
use_counts_lakes['percentage'] = ((use_counts_lakes['segid'] / total_use_segments_lakes) * 100).astype(int)
use_counts_lakes = use_counts_lakes.sort_values(by='segid', ascending=False)
use_counts_lakes


Unnamed: 0,impairment,segid,percentage
0,Algal Growth: Chlorophyll a,29,25
2,Bacteria: Indicator Bacteria- E. coli,19,16
3,Fish Consumption Advisory: Mercury,14,12
8,Turbidity: Secchi Disk Transparency,13,11
10,pH,13,11
9,Turbidity: Suspended Solids,9,8
7,Turbidity,8,7
6,Organic Enrichment: Low Dissolved Oxygen,4,3
1,Algal Growth: Cyanobacteria,1,0
4,Fish Consumption Advisory: PCBs,1,0


In [196]:
# test
filtered_lakes.shape

(433, 23)

In [197]:
# Test
filtered_lakes['segid'].nunique()

54

In [199]:
# Most frequent tmld pri among lakes segments classified as impaired in five consecutive assessment cycles
filtered_lakes = df_lakes[df_lakes['segid'].isin(consecutive_lake_segments['segid'])]
use_counts_lakes = (filtered_lakes.groupby(['tmdlprioritydescription'])['segid'].nunique().reset_index())
total_use_segments_lakes = use_counts_lakes['segid'].sum()
use_counts_lakes['percentage'] = ((use_counts_lakes['segid'] / total_use_segments_lakes) * 100).astype(int)
use_counts_lakes = use_counts_lakes.sort_values(by='segid', ascending=False)
use_counts_lakes

Unnamed: 0,tmdlprioritydescription,segid,percentage
2,"Low impact, high complexity/cost",31,40
0,"High impact, high complexity/cost",23,29
1,"High impact, low complexity/cost",20,25
4,Unknown,2,2
3,"Low impact, low complexity/cost",1,1


In [None]:
# Most common TMDL priority levels
filtered_lake = df_lake[df_lake['segid'].isin(consecutive_lake_segments['segid'])]
tmdlpriority_counts = filtered_lake.groupby(['tmdlprioritydescription'])['segid'].nunique().reset_index()
total_tmdlpriority_segments = tmdlpriority_counts['segid'].sum()
tmdlpriority_counts['percentage'] = ((tmdlpriority_counts['segid'] / total_tmdlpriority_segments) * 100).astype(int)
tmdlpriority_counts = tmdlpriority_counts.sort_values(by='segid', ascending=False)
print("Most frequent tmdlprioritys in deteriorated lake segments:")
print(tmdlpriority_counts)

Most frequent tmdlprioritys in deteriorated lake segments:
             tmdlprioritydescription  segid  percentage
2   Low impact, high complexity/cost     31          40
0  High impact, high complexity/cost     23          29
1   High impact, low complexity/cost     20          25
4                            Unknown      2           2
3    Low impact, low complexity/cost      1           1


In [None]:
# https://programs.iowadnr.gov/adbnet/Docs/Codex/Integrated%20Report%20Categories