We really just need 3 things:

1) Map data and a way to interact with this data
2) A way to determine where <strong> water </strong> is in this map data
3) A way to rank this water and determine which candidates are viable fishing spots

# 1) Map Data

- We want data around my current location only - I am not going to drive more than <strong> 1.5 hours </strong> to go to these spots. HeiGIT caps the free API key at just 1 hour, so that is good enough. 
- We want a representation of data that we can visualize, but ultimately work with the underlying data to do our analysis 

In [1]:
home_lat = 39.734838
home_lon = -90.234682

In [2]:
import openrouteservice
from shapely.geometry import shape, Polygon
import geopandas as gpd
import pandas as pd

# Initialize ORS client
client = openrouteservice.Client(key="eyJvcmciOiI1YjNjZTM1OTc4NTExMTAwMDFjZjYyNDgiLCJpZCI6IjZlNWM5OWNlMzNjZDQzNzU4NzE1MzBiNzJlMmNiNDFhIiwiaCI6Im11cm11cjY0In0=")

# Get isochrone (3600 seconds = 1 hours)
iso = client.isochrones(
    locations=[[home_lon, home_lat]],
    profile='driving-car',
    range=[3600]
)

iso_polygon = shape(iso['features'][0]['geometry'])
gdf_iso = gpd.GeoDataFrame(geometry=[iso_polygon], crs="EPSG:4326")

gdf_iso.to_file("data/isochrone.geojson", driver="GeoJSON")


In [3]:
import folium

gdf_iso = gpd.read_file("data/isochrone.geojson")

# Project to metric CRS for accurate centroid calculation
gdf_proj = gdf_iso.to_crs(epsg=3857)

centroid_proj = gdf_proj.geometry.centroid.iloc[0]

centroid = gpd.GeoSeries([centroid_proj], crs=3857).to_crs(epsg=4326).geometry.iloc[0]
center = centroid.y, centroid.x

m = folium.Map(location=center, zoom_start=10)
folium.GeoJson(gdf_iso).add_to(m)
m

The geojson file is really just a filter to reduce the search space for bodies of water. It is just a polygon - it contains no information about what may or may not be water. We need to utilize OSM, or Open Streetmap Project. It is a free, crowd-sourced, and editable geographic data source from the OpenStreetMap project, a collaborative effort to create a detailed map of the world.

# 2) Where is the water?

In [4]:
import osmnx as ox
print(ox.__version__)

1.3.0


In [5]:
gdf_iso = gpd.read_file("data/isochrone.geojson")
iso_polygon = gdf_iso.geometry.iloc[0]

# Query OSM for water features inside the polygon
tags = {"natural": "water", "waterway": True}  # lakes, ponds, rivers, streams
water_bodies = ox.geometries_from_polygon(iso_polygon, tags)

In [6]:
water_bodies.head(5)

Unnamed: 0_level_0,Unnamed: 1_level_0,attribution,source,geometry,waterway,ele,gnis:feature_id,name,ref:US:NID,leisure,fixme,...,fishing,area,golf,swimming,phone,website,src,wheelchair,culvert,operator:wikipedia
element_type,osmid,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1
node,354172488,,,POINT (-90.34152 39.12725),dam,202.0,1691796.0,West Lake Country Club Lake Dam,IL00672,,,...,,,,,,,,,,
way,83730392,,,"POLYGON ((-90.38002 39.44405, -90.38011 39.443...",,173.0,1691909.0,White Hall Reservoir,,,,...,,,,,,,,,,
way,83730393,,,"POLYGON ((-90.37587 39.44232, -90.37587 39.442...",dam,173.0,1691908.0,White Hall Reservoir Dam,IL00757,,,...,,,,,,,,,,
way,103735121,,Bing,"POLYGON ((-90.36747 39.42697, -90.36684 39.427...",,,,,,,,...,,,,,,,,,,
way,103735122,,Bing,"LINESTRING (-90.36766 39.4266, -90.36638 39.42...",dam,,,Fitzjarrell Lake Dam,IL50409,,,...,,,,,,,,,,


In [7]:
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)

# water_bodies.dtypes

In [8]:
print(water_bodies['name'].isna().sum(),"total records without a name")
print(water_bodies['name'].count(), "total records with a name")

2362 total records without a name
334 total records with a name


In [9]:
df_non_na = water_bodies[~water_bodies['name'].isna()]

In [11]:
df_non_na.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,attribution,source,geometry,waterway,ele,gnis:feature_id,name,ref:US:NID,leisure,fixme,note,seamark:type,nodes,boat,deep_draft,name:ar,name:en,name:fa,name:fr,name:he,name:mia,name:oj,ship,source:deep_draft,NHD:ComID,NHD:FCode,NHD:FDate,NHD:FTYPE,NHD:RESOLUTION,natural,man_made,water,order:strahler,wikidata,wikipedia,intermittent,tidal,lock,layer,operator,operator:wikidata,tunnel,ways,type,landuse,source:name,check_date,salt,alt_name,material,NHD:FType,NHD:way_id,heritage,heritage:operator,heritage:website,historic,nrhp:criteria,nrhp:inscription_date,protection_title,ref,ref:nrhp,source_ref,start_date,operator:short,operator:type,NHD:ReachCode,created_by,NHD:Elevation,wp:reviewed,fishing,area,golf,swimming,phone,website,src,wheelchair,culvert,operator:wikipedia
element_type,osmid,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1,Unnamed: 75_level_1,Unnamed: 76_level_1,Unnamed: 77_level_1,Unnamed: 78_level_1,Unnamed: 79_level_1,Unnamed: 80_level_1
node,354172488,,,POINT (-90.34152 39.12725),dam,202.0,1691796.0,West Lake Country Club Lake Dam,IL00672,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
way,83730392,,,"POLYGON ((-90.38002 39.44405, -90.38011 39.443...",,173.0,1691909.0,White Hall Reservoir,,,,,,"[975232779, 975232791, 975232795, 975232798, 9...",,,,,,,,,,,,,,,,,water,reservoir_covered,reservoir,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
way,83730393,,,"POLYGON ((-90.37587 39.44232, -90.37587 39.442...",dam,173.0,1691908.0,White Hall Reservoir Dam,IL00757,,,,,"[975233405, 2012791637, 2012791639, 2012791436...",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
way,103735122,,Bing,"LINESTRING (-90.36766 39.4266, -90.36638 39.42...",dam,,,Fitzjarrell Lake Dam,IL50409,,,,,"[1197597702, 1197597724]",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
way,134212726,,Bing,"LINESTRING (-89.96112 39.199, -89.96156 39.199...",river,,,Macoupin Creek,,,,,,"[12407842309, 12407842310, 12407842311, 124078...",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,


In [10]:
# We have lots of attributes.. which ones are mostly empty, which are mostly full?

missing = water_bodies.isna().mean() * 100
df_cleaned = water_bodies.loc[:, missing < 90]

In [11]:
import folium
from shapely.geometry import mapping

home_lat, home_lon = 39.734838, -90.234682
m = folium.Map(location=[home_lat, home_lon], zoom_start=11)

# Add home marker
folium.Marker(
    [home_lat, home_lon],
    popup="Home (Jacksonville, IL)",
    icon=folium.Icon(color='red', icon='home')
).add_to(m)

# Add water bodies
for _, row in water_bodies.iterrows():
    geom = row['geometry']
    name = row.get('name', 'Unnamed')
    if geom.geom_type == "Polygon":
        folium.GeoJson(mapping(geom),
                       tooltip=name,
                       style_function=lambda x: {"color": "blue", "fillOpacity": 0.4}).add_to(m)
    elif geom.geom_type == "LineString":
        folium.GeoJson(mapping(geom),
                       tooltip=name,
                       style_function=lambda x: {"color": "green"}).add_to(m)
    elif geom.geom_type == "Point":
        folium.Marker([geom.y, geom.x], tooltip=name).add_to(m)

m

# 3) Which waters might be good for fishing?

The above is nice to play around in. It even picks up on bodies of water which may not be public, without a name on the map. What is the next step? <br>

I am thinking we scrape the iFishIllinois site, extracting names of lakes and fish present. <br>

Once we have these two data sources, we clean up the data and look for either <br>

a) a matching lake name - within 1.5 hour drive (simple filtering)
b) bodies of water without a name that meet a certain size requirement (using some math and extraction of polygon information)
c) streams connecting to the Illinois River (somehow tell using proximity and path finding?)

## a) Scraping iFishIllinois, and matching on what we have in water_bodies

The thought process here is easy to follow: iFishIllinois, or the DNR website for Illinois, contains information about bodies of water. What I want to do is "pogrammatically scrape this site, pulling out all the information present about each body of water. <br>

A first, simple check, is to see if this information already exists in csv format on their site. <strong>It does not.</strong>

https://ifishillinois.org/profiles/select_lake.php -> we want to basically do the following for each LAKE/ POND that appears on this page:

1) Curl its contents, or look at the text on the page some other way
2) Search within this contents for specific fish names. These will all be binary columns, 1 if present, 0 if not:
   - bullhead
   - bluegill
   - catfish
   - crappie
   - largemouth bass
   - redear sunfish
   - muskellunge or muskie or musky
   - trout
   - saugeye
   - pike
   - wiper
   - smallmouth bass
   - walleye
   - carp or buffalo
   - bluegill
   - gar
   - yellow bass
   - white bass
   - drum
   - striped bass
   - burbot
   - perch
   - sauger
   - bowfin
3) Grab all the information present in each lake's lake information table: county, acreage, average depth, etc. <br>

We will handle moving bodies of water a different way.

### 1) Get lake urls

In [56]:
import requests
from bs4 import BeautifulSoup
import re
import time

def get_lake_urls(base_url):
    """Extract all lake/pond profile URLs from the main page"""
    response = requests.get(base_url)
    soup = BeautifulSoup(response.content, 'html.parser')
    
    lake_links = []
    
    # Find all links that lead to waterbody profiles
    for link in soup.find_all('a', href=True):
        href = link['href']
        if '/profiles/waterbody.php' in href:
            # Convert relative URL to absolute URL
            if href.startswith('/'):
                full_url = f"https://ifishillinois.org{href}"
            else:
                full_url = f"https://ifishillinois.org/profiles/{href}"
            
            lake_links.append(full_url.strip())  # Strip any whitespace
    
    return lake_links


In [57]:
# lake_links = get_lake_urls("https://ifishillinois.org/profiles/select_lake.php")

In [58]:
# for i in range(0,5):
    # print(lake_links[i])

Working..

### 2) Detect fish species

In [59]:
def detect_fish_species(page_content):
    """Search for fish species in page content (case-insensitive)"""
    content_lower = page_content.lower()
    
    fish_species = {
        'bullhead': 0,
        'bluegill': 0,
        'catfish': 0,
        'crappie': 0,
        'largemouth_bass': 0,
        'redear_sunfish': 0,
        'muskellunge': 0, 
        'trout': 0,
        'saugeye': 0,
        'pike': 0,
        'wiper': 0,
        'smallmouth_bass': 0,
        'walleye': 0,
        'carp_buffalo': 0,
        'gar': 0,
        'yellow_bass': 0,
        'white_bass': 0,
        'drum': 0,
        'striped_bass': 0,
        'burbot': 0,
        'perch': 0,
        'sauger': 0,
        'bowfin': 0
    }
    
    patterns = {
        'bullhead': r'bullhead',
        'bluegill': r'bluegill',
        'catfish': r'catfish',
        'crappie': r'crappie',
        'largemouth_bass': r'largemouth bass',
        'redear_sunfish': r'redear sunfish',
        'muskellunge': r'(?:muskellunge|muskie|musky)(?!\s+creel\s+survey)',
        'trout': r'(?<!catchable\s)trout(?!\s+fishing|!\s+guide|!\s+tips|!\s+program)',
        'saugeye': r'saugeye',
        'pike': r'pike',
        'wiper': r'wiper',
        'smallmouth_bass': r'smallmouth bass',
        'walleye': r'walleye',
        'carp_buffalo': r'carp|buffalo',
        'gar': r'gar',
        'yellow_bass': r'yellow bass',
        'white_bass': r'white bass',
        'drum': r'drum',
        'striped_bass': r'striped bass',
        'burbot': r'burbot',
        'perch': r'perch',
        'sauger': r'sauger',
        'bowfin': r'bowfin'
    }
    
    for species, pattern in patterns.items():
        if re.search(pattern, content_lower):
            fish_species[species] = 1
    
    return fish_species

### 3) Extract lake information

In [60]:
def extract_lake_info(soup):
    """Extract lake details from the information table"""
    lake_info = {}
    
    # Look for the lake information table
    tables = soup.find_all('table')
    
    for table in tables:
        rows = table.find_all('tr')
        for row in rows:
            cells = row.find_all(['td', 'th'])
            if len(cells) >= 2:
                key = cells[0].get_text(strip=True).lower()
                value = cells[1].get_text(strip=True)
                
                if 'county' in key:
                    lake_info['county'] = value
                elif 'acreage' in key or 'acres' in key:
                    lake_info['acreage'] = value
                elif 'depth' in key:
                    lake_info['average_depth'] = value
                elif 'swimming' in key:
                    lake_info['swimming'] = value
    
    return lake_info

### 4) Main scraping function (bring it all together)

In [61]:
def scrape_lake_data(lake_url):
    """Scrape individual lake page for fish species and lake info"""
    try:
        response = requests.get(lake_url)
        response.raise_for_status()
        
        soup = BeautifulSoup(response.content, 'html.parser')
        page_content = soup.get_text()
        
        # Get lake name from title or header
        lake_name = soup.find('title').get_text(strip=True) if soup.find('title') else 'Unknown'
        
        # Extract fish species data
        fish_data = detect_fish_species(page_content)
        
        # Extract lake information
        lake_info = extract_lake_info(soup)
        
        # Combine all data
        result = {
            'lake_name': lake_name,
            'url': lake_url,
            **fish_data,
            **lake_info
        }
        
        return result
        
    except Exception as e:
        print(f"Error scraping {lake_url}: {e}")
        return None

In [62]:
def main():
    base_url = "https://ifishillinois.org/profiles/select_lake.php"
    
    # Get all lake URLs
    print("Getting lake URLs...")
    lake_urls = get_lake_urls(base_url)
    print(f"Found {len(lake_urls)} lakes/ponds")
    
    # Scrape each lake
    all_data = []
    for i, url in enumerate(lake_urls):
        if i % 20 == 0 or i == len(lake_urls) - 1:
            print(f"Scraping lake {i+1}/{len(lake_urls)}: {url}")
        
        data = scrape_lake_data(url)
        if data:
            all_data.append(data)
        
        time.sleep(1)
    
    df = pd.DataFrame(all_data)
    
    # Save to CSV
    df.to_csv('illinois_lakes_fish_data.csv', index=False)
    print(f"Saved data for {len(df)} lakes to CSV")
    
    return df

if __name__ == "__main__":
    df = main()

Getting lake URLs...
Found 777 lakes/ponds
Scraping lake 1/777: https://ifishillinois.org/profiles/waterbody.php?waternum=01000
Scraping lake 21/777: https://ifishillinois.org/profiles/waterbody.php?waternum=00249
Scraping lake 41/777: https://ifishillinois.org/profiles/waterbody.php?waternum=04084
Scraping lake 61/777: https://ifishillinois.org/profiles/waterbody.php?waternum=00276
Scraping lake 81/777: https://ifishillinois.org/profiles/waterbody.php?waternum=00519
Scraping lake 101/777: https://ifishillinois.org/profiles/waterbody.php?waternum=15099
Scraping lake 121/777: https://ifishillinois.org/profiles/waterbody.php?waternum=53018
Scraping lake 141/777: https://ifishillinois.org/profiles/waterbody.php?waternum=04706
Scraping lake 161/777: https://ifishillinois.org/profiles/waterbody.php?waternum=00049
Scraping lake 181/777: https://ifishillinois.org/profiles/waterbody.php?waternum=00255
Scraping lake 201/777: https://ifishillinois.org/profiles/waterbody.php?waternum=02076
Scrapi

In [63]:
ifish = pd.read_csv('illinois_lakes_fish_data.csv')
ifish.head()

Unnamed: 0,lake_name,url,bullhead,bluegill,catfish,crappie,largemouth_bass,redear_sunfish,muskellunge,trout,saugeye,pike,wiper,smallmouth_bass,walleye,carp_buffalo,gar,yellow_bass,white_bass,drum,striped_bass,burbot,perch,sauger,bowfin
0,Lake Profile -- ANDERSON LAKE,https://ifishillinois.org/profiles/waterbody.p...,1,1,1,1,1,0,1,0,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0
1,Lake Profile -- ANNA CITY LAKE,https://ifishillinois.org/profiles/waterbody.p...,0,1,1,0,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Lake Profile -- ARGYLE LAKE,https://ifishillinois.org/profiles/waterbody.p...,0,1,1,1,1,1,1,1,1,0,0,1,1,0,0,0,0,0,0,0,0,1,0
3,Lake Profile -- ARROWHEAD LAKE,https://ifishillinois.org/profiles/waterbody.p...,0,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Lake Profile -- ASHLAND NEW RESERVOIR,https://ifishillinois.org/profiles/waterbody.p...,0,1,1,1,1,1,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0


In [65]:
ifish.value_counts('muskellunge')

muskellunge
1    777
Name: count, dtype: int64