# Scraping a fixed URL site

We'll often encounter website where the url never changes. Here are a few examples: 


- <a href="https://www.seethroughny.net/">See Through NY</a> 
- <a href="https://restructuring.ra.kroll.com/pge/Home-ClaimInfo">PG&E fire victim creditors</a>


Winter's coming and we want to track critical notices by an energy company to see if prices spike due to their maintenance issues.

From <a href="https://infopost.enbridge.com/InfoPost/">this homepage</a>, we want to scrape all the  critical notices for all their business units. 

<img src="https://sandeepmj.github.io/image-host/energy-scrape.png">

Let's explore the site to come up with our scrape strategy.

A good approach always is to scrape a single page to see if we can, and then get all the pages.

## Single Page Scrape

Determine how to scrape a single page.


In [3]:
pip install lxml

Note: you may need to restart the kernel to use updated packages.


In [4]:
## import libraries
import lxml
import requests ## request content from websites
import pandas as pd ## organize scraped data
from bs4 import BeautifulSoup ## parse content from websites as html
from random import uniform, sample ## uniform for float, ## sample for random samples
import time ## to slow down our scrapes


In [5]:
## target url
url = "https://infopost.enbridge.com/InfoPost/NoticesList.asp?pipe=AG&type=CRI"

In [6]:
## scrape table with pandas
all_data = pd.read_html(url)
type(all_data)
type(all_data[0])

pandas.core.frame.DataFrame

In [7]:
## call my list
df = all_data[0]
df

Unnamed: 0,Notice Type,Posted Date/Time,Notice Effective Date/Time,Notice End Date/Time,Notice Identifier,Subject,Response Date/Time
0,Capacity Constraint,10/20/2025 03:01:17 PM,10/21/2025 09:00:00 AM,10/22/2025 09:00:00 AM,168903,AGT Pipeline Conditions for 10/21/2025,
1,Capacity Constraint,10/19/2025 03:09:24 PM,10/20/2025 09:00:00 AM,10/21/2025 09:00:00 AM,168875,AGT Pipeline Conditions for 10/20/2025,
2,Capacity Constraint,10/18/2025 03:02:46 PM,10/19/2025 09:00:00 AM,10/20/2025 09:00:00 AM,168828,AGT Pipeline Conditions for 10/19/2025,
3,Capacity Constraint,10/17/2025 03:13:53 PM,10/18/2025 09:00:00 AM,10/19/2025 09:00:00 AM,168804,AGT Pipeline Conditions for 10/18/2025,
4,Capacity Constraint,10/16/2025 03:01:34 PM,10/17/2025 09:00:00 AM,10/18/2025 09:00:00 AM,168740,AGT Pipeline Conditions for 10/17/2025,
...,...,...,...,...,...,...,...
124,Capacity Constraint,07/25/2025 08:16:35 PM,07/25/2025 08:16:35 PM,07/26/2025 09:00:00 AM,165530,AGT Pipeline Conditions for 7/25/2025 -- INTRADAY,
125,Capacity Constraint,07/25/2025 03:04:42 PM,07/26/2025 09:00:00 AM,07/27/2025 09:00:00 AM,165513,AGT Pipeline Conditions for 7/26/2025,
126,Capacity Constraint,07/24/2025 03:30:44 PM,07/25/2025 09:00:00 AM,07/26/2025 09:00:00 AM,165485,AGT Pipeline Conditions for 7/25/2025,
127,Operational Flow Order,07/24/2025 03:00:00 PM,07/24/2025 03:00:00 PM,10/22/2025 08:37:10 AM,165453,AGT Operational Flow Order -- EFF 7/28,


In [8]:
## GET ALL URLS

## get notice id numbers

notice_ids = df["Notice Identifier"].to_list()
notice_ids[:10]




[168903,
 168875,
 168828,
 168804,
 168740,
 168712,
 168691,
 168642,
 168607,
 168589]

In [9]:
## insert into base link to 

start_url = "https://infopost.enbridge.com/InfoPost/NoticeListDetail.asp?strKey1="
end_url = "&type=CRI&Embed=2&pipe=AG"

In [10]:
## for loops
links_fl = []
for notice_id in notice_ids:
    links_fl.append(f"{start_url}{notice_id}{end_url}")
links_fl   

['https://infopost.enbridge.com/InfoPost/NoticeListDetail.asp?strKey1=168903&type=CRI&Embed=2&pipe=AG',
 'https://infopost.enbridge.com/InfoPost/NoticeListDetail.asp?strKey1=168875&type=CRI&Embed=2&pipe=AG',
 'https://infopost.enbridge.com/InfoPost/NoticeListDetail.asp?strKey1=168828&type=CRI&Embed=2&pipe=AG',
 'https://infopost.enbridge.com/InfoPost/NoticeListDetail.asp?strKey1=168804&type=CRI&Embed=2&pipe=AG',
 'https://infopost.enbridge.com/InfoPost/NoticeListDetail.asp?strKey1=168740&type=CRI&Embed=2&pipe=AG',
 'https://infopost.enbridge.com/InfoPost/NoticeListDetail.asp?strKey1=168712&type=CRI&Embed=2&pipe=AG',
 'https://infopost.enbridge.com/InfoPost/NoticeListDetail.asp?strKey1=168691&type=CRI&Embed=2&pipe=AG',
 'https://infopost.enbridge.com/InfoPost/NoticeListDetail.asp?strKey1=168642&type=CRI&Embed=2&pipe=AG',
 'https://infopost.enbridge.com/InfoPost/NoticeListDetail.asp?strKey1=168607&type=CRI&Embed=2&pipe=AG',
 'https://infopost.enbridge.com/InfoPost/NoticeListDetail.asp?st

In [11]:
##create list of links Link comprehension

links_lc = [f"{start_url}{notice_id}{end_url}" for notice_id in notice_ids]
links_lc

['https://infopost.enbridge.com/InfoPost/NoticeListDetail.asp?strKey1=168903&type=CRI&Embed=2&pipe=AG',
 'https://infopost.enbridge.com/InfoPost/NoticeListDetail.asp?strKey1=168875&type=CRI&Embed=2&pipe=AG',
 'https://infopost.enbridge.com/InfoPost/NoticeListDetail.asp?strKey1=168828&type=CRI&Embed=2&pipe=AG',
 'https://infopost.enbridge.com/InfoPost/NoticeListDetail.asp?strKey1=168804&type=CRI&Embed=2&pipe=AG',
 'https://infopost.enbridge.com/InfoPost/NoticeListDetail.asp?strKey1=168740&type=CRI&Embed=2&pipe=AG',
 'https://infopost.enbridge.com/InfoPost/NoticeListDetail.asp?strKey1=168712&type=CRI&Embed=2&pipe=AG',
 'https://infopost.enbridge.com/InfoPost/NoticeListDetail.asp?strKey1=168691&type=CRI&Embed=2&pipe=AG',
 'https://infopost.enbridge.com/InfoPost/NoticeListDetail.asp?strKey1=168642&type=CRI&Embed=2&pipe=AG',
 'https://infopost.enbridge.com/InfoPost/NoticeListDetail.asp?strKey1=168607&type=CRI&Embed=2&pipe=AG',
 'https://infopost.enbridge.com/InfoPost/NoticeListDetail.asp?st

## Next step: scrape ALL the gas lines.

What is our approach?

### Using ```Headers``` When Web Scraping

**Headers make your scraper look like a real browser instead of a bot.** 

Websites can easily detect and block requests that lack typical browser information, returning 403 errors or empty content.

**The key header is `User-Agent`:**
```python
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36'}

```

### `pd.read_html()` usually works without headers

**`pd.read_html()` is designed specifically for parsing HTML tables, not general web scraping.** 

It focuses on extracting `<table>` elements from already-loaded HTML content, which many websites serve even to basic requests.

Reading tables seems **less suspicious** and more legitimate than taking all the page content.

In [14]:
## create headers

headers = {
    'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36'
}

In [15]:
## get homepage and make soup
# ul id dropdown

homepage = "https://infopost.enbridge.com/InfoPost/"
response = requests.get(homepage, headers = headers)




In [16]:
type(response)

requests.models.Response

In [17]:
response.status_code

200

In [18]:
response.text

'<!DOCTYPE html>\r\n<!-- template.asp -->\r\n\r\n\r\n<HTML lang="en">\r\n<HEAD>\r\n<meta charset="utf-8"/>\r\n\r\n <meta http-equiv="X-UA-Compatible" content="IE=edge">\r\n<meta name="viewport" content="width=device-width, initial-scale=1.0">\r\n\r\n<link rel="shortcut icon" href="favicon.ico" />\r\n<link href="css/jquery-ui.min.css" rel="stylesheet" type="text/css" />\r\n<link href="css/bootstrap.min.css" rel="stylesheet">\r\n<link href="css/font-awesome-ie7.min.css" rel="stylesheet"/>\r\n<link href="css/font-awesome.css" rel="stylesheet">\r\n\r\n<link href="css/link.css" rel="stylesheet" />\r\n<link href="css/print.css" rel="stylesheet" media="print" />\r\n<link href="css/infopost-custom.css" rel="stylesheet" media="screen" />\r\n<link href="css/environment.css" rel="stylesheet" media="screen" />\r\n<!-- HTML5 shim, for IE6-9 support of HTML5 elements -->\r\n<!--[if lt IE 9]>\r\n\t<link href="css/link-ie.css" rel="stylesheet" />\r\n\t<script src="scripts/html5shiv.js" type="text/java

In [19]:
soup = BeautifulSoup(response.text, "html.parser")
print(soup.prettify())

<!DOCTYPE html>
<!-- template.asp -->
<html lang="en">
 <head>
  <meta charset="utf-8"/>
  <meta content="IE=edge" http-equiv="X-UA-Compatible"/>
  <meta content="width=device-width, initial-scale=1.0" name="viewport"/>
  <link href="favicon.ico" rel="shortcut icon"/>
  <link href="css/jquery-ui.min.css" rel="stylesheet" type="text/css"/>
  <link href="css/bootstrap.min.css" rel="stylesheet"/>
  <link href="css/font-awesome-ie7.min.css" rel="stylesheet">
   <link href="css/font-awesome.css" rel="stylesheet"/>
   <link href="css/link.css" rel="stylesheet">
    <link href="css/print.css" media="print" rel="stylesheet"/>
    <link href="css/infopost-custom.css" media="screen" rel="stylesheet"/>
    <link href="css/environment.css" media="screen" rel="stylesheet"/>
    <!-- HTML5 shim, for IE6-9 support of HTML5 elements -->
    <!--[if lt IE 9]>
	<link href="css/link-ie.css" rel="stylesheet" />
	<script src="scripts/html5shiv.js" type="text/javascript"></script>
	<script src="scripts/html

In [20]:
## get dropdown html
dropdown = soup.find(id="dropdown")
type(dropdown)
dropdown

<ul class="dropdown-menu select-pipe-dropdown-menu" id="dropdown">
<li><a href="AGHome.asp?Pipe=AG">Algonquin (AGT)</a></li><li><a href="BGSHome.asp?Pipe=BGS">Bobcat Gas Storage (BGS)</a></li><li><a href="BIGHome.asp?Pipe=BIG">BIG Pipeline (BIG)</a></li><li><a href="BSPHome.asp?Pipe=BSP">Big Sandy Pipeline (BSP)</a></li><li><a href="EGHome.asp?Pipe=EG">MHP Egan (EHP)</a></li><li><a href="ETHome.asp?Pipe=ET">East Tennessee (ETNG)</a></li><li><a href="GBHome.asp?Pipe=GB">Garden Banks (GB)</a></li><li><a href="GPLHome.asp?Pipe=GPL">Generation  Pipeline (GPL)</a></li><li><a href="MCGPHome.asp?Pipe=MCGP">Mississippi Canyon (MCGP)</a></li><li><a href="MBHome.asp?Pipe=MB">MHP Moss Bluff (MBHP)</a></li><li><a href="MNCAHome.asp?Pipe=MNCA">Maritimes &amp; Northeast Canada (MNCA)</a></li><li><a href="MNUSHome.asp?Pipe=MNUS">Maritimes &amp; Northeast U.S. (MNUS)</a></li><li><a href="MRHome.asp?Pipe=MR">Manta Ray Offshore Gathering Company (MR)</a></li><li><a href="NPCHome.asp?Pipe=NPC">Nautilus P

In [21]:
## get atags
atags = dropdown.find_all("a")
atags

[<a href="AGHome.asp?Pipe=AG">Algonquin (AGT)</a>,
 <a href="BGSHome.asp?Pipe=BGS">Bobcat Gas Storage (BGS)</a>,
 <a href="BIGHome.asp?Pipe=BIG">BIG Pipeline (BIG)</a>,
 <a href="BSPHome.asp?Pipe=BSP">Big Sandy Pipeline (BSP)</a>,
 <a href="EGHome.asp?Pipe=EG">MHP Egan (EHP)</a>,
 <a href="ETHome.asp?Pipe=ET">East Tennessee (ETNG)</a>,
 <a href="GBHome.asp?Pipe=GB">Garden Banks (GB)</a>,
 <a href="GPLHome.asp?Pipe=GPL">Generation  Pipeline (GPL)</a>,
 <a href="MCGPHome.asp?Pipe=MCGP">Mississippi Canyon (MCGP)</a>,
 <a href="MBHome.asp?Pipe=MB">MHP Moss Bluff (MBHP)</a>,
 <a href="MNCAHome.asp?Pipe=MNCA">Maritimes &amp; Northeast Canada (MNCA)</a>,
 <a href="MNUSHome.asp?Pipe=MNUS">Maritimes &amp; Northeast U.S. (MNUS)</a>,
 <a href="MRHome.asp?Pipe=MR">Manta Ray Offshore Gathering Company (MR)</a>,
 <a href="NPCHome.asp?Pipe=NPC">Nautilus Pipeline Company (NPC)</a>,
 <a href="NXCAHome.asp?Pipe=NXCA">NEXUS ULC (NXCA)</a>,
 <a href="NXUSHome.asp?Pipe=NXUS">NEXUS U.S. (NXUS)</a>,
 <a href

In [22]:
## get the hrefs LC
hrefs = [atag.get("href") for atag in atags]
hrefs

['AGHome.asp?Pipe=AG',
 'BGSHome.asp?Pipe=BGS',
 'BIGHome.asp?Pipe=BIG',
 'BSPHome.asp?Pipe=BSP',
 'EGHome.asp?Pipe=EG',
 'ETHome.asp?Pipe=ET',
 'GBHome.asp?Pipe=GB',
 'GPLHome.asp?Pipe=GPL',
 'MCGPHome.asp?Pipe=MCGP',
 'MBHome.asp?Pipe=MB',
 'MNCAHome.asp?Pipe=MNCA',
 'MNUSHome.asp?Pipe=MNUS',
 'MRHome.asp?Pipe=MR',
 'NPCHome.asp?Pipe=NPC',
 'NXCAHome.asp?Pipe=NXCA',
 'NXUSHome.asp?Pipe=NXUS',
 'SESHHome.asp?Pipe=SESH',
 'SGHome.asp?Pipe=SG',
 'SRHome.asp?Pipe=SR',
 'STTHome.asp?Pipe=STT',
 'TEHome.asp?Pipe=TE',
 'TPGSHome.asp?Pipe=TPGS',
 'VCPHome.asp?Pipe=VCP',
 'WEHome.asp?Pipe=WE',
 'WRGSHome.asp?Pipe=WRGS']

In [23]:
## get href FL
hrefs_fl = []
for atag in atags:
    hrefs_fl.append(atag.get("href"))

hrefs_fl

['AGHome.asp?Pipe=AG',
 'BGSHome.asp?Pipe=BGS',
 'BIGHome.asp?Pipe=BIG',
 'BSPHome.asp?Pipe=BSP',
 'EGHome.asp?Pipe=EG',
 'ETHome.asp?Pipe=ET',
 'GBHome.asp?Pipe=GB',
 'GPLHome.asp?Pipe=GPL',
 'MCGPHome.asp?Pipe=MCGP',
 'MBHome.asp?Pipe=MB',
 'MNCAHome.asp?Pipe=MNCA',
 'MNUSHome.asp?Pipe=MNUS',
 'MRHome.asp?Pipe=MR',
 'NPCHome.asp?Pipe=NPC',
 'NXCAHome.asp?Pipe=NXCA',
 'NXUSHome.asp?Pipe=NXUS',
 'SESHHome.asp?Pipe=SESH',
 'SGHome.asp?Pipe=SG',
 'SRHome.asp?Pipe=SR',
 'STTHome.asp?Pipe=STT',
 'TEHome.asp?Pipe=TE',
 'TPGSHome.asp?Pipe=TPGS',
 'VCPHome.asp?Pipe=VCP',
 'WEHome.asp?Pipe=WE',
 'WRGSHome.asp?Pipe=WRGS']

In [24]:
animals = ["cat", "dog", "rat"]
animals[1]

'dog'

In [25]:
## split codes 
unit_codes = [href.split('=')[1] for href in hrefs]
unit_codes

['AG',
 'BGS',
 'BIG',
 'BSP',
 'EG',
 'ET',
 'GB',
 'GPL',
 'MCGP',
 'MB',
 'MNCA',
 'MNUS',
 'MR',
 'NPC',
 'NXCA',
 'NXUS',
 'SESH',
 'SG',
 'SR',
 'STT',
 'TE',
 'TPGS',
 'VCP',
 'WE',
 'WRGS']

In [26]:
for href in hrefs:
    print(href.split("=")[1])
    print("**********")

AG
**********
BGS
**********
BIG
**********
BSP
**********
EG
**********
ET
**********
GB
**********
GPL
**********
MCGP
**********
MB
**********
MNCA
**********
MNUS
**********
MR
**********
NPC
**********
NXCA
**********
NXUS
**********
SESH
**********
SG
**********
SR
**********
STT
**********
TE
**********
TPGS
**********
VCP
**********
WE
**********
WRGS
**********


In [27]:
type(unit_codes)

list

In [28]:
unit_codes[0][1]

'G'

In [29]:
## target codes only


In [30]:
## url templates
start_url = "https://infopost.enbridge.com/InfoPost/NoticesList.asp?pipe="
end_url = "&type=CRI"

In [31]:
## attch codes to base urls
links = [f"{start_url}{unit_code}{end_url}" for unit_code in unit_codes]
links

['https://infopost.enbridge.com/InfoPost/NoticesList.asp?pipe=AG&type=CRI',
 'https://infopost.enbridge.com/InfoPost/NoticesList.asp?pipe=BGS&type=CRI',
 'https://infopost.enbridge.com/InfoPost/NoticesList.asp?pipe=BIG&type=CRI',
 'https://infopost.enbridge.com/InfoPost/NoticesList.asp?pipe=BSP&type=CRI',
 'https://infopost.enbridge.com/InfoPost/NoticesList.asp?pipe=EG&type=CRI',
 'https://infopost.enbridge.com/InfoPost/NoticesList.asp?pipe=ET&type=CRI',
 'https://infopost.enbridge.com/InfoPost/NoticesList.asp?pipe=GB&type=CRI',
 'https://infopost.enbridge.com/InfoPost/NoticesList.asp?pipe=GPL&type=CRI',
 'https://infopost.enbridge.com/InfoPost/NoticesList.asp?pipe=MCGP&type=CRI',
 'https://infopost.enbridge.com/InfoPost/NoticesList.asp?pipe=MB&type=CRI',
 'https://infopost.enbridge.com/InfoPost/NoticesList.asp?pipe=MNCA&type=CRI',
 'https://infopost.enbridge.com/InfoPost/NoticesList.asp?pipe=MNUS&type=CRI',
 'https://infopost.enbridge.com/InfoPost/NoticesList.asp?pipe=MR&type=CRI',
 '

In [32]:
## length of links
len(links)

25

### ```enumerate()```

We can create an efficient counter with minimal coding to track our progress.

In [34]:
## quick explanation
## run this cell
fruits = ["apple", "orange", "plum", "pear", "banana"]
fruits

['apple', 'orange', 'plum', 'pear', 'banana']

In [35]:
## demo enumerate
for i, fruit in enumerate(fruits, start = 1):
    print(i, fruit)

1 apple
2 orange
3 plum
4 pear
5 banana


In [45]:
## scrape all gas units at one time
df_list = []
broken_links = []
total_links = len(unit_codes)

for counter, unit_code in enumerate(unit_codes, start = 1):
    target_link = f"{start_url}{unit_code}{end_url}"
    print(f"Scraping {counter} of {total_links}")
    try:
        data = pd.read_html(target_link)
        df = data[0]
        df["unit"] = unit_code
        df_list.append(df)
    except:
        print(f"{unit_code} was busted or had no table")
        broken_url.append(target_link)
    finally:
        snooze = uniform(10,20)
        print(f"Snoozing for {snooze} seconds")
        time.sleep(snooze)
        
print(f"Done scraping all units")

Scraping 1 of 25
Snoozing for 11.736814837358606 seconds
Scraping 2 of 25
Snoozing for 12.65261045661666 seconds
Scraping 3 of 25
Snoozing for 14.305255926675562 seconds
Scraping 4 of 25
Snoozing for 14.176912763312753 seconds
Scraping 5 of 25
Snoozing for 12.611683960931952 seconds
Scraping 6 of 25
Snoozing for 18.477275098054047 seconds
Scraping 7 of 25
Snoozing for 15.897912439191995 seconds
Scraping 8 of 25
Snoozing for 14.337505073866762 seconds
Scraping 9 of 25
Snoozing for 14.910662509099245 seconds
Scraping 10 of 25
Snoozing for 19.006073596918114 seconds
Scraping 11 of 25
Snoozing for 14.440939643031577 seconds
Scraping 12 of 25
Snoozing for 15.886532565485776 seconds
Scraping 13 of 25
Snoozing for 14.880572238352485 seconds
Scraping 14 of 25
Snoozing for 12.508706762874139 seconds
Scraping 15 of 25
Snoozing for 11.623358793332958 seconds
Scraping 16 of 25
Snoozing for 15.7627844996405 seconds
Scraping 17 of 25
Snoozing for 12.086519460592994 seconds
Scraping 18 of 25
Snoozing

In [49]:
## call our list
## what do we have?
df_list[0]

Unnamed: 0,Notice Type,Posted Date/Time,Notice Effective Date/Time,Notice End Date/Time,Notice Identifier,Subject,Response Date/Time,unit
0,Capacity Constraint,10/20/2025 03:01:17 PM,10/21/2025 09:00:00 AM,10/22/2025 09:00:00 AM,168903,AGT Pipeline Conditions for 10/21/2025,,AG
1,Capacity Constraint,10/19/2025 03:09:24 PM,10/20/2025 09:00:00 AM,10/21/2025 09:00:00 AM,168875,AGT Pipeline Conditions for 10/20/2025,,AG
2,Capacity Constraint,10/18/2025 03:02:46 PM,10/19/2025 09:00:00 AM,10/20/2025 09:00:00 AM,168828,AGT Pipeline Conditions for 10/19/2025,,AG
3,Capacity Constraint,10/17/2025 03:13:53 PM,10/18/2025 09:00:00 AM,10/19/2025 09:00:00 AM,168804,AGT Pipeline Conditions for 10/18/2025,,AG
4,Capacity Constraint,10/16/2025 03:01:34 PM,10/17/2025 09:00:00 AM,10/18/2025 09:00:00 AM,168740,AGT Pipeline Conditions for 10/17/2025,,AG
...,...,...,...,...,...,...,...,...
124,Capacity Constraint,07/25/2025 08:16:35 PM,07/25/2025 08:16:35 PM,07/26/2025 09:00:00 AM,165530,AGT Pipeline Conditions for 7/25/2025 -- INTRADAY,,AG
125,Capacity Constraint,07/25/2025 03:04:42 PM,07/26/2025 09:00:00 AM,07/27/2025 09:00:00 AM,165513,AGT Pipeline Conditions for 7/26/2025,,AG
126,Capacity Constraint,07/24/2025 03:30:44 PM,07/25/2025 09:00:00 AM,07/26/2025 09:00:00 AM,165485,AGT Pipeline Conditions for 7/25/2025,,AG
127,Operational Flow Order,07/24/2025 03:00:00 PM,07/24/2025 03:00:00 PM,10/22/2025 08:37:10 AM,165453,AGT Operational Flow Order -- EFF 7/28,,AG


In [55]:
## concat
df = pd.concat(df_list, ignore_index = True)
df

Unnamed: 0,Notice Type,Posted Date/Time,Notice Effective Date/Time,Notice End Date/Time,Notice Identifier,Subject,Response Date/Time,unit
0,Capacity Constraint,10/20/2025 03:01:17 PM,10/21/2025 09:00:00 AM,10/22/2025 09:00:00 AM,168903,AGT Pipeline Conditions for 10/21/2025,,AG
1,Capacity Constraint,10/19/2025 03:09:24 PM,10/20/2025 09:00:00 AM,10/21/2025 09:00:00 AM,168875,AGT Pipeline Conditions for 10/20/2025,,AG
2,Capacity Constraint,10/18/2025 03:02:46 PM,10/19/2025 09:00:00 AM,10/20/2025 09:00:00 AM,168828,AGT Pipeline Conditions for 10/19/2025,,AG
3,Capacity Constraint,10/17/2025 03:13:53 PM,10/18/2025 09:00:00 AM,10/19/2025 09:00:00 AM,168804,AGT Pipeline Conditions for 10/18/2025,,AG
4,Capacity Constraint,10/16/2025 03:01:34 PM,10/17/2025 09:00:00 AM,10/18/2025 09:00:00 AM,168740,AGT Pipeline Conditions for 10/17/2025,,AG
...,...,...,...,...,...,...,...,...
862,Other,09/29/2025 05:16:33 PM,09/29/2025 05:16:33 PM,12/28/2025 05:16:33 PM,168038,Transferring Supply shipper account balances f...,,WE
863,Other,09/29/2025 05:12:09 PM,09/29/2025 05:12:09 PM,12/28/2025 05:12:09 PM,168037,Transferring Station #2 Shipper account balanc...,,WE
864,Other,09/29/2025 04:42:33 PM,09/29/2025 04:42:33 PM,12/28/2025 04:42:33 PM,168036,LINK Station 2 Balancing - Recommended Shipper...,,WE
865,Other,09/29/2025 04:17:42 PM,09/29/2025 04:17:42 PM,12/28/2025 04:05:42 PM,168035,The LINK system is OPEN for Westcoast Customers,,WE


In [67]:
df.sample(10)

Unnamed: 0,Notice Type,Posted Date/Time,Notice Effective Date/Time,Notice End Date/Time,Notice Identifier,Subject,Response Date/Time,unit
237,Capacity Constraint,08/15/2025 02:31:15 PM,08/16/2025 09:00:00 AM,08/17/2025 09:00:00 AM,166288,ETNG Pipeline Conditions for 8/16/2025,,ET
526,Capacity Constraint,09/17/2025 02:45:33 PM,09/18/2025 09:00:00 AM,09/19/2025 09:00:00 AM,167521,SR Storage Conditions for 9/18/2025,,SR
87,Capacity Constraint,08/16/2025 03:05:00 PM,08/17/2025 09:00:00 AM,08/18/2025 09:00:00 AM,166324,AGT Pipeline Conditions for 8/17/2025,,AG
187,Capacity Constraint,09/24/2025 02:45:10 PM,09/25/2025 09:00:00 AM,09/26/2025 09:00:00 AM,167775,ETNG Pipeline Conditions for 9/25/2025,,ET
449,Capacity Constraint,08/04/2025 02:46:21 PM,08/05/2025 09:00:00 AM,08/06/2025 09:00:00 AM,165870,SESH Pipeline Conditions for 8/5/2025,,SESH
81,Operational Flow Order,08/21/2025 03:00:00 PM,08/23/2025 09:00:00 AM,08/25/2025 09:00:00 AM,166467,AGT Operational Flow Order -- EFF 8/23,,AG
393,Capacity Constraint,09/29/2025 02:22:16 PM,09/30/2025 09:00:00 AM,10/01/2025 09:00:00 AM,168008,SESH Pipeline Conditions for 9/30/2025,,SESH
95,Capacity Constraint,08/12/2025 02:49:39 PM,08/13/2025 09:00:00 AM,08/14/2025 09:00:00 AM,166177,AGT Pipeline Conditions for 8/13/2025,,AG
418,Capacity Constraint,09/04/2025 02:45:20 PM,09/05/2025 09:00:00 AM,09/06/2025 09:00:00 AM,167022,SESH Pipeline Conditions for 9/5/2025,,SESH
859,Capacity Constraint,09/30/2025 07:55:01 AM,09/30/2025 07:55:01 AM,12/29/2025 07:49:01 AM,168041,FSJML to Station #2 Capacity for October 1 - 1...,,WE


### Capture actual critical notices text

In [83]:
## create link to each text description.

## get notice id numbers
ids = list(df["Notice Identifier"])
ids

[168903,
 168875,
 168828,
 168804,
 168740,
 168712,
 168691,
 168642,
 168607,
 168589,
 168557,
 168555,
 168487,
 168474,
 168457,
 168418,
 168432,
 168396,
 168370,
 168324,
 168269,
 168236,
 168202,
 168169,
 168152,
 168133,
 168122,
 168061,
 168013,
 167929,
 167901,
 167885,
 167867,
 167832,
 167806,
 167777,
 167743,
 167712,
 167687,
 167659,
 167612,
 167610,
 167570,
 167553,
 167536,
 167500,
 167498,
 167462,
 167429,
 167398,
 167361,
 167357,
 167291,
 167310,
 167252,
 167237,
 167190,
 167169,
 167168,
 167144,
 167130,
 167095,
 167045,
 167010,
 166972,
 166934,
 166902,
 166871,
 166820,
 166758,
 166774,
 166740,
 166714,
 166688,
 166620,
 166589,
 166574,
 166558,
 166540,
 166539,
 166499,
 166467,
 166450,
 166418,
 166389,
 166370,
 166351,
 166324,
 166295,
 166285,
 166246,
 166260,
 166240,
 166237,
 166232,
 166177,
 166155,
 166124,
 166091,
 166049,
 166025,
 166008,
 166003,
 165972,
 165962,
 165928,
 165896,
 165890,
 165866,
 165859,
 165857,
 

In [85]:
## name of units
codes = (df["unit"])
codes

0      AG
1      AG
2      AG
3      AG
4      AG
       ..
862    WE
863    WE
864    WE
865    WE
866    WE
Name: unit, Length: 867, dtype: object

### What are the elements that make up the url call?

In [87]:
## build parts of url
start_url = "https://infopost.enbridge.com/InfoPost/NoticeListDetail.asp?strKey1="
end_url = "&type=CRI&Embed=2&pipe="

## Build our URLs using ```zip()``` 

In [99]:
## zip together and build new list of text descriptions

text_list = []
for id, code in zip(ids, codes):
    # print(id, code)
    text_list.append(f"{start_url}{id}{end_url}{code}")

sample(text_list, 15)

['https://infopost.enbridge.com/InfoPost/NoticeListDetail.asp?strKey1=168850&type=CRI&Embed=2&pipe=MB',
 'https://infopost.enbridge.com/InfoPost/NoticeListDetail.asp?strKey1=167045&type=CRI&Embed=2&pipe=AG',
 'https://infopost.enbridge.com/InfoPost/NoticeListDetail.asp?strKey1=167924&type=CRI&Embed=2&pipe=TE',
 'https://infopost.enbridge.com/InfoPost/NoticeListDetail.asp?strKey1=167885&type=CRI&Embed=2&pipe=AG',
 'https://infopost.enbridge.com/InfoPost/NoticeListDetail.asp?strKey1=168434&type=CRI&Embed=2&pipe=SR',
 'https://infopost.enbridge.com/InfoPost/NoticeListDetail.asp?strKey1=168437&type=CRI&Embed=2&pipe=TE',
 'https://infopost.enbridge.com/InfoPost/NoticeListDetail.asp?strKey1=167109&type=CRI&Embed=2&pipe=SR',
 'https://infopost.enbridge.com/InfoPost/NoticeListDetail.asp?strKey1=167806&type=CRI&Embed=2&pipe=AG',
 'https://infopost.enbridge.com/InfoPost/NoticeListDetail.asp?strKey1=165540&type=CRI&Embed=2&pipe=ET',
 'https://infopost.enbridge.com/InfoPost/NoticeListDetail.asp?st

In [None]:
## pull out 10 random samples
## use our sample method from the random library (imported earlier)



In [101]:
url = text_list[0]
url

'https://infopost.enbridge.com/InfoPost/NoticeListDetail.asp?strKey1=168903&type=CRI&Embed=2&pipe=AG'

In [103]:
response = requests.get(url)
soup = BeautifulSoup(response.text, "html.parser")
soup

<!-- template.asp -->
<html><head><title>Algonquin Gas Transmission, LLC 006951446 : Critical Notices</title><link href="/styles/infopost.css" rel="stylesheet" type="text/css"/><link href="NoticeDetail.css" rel="stylesheet" type="text/css"/></head><body><div id="main"><script type="text/javascript"> function Print() {if (document.queryCommandSupported('print')) {document.execCommand('print', false, null);	}else {window.parent.print(); }}</script><div class="recordsetBar"><a class="recordsetBar" href="javascript:history.go(-1)">Back</a><a class="recordsetBar" href="#" onclick="Print();return false">Print</a></div><div id="bulletinContents"><div class="headingArea"><div id="heading">TSP: <br/>TSP Name: <br/>Critical Notice Description: <br/>Notice Effective Date: <br/>Notice Effective Time: <br/>Notice End Date: <br/>Notice End Time: <br/>Notice Identifier: <br/>Notice Status Description: <br/>Notice Type: <br/>Posting Date: <br/>Posting Time: <br/>Prior Notice Identifier: <br/>Required 