# Building an Agentic Pipeline 

Now that we have the individual pieces working, let's put it all together into a full Airflow pipeline.

In [1]:
import sys 
import subprocess

# get root of current repo and add to our path
root_dir = subprocess.check_output(["git", "rev-parse", "--show-toplevel"], stderr=subprocess.DEVNULL).decode("utf-8").strip()

sys.path.append(root_dir)

## Content Extraction 

Content will be scraped from available public media RSS feeds. This job will be designed to run every night at 5PM (provided the server and scheduler are running) and will write outputs to the `agentic-de/bronze` data directory.

The code blocks defined here will be consolidated into a single Airflow task in our Agentic Pipeline

In [2]:
from airflow.dags.utils.helpers import generate_npr_feed_urls

# get RSS feeds from public media sources
npr_rss_feeds = generate_npr_feed_urls()
pbs_rss_feeds = [
    "https://www.pbs.org/newshour/feeds/rss/headlines",
    "https://www.pbs.org/newshour/feeds/rss/politics",
    "https://www.pbs.org/newshour/feeds/rss/brooks-and-capehart"
]

# combine 
rss_feeds_to_crawl = npr_rss_feeds + pbs_rss_feeds

# status update
print(f"Preparing to request {len(rss_feeds_to_crawl)} RSS feeds")

Preparing to request 232 RSS feeds


In [3]:
from airflow.dags.utils.helpers import request_rss_feed
import tqdm

raw_feed_data = []
for url in tqdm.tqdm(rss_feeds_to_crawl[:5], desc="Requesting RSS feeds", unit="feed"):
    try:
        feed_data = request_rss_feed(url)
        if feed_data:
            raw_feed_data.append(feed_data)
    except Exception as e:
        print(f"Error requesting {url}: {e}")

Requesting RSS feeds: 100%|██████████| 5/5 [00:02<00:00,  1.90feed/s]


In [4]:
from airflow.dags.utils.aws import S3 
import os 

S3.upload_raw_rss_data(raw_feed_data[0], role_arn=os.getenv("DIGI_INNO_ROLE_ARN"))

[[34m2025-06-11T15:48:21.249-0400[0m] {[34mcredentials.py:[0m1352} INFO[0m - Found credentials in shared credentials file: ~/.aws/credentials[0m
Success!


## Transformation 

Here's where we'll embed our Agent! It will help us make an intelligent decision about which transformation pipeline a given file should be sent to. 