# Your personalized news podcast with Jina Reader, PromptPerfect, and Bark

This notebook will:

- Scrape RSS feeds from news sources for the latest articles.
- Summarize each article.
- Generate a one-paragraph news report from those summaries.
- Read it to you via text-to-speech.

This notebook is a companion to [this blog post]() on [Jina AI's blog]().

## Define some settings

These settings define:

- **Feed URLS**: The feed URLs you want to extract data from. In this example they're from a couple of tech news websites.
- **Maximum quantities**: To keep this example manageable, we want to limit things to a certain number of feeds, news items, and sentences in the spoken output.

In [None]:
feed_urls = [
    "https://www.osnews.com/feed/",
    "https://www.theregister.com/headlines.atom"
]

In [None]:
# Maximum number of feeds to fetch
MAX_FEEDS = 10

# Maximum news items per feed to fetch
MAX_ENTRIES = 3

# Maximum sentences of the news script to convert to speech
VOICE_MAX_SENTENCES = 7

## Add API keys

You will be prompted to enter your PromptPerfect and Replicate API keys below.

In [None]:
import getpass
PROMPTPERFECT_KEY = getpass.getpass()

In [None]:
import os

In [None]:
os.environ["REPLICATE_API_TOKEN"] = getpass.getpass()

## Get article URLs from feeds

Extract the latest stories from the feeds we defined.

In [None]:
!pip install feedparser

In [None]:
import feedparser, requests

In [None]:
page_urls = []

In [None]:
for feed_url in feed_urls[:MAX_FEEDS]:
    feed = feedparser.parse(feed_url)
    for entry in feed["entries"][:MAX_ENTRIES]:
        page_urls.append(entry["link"])

## Extract article text

Define a list of URLs of news sources, then pass each URL to Jina Reader to extract the text of the article without any of the junk like sidebars, headers, footers, etc.

In [None]:
articles = []

for url in page_urls:
    print(f"Processing {url}")
    reader_url = f"https://r.jina.ai/{url}"
    article = requests.get(reader_url)
    articles.append(article.text)

## Summarize each article

Pass each article text to a customized Prompt-as-a-Service on PromptPerfect, generating a summary of each.

Since we're using several prompts-as-services, let's define one function that we can use throughout the script:
- The function's `prompt_id` parameter defines which prompt we call. Each Prompt-as-Service has a unique ID.
- The `template_dict` parameter lets us define variables, like the initial article text or list of concatenated articles.

In [None]:
def get_paas_response(prompt_id, template_dict):
    url = f"https://api.promptperfect.jina.ai/{prompt_id}"

    headers = {
        "x-api-key": f"token {PROMPTPERFECT_KEY}",
        "Content-Type": "application/json"
    }
    
    response = requests.post(url, headers=headers, json={"parameters": template_dict})
    if response.status_code == 200:
        text = response.json()["data"]
        return text
    else:
        return response.text

In [None]:
summaries = []

for article in articles:
    summary = get_paas_response(prompt_id="mkuMXLdx1kMU0Xa8l19A", template_dict={"article": article})
    summaries.append(summary)

In [None]:
summaries

In [None]:
# Put all of the summaries into one text string as bullet points
concat_summaries = "\n- ".join(summaries)

## Convert summaries to news report script

Use another Prompt-as-a-Service to generate a natural sounding news report from the summaries.

In [None]:
news_script = get_paas_response(prompt_id="tmW07mipzJ14HgAjOcfD", template_dict={"summaries": concat_summaries})

## Convert news report script to speech

Use the Bark model on Replicate to convert the text to natural-sounding speech.

**Note:** The weird `b76242...` string is required to make the model work. It's not an API key, just some sort of UUID for the model.

In [None]:
!pip install replicate

In [None]:
import replicate

def get_voice(script, voice="announcer"):
    input_data = {
    "prompt": script,
    "history_prompt": voice,
    }
    
    output = replicate.run(
        "suno-ai/bark:b76242b40d67c76ab6742e987628a2a9ac019e11d56ab96c4e91ce03b79b2787",
        input=input_data
    )

    os.makedirs("output", exist_ok=True)

    import datetime

    # Get current date and time
    now = datetime.datetime.now()
    
    # Format the date and time
    filename = f'output/{now.strftime("%Y-%m-%d-%H:%M.wav")}'

    response = requests.get(output["audio_out"])

    with open(filename, "wb") as file:
        file.write(response.content)
    
    return filename

### Try snippet first

We don't want to spend too much money on the Replicate API, so we'll split the news script into sentences (more or less) and just take a few of those to start (based on `VOICE_MAX_CHUNKS`, defined earlier in the notebook.)

In [None]:
chunks = news_script.split(". ")

In [None]:
snippet = ". ".join(chunks[:VOICE_MAX_SENTENCES])

In [None]:
snippet

In [None]:
voice = get_voice(snippet)

## Play audio

In [None]:
from IPython.display import Audio

In [None]:
Audio(voice, autoplay=True)

## Next steps

We're not going to dive into how to turn this content into a podcast. There are plenty of better guides out there that can cover that!