# Your personalized news podcast with Jina Reader, PromptPerfect, and Bark

This notebook will:

- Scrape your RSS feeds for the latest news articles
- Summarize each article
- Generate a one paragraph news report from those summaries
- Read it to you via text-to-speech

## Define some settings

These settings define:

- **Feed URLS**: The feed URLs you want to extract data from. In this example they're from a couple of tech news websites.
- **Maximum quantities**: To keep this example manageable, we want to limit things to a certain number of feeds, news items, and sentences in the spoken output.

In [133]:
feed_urls = [
    "https://www.osnews.com/feed/",
    "https://www.theregister.com/headlines.atom"
]

In [134]:
# Maximum number of feeds to fetch
MAX_FEEDS = 10

# Maximum news items per feed to fetch
MAX_ENTRIES = 3

# Maximum sentences of the news script to convert to speech
VOICE_MAX_SENTENCES = 7

## Add keys

For PromptPerfect and Replicate.

In [58]:
import getpass
PROMPTPERFECT_KEY = getpass.getpass()

 ········


In [106]:
import os

os.environ["REPLICATE_API_TOKEN"] = getpass.getpass()

 ········


## Get article URLs from feeds

Extract the latest stories from the feeds we defined.

In [None]:
!pip install feedparser

In [135]:
import feedparser, requests

In [136]:
page_urls = []

In [137]:
for feed_url in feed_urls[:MAX_FEEDS]:
    feed = feedparser.parse(feed_url)
    for entry in feed["entries"][:MAX_ENTRIES]:
        page_urls.append(entry["link"])

## Extract article text

Pass each URL to Jina Reader to extract the text of the article without any of the junk like sidebars, headers, footers, etc.

In [138]:
articles = []

for url in page_urls:
    print(f"Processing {url}")
    reader_url = f"https://r.jina.ai/{url}"
    article = requests.get(reader_url)
    articles.append(article.text)

Processing https://www.osnews.com/story/139354/microsoft-shows-banner-in-settings-app-to-push-users-from-local-accounts-to-microsoft-accounts/
Processing https://www.osnews.com/story/139351/gtk-graphics-offload-revisited/
Processing https://www.osnews.com/story/139349/google-is-combining-its-android-and-hardware-teams-and-its-all-about-ai/
Processing https://go.theregister.com/feed/www.theregister.com/2024/04/19/germany_arrests_alleged_russian_spies/
Processing https://go.theregister.com/feed/www.theregister.com/2024/04/19/wing_commander_windows_95/
Processing https://go.theregister.com/feed/www.theregister.com/2024/04/19/uk_smart_meters_pac/


## Summarize each article

Pass each article text to a customized Prompt-as-a-Service on PromptPerfect, generating a shorter summary of each.

Since we're using several prompts-as-services, let's define one function that we can use throughout the script.

In [139]:
def get_paas_response(id, template_dict):
    url = f"https://api.promptperfect.jina.ai/{id}"

    headers = {
        "x-api-key": f"token {PROMPTPERFECT_KEY}",
        "Content-Type": "application/json"
    }
    
    response = requests.post(url, headers=headers, json={"parameters": template_dict})
    if response.status_code == 200:
        text = response.json()["data"]
        return text
    else:
        return response.text

In [140]:
summaries = []

for article in articles:
    summary = get_paas_response("mkuMXLdx1kMU0Xa8l19A", {"article": article})
    summaries.append(summary)

In [141]:
summaries

['Microsoft is rolling out account-related notifications in the Windows 10 Settings app encouraging users to switch from local accounts to Microsoft accounts, which offer data backups, subscription management, and added security features. The banners are removable and can be disabled, but the push towards Microsoft accounts may become more persistent in the future.',
 'The article discusses the reintroduction of support for dmabufs and graphics offload in GTK 4.14, with updates and improvements since its initial introduction last fall. The author, Thom Holwerda, provides an overview of the changes and enhancements that have been made, pointing readers to the GTK Development Blog for more detailed information. The article is aimed at readers with a technical understanding of graphics offload technology.',
 'Google is undergoing significant internal reorganizations to focus on AI, with the creation of a new team called "Platforms and Devices" overseen by Rick Osterloh. This team will man

In [154]:
# Put all of the summaries into one text string as bullet points
concat_summaries = "\n- ".join(summaries)

## Convert summaries to news script

Use another Prompt-as-a-Service to generate a natural sounding news report from the summaries.

In [144]:
news_script = get_paas_response("tmW07mipzJ14HgAjOcfD", {"summaries": concat_summaries})

## Convert script to speech

Use the Bark model on Replicate to convert the text to natural-sounding speech.

In [None]:
!pip install replicate

In [145]:
import replicate

def get_voice(script, voice="announcer"):
    input_data = {
    "prompt": script,
    "history_prompt": voice,
    }
    
    output = replicate.run(
        "suno-ai/bark:b76242b40d67c76ab6742e987628a2a9ac019e11d56ab96c4e91ce03b79b2787",
        input=input_data
    )

    os.makedirs("output", exist_ok=True)

    import datetime

    # Get current date and time
    now = datetime.datetime.now()
    
    # Format the date and time
    filename = f'output/{now.strftime("%Y-%m-%d-%H:%M.wav")}'

    response = requests.get(output["audio_out"])

    with open(filename, "wb") as file:
        file.write(response.content)
    
    return filename

### Try snippet first

We don't want to spend too much money on the Replicate API, so we'll split the news script into sentences (more or less) and just take a few of those to start (based on `VOICE_MAX_CHUNKS`, defined earlier in the notebook.)

In [146]:
chunks = news_script.split(". ")

In [147]:
snippet = ". ".join(chunks[:VOICE_MAX_SENTENCES])

In [148]:
snippet

"On today's tech news roundup, Microsoft is encouraging Windows 10 users to switch to Microsoft accounts for added security and features, while Google undergoes internal reorganizations to focus on AI integration in its products, raising concerns about trustworthiness and usability. In a shocking development, German authorities have arrested two German-Russian citizens suspected of being Russian spies planning to bomb industrial and military targets in support of Ukraine against Vladimir Putin's invasion, highlighting the escalating tensions in the region. Meanwhile, the UK government is facing challenges as millions of smart meters will become obsolete once 2G and 3G networks are switched off, with concerns about the cost implications for consumers. And in a blast from the past, Wing Commander III: Heart of the Tiger played a crucial role in testing Windows 95, revealing an issue with the copy hotkey that was fixed thanks to the dedication of testers. Stay tuned for more updates on th

In [149]:
voice = get_voice(snippet)

## Play audio

In [150]:
from IPython.display import Audio

In [152]:
Audio(voice, autoplay=True)

## Next steps

We're not going to dive into how to turn this content into a podcast. There are plenty of better guides out there that can cover that!