🌟 Exercise 4 : Scrape And Categorize News Articles From A JavaScript-Enabled News Site

Task:

Visit a section of a news website (e.g., the Technology section of BBC News).
Scrape news article titles and their publication dates.
Categorize articles based on their publication month.
Instructions:

Use Selenium to navigate to a specific news section on the website.
Extract and parse the HTML content that is dynamically loaded via JavaScript.
Using BeautifulSoup, extract news article titles and publication dates.
Categorize articles by their publication month (e.g., ‘January’, ‘February’, etc.).
Print the categorized lists of articles.

In [6]:
import requests
from bs4 import BeautifulSoup
from collections import defaultdict
from datetime import datetime

url = 'https://feeds.bbci.co.uk/news/technology/rss.xml'

response = requests.get(url)

if response.status_code != 200:
    print(f"Failed to retrieve data: {response.status_code}")
    exit()

soup = BeautifulSoup(response.content, 'xml')

items = soup.find_all('item')

categorized_articles = defaultdict(list)

for item in items:
    title = item.title.text.strip()
    pub_date = item.pubDate.text.strip()
    date_obj = datetime.strptime(pub_date, '%a, %d %b %Y %H:%M:%S %Z')
    month = date_obj.strftime('%B')

    categorized_articles[month].append((title, date_obj.date()))

for month, articles in categorized_articles.items():
    print(f"Articles Published in {month}:")
    for idx, (title, date) in enumerate(articles, 1):
        print(f"{idx}. [{date}] {title}")
    print('-' * 60)

Articles Published in September:
1. [2024-09-23] Telegram will now provide some user data to authorities
2. [2024-09-23] CrowdStrike: Company to face questions over global IT outage
3. [2024-09-21] MrBeast is YouTube's biggest star - now he faces 54-page lawsuit
4. [2024-09-20] A Tamagotchi comeback? Toy gets first UK store as global sales double
5. [2024-09-20] Sky Glass customers complain as TVs won't turn on
6. [2024-09-20] LinkedIn suspends AI training using UK user data
7. [2024-09-20] Taiwan says it did not make Hezbollah pager parts
8. [2024-09-19] 'Hunger Games' studio Lionsgate announce AI video deal
9. [2024-09-19] Nintendo sues 'Pokémon with guns' video game firm
10. [2024-09-19] Brazil fines Musk's X for site's return after ban
11. [2024-09-18] MrBeast and Amazon named in lawsuit over Beast Games
12. [2024-09-18] Google scores rare legal win as 1.49bn euro fine scrapped
13. [2024-09-17] Instagram boosts privacy and parental control on teen accounts
14. [2024-09-18] The Pluc