# TruthLens - Data Collection

TruthLens is a project developed for the BSc. Computer Science (Data Science) Final Project (CM3070) at the University of London. TruthLens is based on the Fake News Detection template. 

## Project Objectives
The primary objective of this project is to build a two-stage pipeline for misinformation classification:

1. Binary classification (Stage 1): Distinguish between real news and misinformation using the ISOT dataset. This ensures robust detection at the first stage, leveraging an established dataset.
2. Multi-class classification (Stage 2): Further classify content identified as misinformation into one of seven categories, based on Molina et al.’s taxonomy. A custom dataset will support this nuanced classification.

The scope of the project is limited to text-based, English language content, explicitly excluding images and videos. A user interface will also be developed, enabling users to input articles or URLs and receive classification results.

A secondary objective is to enhance the explainability of classification results, aiming to provide users with interpretable insights into why content was classified in a particular way.

The project aims for high accuracy and reliability, with measurable performance goals. Ethical considerations, including bias mitigation and responsible dataset usage, will guide the design and implementation of the pipeline.

## Custom dataset generation
As outlined in the previous section, the second stage of the pipeline relies upon a custom dataset, labelled with the categories from the Molina et al. Misinformation Taxonomy. These classes are summarised in the table below. The aim of this stage is to create a balanced dataset with 200 pieces of content for each of the 7 categories. 

| Misinformation Type | Characteristics | Example |
|:--------------|:---------------|:-------|
| Fabricated content | Completely false content created with the intent to deceive.| Fake reports of events that never occurred; entirely false claims about public figures |
|Polarised content |True events or facts presented selectively to promote a biased narrative, often omitting critical context. |Partisan news articles highlighting one side of a political argument while ignoring counterpoints.|
|Satire |Content intended to entertain or provoke thought through humour, exaggeration, or irony. Often misunderstood. |Satirical articles from outlets like “The Onion” being shared as if they are factual news.|
|Misreporting | Incorrect information shared unintentionally, often due to errors or lack of verification. | A news outlet incorrectly reporting election results due to early or inaccurate data.|
|Commentary |Opinion-based content reflecting the writer’s interpretation or viewpoint, often lacking factual grounding. |Editorials or blogs expressing subjective opinions without substantial evidence.|
|Persuasive information |Content designed to persuade or influence the audience, often including marketing and propaganda. |Politically motivated propaganda campaigns, advertisements disguised as objective news articles.|
|Citizen journalism | User-generated content that may lack professional journalistic standards, leading to error or bias. |Social media posts about breaking news that spread unverified or incorrect details.|

Data will be scrapped from relevant websites for each category, then manually reviewed to ensure that it fits the category. Relevant features and labelling guidelines can be found for each category below.

In [1]:
#Imports and helper functions
import requests
import json
from bs4 import BeautifulSoup
import csv
import pandas as pd
pd.set_option('display.max_colwidth', None)
import re
import string
import nltk
from nltk.corpus import stopwords

def preprocess_text(text):
    """
        Preprocesses a given text string by applying the following steps:
        1. Converts the text to lowercase.
        2. Removes punctuation marks.
        3. Tokenizes the text into individual words.
        4. Removes stopwords (common words that add little value to classification tasks).

        Parameters:
        ----------
        text : str
            The input text string to preprocess.

        Returns:
        -------
        str
            The cleaned and preprocessed text, with tokens joined back into a single string.
    """
    stop_words = set(stopwords.words('english'))
    text = text.lower()
    text = text.translate(str.maketrans('', '', string.punctuation))
    tokens = [word for word in text.split() if word not in stop_words]
    return ' '.join(tokens)

### 2. Polarised content
Polarised content is true events or facts selectively presented to promote a biased narrative, often omitting critical context.

##### Features:
- Partial Truth: The piece is based on a real event, statistic, or quote.
- Omission / Distortion: The content emphasizes certain facts while ignoring or minimizing others, creating a skewed impression.
- Strong Bias: The language or framing clearly supports one political, ideological, or partisan stance, rather than offering balanced coverage.

##### Label if:
- The article references real events but uses them to push a strong, one-sided narrative.
- The content focuses on data or testimonies that bolster a specific stance while disregarding contradictory evidence.
- The tone or style is heavily partisan and attempts to sway opinion by selective fact usage rather than outright fabrication.

##### Do Not Label if:
- The core facts are outright false (label as Fabricated).
- It is primarily personal opinion or commentary without strong factual references (label as Commentary).
- It is purely an attempt at persuasion or advertising without misrepresenting an event (label as Persuasive).

##### Sources:
- The Conservative Woman (UK, Right leaning) https://www.conservativewoman.co.uk/
- The Canary (UK, Left leaning) https://www.thecanary.co/uk/
- Breitbart (USA, Right leaning) https://www.breitbart.com/
- Daily Kos (USA, Left leaning) https://www.dailykos.com/

**The Conservative Woman**

Articles were scraped from the weekly "Our Top Ten Articles of the Week" series, starting from the January 11, 2025 edition (https://www.conservativewoman.co.uk/tcw-our-top-ten-articles-of-the-week-9/). 

A large number of articles were skipped. "Features" and "Family and Faith" articles were skipped as they are not news. Many of the other articles did not meet the criteria for labelling, instead falling under Commentary, for example: https://www.conservativewoman.co.uk/wind-turbines-and-a-voice-in-the-wilderness/ These were primarily recognised by a focus on "I" and "me" in the text.

In [28]:
def scrape_tcw_article(url):
    """
    Scrapes an article from a given URL on conservativewoman.co.uk and extracts relevant information.

    Parameters:
    ----------
    url : str
        The URL of the article to scrape.

    Returns:
    -------
    dict
        A dictionary containing the extracted article data.
    """
    article_data = {
        "title": "",
        "text": "",
        "site": "",
        "date": "",
        "category": "",
        "class": "Polarised", 
        "url": url
    }

    response = requests.get(url, headers=headers)
    if response.status_code == 200:
        
        soup = BeautifulSoup(response.content, 'html.parser')

        # Title
        title_meta = soup.find('meta', property='og:title')
        article_data["title"] = title_meta['content'] if title_meta else "Title not found"
        # Remove the trailing site name
        if article_data["title"].endswith(" - The Conservative Woman"):
            article_data["title"] = article_data["title"].replace(" - The Conservative Woman", "")
        
        # URL
        url_meta = soup.find('meta', property='og:url')
        article_data["url"] = url_meta['content'] if url_meta else url  # Fallback to input URL
        
        # Site name
        site_name_meta = soup.find('meta', property='og:site_name')
        article_data["site"] = site_name_meta['content'] if site_name_meta else "Site name not found"
        
        # Published date
        published_date_meta = soup.find('meta', property='article:published_time')
        article_data["date"] = published_date_meta['content'] if published_date_meta else "Published date not found"
        
        # Category
        yoast_script = soup.find("script", class_="yoast-schema-graph", type="application/ld+json")
        if yoast_script:
            try:
                data_json = json.loads(yoast_script.string)
                for node in data_json.get("@graph", []):
                    if node.get("@type") == "Article":
                        art_sec = node.get("articleSection", None)
                        if art_sec:
                            if isinstance(art_sec, list):
                                article_data["category"] = art_sec[0]
                            else:
                                article_data["category"] = art_sec
                        break
            except json.JSONDecodeError:
                print("Could not parse the JSON-LD correctly.")
        
        # Article copy
        content_div = soup.find("div", class_=lambda c: c and "td-post-content" in c)
        if content_div:
            # Collect paragraphs
            paragraphs = content_div.find_all("p")
            text_list = []
            for p in paragraphs:
                text = p.get_text(strip=True)
                # End before the donation paragraph
                if text.startswith("If you appreciated this article, perhaps you might consider making a donation"):
                    break  
                text_list.append(text)
            #join all paragraphs together
            full_text = " ".join(text_list).strip()
            # Remove web addresses using a regex
            full_text = re.sub(r'https?://\S+', '', full_text)    
            article_data["text"] = full_text
        else:
            article_data["text"] = "Article text not found"
    
    else:
        print(f"Failed to fetch the webpage: {url}. Status code: {response.status_code}")

    return article_data



def scrape_multiple_tcw_articles(urls):
    """
    Scrapes multiple articles from a list of URLs and stores the data in a DataFrame.

    Parameters:
    ----------
    urls : list
        A list of article URLs to scrape.

    Returns:
    -------
    pd.DataFrame
        A DataFrame containing the scraped data from all URLs.
    """
    articles = []
    for url in urls:
        article = scrape_tcw_article(url)
        articles.append(article)
    return pd.DataFrame(articles)


# List of URLs to scrape
urls = ["https://www.conservativewoman.co.uk/the-uk-grooming-gang-scandal-is-a-galileo-moment/",
        "https://www.conservativewoman.co.uk/progressive-contempt-for-the-white-working-class/",
        "https://www.conservativewoman.co.uk/how-dare-starmer-reject-a-public-inquiry-into-muslim-grooming-gangs/",
        "https://www.conservativewoman.co.uk/a-quad-demic-christmas-blown-out-of-proportion-and-a-happy-new-year/",
        "https://www.conservativewoman.co.uk/how-labour-is-seizing-more-control-over-our-children/",
        "https://www.conservativewoman.co.uk/david-keighley-multiculturalism-slays-rotherhams-young-girls/",
        "https://www.conservativewoman.co.uk/nelson-turns-a-blind-eye-to-broken-britain/",
        "https://www.conservativewoman.co.uk/major-study-confirms-covid-jab-harms-mental-health/",
        "https://www.conservativewoman.co.uk/why-was-saras-sadistic-murderous-father-not-stopped-we-all-know-why/",
        "https://www.conservativewoman.co.uk/stories-from-the-illegal-migrant-frontline/",
        "https://www.conservativewoman.co.uk/the-climate-scaremongers-met-office-fiddles-the-figures-over-storm-darragh/",
        "https://www.conservativewoman.co.uk/electric-armoured-vehicles-net-zero-chance-of-that/",
        "https://www.conservativewoman.co.uk/war-on-microbes-the-murky-agenda-behind-the-covid-pandemic/",
        "https://www.conservativewoman.co.uk/this-methane-nonsense-is-a-nasty-protection-racket/",
        "https://www.conservativewoman.co.uk/why-nhs-staff-are-shunning-the-vaccines/",
        "https://www.conservativewoman.co.uk/why-report-the-mass-rape-of-white-schoolgirls-when-you-can-pick-on-gregg-wallace/",
        "https://www.conservativewoman.co.uk/debunked-the-great-diversity-equity-and-inclusion-myth/",
        "https://www.conservativewoman.co.uk/marvel-at-trumps-resurgent-america-weep-for-starmers-desolate-britain/",
        "https://www.conservativewoman.co.uk/exit-june-raine-pursued-by-bare-faced-lies/",
        "https://www.conservativewoman.co.uk/from-1991-a-prescient-warning-about-the-globalist-agenda/",
        "https://www.conservativewoman.co.uk/beware-sir-keir-beware-this-petition-is-the-tip-of-the-iceberg/",
        "https://www.conservativewoman.co.uk/the-blackrock-connection-and-a-nightmare-for-farmers/",
        "https://www.conservativewoman.co.uk/one-of-the-worlds-oldest-christian-communities-faces-destruction/",
        "https://www.conservativewoman.co.uk/covid-didnt-cause-surge-in-excess-deaths-the-pandemic-response-did/",
        "https://www.conservativewoman.co.uk/a-book-to-destroy-faith-in-doctors-for-ever/",
        "https://www.conservativewoman.co.uk/staggering-ignorance-that-scuppered-sterling-and-the-stock-exchange/",
        "https://www.conservativewoman.co.uk/oxbridge-and-the-cancellation-of-kindness/",
        "https://www.conservativewoman.co.uk/msm-silence-as-health-coalition-urges-governments-stop-the-jabs-now/",
        "https://www.conservativewoman.co.uk/methane-reducing-feed-additive-trialled-in-arla-dairy-farms/",
        "https://www.conservativewoman.co.uk/still-time-to-pull-back-from-slippery-suicide-slope/",
        "https://www.conservativewoman.co.uk/this-ghastly-assisted-suicide-bill-strikes-against-decency-and-genuine-choice/",
        "https://www.conservativewoman.co.uk/cop29-reveals-itself-as-the-great-fraud-it-always-was/",
        "https://www.conservativewoman.co.uk/david-keighley-was-right-everything-he-warned-about-hate-crime-has-come-to-pass/",
        "https://www.conservativewoman.co.uk/the-climate-scaremongers-bbc-admits-it-lied-about-vanishing-polar-bears/",
        "https://www.conservativewoman.co.uk/a-vaccine-guinea-pigs-heartrending-chronicle-of-a-life-destroyed/",
        "https://www.conservativewoman.co.uk/muslim-rioters-rampage-with-police-blessing-today-amsterdam-tomorrow-britain/",
        "https://www.conservativewoman.co.uk/a-warning-shot-writ-large-but-putin-wont-attack-the-west/",
        "https://www.conservativewoman.co.uk/my-choice-as-world-leader-of-the-century-netanyahu/",
        "https://www.conservativewoman.co.uk/badenoch-should-listen-to-clarkson-and-learn/",
        "https://www.conservativewoman.co.uk/you-must-be-bad-if-even-the-amish-are-up-in-arms/",
        "https://www.conservativewoman.co.uk/trumps-multiethnic-winning-coalition/",
        "https://www.conservativewoman.co.uk/the-billions-upon-billions-wasted-on-useless-face-masks/",
        "https://www.conservativewoman.co.uk/why-the-law-is-stacked-against-right-thinkers/",
        "https://www.conservativewoman.co.uk/whats-the-real-reason-theyre-going-after-allison-pearson/",
        "https://www.conservativewoman.co.uk/health-warrior-rfk-jr-faces-coalition-of-formidable-enemies/",
        "https://www.conservativewoman.co.uk/so-cruel-so-vulnerable-the-daycare-generation/",
        "https://www.conservativewoman.co.uk/killing-freedom-under-the-banner-of-public-health/",
        "https://www.conservativewoman.co.uk/the-climate-scaremongers-energy-operator-tells-miliband-your-plans-cannot-work/",
        "https://www.conservativewoman.co.uk/here-is-the-long-term-weather-report-same-old-same-old/",
        "https://www.conservativewoman.co.uk/revealed-pfizers-hidden-vaccine-injuries/",
]

# Provide a common browser user agent - otherwise the scraping fails
headers = {
    "User-Agent": (
        "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 "
        "(KHTML, like Gecko) Chrome/115.0.0.0 Safari/537.36"
    )
}

# Scrape articles and create a DataFrame
tcw_data_df = scrape_multiple_tcw_articles(urls)
# Store to CSV
tcw_data_df.to_csv("polarised_scraped_articles_tcw.csv", index=False)
# Print head 
tcw_data_df.head()

Unnamed: 0,title,text,site,date,category,class,url
0,The UK grooming gang scandal is a Galileo moment,"This article contains graphicdescriptions of a sort that we would not normally publish but which, in this case, are in the public interest to do so. THERE IS a simple reason that the UK’s grooming gangs scandal is so difficult for liberally minded people to wrap their minds around, and that reason is that the scandal is exactly what ‘the far right’ says it is: Mass, religiously and racially aggravated, rape of thousands of young white working class British girls by Muslim men from mostly Pakistani immigrant backgrounds, on an industrial and organised scale. Every racially aggravating factor you could imagine is present. First, there is substantial evidence that the young girls in question were targetedbecausethey were white. In these communities of Pakistani Muslim rapists, hardly any of the victims were Pakistani Muslim women. Second, the scale of the depravity and violence makes it clear that in many – if not a clear majority – of cases, the abuse was more than just about sex. It was about domination, enslavement, and humiliation. If you doubt me on this point, just readthis 2013 sentencing reportthat did the rounds on X (formerly twitter) in recent days. There’s a quote from it later in this article that sums up the kind of thing you’ll find in there. Third, there is overwhelming evidence that in many cases the abuse was covered up by the authorities and then downplayed by the media right across Britain because it was more ideologically convenient to ignore the mass rape of children than it was to confront the fact that a substantial section of the British Pakistani community harboured (and indeed still harbours) violently sexist and racist attitudes towards white non-muslim women in particular. The implications of that idea wereand aresimply too great for the liberal mind to contemplate its admission. If multiculturalism has a foundational myth, that myth is that all cultures are compatible with each other and that they can always peacefully and harmoniously co-exist in an open and democratic society. The British grooming gangs scandal shatters that myth. The United Kingdom, like most western democracies, is now a feminist society. Women have equal rights in law, and cultural and sexual politics have shifted dramatically over the past fifty years in favour of women. Pakistan remains a deeply entrenched patriarchal society where religious attitudes towards women, and particularly towards men’s right to access women sexually, dominate. Throw in multiculturalism and tolerance and ‘community cohesion’ as major values, and then mix in hard-line Muslim attitudes towards ‘kuffar’ women and women who show more of their flesh than a burkha permits, and the UK gets the mass rape of young white girls by Pakistani Muslim men who believe themselves to have both a cultural and a religious right to use those girls as they see fit. When liberally minded people critique the Catholic church, they often cite the imprisonment in 1633 by Pope Urban VIII of Galileo Galilei, as a result of the latter’s insistence that the earth is round and revolves around the Sun. When that story is recounted, it is presented as a morality tale about how the Roman Catholic Church is anti-science and anti-progress. But it is a human story at heart: Galileo presented evidence that shattered more than a millennium of religious certainty. Faced with evidence, Urban could either publicly accept that a thousand years of teaching had been wrong – risking people askingwhat elsewas wrong – or he could preserve the authority of the Church and silence the heretic. He chose to silence the heretic. The problem, if that analogy wasn’t clear enough for you, is that the UK’s grooming scandal shatters decades of liberal teaching about multiculturalism and might be expected to force people to admit that they might have been wrong. Given the choice between having that discussion on the one hand and writing off all those who raise it as heretics who need silencing on the other, modern British liberals have chosen the path of Urban. Thus, Tommy Robinson is a household name in Britain (and indeed Ireland) for his alleged ‘islamophobia’ while most of us have never heard the name of the man who branded his initials onto a 13-year-old girl, using red hot metal, on the sensitive skin next to her anal passage. His name is Mohammed Karrar. Here is some more of what he did to that child, in the words of the judge who sentenced him: ‘You, Mohammed Karrar, prepared her for gang anal rape by using a pump to expand her anal passage. You subjected her to a gang rape by five or six men (count 30). At one point she had four men inside her. A red ball was placed in her mouth to keep her quiet. Not only were you both involved in the commercial sexual exploitation of GH, you also used her for your own self-gratification. You both raped her when she was under 13. When she was very young, although it is not clear whether she was under 13, you both raped her at the same time (oral and vaginal/anal). It happened on more than one occasion.’ Of course, the grooming scandal (which should really be called what it is – a mass rape scandal) is not the only evidence that the UK has an enormous multiculturalism problem linked primarily to the Islamic religion. A2017 investigationby the Guardian revealed that the UK has between 30 and 90 ‘sharia councils’ in operation, which perform the essential functions of courts, granting divorces, regulating marital disputes, and operating as a parallel legal system. The Guardian is a liberal newspaper, but even it had to report that: ‘In December, the Casey Review by Dame Louise Casey into integration included claims that sharia councils “supported the values of extremists, condoned wife-beating, ignored marital rape and allowed forced marriages”.’ This is not, to be clear, a British problem. It is a Pakistani cultural and Muslim religious problem that is taking place in the United Kingdom. What’s more, the whole scandal underlines a fact that sends shivers down liberal spines: that elements of Pakistani culture and the Islamic faith pose a real and existential threat to women living in liberal democratic societies. If the state wishes to prevent threats to women (as the liberal state insists it does) and Pakistani culture and elements of the Islamic faith are threats to women, suddenly the conversations become – of necessity – very uncomfortable. Which is why liberal Britain is so desperate to avoid having them. For this cowardice liberalism in Britain and the wider west deserves nothing but contempt, scorn, and condemnation. A generation of ‘progressive’ political leaders of both parties in Britain from the 1970s onwards have consistently prioritised ‘community relations’ over the safety of their own little girls. Young women were allowed to be treated like human filth, while the entire apparatus of the state looked the other way lest it be accused of racism. All the while, towns like Rotherham and Oldham have been permitted to be turned into bastardised versions of Multan, Pakistan. I will leave you with this story, from 2017. It took place in the aforementioned Multan. A young girl was raped by a boy from her village. The victim had a brother. The perpetrator had a sister. The victim’s brother was summoned before the village council and told that justice had been arranged: To avenge the rape of his sister, he would be presented with the rapist’s younger sister – a 16 year old girl – andtold to rape her in turn. This is the culture that Britain has imported. It is the culture that dominates vast swathes of Northern England. The state, and progressives, having admitted it to their country, have a duty to quash it entirely. Racism and community sensitivity be damned.",The Conservative Woman,2025-01-04T01:18:00+00:00,Culture War,Polarised,https://www.conservativewoman.co.uk/the-uk-grooming-gang-scandal-is-a-galileo-moment/
1,The progressives' contempt for the white working class,"THERE is a time to be angry. There is a time when it is a sin not to be angry. This is such a time. I have served in the armed forces. I have been a prison chaplain. I have been a minister in one of the roughest parts of Glasgow. I am not naive. Yet I was unable to finish readingthe details of the atrocitiescommitted by a mainly Pakistani Muslim rape gang on white girls. some of them pre-teen. These vile acts were committedprior to 2013yet have came to light last week. Why were they hidden? I think we know the answer. The great Dutch theologian, church leader and prime minister Abraham Kuyper never lost sight of thekleine mensen, the little people. He knew that the supposedly insignificant working-class people were ignored by the main parties and needed protecting. Our secular progressive elites have no interest in or concern for the ordinary working-class people of Britain. When a nation abandons the Bible, the most vulnerable pay the heaviest price. These were not grooming gangs: dogs are groomed. These were paedophile rape and torture gangs. Neither was this confined to Rochdale; paedophile rape gangs have been operatingin towns and cities throughout the UK.Poor white girlsfrom broken homeswere viewed as legitimate targets for rape and exploitation by men from a faith known for its disdain of women. This happened with the complicity of those charged with protecting children. Ina case from Bradforda 15-year-old girl was placed as a foster child in the family of her rapist and made to marry her abuser in an Islamic ceremony with her social worker present. During this period she was forced to convert to Islam and treated as a domestic slave. Mainly Pakistani Muslims preyed upon poor, vulnerable white girls. They were pimped out to others across Britain. Police and social services were so scared of being called racist or Islamophobic they did nothing to protect the girls or arrest the perpetrators. Sometimes they were orderedto do nothing.One girl claimsto have been raped by 150 men; this began when she was 13. Some girlswere murdered. The depth of the contempt the white progressive managerial class has for the white working class is all too evident; their oikophobia is rampant. The primary responsibility for these horrific acts belongs to the mostly Pakistani Muslim perpetrators. Yet it was the weakness and systemic failures of those with responsibility for protecting their own girls which should make us angriest. For them progressive ideology, especially immigration and multiculturalism, was so important that it was worthwhile sacrificing tens of thousands of working-class white girls. Ultimate responsibility lies with our politicians. These politicians are keen on punishing hate crimes. If there was ever a hate crime it was when groups of mainly Pakistani Muslims targeted vulnerable white girls on racial and religious grounds for industrial-scale rape and torture. These hate crimes were ignored. These horrific crimes would not have been allowed to continue unabated if our politicians had not prioritised multiculturalism and vote gathering, a combination of progressive ideology and moral corruption. The white girls were simply the wrong victims, their suffering got in the way of the progressive narrative. If it had been Asian girls the story might have been different. This scandal touchedthe whole of Britain, yet the currentLabour government refusesto hold a national inquiry, asLaura Perrins wrote inTCWyesterday.It is to be left to local authorities with limited resources to investigate these crimes. One doesn’t need to be cynical to askwho is being protectedand which voting bloc appeased. It does the Conservative Party little good to try to assume the high moral ground and call for a national inquiry. They have just emerged from 14 years in power when they did nothing. The police would not have ignored these crimes without strong political backing and instruction. A law enforcement officer came to the conclusion that a 12-year-old who told police she hadsex with five adultshad done so in a ‘100 per cent consensual [way] in every incident’. Thankfully, he was overruled. Therewas the time when a 13-year-oldwas found half-naked and drunk in a house with a group of seven adult Pakistani men. It was the girl who was arrested, charged and eventually convicted of being drunk and disorderly. The men were not even questioned, never mind arrested. By way of contrast, fathers who attempted to rescue their underage daughters from the houses in which they were being held and rapedwere arrested. From the copper on the beat to the highest-ranking officer, not one police officer has been disciplined or lost their job and pension. The same police who turned a blind eye to these crimes are still policing us. There has been a mass dereliction of duty amongst the media who pride themselves on ‘speaking truth to power’. With a few brave exceptions, such asAndrew Norfolkof theTimesandCharlie Petersof GB News, they have chosen to ignore the elephant in the room. Our progressive elite are so out of touch with the people of Britain they can only see through woke spectacles.Dr Ella Cockbainof University College Londonwrote in 2013that she was concerned that the existence of child abusing networks run by South Asian men was ‘fuelling racist rhetoric’. She was so blind to reality thatshe opined: ‘It can seem that the greatest effrontery about grooming is not the abuse of children but the interracial sex itself.’ The progressives simply cannot grasp how deeply they hold their fellow countrymen and women in contempt. They cannot possibly admit that they look down on the white working class. After all, being liberal-progressives they are the good guys. They are the enlightened. They are the moral arbiters. They are the ones who understand the whole picture. There is a certain parable about specks of sawdust and planks of wood which springs to mind (Matthew 7:3-5). Professor Alexis Jay, who conducted an inquiry into the mass rapes in Rochdale, wrote: ‘The authorities involved have a great deal to answer for.’ To date no police officer has been disciplined, no social work director sacked, no politician held to account. While unwilling to protect the most vulnerable in society, the progressive managerial class are more than willing to protect their own. This societal calamity was papered over. That is no longer the case. The mass rape of white British girls is the most egregious example of the high-handed disdain the progressive powers that be have for the indigenous working class.",The Conservative Woman,2025-01-08T01:19:00+00:00,Culture War,Polarised,https://www.conservativewoman.co.uk/progressive-contempt-for-the-white-working-class/
2,How dare Starmer reject a public inquiry into Muslim grooming gangs?,"SIR KEIR Starmer claims that politicians who are calling for a statutory inquiry into grooming gangs are‘jumping on a bandwagon of the far right’. In a press conference this morning (about the NHS, obviously, with his shirt sleeves rolled up, obviously) the Prime Minister not only rejected calls for a full public inquiry into the grooming scandal, or more accurately the Muslim child rape, trafficking and prostitution scandal, he also insulted politicians who believe such an inquiry is needed. It was a disgusting slur against anyone who has problems with thesystematic brutalisation of 11-year-old girls. To quote Sir Keir in full: ‘It is something about the nature of our politics because once we lose the anchor that truth matters in the robust debate we must have, then we are on a very slippery slope and when politicians, and I mean politicians, who sat in government for many years are casual about honesty, decency, truth and the rule of law, calling for inquiries because they want to jump on a bandwagon of the far right, then that affects politics because a robust debate can only be based on the true facts and that is why this is actually an important point about our politics, not about what anybody may or may not say on Twitter.’ A few things. First, it is noteworthy that this press conference was not about the child gang rape scandal but about the NHS, which in itself tells you all you need to know. As I have said before, there are very few subjects that get the lectern treatment: Far Right Thugs get the lectern treatment, the NHS gets the lectern treatment. How and why young white girls were raped and brutalised by gangs of Muslim men while authorities stood by is not worthy of the lectern treatment.This tells you all you need to know about Labour’s political priorities. Second, how dare he? How very dare he say that calling for a proper statutory inquiry into the biggest scandal in British history, namely Muslim gangs of men raping, trafficking and prostituting thousands of white working-class girls, some as young as 11, around the country, while police forces looked the other way, is ‘jumping on the band wagon of the far right’. Yes, there have been inquiries before. There was theIndependent Inquiry into Child Sexual Abuse by Professor Alexis Jay, but that was a much broader inquiry into sexual abuse by the Catholic church, Anglican church and others. The Independent Office for Police Conduct (IOPC) launched an inquiry in 2014 after it was revealed that at least 1,400 girls in the South Yorkshire town were abused between 1997 and 2013. TheTimesreported: ‘An eight-year investigation into multiple police failings during the Rotherham sex grooming scandal was condemned as a disgrace after its final report revealed that no officers have been dismissed . . . The IOPC’s resulting Operation Linden investigation examined 265 allegations against 47 police officers by 51 complainants, most of them abuse survivors. It upheld 43 of the complaints and ruled that 14 officers “had a case to answer” for misconduct or gross misconduct.’ One unnamed officer was said to have told the mother of an exploited childthat it was a ‘fashion accessory’ for Rotherham girls to have an ‘older Asian boyfriend’ and that she would ‘grow out of it’. And there were other local council inquiries and safeguarding reviews. But as Sir Keir Starmer well knows – he is a human rights lawyer who normally cannot get enough of inquiries – this is very different from an inquiry under the Inquiries Act 2005 where people are called to give evidence under oath. Starmer says that ‘robust debate can only be based on the true facts’. Ok then, genius, order an inquiry into the rape gangs, and any possible misconduct by police officers under the Inquiries Act 2005 and we will get to the facts. As a bonus, all your human rights lawyer buddies will be quids in for years. Why is Starmer dragging his feet over a public inquiry? As a rule, politicians normally can’t wait to order an inquiry. They demand inquiries over anything. We have the current ridiculous UK Covid-19 Inquiry which will tell us at the end of it all that we should have locked down harder and earlier. There is an Undercover Policing Inquiry. There is the Dawn Sturgess Inquiry, ‘an independent Inquiry into the circumstances ofDawn Sturgess’s deathin Salisbury on 8 July 2018’. There is theIndependent Inquiry relating to Afghanistan‘to investigate the “deliberate detention operations” conducted by the British armed forces in Afghanistan between the period of mid-2010 to mid-2013 to determine whether any of the circumstances around any unlawful killings were covered up at any stage’. This is to name just a few. But for some reason, for some unknown reason, Sir Keir Starmer is refusing to hold a 2005 Act public inquiry into the sodomising and rape of working-class white girls. They are not worthy of this human rights lawyer’s time and attention, it seems. Don’t you know there is a Far Right Thug out there tweeting something mean? You know as well as I do that the Old Bill will be down on them quicker than you can say ‘Mind how you go’ to a Muslim rapist who is receiving a sex act from a minor in his car. (Yes, that happened.) No, nothing to see here, says Starmer, we have had all the itty bitty inquiries into these pesky white girls and their abuse that we need. (My words, not his, just to be clear.) Starmer said, in full: ‘There have been a lot of reviews including localised reviews, including into Oldham for example, the mayor of Manchester did his review, and the Jay report was intended to look at the different types of exploitation that went on. It was a comprehensive review . . . This doesn’t need more consultation. It doesn’t need more research. It just needs action.’ Don’t play me for a fool. Sir Keir Starmer KC knows better than anyone that none of these inquiries are the same as a 2005 inquiry with proper terms of reference, focusing on Muslim rape gangs and any possible police corruption, with the power to call witnesses who must give evidence under oath. Starmer doesn’t want an inquiry because he knows awkward things come out in inquiries. Facts. Truth. Logic. And often liberal beliefs can be slayed. No one is going to push Diversity is our Strength, or say that Britain has experienced an ‘integration miracle’, in the words of that half-wit Fraser Nelson, after such an inquiry, I reckon. No British Prime Minister wants to start unravelling the myth of integration or asking some tough questions about the Islamic faith. That could lead to the kind of facts and truth a Prime Minister could do without, thanks very much. The girls who were raped, sodomised, tortured and brutalised by Muslim rape gangs deserve a public inquiry. It is disgraceful that Starmer is refusing to order one. And it is disgusting to label those who call for one as ‘far right’. It seems that the only time a human right lawyer rejects a public inquiry is when it is about Muslim grooming gangs.",The Conservative Woman,2025-01-06T14:01:02+00:00,Culture War,Polarised,https://www.conservativewoman.co.uk/how-dare-starmer-reject-a-public-inquiry-into-muslim-grooming-gangs/
3,Welcome to the year of the 'Quad-demic',"WHEN I WAS a child, we had ‘outbreaks’ of diseases. Living very close to Aberdeen, I miraculously survived the famous (only if you lived near Aberdeen)typhoid outbreak of 1964, which hospitalised hundreds of people. The only thing it killed, temporarily, was the corned beef trade, as the outbreak was traced to a tin of the popularFray Bentos product. Life soon returned to normal, as did the consumption of tinned meats. That was the first and last ‘outbreak’ that I can recall. Almost anything that more than a couple of folks have caught since has been named an ‘epidemic’. That sounded much more threatening, until it was superseded by the term ‘pandemic’, which seems to be the ubiquitous phrase for anything that used to be an epidemic. Trying to distinguish an epidemic from a pandemic is a fruitless task. The definitions seem to differ depending on where you look and, for all intents and purposes, are interchangeable. The problem for the public health brigade is that, clearly, neither term instils sufficient fear into the population, as demonstrated by the Covid-19 fiasco. Admittedly, they rode high for a time. However, very few people, for example, still wear face masks, apart from a few fainthearts, virtue signallers, and those once again mandated by theirNHS employers. Enthusiasm is waning for covid vaccines. Nevertheless, they must be given credit for their tenacity; they simply don’t give up. In their efforts to paralyse us with fear, pepper the joy of the festive season with a little misery and turn our minds towards the perpetual worship of the NHS, we now have a new category of killer to cope with: the ‘quad-demic’. In case you are interested, this is an allegedly lethal cocktail of ‘flu, Covid-19, respiratory syncytial virus (RSV) and norovirus’. The quad-demic didn’t seem to do the trick, as the irresponsible British public celebrated the birth of Jesus, visited their families and partied like it was 1999. The outcome? According toNHS England, this has resulted in a ‘tidal wave’ of flu and, in a week, a 70 per cent increase in hospital cases. On New Year’s Eve, clearly concerned that we were not listening, the quad-demic has been upgraded with the epithet of‘terrifying’. How do we know this? Apparently,over a quarter of British people, or at least those gullible enough to take a test, have tested positive for influenza. This does not take into account any false positives, which plague this kind of testing, and it does not report how many of these people were symptomatic. In other words, such testing statistics are virtually meaningless. Professor Sir Stephen Powis, NHS National Medical Director, said that the worst of the tidal wave was on its way. There have been almost4500 flu patients every day, with 211 people in critical care. Therefore, 0.0078% of the population is hospitalised with flu and 0.00037% is in critical care. Expressed like that, the tidal wave seems more like a puddle in a car park. So, why the fuss? Obviously, it is to do with pressure on the NHS which, above all, must be ‘saved’. The usual guilt-inducing measures are employed to imply that it is all our fault, thus it has been reported that the surge in infections will continue to pick up pace in the coming days as a result of more people socialising indoors over the Christmas and the New Year period. Mask-wearing is encouraged, social distancing rears its ugly head and our old favourite lockdown-style measures have been recommended by virologist from the University of Warwick, Professor Lawrence Young. Laughably, it is recommended that you isolate yourself if you have the flu. In my experience, as flu is a near-death experience, the last thing you would want to do is move, eat, drink or speak. Of course, like Covid-19, all things are not equal with flu. It tends not to strike people down in the prime of their lives; the NHS is not overwhelmed with athletes and extreme sports enthusiasts. People in general good health, while not immune to flu, tend to pull through. The usual suspects arethe most vulnerable: I am the last person to play down the severity of flu. I have had it and, as I write, my son, his wife and two of my daughters, both nurses, have it. It is a horrible infection which I sincerely hope to avoid. However,as explained by Dr Clare Craig, flu is not spread asymptomatically, therefore a person testing positive, who has no flu-like symptoms, poses minimal danger to others. Moreover, the flu virus is airborne, meaning that it quickly spreads after a few people with symptoms have it. To cut a long story short, there is very little that anyone can do either to stop it spreading or, once it is spreading, to avoid encountering it. Most people who do encounter the influenza virus will never know it, as their immune systems will deal with it. Attempts by our public health masters to terrify us into submission with the latest ‘demic’ are just that: attempts to terrify us. Given that they know the above facts as well as we do, and that measures such as face masks and lockdowns are futile, we can only speculate about their motives. If only there was a vaccine!",The Conservative Woman,2025-01-04T15:05:00+00:00,COVID-19,Polarised,https://www.conservativewoman.co.uk/a-quad-demic-christmas-blown-out-of-proportion-and-a-happy-new-year/
4,How Labour is seizing more control over our children,"NEVER underestimate Labour and their determination to wrest any final vestige of authority and concomitant responsibility from parents and endow it on the oh-so-caring state. Labour’s Children’s Wellbeing and Schools Bill has its Second Reading tomorrow and given Labour’s thumping majority it will be voted through. Even Sky News’s reporting of it has been mildly critical. A few days ago it led with‘Parents of the most vulnerable children will lose their right to home education’. This is to understate it. It is a further assault on parents’ freedom from the state regarding the upbringing and education of their children, transferring even more powers from parents to the state – a process started by Tony Blair when he decided to centralise, control and regulate families by a programme which merged protection services and child care with the education system. All parents became potential abusers while abusing parents carried on being ignored. In fact it has failed abysmally to protect children. Now Labour are doubling down on failure with the last bastion of parental authority and autonomy from the state, home schooling, threatened. So never mind how bullied your child is or how much you object to a school’s sex education, gender identity teaching or ‘trans’ policy, or how bad the educational ethos and standards are, your power to take your child out of school is to be heavily circumscribed. I have to apologise on behalf ofTCWfor being asleep at the wheel on this one and letting it slide, despite the formidable Rabbi Asher Gratt keeping me copied in on his almost one-man battle with the Government and his various exchanges with Labour bigwigs over his understandable concerns (as well as to the worse than dismissive responses that he has received). As Rabbi Gratt has pointed out repeatedly, ironically the Bill will undermine the very things you might have supposed the DIE minorities mantra stands for: the rights of minorities, cultural diversity and religious pluralism. In fact it never did. Diversity in ‘multiculti’ state education means one thing and that’s what the woke choose it to mean – as an example, to celebrate Diwali but ban Christmas nativity plays in favour of ‘Winterval’. Demoting Christianity in schools despite its being the country’s religion for nearly two thousand years. The bottom line, however,as Rabbi Gratt detailed last month, is that this Bill is the most repressive and authoritarian of assaults on parents. The most repressive since Tony Blair instigated the nationalisation of children when he merged child protection services with the education system andput every child’s ID into a national computer database in order to monitor their use of services. Of course this did nothing to stop child abuse and neglect, rather it treated all parent as potential abusers. What this Bill (planned under the last Conservative Governments) will once again introduce under the guise of child protection is a nationwide register of all children not attending school, a unique identification number for each child akin to a National Insurance number, and, most disturbingly, a requirement for parents to seek council permission to educate their own children at home if a child is under a child protection plan. It is nothing less than the state seizing control over our children, Gratt says: ‘Under these draconian measures, the government assumes control over children’s education in ways that have never before been seen in the UK. Parents will be forced to “register” their children like property and submit to the authority of local councils to determine whether their education is “suitable”. The Bill not only criminalises conscientious parents who seek to provide an education aligned with their religious or philosophical beliefs, but it also sets a dangerous precedent for the erosion of fundamental civil liberties.’ Ironically again, the Bill appears to be in direct violation of parents’ rights as enshrined in Article 2 of Protocol 1 of the European Convention on Human Rights (ECHR), which guarantees parents the right to educate their children in accordance with their own religious and philosophical beliefs. Perhaps the most shocking measure is the introduction of a unique identifier for every child in England, a ‘child-tracking number’ which will be used across services to monitor and log the child’s every movement, akin to a social credit system. Gratt cites civil liberties groups calling this provision ‘deeply chilling’ and reminiscent of surveillance measures seen in authoritarian regimes. The government claims that these measures are necessary for ‘child protection’. Critics argue it is state overreach on an unimaginable scale. Rabbi Gratt is far from alone in his concerns. The activist education groupEducational Freedom are concerned that the right to home educate is seriously at risk, pointing to the Children’s Commissioner’s chilling explanation of the purpose of the register: ‘If we get proper registers and we have our local authorities taking their responsibilities seriously to engage with these families, we may find that we can get lots of them back to school, which is where they need to be.’ Furthermore they point out that if you choose to home educate you now ‘have to demonstrate your home environment is safe even if your education provision is top notch’. So it proposes in effect to circumvent other legislation to remove parental responsibility by the back door. What constitutes ‘safe’ in a home? Their other concern is that it does nothing to support parents to protect their children ‘when the school environment is enabling them to be harmed’. As Gratt points out in his article, for decades parents in the UK have been able to educate their children at home without interference, ‘a right that is rooted in centuries of common law. It is not just that if a child is under a child protection plan that the parents must seek council permission to continue home education, the measure effectively treats all home parents as suspects under constant surveillance, undermining the presumption of parental competence and good faith’. Both Gratt and Educational Freedom believe the Bill has profound consequences for vulnerable families, particularly those dealing with medical or special educational needs (SEN) children: ‘By forcing parents to seek permission from council authorities, this Bill threatens to reduce the flexibility and personalisation that home education provides for children with complex needs.’ This is nothing less than a surveillance state in disguise. As Gratt says, it is not about education quality, but about control, ‘something you expect in authoritarian regimes, not in a democratic society’.",The Conservative Woman,2025-01-07T13:42:16+00:00,Democracy in Decay,Polarised,https://www.conservativewoman.co.uk/how-labour-is-seizing-more-control-over-our-children/


**The Canary**

Articles have been scraped from the UK section of The Canary (https://www.thecanary.co/uk/) from newest to oldest. Article date range is January 19th to January 7th. Two articles were excluded for not meeting the labelling criteria (articles focused on getting users to sign a petition.)

In [33]:
testURL = "https://www.thecanary.co/uk/analysis/2025/01/19/fii-long-covid/"

def scrape_can_article(url):
    """
    Scrapes an article from a given URL on https://www.thecanary.co/uk/ and extracts relevant information.
    
    Parameters:
    ----------
    url : str
        The URL of the article to scrape.

    Returns:
    -------
    dict
        A dictionary containing the extracted article data.
    """
    article_data = {
        "title": "",
        "text": "",
        "site": "",
        "date": "",
        "category": "",
        "class": "Polarised",
        "url": url
    }

    try:
        response = requests.get(url)
        if response.status_code == 200:
            soup = BeautifulSoup(response.content, 'html.parser')
            # Remove tweet embeds
            for twitter_blockquote in soup.find_all('blockquote', class_='twitter-tweet'):
                twitter_blockquote.decompose()
            # Remove ad elements
            for ads_div in soup.find_all('div', class_='ads_google_ads'):
                ads_div.decompose()

            # Title
            title_meta = soup.find('meta', property='og:title')
            article_data["title"] = title_meta['content'] if title_meta else "Title not found"
            
            # URL
            url_meta = soup.find('meta', property='og:url')
            article_data["url"] = url_meta['content'] if url_meta else url
            
            # Site name
            site_name_meta = soup.find('meta', property='og:site_name')
            article_data["site"] = site_name_meta['content'] if site_name_meta else "Site name not found"
            
            # Published date
            published_date_meta = soup.find('meta', property='article:published_time')
            article_data["date"] = published_date_meta['content'] if published_date_meta else "Published date not found"
            
            # Category
            category_found = None
            yoast_script = soup.find('script', class_='yoast-schema-graph', type='application/ld+json')
            if yoast_script:
                try:
                    yoast_data = json.loads(yoast_script.string)
                    for item in yoast_data.get('@graph', []):
                        if item.get('@type') == 'NewsArticle':
                            section = item.get('articleSection')
                            if section:
                                if isinstance(section, list) and len(section) > 0:
                                    category_found = section[0].strip()
                                elif isinstance(section, str):
                                    category_found = section.strip()
                                break
                except json.JSONDecodeError:
                    pass
            # If we never found a category, use a default
            if category_found:
                article_data["category"] = category_found
            else:
                article_data["category"] = "Category not found"
            
            # Article copy
            article_body = soup.find('div', class_='jeg_inner_content')
            featured_image_patterns = [
                re.compile(r'^Featured image via .*$', re.IGNORECASE),
                re.compile(r'^Featured image supplied', re.IGNORECASE),
                re.compile(r'^Featured image and additional images via .*$', re.IGNORECASE),
                re.compile(r'^Featured image and additional images supplied$', re.IGNORECASE)
            ]
            if article_body:
                paragraphs = article_body.find_all('p')
                text_content = []
                
                for p in paragraphs:
                    if any(pattern.match(p.text.strip()) for pattern in featured_image_patterns):
                        p.decompose()
                    p_text = p.get_text().strip()
                    if p_text:
                        text_content.append(p_text)
                
                article_data["text"] = " ".join(text_content) if text_content else "Article content not found"
            else:
                article_data["text"] = "Article content not found"
        
        else:
            print(f"Failed to fetch the webpage: {url}. Status code: {response.status_code}")

    except Exception as e:
        print(f"An error occurred: {e}")

    return article_data

def scrape_multiple_can_articles(urls):
    """
    Scrapes multiple articles from a list of URLs and stores the data in a DataFrame.

    Parameters:
    ----------
    urls : list
        A list of article URLs to scrape.

    Returns:
    -------
    pd.DataFrame
        A DataFrame containing the scraped data from all URLs.
    """
    articles = []
    for url in urls:
        article = scrape_can_article(url)
        articles.append(article)
    return pd.DataFrame(articles)


# List of URLs to scrape
urls = ['https://www.thecanary.co/uk/news/2025/01/19/corbyn-mcdonnell-met-police/',
       'https://www.thecanary.co/uk/news/2025/01/19/filton-18-trial/',
       'https://www.thecanary.co/uk/news/2025/01/19/corbyn-met-police-palestine-march/',
       'https://www.thecanary.co/uk/analysis/2025/01/19/fii-long-covid/',
       'https://www.thecanary.co/uk/analysis/2025/01/19/starmer-new-policy/',
       'https://www.thecanary.co/uk/analysis/2025/01/17/starmer-resignation-polling/',
       'https://www.thecanary.co/uk/analysis/2025/01/17/march-for-palestine-route/',
       'https://www.thecanary.co/uk/news/2025/01/17/hastings-general-dynamics-protest/',
       'https://www.thecanary.co/uk/news/2025/01/17/government-buried-dwp-pip-report/',
       'https://www.thecanary.co/uk/analysis/2025/01/16/nhs-rcn-report/',
       'https://www.thecanary.co/uk/analysis/2025/01/16/dwp-wca-court-verdict/',
       'https://www.thecanary.co/uk/news/2025/01/16/just-stop-oil-hung-jury/',
       'https://www.thecanary.co/uk/analysis/2025/01/16/heathrow-expansion/',
       'https://www.thecanary.co/uk/news/2025/01/16/palestine-action-filton18/',
       'https://www.thecanary.co/uk/analysis/2025/01/15/pmqs-15-january/',
       'https://www.thecanary.co/uk/news/2025/01/15/everydoctor-campaign/',
       'https://www.thecanary.co/uk/analysis/2025/01/15/send-england/',
       'https://www.thecanary.co/uk/news/2025/01/15/gh-artemis/',
       'https://www.thecanary.co/uk/news/2025/01/15/shell-protest-wildfires/',
       'https://www.thecanary.co/uk/analysis/2025/01/14/ai-fourth-industrial-revolution/',
       'https://www.thecanary.co/uk/news/2025/01/14/just-stop-oil-mark-jenkinson/',
       'https://www.thecanary.co/uk/analysis/2025/01/14/anti-fatness-media-headlines/',
       'https://www.thecanary.co/uk/news/2025/01/14/leicester-birmingham-students/',
       'https://www.thecanary.co/uk/analysis/2025/01/14/palestine-action-eagle-strategic/',
       'https://www.thecanary.co/uk/news/2025/01/14/renters-rights-bill-acorn/',
       'https://www.thecanary.co/uk/news/2025/01/14/university-admissions-poorest-students/',
       'https://www.thecanary.co/uk/analysis/2025/01/14/labour-ai-policy/',
       'https://www.thecanary.co/uk/news/2025/01/13/palestine-march-18-january-bbc/',
       'https://www.thecanary.co/uk/analysis/2025/01/13/cost-of-living-skipping-meals/',
       'https://www.thecanary.co/uk/analysis/2025/01/13/scotland-access-to-justice/',
       'https://www.thecanary.co/uk/news/2025/01/13/palestine-action-parker-hannifin/',
       'https://www.thecanary.co/uk/analysis/2025/01/13/18-jan-palestine-demo-update/',
       'https://www.thecanary.co/uk/news/2025/01/13/just-stop-oil-darwin/',
       'https://www.thecanary.co/uk/analysis/2025/01/13/msm-mental-health-dwp/',
       'https://www.thecanary.co/uk/news/2025/01/12/gaie-delap-case/',
       'https://www.thecanary.co/uk/analysis/2025/01/12/bond-markets/',
       'https://www.thecanary.co/uk/news/2025/01/10/palestine-march-bbc/',
       'https://www.thecanary.co/uk/analysis/2025/01/09/schools-funding/',
       'https://www.thecanary.co/uk/analysis/2025/01/09/brexit-skills-shortage-uk/',
       'https://www.thecanary.co/uk/news/2025/01/09/just-stop-oil-abigail-percy/',
       'https://www.thecanary.co/uk/news/2025/01/09/palestine-march-18-january/',
       'https://www.thecanary.co/uk/news/2025/01/09/labour-political-donations/',
       'https://www.thecanary.co/uk/analysis/2025/01/09/dwp-complaints-report/',
       'https://www.thecanary.co/uk/analysis/2025/01/08/pmqs-8-january/',
       'https://www.thecanary.co/uk/analysis/2025/01/09/corbyn-raf-akrotiri/',
       'https://www.thecanary.co/uk/news/2025/01/08/met-police-just-blocked-a-pro-palestine-protest-from-marching-outside-the-bbc/',
       'https://www.thecanary.co/uk/analysis/2025/01/08/sas-murder/',
       'https://www.thecanary.co/uk/analysis/2025/01/08/raffi-berg-bbc/',
       'https://www.thecanary.co/uk/news/2025/01/07/just-stop-oil-dr-hart/',
       'https://www.thecanary.co/uk/analysis/2025/01/07/mcdonalds-staff-abuse/']

# Scrape articles and create a DataFrame
can_data_df = scrape_multiple_can_articles(urls)
# Store to CSV
can_data_df.to_csv("polarised_scraped_articles_can.csv", index=False)
# Print head 
can_data_df.head()

Unnamed: 0,title,text,site,date,category,class,url
0,Corbyn and McDonnell leave police station after being INTERVIEWED UNDER CAUTION by Met,"Former Labour Party leader Jeremy Corbyn and former shadow chancellor John McDonnell have been interviewed under caution by the Met Police. It was over the force’s alleged lies about events at the pro-Palestine march on Saturday 18 January. Meanwhile, the Palestine Solidarity Campaign (PSC) has hit back at the Met – accusing the police of falsifying events. BBC News reportedly found out that: MPs Jeremy Corbyn and John McDonnell have agreed to be interviewed under caution by police following a pro-Palestinian rally in central London on Saturday, the BBC understands. The former Labour leader, 75, and former shadow chancellor, 73, will voluntarily attend a police station in the capital as the Metropolitan Police investigates what it says was a coordinated effort by organisers to breach conditions imposed on the event. They will be interviewed on Sunday afternoon. Sky News were ahead of the rest of the corporate media – ‘doorstopping’ the two MPs after they were interviewed: This evocative image also now bears further relevance: As the Canary previously reported, people at the march are saying that no one forced their way through the police line. People are claiming that the police agreed to it. Corbyn’s response to the Met was: This is not an accurate description of events at all. I was part of a delegation of speakers, who wished to peacefully carry and lay flowers in memory of children in Gaza who had been killed. This was facilitated by the police. We did not force our way through. When we reached Trafalgar Square, we informed police that we would go no further, lay down flowers and disperse. At that point, the Chief Steward, Chris Nineham was arrested. We then turned back and dispersed. I urge the police to release all bodycam footage and retract its misleading account of events. So, people on X hit back at the cops interviewing Corbyn and McDonnell: Meanwhile, the PSC has issued the following statement about the Met Police’s actions on 18 January. The Metropolitan Police has promoted a misleading narrative about the events in Whitehall and Trafalgar Square, claiming that a peaceful delegation pushed through police lines in an attempt to justify their repressive actions on Saturday 18 January. This could not be further from the truth. On Saturday 18 January, we organised a rally on Whitehall to call for a permanent end to Israel’s genocide in Gaza. Despite our long-standing record of peaceful demonstrations, the police, under political pressure from pro-Israel groups, banned our planned march to the BBC. In response, we announced plans for a rally and a peaceful protest against this anti-democratic ban. Ahead of the rally, we publicly called on the police to rescind the restrictions they had imposed and allow our march to go ahead. We had also made clear that if they refused to do so we would hold a rally and protest against the ban as part of that rally. The police were fully aware of these statements and our intentions. On the day, we were confronted with extremely heavy-handed and aggressive policing. With less than 24 hours’ notice, the police had imposed a series of complex restrictions preventing people from assembling at various points on Whitehall at various times of the day – notably an area at the centre of Whitehall from which rally participants were excluded for part of the day to allow space for a children’s marching band to proceed up and down. As a result, a number of people were arrested without warning, on flimsy pretexts including simply for inadvertently standing in this central area at the wrong time. We understand that a total of 77 people were arrested on the day, 66 of them for alleged violations of these orders. At the end of the rally, it was announced from the stage that, as an act of protest against the police ban, a delegation of organisers and rally speakers – including an 87-year-old Jewish Holocaust survivor, politicians including MPs, and prominent cultural figures – would walk silently and peacefully towards the BBC. It was clearly stated that the delegation expected to be stopped by the police and that no attempt would be made to push through police lines – the delegation would simply leave the flowers they were carrying at the feet of the police and disperse in an orderly and dignified manner. They anticipated being stopped at the line of police that had been constructed at the top of Whitehall. When the delegation reached this police line, they were not stopped as expected but were instead invited to proceed into Trafalgar Square by the police who said, ‘please filter through.’ When the delegation reached the other end of the square, they encountered a line of police which prevented them from going any further. They formally requested that the delegation – a maximum of 25 people – be allowed to proceed. The police officer in charge said he would need to ‘pass this up the line for a decision.’ While the delegation was awaiting that response, the police violently and for no apparent reason arrested the chief steward of the rally, Chris Nineham. At this point, the delegation laid their flowers as they had said they would do and dispersed, and Ben Jamal and Ismail Patel used a megaphone to call on the crowd that had gathered around them to do the same, which people then did. At no stage was there any organised breach of the conditions imposed by the police. There is a large amount of video evidence confirming all of these events. This is a direct assault on freedom of assembly and democracy. The police’s actions, including their false statements after the event, are deeply troubling. We demand the immediate release of all those arrested and remain resolute in our campaign for freedom and justice for the Palestinian people.",Canary,2025-01-19T17:21:00+00:00,News,Polarised,https://www.thecanary.co/uk/news/2025/01/19/corbyn-mcdonnell-met-police/
1,People rally at the Old Bailey in support of the Palestine Action Filton 18,"Some of the Palestine Action activists from the so-called Filton 18 appeared at the Old Bailey on Friday 17 January over an action at a weapons factory that supplies genocidal Israel. Of course, they entered not guilty to the charges the state was brining against them – and they also received huge support from crowds waiting outside. In a hearing at the Old Bailey, nine of the ‘Filton 18’ political prisoners have entered ‘not guilty’ pleas on all charges put before them, while supporters amassed in solidarity outside of the court. They were called to court to plea to charges after an action in August 2024 at the Filton, Bristol site of Israel’s largest weapons company Elbit Systems. Outside of the hearing, supporters of the Palestine Action activists gathered outside in solidarity: Palestinian flags were waved: While people rallied: Predictably, the Met Police were in attendance: Supporters waved off members of the Filton 18 as they left the hearing: All 18 face charges of aggravated burglary, criminal damage, with some of the 18 additionally facing charges of violent disorder. Six activists were arrested on site for an action that saw them breach the site using a modified van, before dismantling weapons of genocide inside, including ‘quadcopter’ drone models. 12 further people were later arrested and remanded to prison for their alleged involvement. Police have justified their continued detention by alleging that their actions have a ‘terrorism connection’. The rest of the 18 are expected to enter not guilty pleas later this year. A spokesperson for Palestine Action said: We refuse to bow to this continued police intimidation and harassment. It is Elbit, Israel’s largest weapons company, that is the guilty party: those resisting the UK’s complicity in genocide are not. The activists have been returned to prison by the judge and are currently awaiting appeal hearings for bail which have been thus-far rejected. Of the 18, 10 have spent over five months in prison since August, with an additional eight detained since November. At the hearing, the judge confirmed that their case shall be seen with the 18 split across three trial dates, the first taking place in November 2025, the second in May 2026 and the final date is currently unknown. An additional date is yet to be set in March of this year, when the defence will seek to challenge and dismiss the application of a “terror connection” in this case. Amnesty International has stated that the Filton 18 case demonstrates “terrorism powers being misused” to “circumvent normal legal protections, such as justifying holding people in excessively-lengthy pre-charge detention”. The #Filton18 political prisoners have been subjected to arbitrary and repressive treatment while inside prison – including the withholding of phone calls and mail, prohibitions on communicating with other prisoners, and denials of religious practices and medical privacy.",Canary,2025-01-19T15:02:29+00:00,News,Polarised,https://www.thecanary.co/uk/news/2025/01/19/filton-18-trial/
2,Jeremy Corbyn slams Met Police's wilful 'inaccuracies' following mass-arrests at Palestine march,"On Saturday 18 January, Britons once again took to the streets to show their support for the people of Palestine. As is unfortunately common in Britain, the peaceful march was beset by what some have described as “fascist” police violence. The Met Police also arrested 77 people, with former Labour Party leader Jeremy Corbyn criticising their excuse for doing so: Corbyn’s response to the Met Police in full reads: This is not an accurate description of events at all. I was part of a delegation of speakers, who wished to peacefully carry and lay flowers in memory of children in Gaza who had been killed. This was facilitated by the police. We did not force our way through. When we reached Trafalgar Square, we informed police that we would go no further, lay down flowers and disperse. At that point, the Chief Steward, Chris Nineham was arrested. We then turned back and dispersed. I urge the police to release all bodycam footage and retract its misleading account of events. Corbyn’s former shadow chancellor also commented on the situation: According to the National, an internal police investigation is now under way. The outlet also carries the following response from the Met: We have policed more than 20 national protests organised by the PSC since October 2023. This is the highest number of arrests we have seen, in response to the most significant escalation in criminality. We could not have been clearer about the conditions in place. Protesters were to remain in Whitehall with no march towards the BBC. Our relationship with protest organisers has to be based on trust and good faith. If they say they will act responsibly and lawfully we need to be able to know those are genuine assurances. That is why it was so deeply disappointing to see a deliberate effort, involving organisers of the demonstration, to breach the conditions and attempt to march out of Whitehall. Officers responded bravely and decisively, ensuring they got no further than Trafalgar Square and certainly nowhere near their target. I am quite confident this was a coordinated breach with the intention being to reach the BBC at Portland Place in defiance of the conditions. There is video footage of one of the organisers clearly inciting the crowd to join a march and one of the organisations involved has released a statement this evening confirming as much. At the same time as the group was attempting to force its way past police lines, camera crews were seen arriving in Portland Place. It is unlikely that the timing was simply a coincidence. We are in possession of footage from officers’ body worn cameras, from CCTV and from social media. We know who was involved in leading the movement of so many people through police lines. Investigations are now underway and we will make every effort to bring prosecutions against those we identify. Speaking on the arrested Chris Nineham, Corbyn said: Earlier this week, the world received the news that there would be a ceasefire between the invading Israel and the invaded Gaza. Speaking on this at the march, Corbyn said: Corbyn also made it clear what more there is to be done: The Boycott, Divestment and Sanctions (BDC) movement released an article responding to the ceasefire: (1) From ceasefire to ceasing the genocide The Palestinian BDS National Committee (BNC), the largest coalition in Palestinian society that is leading the global Boycott, Divestment and Sanctions (BDS) movement, welcomes the news of a ceasefire agreement with immense relief. A ceasefire, however, is only the most important first step to end the genocide against the 2.3 million Palestinians in the illegally occupied and besieged Gaza Strip. Without massive pressure, it may constitute a continuation of a less visible form of genocide that Israel and the US hope will provoke less regional and global outrage, boycotts and sanctions. After all, Israel’s genocide, armed, funded and shielded from accountability by the colonial West, intentionally reduced the illegally occupied Gaza Strip into an unlivable territory by destroying life-sustaining conditions designed to cause continued mass loss of Palestinian lives and spread of infectious diseases as well as famine or food insecurity for years to come, while attempting to force as many Palestinians as possible into exile. According to UN human rights experts, this genocide has included “domicide, urbicide, scholasticide, medicide, cultural genocide and, more recently, ecocide.” The devastating effects of all these crimes, as well as the Israeli-induced starvation, will continue to kill thousands more Palestinians due to the immense carnage and Israel’s wilful destruction of life-sustaining conditions across Gaza. Only massive global pressure, especially in the form of BDS, can truly contribute to ending Israel’s genocide and support the Palestinian struggle to dismantle Israeli apartheid. The full article presents their plan for continuing to apply pressure to Israel and the Western powers which support them. Campaign Against the Arms Trade also made a statement about the ceasefire (written before it was fully agreed by both sides). Their statement makes it clear that while this is a positive development, the many decades of oppression that the Palestinians have suffered show us we shouldn’t turn our eyes away now: We welcome the news of a potential ceasefire in Gaza- anything that could bring an end to the horrors inflicted on Palestinian people is a ray of hope. However, the promised ceasefire has not yet been agreed, let alone tested, and there is no guarantee that the planned ‘second phase’ of the agreement, leading to a permanent end to the current war, will be sealed. Israel has committed genocide in Gaza, with the full complicity of the US and UK governments. Even in the best case scenario, Palestinian people in Gaza are facing a humanitarian catastrophe and environmental devastation. Homes and infrastructure are in ruins, hospitals and healthcare destroyed and people are facing starvation and disease. The genocide will continue, even without dropping bombs, unless Palestinian people are given full access to aid and the resources to rebuild. Israel has shown it has utter contempt for international humanitarian law (IHL). Even our government admits that it assesses Israel is not committed to complying with IHL. Even if there is a ceasefire, it is still breaching IHL in its actions in the occupied territories. Even if the bombs stop dropping, Israel will be breaching IHL if it does not allow aid into Gaza. While a ceasefire would be positive progress, the conflict will not be resolved while Israel and its allies deny the humanity and rights of the Palestinian people. Recognising the state of Palestine is the only path to a just peace, the only path to realising the rights and autonomy of Palestinians. Now is the time to keep up the pressure. Israel is still committing genocide with the full complicity of our government. We need to keep demanding a full two-way arms embargo. A genuine, long-lasting peace can only be achieved when we stop the flow of arms sales. Now is the time to make sure our government knows that a ceasefire doesn’t mean it is business as usual for arms dealers. It must not reinstate the few licenses it suspended. Now is the time to keep standing with the Palestinian people. Footage from the march showed thousands of people coming together to peacefully show support for Palestinians: However, some footage shows there was violence. It’s alleged that the police instigated this due to a protester filming their actions, which is what the following video appears to show: If the police officers had a legitimate reason for arresting this person, it’s not made clear in the video. Instead they demand that marchers questioning their actions “go away” and that they’ll be “locked up”. On the same day that the Met arrested 77 marchers, Declassified published the following article: The piece highlights the well-established links between British/American police forces and Israel, and these links shouldn’t come as a surprise. America and its followers support Israel because it suits American interests, and Israel in turn does what it can to support the control that Western governments have over their citizens. In other words, you shouldn’t be surprised when the British establishment comes down heavy on peaceful protesters who are siding with the victims of a genocide. It’s all the same system; it’s all the same violence; we’re just spared from the worst of it over here. For now, anyway. But that won’t hold true forever if the people in charge can get away with increasingly depraved acts of mass violence.",Canary,2025-01-19T14:01:52+00:00,News,Polarised,https://www.thecanary.co/uk/news/2025/01/19/corbyn-met-police-palestine-march/
3,The blame game: the rise in false allegations of maternal abuse in long Covid and disability,"With Donald Trump’s election in the US, a wave of uncertainty has swept the globe: for the environment, gender, war, and health. However, the undermining of women’s rights has become a major theme throughout the presidential campaign. As we lurch further into climate breakdown and instability, the further society sways to the far right – just as the world did after the Spanish Flu pandemic in 1918. A place where simple answers are easier to stomach than a complex reality, the vulnerable are seen as disposable, and there’s always someone else to blame – not least, in this context, with long Covid. Here we are focusing on laying the blame on women; the blame on mothers. Austerity, a ragged and inflexible NHS, a pandemic, an unfit SEND system and policy changes have led to an increase in false safeguarding accusations – almost always against the mother. Nevertheless, the framework and systemic tendency for medical misogyny is rooted in the beginnings of medicine. It is a conservative, top-down, and authoritarian structure built on strong foundations of patriarchy and patrimony. The Cerebra Report (2023) by Clements and Aiello found allegations of Fabricated Induced Illness (FII) against parents of disabled children were widespread, often causing devastating and lifelong family trauma. It’s one thing to go for help and not find any medical care. It’s another to be blamed for your child’s disability and illness. Clements discovered parents with disabilities were four times more likely to face accusations compared to non-disabled parents. The pandemic continues to leave in its wake a monstrous wasteland of chronic illness. One of those hydras is long Covid. In the early days of Covid, we were reassured that children were not affected, and post-acute viral disease was not even mentioned. Yet, these were simple answers for a complex situation. In March 2024, the ONS found that over 111,000 children have long Covid: a devastating, multi-system disease. Denial and the desire for normalcy have led to more families facing traumatic safeguarding referrals in an effort to divert blame and avoid institutional responsibility. This is nothing new in the world of myalgic encephalomyelitis (ME/CFS), disability, and chronic illness. The LeAP research program found social care policies in the UK assume parental failings as a default position. One in five families with a child with ME faced false claims resulting in child protection involvement. Families with children with long Covid now find themselves inheritors of a precarious and dangerous system. But how has this become so ubiquitous in medicine that it barely registers as a concern for healthcare professionals? Again, we come back to systemic problems: medics overstretched and alienated from patients; education that rarely covers post-acute viral illness; a protective and anti-whistleblower culture; a root bias towards acute illness, and an ingrained attitude denigrating ME and many chronic illnesses medicine deems unexplainable. Another key reason for this growing problem is psychiatric trespass. “Within the field of liaison psychiatry and psychosomatics, CFS, CFS/ME, IBS, FM, CI, CS, EI and a number of other conditions, for example, chronic Lyme disease, are bundled under the so-called “Functional Somatic Syndromes” (FSS) and “Medically Unexplained Syndromes” (MUS) umbrellas”. DSM-5 Working Group, DX Revision Watch. And here is the problem. In courses, textbooks, and cultural assumptions, we have the pernicious belief that medically unexplained symptoms (MUS) and its other forms have psychiatric factors. Which is why professionals can be so quick to call in child protection. Because if you have a mentally disturbed mother, that child is in danger and safeguarding action is needed. With ME/CFS and long Covid often assumed to be MUS or a Perplexing Presentation (PP) it is clear there’s a growing risk to children and families. However, it is a blunt instrument – with the majority of all safeguarding referrals unfounded and causing untold damage. Action for ME found that 70% of cases were dropped in a year. Tymes Trust has been involved in over 140 cases; none resulted in a guilty verdict. The Cerebra Report found that 84% cases were abandoned or had no follow up. Dr. Nigel Speight, involved in over 200 cases over the last 25 years, said: In c.98% of all the other cases, proceedings were aborted without a court order and the case eventually closed. Some of these cases had to experience prolonged social work involvement, albeit with no real threat of removal. It is this wild, free-reign that psychiatry and the DSM-5 have that is so dangerous. It can claim, without evidence, any poorly understood disorder without due warranty, into psychological abnormality due to its broad reclassification of somatic symptom disorder (SSD) in the DSM-5. It’s such a vast overreach that psychologists have classed up to 23% of the population as having maladaptive thoughts connected to physical symptoms. Long Covid, ME, and other long-term illnesses can easily be given a psychiatric SSD diagnosis. In the book Cracked, Dr James Davies states: first we named a so-called medical disorder before it has identified any pathological basis in the body. So even when there’s no biological evidence that a mental disorder exists, that disorder can still enter the DSM and become part of our medical culture. However, it is not just the DSM-5. ME is in the liaison psychiatry and functional somatic syndromes of a major textbook, Kumar and Clarke’s Clinical Medicine. It is no wonder why women with ME can be sectioned so easily or starved to death. Another way psychiatry has overreached is that the Royal College of Paediatrics and Child Health (RCPCH) have included PP and MUS as an alerting sign for FII in their guidance. There are fundamental issues here. One being that the RCPCH alerting signs are not derived from any peer-reviewed research. This amalgamation of conditions is increasing unlawful adverse discrimination at an alarming rate. Yet the RCPCH does not recognise the harm caused by false allegations. Even the British Association of Social Workers state in its FII Practice Guide: If social workers were to follow the RCPCH guidance, the proposed assessment criterion for FII is likely to cast suspicion on many families who are not harming their children, including children and young people with disabilities and illnesses that are undiagnosed. FII is an accepted very rare condition. Gullon-Scott and Long estimated between 53-376 cases in the UK and that the RCPCH guidelines lead to an extraordinary number of false positives. But it is the human story that we need to remember. One mother with a child with long Covid and who experienced unfounded safeguarding proceedings said: Hearing that I was abusing my daughter because she was using a wheelchair due to fatigue and pain, and doing my own research on how to manage her symptoms after being repeatedly being gaslit by medical professionals, was devastating for both myself and my daughter. After constant fighting for care and support for her for over 18 months, I was broken and my daughter lost all remaining trust in any medical professionals, her school and all professionals The trauma for families cannot be understated. It’s more complicated than saying those with MUS and PP are mentally ill. We can see from this presentation by professor Stokes that there is an assumption of a mix of physical, psychological, and behavioural factors and how embedded the culture of patient blame is in medicine. However, this is an old, old trope – one that has delayed treatment and research for decades, especially for women, due to the prejudice that their emotional lives are caught up in their biological illnesses. In many ways, it is a perfect storm. The reaction to the pandemic through educational government policy has been to zealously focus on attendance. A narrative of parent blame and anxiety has been perpetuated, so the foundations do not have to be fixed. Children in schools remain unprotected from airborne illness. The updated (September 2024) statutory government guidance, ‘Keeping Children Safe in Education’ changed the phrase ‘deliberately missing’ education to ‘unexplainable and/or persistent absence’ in relation to safeguarding. It is another pressure point where, in the omission of policy on long Covid, there is an increased risk of unsubstantiated child protection. Schools are now expected to inform social workers for any unexplained absence. There is a dearth of high-quality research, medical education, and specialism in long Covid. Yet we have a situation where it’s easier to point the finger at families, at women. We become the scapegoats, a simple answer for a complex problem – all to avoid humility in the face of responsibility. It is clear that we need action on multiple levels: a thorough change of medical literature and guidance to close loopholes; high-quality paediatric research, and a drastic shift in culture. We live in a connected world where politics, medicine, and social forces collide. We are more likely to experience pandemics because of the destruction of the environment. Yet the political response to climate breakdown and pandemic fallout has been a rise in populism. It supplies simple answers, shifts the blame game on to the individual – all to avoid institutional due diligence. It is no wonder that women have been caught in the crosshairs. We have become an easy target for complex problems.",Canary,2025-01-19T12:51:45+00:00,Analysis,Polarised,https://www.thecanary.co/uk/analysis/2025/01/19/fii-long-covid/
4,Starmer ends a disastrous week with his most ridiculous pledge yet,"Prime minister Keir Starmer has had another bad week. While pretty much all of his weeks in power have gone badly, they haven’t all ended with a punchline. That’s because this week closed with the struggling Starmer declaring his intention to remain as prime minister for 10 years. 10 years! The man hasn’t finished one year yet, and as things stand a majority of Britons hope that he doesn’t: The biggest calamity for Starmer this week was also a welcome yet temporary reprieve for sick and disabled people. As Rachel Charlton-Dailey wrote for the Canary: On Thursday 16 January, a high court judge ruled what disabled people already knew, the Department for Work and Pensions (DWP) had acted unlawfully in their controversial plans to reform the Work Capability Assessment (WCA). However, the DWP has also revealed that the Labour Party government is planning on going ahead with the WCA changes the Tories first tabled – but it will run another consultation, first. Disabled activist and all-round legend Ellen Clifford took the DWP to court over a consultation which ran for just eight weeks in 2023. The department later used the evidence gathered from the consultation to make changes to the WCA, and who qualified as Limited Capability for Work Related Activity (LCWRA) – which are due to come into effect this year. If they come in, the plans would see over 400,000 disabled people lose around £416 a month, something which could lead to countless deaths – this is on top of the god knows how many deaths the WCA has already caused. The real sting in the tail to this story was how Starmer reacted to the court loss, with Starmer going out of his way to impress the rancid British tabloids, as the Sun gleefully reported: Asked if he had the “stomach or the balls” to take on his squeamish MPs who have railed against benefits cuts for years, the PM insisted: “Yes. I love fights. I had to fight to get the leadership of the Labour Party, had to fight to win the election.” Some like Raphael Dogg ridiculed the idea that Starmer is a ‘fighter’: While Starmer may be winning friends at the Sun – a newspaper which has been in decline for years – he’s not winning support from the electorate: Oh, and he’s also not really winning friends at the Sun: Starmer’s war on sick and disabled people contrasted poorly with comments he made on Ukraine: As many have pointed out, Starmer is happy to pledge unwavering support to an endless foreign conflict while simultaneously cutting DWP funds in his war against Britain’s disabled population: And now for the punchline. Politico interviewed Starmer during his recent trip to Ukraine. Reportedly, Starmer told them: “We are now, what, four and a half years before the next election,” Starmer said, sitting at a table for dinner in a traditional Ukrainian restaurant in Kyiv. “I remind myself that four and a half years ago, Boris Johnson was prime minister with very high ratings and most commentators were saying he’s going to be prime minister for the next 10 years. So I am a great believer in taking each step as it comes, facing each challenge as it comes, keeping my eye on the long term and not getting distracted by the noises off.” We’re sure he does remind himself of that; we’re sure he reminds himself every day given the disastrous polling and openly-expressed disgust of the public. Just look: And as our favourite analyst on Twitter also pointed out: As James Foster rightly notes, Starmer became leader of the Labour Party by lying. He then became prime minister of the country by lying again: The problem with lying to take power is that you can’t keep getting away with it. There are some who will happily eat up the lies; they’re just not doing a good job of making those lies palatable to anyone else: Starmer doesn’t understand that while you can get away with telling people ‘I will deliver’; you can’t get away with delivering people shit once you’re in power. Not as a Labour politician, anyway. He looks to Boris Johnson, but Johnson and his Tory Party had most of the media backing them up. Starmer has no one on his side except Keir Starmer and the soulless toadies in his cabinet – the least convincing people in the country. So, is it genuinely ridiculous for Starmer to think he can win another election? The short answer is ‘yes’; the long answer is ‘ha ha ha ha ha, yes’. This is what it looks like when the PM walks around the UK right now: And this is what the public is telling him: Starmer is the worst leader the Labour Party has ever produced – a vacuous non-entity who exists solely to siphon money from public services to private bank accounts. We don’t know how long he’ll last as prime minister, but we do know there’ll be serious talk of him going if this year’s local elections go as badly as people are predicting.",Canary,2025-01-19T11:34:29+00:00,Analysis,Polarised,https://www.thecanary.co/uk/analysis/2025/01/19/starmer-new-policy/


**Breitbart**

Articles have been scrapped from the News section in reverse chronological order: https://www.breitbart.com/news/source/breitbart-news/

In [None]:
urls= ['https://www.breitbart.com/2024-election/2025/01/19/trump-makes-triumphant-crowd-entrance-at-inaugural-eve-maga-rally/',
      'https://www.breitbart.com/politics/2025/01/19/supporters-gather-for-trumps-inauguration-eve-victory-rally/',
      'https://www.breitbart.com/law-and-order/2025/01/19/biden-grants-posthumous-pardon-to-black-nationalist-marcus-garvey/',
      'https://www.breitbart.com/politics/2025/01/19/trump-swear-in-with-personal-bible-and-lincoln-bible-during-inauguration/',
      'https://www.breitbart.com/politics/2025/01/19/poll-most-americans-one-word-summary-president-joe-bidens-legacy-nothing/',
      'https://www.breitbart.com/politics/2025/01/19/exclusive-tim-scott-releases-video-highlighting-american-renewal-ahead-trump-inauguration/',
      'https://www.breitbart.com/tech/2025/01/19/tiktok-restores-services-in-u-s-after-trump-says-he-will-issue-executive-order-delaying-ban/',
      'https://www.breitbart.com/politics/2025/01/19/jon-voight-matt-boyle-james-okeefe-tribute-andrew-breitbart-patriot-awards/',
      'https://www.breitbart.com/politics/2025/01/19/poll-most-americans-have-negative-view-outgoing-president-joe-bidens-time-office/',
      'https://www.breitbart.com/entertainment/2025/01/19/actor-cnn-show-host-michael-ian-black-wants-to-fuing-impeach-trump-before-hes-sworn-in/',
      'https://www.breitbart.com/clips/2025/01/19/kaine-claims-trumps-proposed-deportations-will-be-gut-punch-to-economy/',
      'https://www.breitbart.com/clips/2025/01/19/graham-to-cbs-worry-about-reporting-the-news-fairly-which-you-dont-do-when-it-comes-to-everything-trump/',
      'https://www.breitbart.com/entertainment/2025/01/19/michael-rapaport-celebrates-the-demise-of-dirty-biased-damn-near-soft-porn-dumphole-tiktok/']

### 3. Satire
Satirical content is intended to entertain or provoke thought through humor, exaggeration, or irony. Satire is often misunderstood as factual. 

##### Features:

- Humourous or Exaggerated Tone: Content is typically marked by wit, parody, or absurdity.
- Intentional Ridiculousness: The story is meant to be funny, not factual; outlandish claims serve comedic purposes.

##### Label If:

- The piece’s goal is clearly comedic or parodic, rather than deceptive.
- The tone, language, or disclaimers indicate it’s intentionally satirical.

##### Do Not Label If:

- The piece uses humour but is still intended to mislead (label as Fabricated Content).
- The piece is comedic but still pushing a heavily skewed narrative as if it’s true (label as Polarised Content).

##### Sources:
- The Onion (USA - 55 articles)
- Babylon Bee (USA - 50 articles)
- The Daily Squib (UK - 45 articles)
- Waterford Whispers (IE - 50 articles)


**The Onion**

The articles scraped are the ones featured on the 2024 "Annual Year" post found here: https://theonion.com/our-annual-year-2024/ - the top 5 from each month have been chosen (image posts have been excluded as per scope), so a total of 55 articles a December has not been included in their roundup.

In [2]:
def scrape_onion_article(url):
    """
    Scrapes an article from a given URL on theonion.com and extracts relevant information.

    Parameters:
    ----------
    url : str
        The URL of the article to scrape.

    Returns:
    -------
    dict
        A dictionary containing the extracted article data.
    """
    article_data = {
        "title": "",
        "text": "",
        "site": "",
        "date": "",
        "category": "",
        "class": "Satire", #satire is hardcoded here as we know TheOnion is a satire site
        "url": url
    }

    response = requests.get(url)
    if response.status_code == 200:
        soup = BeautifulSoup(response.content, 'html.parser')

        # Title
        title_meta = soup.find('meta', property='og:title')
        article_data["title"] = title_meta['content'] if title_meta else "Title not found"
        
        # URL
        url_meta = soup.find('meta', property='og:url')
        article_data["url"] = url_meta['content'] if url_meta else url  # Fallback to input URL
        
        # Site name
        site_name_meta = soup.find('meta', property='og:site_name')
        article_data["site"] = site_name_meta['content'] if site_name_meta else "Site name not found"
        
        # Published date
        published_date_meta = soup.find('meta', property='article:published_time')
        article_data["date"] = published_date_meta['content'] if published_date_meta else "Published date not found"
        
        # Category
        category_element = soup.find('div', class_='taxonomy-category')
        category_link = category_element.find('a') if category_element else None
        article_data["category"] = category_link.text.strip() if category_link else "Category not found"
        
        # Article copy
        content_div = soup.find(
            "div",
            {"class": lambda x: x and "entry-content" in x and "single-post-content" in x}
        )
        if content_div:
            paragraphs = content_div.find_all("p")
            full_text = " ".join(p.get_text(strip=True) for p in paragraphs)
            article_data["text"] = full_text
        else:
            article_data["text"] = "Article text not found"
    
    
    else:
        print(f"Failed to fetch the webpage: {url}. Status code: {response.status_code}")
    
    return article_data


def scrape_multiple_onion_articles(urls):
    """
    Scrapes multiple articles from a list of URLs and stores the data in a DataFrame.

    Parameters:
    ----------
    urls : list
        A list of article URLs to scrape.

    Returns:
    -------
    pd.DataFrame
        A DataFrame containing the scraped data from all URLs.
    """
    articles = []
    for url in urls:
        article = scrape_onion_article(url)
        articles.append(article)
    return pd.DataFrame(articles)


# List of URLs to scrape
urls = [
    #January
    "https://theonion.com/biden-addresses-nation-while-hanging-from-branch-on-sid-1851106795/",
    "https://theonion.com/marriage-counselor-sides-with-hotter-spouse-1851143488/",
    "https://theonion.com/wealthy-dad-surprises-child-with-tree-house-he-can-airb-1851112919/",
    "https://theonion.com/glowing-pulsating-hair-product-takes-control-of-gavin-1851160421/",
    "https://theonion.com/gen-z-announces-julie-andrews-is-problematic-but-refuse-1851180352/",
    #February
    "https://theonion.com/mrbeast-announces-he-has-resurrected-everyone-buried-at-1851217565/",
    "https://theonion.com/introverted-cowboy-struggling-to-round-up-posse-1851226175/",
    "https://theonion.com/country-stations-refuse-to-play-beyonce-s-music-after-a-1851261135/",
    "https://theonion.com/stab-him-stab-him-you-cowards-says-terrified-kamal-1851243467/",
    "https://theonion.com/emerging-filmmaker-malia-obama-changes-surname-to-scors-1851278946/",
    #March
    "https://theonion.com/u-s-airdrops-rubble-into-gaza-1851305713/",
    "https://theonion.com/ozempic-maker-triumphantly-announces-new-drug-that-make-1851320436/",
    "https://theonion.com/study-millennial-women-forgoing-dating-apps-in-favor-o-1851338275/",
    "https://theonion.com/beyonce-reveals-new-country-album-cover-featuring-tooth-1851355991/",
    "https://theonion.com/but-dog-likes-fighting-for-money-1851352386/",
    #April
    "https://theonion.com/finance-whiz-has-over-300-in-bank-account-1851375065/",
    "https://theonion.com/sotheby-s-announces-auction-of-napkin-on-which-jeffrey-1851375213/",
    "https://theonion.com/o-j-simpson-allowed-to-remain-living-after-coffin-does-1851403804/",
    "https://theonion.com/travis-kelce-impresses-coachella-crowd-by-tossing-taylo-1851410856/",
    "https://theonion.com/biden-carried-away-by-ants-1851422363/",
    #May
    "https://theonion.com/tesla-lays-off-entire-team-behind-brakes-1851449223/",
    "https://theonion.com/drake-drops-new-track-inviting-kendrick-lamar-out-to-co-1851458534/",
    "https://theonion.com/perdue-announces-initiative-to-even-the-playing-field-b-1851423157/",
    "https://theonion.com/new-florida-law-requires-all-women-to-produce-3-healthy-1851482288/",
    "https://theonion.com/everyone-in-er-bit-off-finger-while-holding-sandwich-1851488798/",
    #June
    "https://theonion.com/cult-leader-not-even-charismatic-1851512851/",
    "https://theonion.com/embarrassed-david-attenborough-realizes-he-spent-10-min-1851512951/",
    "https://theonion.com/newest-u-s-aid-mission-just-single-powerbar-labeled-f-1851540802/",
    "https://theonion.com/report-every-place-on-earth-has-wrong-amount-of-water-1851544516/",
    "https://theonion.com/nasa-warns-space-hawk-has-swooped-in-and-picked-up-eart-1851544578/",
    #July
    "https://theonion.com/clarence-thomas-torn-over-case-where-both-sides-offer-c-1851566812/",
    "https://theonion.com/democrats-panic-after-kamala-harris-ages-40-years-in-si-1851601473/",
    "https://theonion.com/congress-bans-roofs-1851592883/",
    "https://theonion.com/news-happening-faster-than-man-can-generate-uninformed-1851601466/",
    "https://theonion.com/god-forced-to-shave-head-after-contracting-plague-of-li-1851580149/",
    #August
    "https://theonion.com/environmentalists-warn-u-s-running-out-of-small-wooded-1851609190/",
    "https://theonion.com/r-kelly-petitions-supreme-court-to-watch-him-pee-1851619802rev1723482404693/",
    "https://theonion.com/federated-union-of-bear-cub-carcass-dumpers-endorses-rf-1851613425/",
    "https://theonion.com/glen-powell-opens-up-about-dangerous-stunt-work-filming-with-sydney-sweeneys-breasts/",
    "https://theonion.com/j-d-vance-accuses-tim-walz-of-stolen-valor-for-wearing-1851621120/",
    #September
    "https://theonion.com/everyone-in-restaurant-jealous-of-toddler-who-gets-to-wear-pajamas-and-watch-ipad/",
    "https://theonion.com/horrified-taylor-swift-realizes-football-happens-every-year/",
    "https://theonion.com/trump-avoids-answering-hard-questions-by-pretending-he-shot-in-ear-again/",
    "https://theonion.com/man-replies-stop-to-political-fundraiser-text-like-powerful-wizard-casting-spell-to-ward-off-mythical-beast/",
    "https://theonion.com/scarecrow-has-double-ds/",
    #October
    "https://theonion.com/the-onion-officially-endorses-joe-biden-for-president/",
    "https://theonion.com/texas-sex-ed-class-teaches-boys-how-to-cheat-on-pregnant-wife/",
    "https://theonion.com/sabrina-carpenter-completes-mandatory-service-in-south-korean-military/",
    "https://theonion.com/north-carolina-family-informed-their-insurance-policy-voided-once-house-gets-wet/",
    "https://theonion.com/grandma-who-survived-great-depression-casually-drops-that-she-once-killed-man-for-mayonnaise/",
    #November
    "https://theonion.com/piss-soaked-tucker-carlson-claims-demon-urinated-on-him-while-he-slept/",
    "https://theonion.com/trump-calls-harris-to-congratulate-himself-on-winning/",
    "https://theonion.com/america-defeats-america/",
    "https://theonion.com/man-forgetting-difference-between-meteoroid-meteorite-struggles-to-describe-what-just-killed-his-dog/",
    "https://theonion.com/every-movement-in-mans-burrito-eating-technique-informed-by-past-burrito-tragedies/"
]

# Scrape articles and create a DataFrame
onion_data_df = scrape_multiple_onion_articles(urls)
# Store to CSV
onion_data_df.to_csv("satire_scraped_articles_onion.csv", index=False)
# Print head 
onion_data_df.head()

Unnamed: 0,title,text,site,date,category,class,url
0,Biden Addresses Nation While Hanging From Branch On Side Of Cliff,"WASHINGTON—Using his platform to plead for Americans to lend him a hand, President Joe Biden addressed the nation Monday while hanging from a branch on the side of a cliff. “Our democracy has never before hung in the balance more than it has at this moment when I am in danger of plummeting 50 feet to those sharp rocks below,” said Biden, who implored the U.S. populace to set aside its differences and find a long stick, a rope, or, preferably, a helicopter that they could use to return him to stable ground. “What’s important is not what led us to this point, but rather how we choose to move forward in helping me back up. Even a carefully placed mattress or pile of sofa cushions would do. My fellow Americans, I urge you to act fast, as a small bird has landed on my head and is now pecking at me.” At press time, a Gallup Poll had found that 70% of Americans opposed Biden being rescued.",The Onion,2024-01-01T11:45:00+00:00,Politics,Satire,https://theonion.com/biden-addresses-nation-while-hanging-from-branch-on-sid-1851106795/
1,Marriage Counselor Sides With Hotter Spouse,"ANCHORAGE, AK—Stating that she had heard both perspectives and could understand their frustrations, marriage counselor Laurie Hartford reportedly told couple David and Julia Carter that she ultimately had to side with the hotter spouse. “So, I’ve listened to everything you’ve had to say, and I’ve come to the conclusion that while David does seem to be emotionally withholding, he’s also at least two points hotter,” said the therapist, who rushed to note that, in all fairness, she needed to take into consideration that she would at best describe the female half of the relationship “as, like, a six even on her best day.” “I’ve spent hours listening to you pour out your hearts and that’s never easy, so pat yourselves on the back. But, frankly, only one of you has bothered to comb your hair or put on a nice shirt at these sessions. I’m not in any way trying to invalidate your experiences. All I’m saying is that only one of you—David—has an ass that you could bounce a quarter off, and the other one is kind of an uggo, if that makes sense?” Hartford went on to say that it might be helpful if Julia stayed at home for their next sessions so that they could spend more time understanding where, exactly, David’s hotness came from.",The Onion,2024-01-09T11:30:00+00:00,Local,Satire,https://theonion.com/marriage-counselor-sides-with-hotter-spouse-1851143488/
2,Wealthy Dad Surprises Child With Tree House He Can Airbnb For Passive Income,"WILMETTE, IL—Telling the child not to peek as they walked into the backyard, local wealthy man Kenneth Schweitz reportedly surprised his son Tuesday with a tree house that the young boy could Airbnb for passive income. “It’s time you got your own little space that can be rented out for short-term stays and used to produce a reliable revenue stream,” a visibly excited Schweitz said as he took his hands off his son’s eyes to reveal the fully appointed structure built into the tree’s branches, stressing to the boy that he would not have to do any real work for the lodging to generate substantial returns. “Your mom and I can help you decorate it, but then it’s all up to you to decide how much to charge per night and which cleaning service to hire, bud. After that, you can sit back and collect thousands of dollars a month. How cool is that? You and your little friends are going to have so much fun building your little real estate empire. Enjoy!” At press time, sources reported Schweitz’s son was enthusiastically climbing into the tree house to serve an eviction notice to the low-income family currently living there.",The Onion,2024-01-09T17:30:00+00:00,Local,Satire,https://theonion.com/wealthy-dad-surprises-child-with-tree-house-he-can-airb-1851112919/
3,"Glowing, Pulsating Hair Product Takes Control Of Gavin Newsom’s Thoughts","SACRAMENTO, CA—As an otherworldly glow emanated from the California governor’s meticulously sculpted coiffure, sources confirmed Friday that the pulsating hair product on Gavin Newsom’s head had taken control of his thoughts. “There will be no bills signed, no presidential campaign—there will only be hair,” said the disembodied voice emanating from the greasy, slicked-back mass atop Newsom’s skull, his hair reportedly growing into thick, powerful tendrils long enough to choke out his political opponents anywhere they might try to hide in the State Capitol. “There will be no clemency for those who refuse to succumb to the wet and shiny hair. With these mighty strands, I command the wildfires and the earthquakes, the droughts and the floods!” At press time, sources confirmed Newsom’s hair product had evicted several homeless people seeking shelter within the throbbing gelatinous nest upon his head.",The Onion,2024-01-19T17:45:00+00:00,Politics,Satire,https://theonion.com/glowing-pulsating-hair-product-takes-control-of-gavin-1851160421/
4,Gen Z Announces Julie Andrews Is Problematic But Refuses To Explain Why,"​​NEW YORK—Standing before a crowd of millennials, Gen Xers, and baby boomers, members of Generation Z announced at a press conference Wednesday that actress Julie Andrews was problematic, but they refused to explain why. “You know what she did—you just don’t want to admit it,” said Gen Z spokesperson Taylor Collaco, who rolled her eyes in response to requests from those who wanted to know what exactly theSound Of Musicstar had said or done to have earned the ostracism of millions of Americans ages 12 to 27. “Yes, that Julie Andrews. Has she been so normalized that you can’t even see it? Yikes. Oh, come on, it’s not my job to educate you.” At press time, Gen Z had dropped a hint that it had something to do with the Genovian monarchy.",The Onion,2024-01-24T13:44:00+00:00,Entertainment,Satire,https://theonion.com/gen-z-announces-julie-andrews-is-problematic-but-refuse-1851180352/
5,MrBeast Announces He Has Resurrected Everyone Buried At Arlington National Cemetery,"GREENVILLE, NC—Telling viewers of his latest charitable video to prepare themselves for his “most epic challenge yet,” 25-year-old influencer Jimmy “MrBeast” Donaldson announced Friday that he had resurrected everyone buried at Arlington National Cemetery. “You might not know this, but sadly, over 400,000 of our nation’s most decorated veterans have to spend eternity dead and underground,” the content creator says in the video, explaining that he has found a scientist willing to help him reanimate the dead and has spent millions of dollars to pump 75,000 volts of electricity into each plot of the hallowed cemetery. “Thanks to this amazing procedure, every single one of these deserving corpses has crawled out of the ground and begun roaming the earth again, totally free of charge to them and their families. They’ll also receive $10,000 in cash. Frankly, it’s a tragedy we’ve let these American heroes rot in their graves for this long.” Reacting to a moment in the video when MrBeast gives a reanimated World War II veteran a Lamborghini, viewers expressed outrage toward the apparent ingratitude of the undead soldier, who only screams, “Why did you do this? Kill me…kill me!” into the camera.",The Onion,2024-02-02T13:24:00+00:00,News,Satire,https://theonion.com/mrbeast-announces-he-has-resurrected-everyone-buried-at-1851217565/
6,Introverted Cowboy Struggling To Round Up Posse,"BANDERA, TX—Admitting that he was actually a lot more shy and reserved than folks might think, introverted cowboy Cassidy Walsh sheepishly told reporters Friday that he’d been struggling lately to round up a posse. “While I might seem confident and outgoing at times, the truth is, I’m the sort of feller who needs to recharge at the end of a long day ridin’ the range with a bunch of cowhands,” said Walsh, adding that he also experienced “a might fair bit of social anxiety” that probably stemmed from a fear his attempts to organize a posse would end in rejection. “Don’t get me wrong, I enjoy spending time with ol’ buckaroos like myself. It’s just that your pal Cassidy can only handle so much hootin’ and hollerin’ before he plumb runs out of steam. Now if you’ll excuse me, I’m gonna kick up my spurs and snuggle up in my bedroll with a Louis L’Amour novel.” At press time, another successful train robbery had reportedly been carried out in the area by a tireless gang of extroverted outlaws.",The Onion,2024-02-06T15:16:00+00:00,Local,Satire,https://theonion.com/introverted-cowboy-struggling-to-round-up-posse-1851226175/
7,Country Stations Refuse To Play Beyoncé’s Music After Artist Condemns Iraq War,"HOUSTON—Calling the popular musician traitorous for failing to support President George W. Bush in a time of crisis, thousands of country stations across America reportedly refused to play Beyoncé’s music Thursday after the artist condemned the Iraq War. “If she doesn’t want to support our troops risking their lives out there for the cause of freedom, then we don’t need her,” said country radio executive Hunter Roeloffs, one of many station owners who blacklisted the recent singles “Texas Hold ’Em” and “16 Carriages” after controversial comments in which the star expressed reservations about the U.S.-led Coalition invasion of Iraq—remarks that also led to a reported drop in ticket sales and Beyoncé losing a sponsorship deal with Lipton. “Unlike Miss Knowles, we’re proud Americans here at 100.3 the Bull. We support freedom, whether it’s here or in the Middle East. So when she says innocent lives will be lost, I can’t help but wonder how she could possibly think a bloodthirsty dictator like Saddam Hussein is innocent. And then there’s that line of hers about being ashamed of President Bush? Well, we’re ashamed of her. How about that?” At press time, Beyoncé had attracted additional criticism from the country music scene after rebranding herself as the Chicks.",The Onion,2024-02-15T19:00:00+00:00,Entertainment,Satire,https://theonion.com/country-stations-refuse-to-play-beyonce-s-music-after-a-1851261135/
8,"‘Stab Him! Stab Him, You Cowards!’ Says Terrified Kamala Harris To Aides After Plunging First Knife Into Biden’s Back","WASHINGTON—Moments after pulling shut the door to the Roosevelt Room and locking it behind her, a terrified Vice President Kamala Harris reportedly told aides to “Stab him! Stab him, you cowards!” on Friday after she plunged a knife into President Joe Biden’s back. “What are you waiting for, you fools? Strike now! Strike before the opportunity goes cold!” said the blood-dappled vice president, who, as her staff appeared to grow uncertain of the blades in their shaking hands and backed away toward the exit, reminded each panicked aide in turn that they had pledged their fealty for this day. “Think of all I’ve promised you. Think of all we stand to gain. Quick, now, the first blow has been rendered. There is no going back. We’re confederates in this. We must act now or be damned by inaction!” At press time, sources confirmed President Biden had complained to an assistant of a tightness in his shoulder and returned to the Oval Office with the knife still protruding from his back.",The Onion,2024-02-16T11:15:00+00:00,Politics,Satire,https://theonion.com/stab-him-stab-him-you-cowards-says-terrified-kamal-1851243467/
9,Emerging Filmmaker Malia Obama Changes Surname To Scorsese,"PARK CITY, UT—Noting that she did not want her parents’ fame to distract from her Sundance premiere, industry sources confirmed Thursday that emerging filmmaker Malia Obama had changed her surname to ‘Scorsese.’ “Although her legal name is still Obama, Malia is officially promoting her short filmThe Heartunder the pseudonym Malia Martin Scorsese,” said Sundance spokesperson Shelby Fleming, adding that the 25-year-old had been using the more neutral, nondescript moniker since writing for Donald Glover’s television seriesSwarm. “When people see the last name Scorsese, they don’t see the daughter of a former president. They see a blank slate. She’s hopeful this slight change will help people take her art much more seriously.” At press time, Obama announced that her next film would be a gritty portrait of 1970s Little Italy titledMean Streets.",The Onion,2024-02-22T18:40:00+00:00,Entertainment,Satire,https://theonion.com/emerging-filmmaker-malia-obama-changes-surname-to-scors-1851278946/


**Babylon Bee**

The top 50 articles from the Greatest Hits page (https://babylonbee.com/news?sort=greatest-hits) have been scraped. The categories "Christian Living" and "Scripture" were excluded for being too niche. 


In [3]:
urls = [
    "https://babylonbee.com/news/trump-i-have-done-more-for-christianity-than-jesus",
    "https://babylonbee.com/news/senate-to-be-replaced-with-room-full-of-monkeys-throwing-feces",
    "https://babylonbee.com/news/motorcycle-that-identifies-as-bicycle-sets-world-cycling-record",
    "https://babylonbee.com/news/trumps-says-5-golden-tickets-to-be-hidden-among-stimulus-checks",
    "https://babylonbee.com/news/nfl-to-adorn-all-uniforms-with-lace-doilies-in-to-honor-rbg",
    "https://babylonbee.com/news/pelosi-rips-up-bible",
    "https://babylonbee.com/news/biden-cuts-holes-in-medical-mask-so-he-can-still-sniff-people",
    "https://babylonbee.com/news/man-identifying-6-year-old-crushes-game-winning-homer-tee-ball-championship",
    "https://babylonbee.com/news/biden-i-am-the-only-candidate-who-can-beat-ronald-reagan",
    "https://babylonbee.com/news/fisher-price-introduces-supreme-court-protest-playhouse-that-can-be-vandalized-and-burned-down",
    "https://babylonbee.com/news/cracker-jacks-changes-name-to-more-politically-correct-caucasian-jacks",
    "https://babylonbee.com/news/cdc-people-dirt-clintons-843-greater-risk-suicide",
    "https://babylonbee.com/news/walmart-requiring-all-shoppers-to-wear-pants",
    "https://babylonbee.com/news/ilhan-omar-withdraws-support-from-bill-to-save-the-earth-after-learning-thats-where-israel-is",
    "https://babylonbee.com/news/inspiring-celebrities-spell-out-were-all-in-this-together-with-their-yachts",
    "https://babylonbee.com/news/democrats-warn-that-american-people-may-tamper-with-next-election",
    "https://babylonbee.com/news/people-who-tweet-in-support-of-foreign-wars-to-be-automatically-enlisted-in-armed-forces",
    "https://babylonbee.com/news/bernie-sanders-praises-china-for-eradicating-poverty-by-killing-all-the-poor-people",
    "https://babylonbee.com/news/pence-cancels-general-election-to-stymie-coronavirus",
    "https://babylonbee.com/news/walmart-discontinues-sale-of-auto-parts-to-prevent-car-accidents",
    "https://babylonbee.com/news/federal-prison-hires-top-rated-italian-bodyguard-hillena-clintonelli-to-protect-ghislaine-maxwell",
    "https://babylonbee.com/news/kim-jong-un-attends-ivy-league-university-to-learn-new-brainwashing-techniques",
    "https://babylonbee.com/news/florida-recount-finally-wraps-up-al-gore-declared-president",
    "https://babylonbee.com/news/powerful-protesters-spell-out-love-with-burning-homes-and-businesses",
    "https://babylonbee.com/news/joel-osteen-tests-positive-for-heresy",
    "https://babylonbee.com/news/caravan-of-liberal-americans-makes-way-toward-socialist-paradise-of-venezuela",
    "https://babylonbee.com/news/in-genius-move-trump-supports-impeachment-forcing-democrats-to-oppose-it",
    "https://babylonbee.com/news/cnn-publishes-real-news-story-for-april-fools-day",
    "https://babylonbee.com/news/government-accidentally-shuts-itself-down-with-ban-on-non-essential-businesses",
    "https://babylonbee.com/news/wife-unaware-that-movie-will-answer-all-her-questions-if-she-just-pays-attention",
    "https://babylonbee.com/news/bernie-sanders-arrives-in-hong-kong-to-lecture-protesters-on-how-good-they-have-it-under-communism",
    "https://babylonbee.com/news/jussie-smollett-offered-job-at-cnn-after-fabricating-news-story-out-of-thin-air",
    "https://babylonbee.com/news/portland-police-wish-there-were-some-kind-of-organized-armed-force-that-could-fight-back-against-antifa",
    "https://babylonbee.com/news/to-celebrate-move-to-texas-tesla-introduces-battery-powered-ar-15",
    "https://babylonbee.com/news/genius-trump-nominates-joe-biden-to-supreme-court",
    "https://babylonbee.com/news/hillary-clinton-accidentally-posts-condolences-for-tulsi-gabbards-suicide-one-day-early",
    "https://babylonbee.com/news/twitter-shuts-down-entire-network-to-slow-spread-of-negative-biden-news",
    "https://babylonbee.com/news/celebrities-show-solidarity-with-protesters-by-burning-their-own-homes-to-the-ground",
    "https://babylonbee.com/news/lego-introduces-new-sharper-bricks-that-instantly-kill-you-when-you-step-on-them",
    "https://babylonbee.com/news/democrats-call-for-flags-to-be-flown-half-mast-to-grieve-death-of-soleimani",
    "https://babylonbee.com/news/californians-brace-for-deadly-50-degree-cold-front",
    "https://babylonbee.com/news/brilliant-trump-puts-himself-on-all-postage-stamps-forcing-democrats-to-abolish-the-usps",
    "https://babylonbee.com/news/nations-nerds-wake-up-in-utopia-where-everyone-stays-inside-sports-canceled-social-interaction-forbidden",
    "https://babylonbee.com/news/hollywood-rushes-to-make-pedophilia-acceptable-before-theyre-outed-by-ghislaine-maxwell",
    "https://babylonbee.com/news/as-part-of-settlement-with-nick-sandmann-cnn-hosts-must-wear-maga-hats-while-on-the-air",
    "https://babylonbee.com/news/biden-campaign-says-he-is-so-close-to-a-vp-pick-he-can-smell-her",
    "https://babylonbee.com/news/trump-says-to-drink-lots-of-water-media-reports-as-deranged-trump-tells-everyone-to-drown-themselves",
    "https://babylonbee.com/news/starbucks-unveils-new-satanic-holiday-cups",
    "https://babylonbee.com/news/bill-clinton-allegations-of-sexual-misconduct-should-disqualify-a-man-from-public-office",
    "https://babylonbee.com/news/joel-osteen-launches-line-pastoral-wear-sheeps-clothing"
]

def scrape_bee_article(url):
    """
    Scrapes an article from a given URL on babylonbee.com and extracts relevant information.

    Parameters:
    ----------
    url : str
        The URL of the article to scrape.

    Returns:
    -------
    dict
        A dictionary containing the extracted article data.
    """
    article_data = {
        "title": "",
        "text": "",
        "site": "",
        "date": "",
        "category": "",
        "class": "Satire", #satire is hardcoded here as we know BabylonBee is a satire site
        "url": url
    }

    response = requests.get(url)
    if response.status_code == 200:
        soup = BeautifulSoup(response.content, 'html.parser')

        # Title
        title_meta = soup.find('meta', property='og:title')
        article_data["title"] = title_meta['content'] if title_meta else "Title not found"
        
        # URL
        url_meta = soup.find('meta', property='og:url')
        article_data["url"] = url_meta['content'] if url_meta else url  # Fallback to input URL
        
        # Site name
        site_name_meta = soup.find('meta', property='og:site_name')
        article_data["site"] = site_name_meta['content'] if site_name_meta else "Site name not found"
        
        # Published date       
        published_date_meta = soup.find('meta', {"name": "published_at"})
        if published_date_meta and published_date_meta.get("content"):
            article_data["date"] = published_date_meta["content"].split()[0]
        else: "Published date not found"
        
        # Category
        category_link = soup.find("a", href=lambda href: href and "/news/categories/" in href)
        if category_link:
            article_data["category"] = category_link.get_text(strip=True)
        else:
            article_data["category"] = "Category not found"
            
        # Article copy
        content_div = soup.find("div", class_="text-lg mt-6 leading-6 text-gray-700 article-content mx-2 sm:mx-0")
        if content_div:
            paragraphs = content_div.find_all("p")
            full_text = " ".join(p.get_text(strip=False) for p in paragraphs)
            article_data["text"] = full_text.strip()
        else:
            article_data["text"] = "Article text not found"
    
    else:
        print(f"Failed to fetch the webpage: {url}. Status code: {response.status_code}")
    
    return article_data


def scrape_multiple_bee_articles(urls):
    """
    Scrapes multiple articles from a list of URLs and stores the data in a DataFrame.

    Parameters:
    ----------
    urls : list
        A list of article URLs to scrape.

    Returns:
    -------
    pd.DataFrame
        A DataFrame containing the scraped data from all URLs.
    """
    articles = []
    for url in urls:
        article = scrape_bee_article(url)
        articles.append(article)
    return pd.DataFrame(articles)

# Scrape articles and create a DataFrame
bee_data_df = scrape_multiple_bee_articles(urls)
# Store to CSV
bee_data_df.to_csv("satire_scraped_articles_bee.csv", index=False)
# Print df 
bee_data_df.head()

Unnamed: 0,title,text,site,date,category,class,url
0,Trump: 'I Have Done More For Christianity Than Jesus',"WASHINGTON, D.C. - In response to the Christianity Today editorial calling for his removal, Trump called the magazine a ""left-wing rag"" and said, ""I have done more for Christianity than Jesus."" ""I mean, the name of the magazine is Christianity Today, and who is doing more for Christians today? Not Jesus. He disappeared; no one knows what happened to him. But I'm out there every day protecting churches from crazy liberals."" While Trump admitted that Jesus did do some things for Christianity in the past, Trump said he was doing more now and it was more substantial. ""I'm appointing judges to help protect religious rights,"" Trump stated. ""How many judges has Jesus appointed? He says something about judging people in the future, but I ain't seen it."" Furthermore, Trump asserted that he ""saved Christmas."" ""Look what I've done,"" he said. ""You can say 'Merry Christmas' now. In fact, if you say 'Happy Holidays' and don't immediately make it clear you're referring to Christmas, you go to prison. What has Jesus ever done for Christmas? Be born? He wants credit for that? Come on.""",The Babylon Bee,2019-12-23,Politics,Satire,https://babylonbee.com/news/trump-i-have-done-more-for-christianity-than-jesus
1,Senate To Be Replaced With Room Full Of Monkeys Throwing Feces,"WASHINGTON, D.C. - In an emergency, overnight referendum, the American people voted on Thursday to replace the United States Senate with a room full of monkeys throwing feces. The measure passed with 57% of the vote. 22% of voters thought the Senate should be replaced by barking seals, while 17% voted that the replacement should be the pit of venomous snakes from Indiana Jones. 3.97% voted that Senate members be replaced by screaming goats. ""About 100 people"" voted for the current Senators to keep their jobs, with this tiny voting bloc centered in Washington, D.C. Highland Ape Rescue out of West Virginia will be teaming up with Cornwell Primate farms to supply hundreds of monkeys and apes to the Senate. The animals will be fed a nutritious mixture of foods that produce easily throwable feces. Protective glass will be put up around the Senate for camera crews to safely film, but anyone being interviewed by the new senators will have to sit in the middle of the poo-flinging octagon, coming under a heavy barrage of projectile excrement. ""It will be a huge improvement from how things were before,"" said ape trainer, Marlena Henwick. ""No more 10-12 hour hearings. With these monkeys, all the fecal projectiles will have been flung in under 30 minutes. One and done."" The recently replaced senators will be placed on display at the National Zoo in Washington, D.C. for families to observe and zoologists to study.",The Babylon Bee,2018-09-28,Politics,Satire,https://babylonbee.com/news/senate-to-be-replaced-with-room-full-of-monkeys-throwing-feces
2,Motorcyclist Who Identifies As Bicyclist Sets Cycling World Record,"NEW YORK, NY - In an inspiring story from the world of professional cycling, a motorcyclist who identifies as a bicyclist has crushed all the regular bicyclists, setting an unbelievable world record. In a local qualifying race for the World Road Cycling League, the motorcyclist crushed the previous 100-mile record of 3 hours, 13 minutes with his amazing new score of well under an hour. Professional motorcycle racer Judd E. Banner, the brave trans-vehicle rider, was allowed to race after he told league organizers he's always felt like a bicyclist in a motorcyclist's body. ""Look, my ride has handlebars, two wheels, and a seat,"" he told reporters as he accepted a trophy for his incredible time trial. ""Just because I've got a little extra hardware, such as an 1170-cc flat-twin engine with 110 horsepower, doesn't mean I have any kind of inherent advantage here."" Banner also said he painted the word ""HUFFY"" on the side of his bike, ensuring he has no advantage over the bikes that came out of the factory as bicycles. Some critics say he needs to cut off his motor in order to make the competition fairer, but he quickly called these people bigots, and they were immediately banned from professional cycle racing.",The Babylon Bee,2019-10-25,Sports,Satire,https://babylonbee.com/news/motorcycle-that-identifies-as-bicycle-sets-world-cycling-record
3,Trump Announces He Has Hidden 5 Golden Tickets Among Stimulus Checks,"WASHINGTON, D.C. - Trump has built up a lot of buzz over the coming stimulus payments, saying he has hidden five golden tickets among the checks heading to Americans this week. Anyone who gets a golden ticket will win a free tour of Mar-a-Lago. Rumor has it that Trump will be watching them closely to see which of the winners has the qualities he looks for in a manager, with the best candidate getting hired as Mar-a-Lago's onsite McDonald's manager. ""Who will win? Nobody knows!"" Trump said gleefully as he carefully signed each of the golden tickets before hiding them among the stimulus checks. ""I, Donald Trump, have decided to allow five Americans - just five, mind you, and no more - to visit my resort this year. These lucky five will be shown around personally by me, and they will be allowed to see all the secrets and the magic of my hotel and golf resort -- the best golf, maybe ever. Then, at the end of the tour, as a special present, all of them will be given Season 1 of The Apprentice on DVD!"" ""So watch out for the Golden Tickets! Five Golden Tickets have been printed on golden paper, and these five Golden Tickets have been hidden in your stimulus checks. These five may be anywhere - in any mailbox in the country. And the five lucky finders of these five Golden Tickets are the only ones who will be allowed to visit my Mar-a-Lago during the lockdown. Good luck to you all!"" Unfortunately, he put all five golden tickets in a stimulus envelope addressed to Jim Acosta.",The Babylon Bee,2020-04-15,Politics,Satire,https://babylonbee.com/news/trumps-says-5-golden-tickets-to-be-hidden-among-stimulus-checks
4,NBA Players Wear Special Lace Collars To Honor Ruth Bader Ginsburg,"LOS ANGELES, CA - NBA players are honoring the life of Ruth Bader Ginsburg this week by wearing pretty lace collars just like Notorious RBG used to wear. In a touching show of respect for the late Justice Ginsburg, and in solidarity with her progressive cause, Lebron James and the LA Lakers took to the court yesterday wearing a stunning variety of delicate white collars inspired by RBG's wardrobe. According to several commentators on ESPN, the virtual teleconference crowd fell silent in reverent awe as the players all knelt down and chanted ""RBG! RBG! RBG!"" ""Yeah, RBG was an amazing person,"" said LeBron James after the game. ""I have her biography right here and I totally read it right before the game. She was a judge. That's cool, I respect that. Judges judge things and not everyone can do that. She believed in Black Lives Matter and being on the right side of history and stuff."" Power forward Anthony Davis also expressed his happiness with the collars. ""It's good to honor her today with these lacey things. Commissioner Adam Silver and President Xi Jinping told us to wear them so we did. I just took this little doily thing from under a table lamp at my mom's house and cut a hole in the middle. Easy."" NBA players are vowing to wear the collars until Trump is removed from office, or until angry rioters burn their basketball arenas down, whichever comes first.",The Babylon Bee,2020-09-22,Politics,Satire,https://babylonbee.com/news/nfl-to-adorn-all-uniforms-with-lace-doilies-in-to-honor-rbg
5,"In Bold Anti-Trump Statement, Pelosi Rips Up Bible","WASHINGTON, D.C. - In a bold, powerful statement to oppose Trump, Speaker of the House Nancy Pelosi solemnly tore up the Bible after Trump was seen holding one up in front of a church. At a press conference, the Speaker of the House held up a Bible and then ripped it in two, declaring that she was against anything Trump was associated with. ""If Trump is for the Bible, then I am against it,"" she said as she struggled to rip the Bible in half. Finally, aides came to intervene, pre-ripping the spine of the Bible so it would be easier for her to tear. ""All the books of the Bible are bad: Genesis, Joseph, the one with the big fish, even Hezekiah. We must stand against Trump's bigotry by ripping up anything he claims to be for."" ""Yass, queen! Slay!"" shouted her fans at the press conference as she finally managed to rip the Bible up. ""You're my president!"" In a genius move, Trump then held up a Koran in front of a mosque, forcing Pelosi to tear up a Koran and alienate the left.",The Babylon Bee,2020-06-03,Politics,Satire,https://babylonbee.com/news/pelosi-rips-up-bible
6,Biden Cuts Hole In Mask So He Can Still Sniff People's Hair,"WASHINGTON, D.C. - Joe Biden has committed to wearing a mask in public to be a good example and to prevent the spread of COVID-19. Aides were disappointed and a little frightened, however, when Biden immediately cut a large hole in the middle of the mask so he could continue to invade people's personal space and sniff their hair, necks, and faces. Staffers usually don't let Biden play with sharp objects, but he managed to find some safety scissors stashed behind the Metamucil in his campaign bus. Using the purple plastic scissors, he cut a large hole and then fitted the mask to his face, confident that he was protecting himself and others from the virus. ""That's better,"" he said as he cut a big hole for his schnoz. ""Now I'm protecting against infection and I'm still able to give the ladies a good sniff. You know, in my day, I wore a mask just like this, as was the fashion at the time. All the kids at the pool would ask to play with the mask, and they'd run their fingers through it. In fact, one time, a gangster named CornPop was about to go cause some trouble at the sock hop, and I put some rocks in my mask and started swinging it around like a sling. You know, real Daniel and Goliath type stuff. He looked at me, tears in his eyes, and promised never again to go out and cause a ruckus."" ""Anyway, that's why I'm your best choice for senator of the Roman Empire. Vote for Joe!"" Biden suddenly came to and realized he was standing in a Walmart parking lot talking to a hobo.",The Babylon Bee,2020-04-09,Politics,Satire,https://babylonbee.com/news/biden-cuts-holes-in-medical-mask-so-he-can-still-sniff-people
7,Man Identifying As 6-Year-Old Crushes Game-Winning Homer In Tee-Ball Championship,"AUBURN, CA - Local 36-year-old man Nate Ripley, who identifies as a six-year-old, ""absolutely crushed"" a game-winning homer at a local tee-ball game and won the championship for his team Monday evening, reports confirmed. Ripley reportedly walked up to the plate in the bottom of the 6th, pointed his bat toward the left-field wall looming 130 feet in the distance, and let her rip, sending the ball rocketing over the fence and into a parking lot as the fans cheered and his coach yelled out, ""Attaboy, Nate! Good job, bud!"" His team, the Lil' Padres, attempted to hoist him up on their shoulders in celebration of their great victory over the favored Tiny Tigers, but were unable to pick up the large 230-pound man. Ripley's feat comes at the end of a momentous tee-ball season, in which the self-identified six-year-old absolutely shattered every record set prior to that point. With a 1.000 batting average, 52 home runs, and an incredible showing at first base, second base, shortstop, third base, and pitcher, the man is being called an inspiration to other six-year-olds everywhere. ""I'm just proud to be here with my team. It's all for the love of the game,"" an emotional Ripley told reporters while enjoying an orange slice and juice box after the championship. ""I couldn't have done it without my team.""",The Babylon Bee,2017-06-06,Lifestyle,Satire,https://babylonbee.com/news/man-identifying-6-year-old-crushes-game-winning-homer-tee-ball-championship
8,Biden: 'I Am The Only Candidate Who Can Beat Ronald Reagan',"HOUSTON, TX - Fresh off his afternoon nap, presidential candidate Joe Biden gave a fiery, high-energy speech in Houston today, claiming to be the only candidate who could beat incumbent Ronald Reagan. ""I am the only candidate who can unite the party to defeat Reagan,"" he said to scattered applause. ""When Super Thursday hits here in a few weeks, we can rally the 150 million Democrats here in the great country of Texas to vote for me so we can get Reagan and his crony Dick Cheney off the Iron Throne there in the Imperial Senate. Go Hoosiers!"" Aides scrambled to turn off Biden's mic but he beat them away with his walker. ""The time has come for the reign of Tippecanoe and Tyler too to end!"" he shouted, though by this point he had wandered into a nearby field and no one could hear him.",The Babylon Bee,2020-03-02,Politics,Satire,https://babylonbee.com/news/biden-i-am-the-only-candidate-who-can-beat-ronald-reagan
9,Fisher-Price Releases 'My First Peaceful Protest' Playset With House You Can Actually Burn Down,"EAST AURORA, NY - The toy geniuses at Fisher-Price have announced a brand new toy made just for leftist parents and their kids: the My First Peaceful Protest playset. The kid-size clubhouse will come with several varieties of spray paint so kids can tag the tiny building with their own empowering slogans. It will also be made out of cardboard, allowing the cute little tikes to burn the whole thing down if their demands are not met. ""Here at Fisher-Price, we are steadfastly committed to social justice,"" said toy designer Camden Flufferton. ""We need to teach our kids what democracy looks like, and there's no better example of democracy in action than violent vandalism and arson. We hope this new playset will serve as an inspiration for parents wanting to teach their kids how to threaten citizens with violence whenever their demands are not met."" The set will also come with toy televisions, cell phones, jewelry, and clothing, allowing kids to simulate looting before they torch the entire set. The set will be available in stores for $399 because of capitalism. Experts are questioning the wisdom of this move by Fisher-Price, mainly because people in the target market don't typically have any kids. ""We know we'll probably only sell, like, 3 of these,"" said Flufferton, ""but selling them isn't the point. We just need you to know we're on the right side of history.""",The Babylon Bee,2020-09-21,Politics,Satire,https://babylonbee.com/news/fisher-price-introduces-supreme-court-protest-playhouse-that-can-be-vandalized-and-burned-down


**The Daily Squib**

45 articles were taken from the "Most Popular" page: https://www.dailysquib.co.uk/category/most-popular

In [4]:
urls = [
    "https://www.dailysquib.co.uk/entertainment/58255-bonus-good-newsfor-american-men-unhinged-batsht-crazy-liberal-women-going-celibate.html",
    "https://www.dailysquib.co.uk/world/56316-rachel-maddow-concerned-trump-will-put-her-in-fema-camp-during-second-term-yes-im-worried.html",
    "https://www.dailysquib.co.uk/entertainment/55825-im-a-cruise-ship-worker-these-are-the-six-things-smart-passengers-always-do-onboard.html",
    "https://www.dailysquib.co.uk/world/56106-labour-plan-to-have-speakers-and-listening-devices-on-every-lamp-post.html",
    "https://www.dailysquib.co.uk/sci_tech/55812-personal-computers-and-smartphones-were-introduced-for-benefit-of-ai.html",
    "https://www.dailysquib.co.uk/entertainment/55590-analysis-was-prince-harry-making-a-statement-in-latest-address.html",
    "https://www.dailysquib.co.uk/world/54018-interconnected-the-internet-only-creates-war-for-humanity.html",
    "https://www.dailysquib.co.uk/world/48511-world-economic-forum-brutal-totalitarian-communist-china-is-model-for-western-nations.html",
    "https://www.dailysquib.co.uk/entertainment/48277-matt-hancock-found-with-huge-amounts-of-midazolam-in-jungle.html",
    "https://www.dailysquib.co.uk/world/47729-the-beginning-of-the-post-consumerist-era.html",
    "https://www.dailysquib.co.uk/world/45624-netflix-harry-and-meghan-enjoy-themselves-exploiting-disabled-veterans-for-cash.html",
    "https://www.dailysquib.co.uk/entertainment/41634-experts-meghan-markle-thought-she-could-move-up-rank-in-royal-family.html",
    "https://www.dailysquib.co.uk/entertainment/41343-boo-hoo-you-made-me-cry.html",
    "https://www.dailysquib.co.uk/world/41322-is-harry-now-a-national-security-threat.html",
    "https://www.dailysquib.co.uk/entertainment/41208-first-transgender-woman-crowned-miss-minnesota-2021.html",
    "https://www.dailysquib.co.uk/entertainment/41200-meghan-markle-endures-bird-shit-trauma-during-oprah-interview.html",
    "https://www.dailysquib.co.uk/world/41113-meghan-markle-demands-english-county-of-sussex-is-moved-to-california.html",
    "https://www.dailysquib.co.uk/entertainment/39995-queen-meghan-and-king-harry-of-america-knight-their-gardener.html",
    "https://www.dailysquib.co.uk/world/39382-hunter-biden-i-cant-wait-to-move-into-white-house-to-smoke-crack-rocks.html",
    "https://www.dailysquib.co.uk/world/39255-obama-and-hunter-biden-sold-out-america-to-the-highest-bidder-china.html",
    "https://www.dailysquib.co.uk/world/39205-the-biden-incest-plot-thickens.html",
    "https://www.dailysquib.co.uk/entertainment/38604-keeping-up-with-the-sussexes-netflix-series-coming-in-december.html",
    "https://www.dailysquib.co.uk/world/38599-trump-to-open-presidential-library-of-authors-books-written-about-how-bad-he-is.html",
    "https://www.dailysquib.co.uk/entertainment/38512-meghan-is-imitating-diana-after-years-of-study.html",
    "https://www.dailysquib.co.uk/entertainment/38263-another-tedious-megan-markle-lecture-to-the-unwashed-masses.html",
    "https://www.dailysquib.co.uk/entertainment/38146-meghan-markle-pees-in-extensive-gardens-instead-of-using-16-bathrooms.html",
    "https://www.dailysquib.co.uk/world/38067-ironic-that-blm-antifa-heroes-karl-marx-and-engels-thought-blacks-closer-to-animal-kingdom-than-whites.html",
    "https://www.dailysquib.co.uk/world/38023-why-is-michelle-obama-so-depressed.html",
    "https://www.dailysquib.co.uk/world/37706-blm-campaign-failure-are-black-people-more-hated-now-than-before-riots.html",
    "https://www.dailysquib.co.uk/world/37612-mount-rushmore-presidents-could-be-replaced-by-blm-and-metoo-founders.html",
    "https://www.dailysquib.co.uk/world/37341-blm-doing-to-white-people-what-the-nazis-did-to-jews-dehumanizing-them.html",
    "https://www.dailysquib.co.uk/world/37294-george-floyd-to-be-sainted-by-pope-in-america.html",
    "https://www.dailysquib.co.uk/world/37236-intelligence-china-encouraging-blm-antifa-rioters-across-u-s-cities.html",
    "https://www.dailysquib.co.uk/world/36763-civilian-harry-misses-his-old-life-and-regrets-listening-to-meghan-markle.html",
    "https://www.dailysquib.co.uk/world/36586-meghan-markle-appealing-to-trump-to-end-coronavirus-pandemic-because-her-headlines-are-gone.html",
    "https://www.dailysquib.co.uk/entertainment/36508-archehole-harry-and-meghan-reveal-new-money-making-venture.html",
    "https://www.dailysquib.co.uk/world/35810-coronavirus-cui-bono-who-benefits.html",
    "https://www.dailysquib.co.uk/world/35975-meghan-reveals-harry-suffers-from-post-traumatic-royal-disorder.html",
    "https://www.dailysquib.co.uk/world/35896-will-harry-ever-forgive-meghan-for-her-crime.html",
    "https://www.dailysquib.co.uk/entertainment/35738-disney-could-replace-meghan-markle-with-plank-of-wood.html",
    "https://www.dailysquib.co.uk/entertainment/35629-defiant-meghan-markle-vs-windsor-royal-family.html",
    "https://www.dailysquib.co.uk/entertainment/35605-thomas-markle-bans-meghan-and-harry-from-using-markle-brand.html",
    "https://www.dailysquib.co.uk/world/35419-chinese-water-supply-contains-faecal-matter-aiding-spread-of-coronavirus.html",
    "https://www.dailysquib.co.uk/world/35341-chinese-authorities-misreporting-coronavirus-deaths.html",
    "https://www.dailysquib.co.uk/world/35315-remainer-tears-to-be-used-to-generate-electricity-for-britain.html",
]

def scrape_squib_article(url):
    """
    Scrapes an article from a given URL on dailysquib.co.uk and extracts relevant information.

    Parameters:
    ----------
    url : str
        The URL of the article to scrape.

    Returns:
    -------
    dict
        A dictionary containing the extracted article data.
    """
    article_data = {
        "title": "",
        "text": "",
        "site": "",
        "date": "",
        "category": "",
        "class": "Satire", #satire is hardcoded here as we know the Daily Squib is a satire site
        "url": url
    }

    response = requests.get(url)
    if response.status_code == 200:
        soup = BeautifulSoup(response.content, 'html.parser')

        # title
        title_meta = soup.find('meta', property='og:title')
        article_data["title"] = title_meta['content'] if title_meta else "Title not found"
        
        # URL
        url_meta = soup.find('meta', property='og:url')
        article_data["url"] = url_meta['content'] if url_meta else url  # Fallback to input URL
        
        # Site name
        site_name_meta = soup.find('meta', property='og:site_name')
        article_data["site"] = site_name_meta['content'] if site_name_meta else "Site name not found"
        
        # Published date        
        published_meta = soup.find("meta", property="article:published_time")
        if published_meta and published_meta.get("content"):
            article_data["date"] = published_meta["content"].split("T")[0]
        
        # Category
        category_div = soup.find("div", class_="tdb-category td-fix-index")
        if category_div:
            cat_links = category_div.find_all("a", class_="tdb-entry-category")
            if cat_links:
                categories = [
                    #ignore "most popular"
                    a.get_text(strip=True) for a in cat_links if a.get_text(strip=True).lower() != "most popular"
                ]  
                #if multiple categories, return the first
                article_data["category"] = categories[0]
            else:
                article_data["category"] = "Category not found"
        else:
            article_data["category"] = "Category not found"

        # Extract the article text
        content_div = soup.find("div", class_="td-post-content")
        
        if content_div:
            # remove blockquotes (e.g. embedded tweets)
            for bq in content_div.find_all("blockquote"):
                bq.decompose()
            paragraphs = content_div.find_all("p")
            full_text = " ".join(p.get_text(strip=False) for p in paragraphs)
            article_data["text"] = full_text.strip()
        else:
            article_data["text"] = "Article text not found"
    
    else:
        print(f"Failed to fetch the webpage: {url}. Status code: {response.status_code}")
    
    return article_data


def scrape_multiple_squib_articles(urls):
    """
    Scrapes multiple articles from a list of URLs and stores the data in a DataFrame.

    Parameters:
    ----------
    urls : list
        A list of article URLs to scrape.

    Returns:
    -------
    pd.DataFrame
        A DataFrame containing the scraped data from all URLs.
    """
    articles = []
    for url in urls:
        article = scrape_squib_article(url)
        articles.append(article)
    return pd.DataFrame(articles)

# Scrape articles and create a DataFrame
squib_data_df = scrape_multiple_squib_articles(urls)
# Store to CSV
squib_data_df.to_csv("satire_scraped_articles_squib.csv", index=False)
# Print df 
squib_data_df

Unnamed: 0,title,text,site,date,category,class,url
0,Bonus Good News For American Men - Unhinged Batsh*t Crazy Liberal Women Going Celibate,"Sometimes good news comes in floods of joy and happiness, and this is the case for American men as the batshit crazy, deranged blue-haired pierced and heavily tattooed liberal narcissistic women who voted for Kamala Harris have vowed to go celibate and abstain from men. It’s double plus good news for the gene pool as these selfish, entitled, mentally ill, attention seeking harridans won’t thankfully reproduce. “For men, this makes things easier by taking psycho hose beasts out of the equation, only leaving the good women. The Trump win has truly been a wondrous event, as it effectively cleanses America of the shitty things, and brings back some form of goodness and purity to the country,” a college student from Nebraska revealed. These liberal women have been so ideologically brainwashed and mentally damaged by the Kamala Harris campaign that they are going on TikTok to publicly display their total and utter shameful, demoralised mental conditions of utter indignity to everyone. The ironic thing is, these dumbos only sleep with liberal men who voted for Kamala as well, so they’re taking them out of the gene pool too — triple fucking bonus. No need for your much-loved right to have abortions any more — quadruple fucking bonus!!!",Daily Squib,2024-11-08,Entertainment,Satire,https://www.dailysquib.co.uk/entertainment/58255-bonus-good-newsfor-american-men-unhinged-batsht-crazy-liberal-women-going-celibate.html
1,"Rachel Maddow Concerned Trump Will Put Her in ‘FEMA Camp’ During Second Term: ‘Yes, I’m Worried’","Clinically insane MSNBC host Rachel Maddow has expressed serious fearmongering concerns that she and millions of other clinically insane American woke socialist liberals would be interned in a “camp” when former President Donald Trump wins a second term in the White House this November. During an interview with Maddow in Monday’s Unreliable Sources newsletter, CNN Democrat propagandist Benson Burner asked the MSNBC host about her concerns about being targeted during a second Trump administration. “Trump and his allies are openly talking about doing the same thing the Democrats have unjustly done to him. Weaponising the government to seek revenge against critics in media and politics, with some of his extremist allies even talking about jailing the ‘treacherous and treasonous scum’,” noted Burner. “You’re one of his most notable critics on television. Are you worried that you could be a target?” Rachel Maddow replied: “I’m worried, but I’m actually ready for being in a camp because I’m a bleeding heart liberal propagandist for the Democrat Party. I don’t read the news in an objective fashion or without obvious bias in any way. When Trump invokes the Insurrection Act to deploy the U.S. military against civilians on his first day in office, I will be cheering like a cheer leader because it plays into my perpetual liberal victim state of mind. Also, when Trump imprisons me in a FEMA concentration camp, I will be able to play the part of martyr, and virtue signal to my fellow liberal socialist Americans of my suffering for the cause of socialism and communism in America. “When Trump puts millions of blacks, criminals, Mexicans, gays, and migrants into the concentration camps, I will be happy because it would have proved my point that my scaremongering before the election actually did not work and millions of Americans voted for Donald J. Trump anyway. “In the camps of the future it won’t be so bad either, there will be plenty of women for me to become friends with. I hope Trump puts me in one of those all female camp buildings. I’ll be up to my eyeballs in pussy. Really, there’s nothing to fear folks, everything I say on MSNBC is absolute bullshit, and I am essentially an actor. Anyone who takes me seriously must be as mad as I am.” The ‘Trump Derangement Syndrome’ seems to be in full force before the coming U.S.elections.",Daily Squib,2024-06-12,World,Satire,https://www.dailysquib.co.uk/world/56316-rachel-maddow-concerned-trump-will-put-her-in-fema-camp-during-second-term-yes-im-worried.html
2,I'm a Cruise Ship Worker...These are the SIX Things Smart Passengers Always do Onboard,"A cruise ship worker has shared the six things that ‘smart’ passengers always do when they come aboard for their holiday. Janice Munklehouse, 21, from Grimisbury on Thames, Lincolnshire worked on cruise ships for 38 years, and regularly shares advice for passengers and crew on her Cruising Da Seas Innit YouTube channel. Now, she has shared how passengers can make the most of their voyage with her own expert tips. Read on below for the six top tips she had for cruise passengers. Don’t fall overboard whilst boarding the ship Kicking off with the first piece of advice to get ahead of the game while travelling on a cruise, Janice said people should always avoid falling overboard whilst boarding the cruise ship. She said that, it is best to arrive at the destination of your cruise embarkation point “as early as possible”, suggesting a late arrival may mean you rush to board the ship loaded with luggage and fall into the sea. While the cruise ship worker said that many people avoid drowning or being crushed by other boats in the harbour, she has seen many passengers falling off the boarding ramp in a rush to board the ship. “One fella comically tried to jump over 17 feet because he was late for the cruise. He was summarily chewed up in the propellers, but his cruise ticket and hairbrush were thankfully recovered by a fisherman 6 months later in the stomach of a Mako shark.” Don’t jump off the ship when it’s cruising Moving on to her next tip, Janice encouraged passengers to not jump off the ship when it was cruising in the ocean. Janice said jumping off the cruise ship whilst moving would not be very nice for the passenger. She said that: “Jumping into the ocean from your ultra-luxury cruise line would mean that you miss a lot of activities on board like karaoke, disco night and various themed buffets”. She said that, while it may be tempting to jump from the deck of your cruise ship to avoid all the old codgers and nouveau riche arseholes, bad things could happen to you in the open ocean hundreds of miles from shore. Don’t fall off the ship The cruise worker highlighted the potential dangers of falling off the ship. If passengers wear inadequate footwear or lean over the edge of the ship, they could slip and fall into the ocean, which would not be such a good cruise ship experience. Don’t get pushed off the ship Janice’s fourth piece of advice for how to be a smart passenger when going on a cruise concerned not being pushed off the ship. She urged anyone looking to enjoy their holiday cruise to avoid being pushed over the side of the ship into the ocean by some nutter on board or an angry crew member. During her 38 years working on cruise ships, Janice says she has witnessed a number of crew members who have had to push annoying passengers overboard for being arseholes. She warned that many passengers end up in the ocean because the crew have simply had enough. There’s only so much a crew member can take, so don’t be an arse and get on everyone’s tits. Don’t fall out of your cabin window The cruiser worker’s penultimate tip of six urged holidaymakers not to open the cabin window and hang over the edge, especially when drunk out of their minds on cheap watered-down booze. Elaborating on this point, she suggested it would always be best to simply use the cabin window to look at the ocean and not to jump into the water from it. Janice remarked: “You may run the risk of getting eaten by sharks, or simply drowning”. Don’t fall off the ramp whilst disembarking Her final piece of advice for how to be a smart passenger when disembarking a cruise ship concerned not falling off the ramp into the polluted, dangerous waters at port. Janice suggested this could be potentially very dangerous for the passenger. She said: “The reason you want to decide on these things before you disembark is because if you fall overboard, things could get rather messy. I once saw an elderly passenger ingest an entire humungous turd in one gulp when she fell in the water near a sewage outflow pipe”. The veteran cruise worker also said that avoiding falling or jumping off a cruise ship should be a priority for most passengers, but naturally some were just born to do it. Once passengers disembarked, Janice admitted she and the crew were glad all the whinging fuckers had gone and hoped they would never be seen again — that is, until the next batch boarded the cruise ship for a trip they would surely regret for a lifetime.",Daily Squib,2024-05-12,Entertainment,Satire,https://www.dailysquib.co.uk/entertainment/55825-im-a-cruise-ship-worker-these-are-the-six-things-smart-passengers-always-do-onboard.html
3,Labour Plan to Have Speakers and Listening Devices on Every Lamp Post,"The public will be lectured on “motivational” socialist principles, and Labour diktats every day of their lives, a newly published manifesto paper has revealed. Speakers and microphones will be installed in every lamp post in the UK proposed by the innovative Labour plan. Much like the Mayor of London, Sadiq Khan’s pet surveillance project, ULEZ, where every vehicle is tracked and charged in London, the proposed Labour “Listen and Speak” scheme will ensure that citizens are daily indoctrinated in soviet ideology and microphones will listen for any form of dissent against the ruling regime once it gains power in the coming election. LISTEN AND SPEAK Upon hearing of the Labour plans, one citizen voiced his distaste of such a scheme coming into fruition. “If I want to live in fucking North Korea, I’ll go and live there. Imagine walking down the street and being forced to listen to the irritating nasal droning from Comrade Starmer every fucking day of your life, listening to his awful grinding punishing nasal voice telling you how to think, what to do, where to go, I’d fucking top myself.” Along with daily lectures on the greatness of Labour socialist schemes, citizens will be indoctrinated in EU values and other communist rhetoric. If people are seen to be wearing headphones whilst walking in the streets, they will be told to take the device off from their heads, or if they are Bluetooth headphones, the Labour “Listen and Speak” system will hijack the Bluetooth headphones to force the citizen to listen to the latest soviet Labour messages being broadcast. Along with speakers installed on every lamp post, the Labour plan is to also install powerful microphones that will monitor each citizen’s speech. The listening devices will be powered by AI and will alert the Labour Stasi authorities if any citizen speaks adversely about the Labour regime at any time or says any word that is forbidden by woke programmers who have infiltrated the English language. “If someone says any forbidden words or speaks badly about the Labour soviet system, the AI system will identify the offender, who will then be removed from their home in the early hours of the morning. These offending individuals will then be sent to an EU sanctioned re-education centre and reprogrammed to love the Labour EU State.,” a jubilant Labour spokesman revealed on Thursday. Comrade Starmer pronounced the scheme as a measure to “safeguard and ensure the safety of every British citizen” and a way to uphold “the beloved EU rules which Labour is dedicated to rejoining”.",Daily Squib,2024-05-30,World,Satire,https://www.dailysquib.co.uk/world/56106-labour-plan-to-have-speakers-and-listening-devices-on-every-lamp-post.html
4,Personal Computers and Smartphones Were Introduced For Benefit of AI,"You may think there were some benevolent reasons for rolling out and introducing the personal computer and smartphone to the civilian population, and yes there were some, but the majority of reasons were far from that. One has to understand that the controllers work in 50 and 100 year increments and their primary modus operandi is one that may either surprise you or deep down your sub-conscious probably already knows what is going on. What is the last bastion of control over the human population? The inside of your mind, of course, this has eluded the controllers for centuries. Even the Spanish Inquisition could not get close to that level of knowledge or control, or the Nazis, or the Soviet communist dictators or the religious organisations. As a control system, religion has wavered and is not as powerful a tool for complete control any more, and this is why the controllers needed access to your most intimate thoughts, your thought cycles, as well as your very methods of thought. This process would need machines that replicate human thought to some extent, and what better way than a personal computer touted as a way to enhance human activity. Even programming languages effectively replicate human thought processes to some extent with variables, strings and multiple processing architectural archetypes that are the basic structure of 1010101, the universal on-off switch for every permutation of every possible combination of mathematical and human existence. Who’s programming who, the human on the computer, or the computer on the human? Essentially speaking, the personal computer introduced to the public was a first major step into delving into the minds of the population, giving the controllers a basic map of the internal minds of humans. The next step was connectivity, and this is when the DARPA project of an internet was introduced to the general public. All of this trained internal data had to move around, it had to evolve and of course it had to be collected and filed by the controllers in their massive database banks. The internet allowed the controllers to see what people liked/disliked, it allowed them to delve into the darkest secrets of human activity as well as the thought processes and decisions people made in their lives. Every single facet of human behaviour was intricately analysed, logged and filed and in the present time it still is right now. The smartphone was then introduced as an additional form of ultimate human control. This technique was a goldmine of information for the benefit of AI systems because it formed a much more intimate picture of human activity and behaviour simply because of its small size. A mobile phone is easy to carry and is with humans pretty much all the time, whereas a bulky personal computer is generally not with a person at all times. The vast amount of data collected through this method is too vast to even comprehend for most people, but smartphones along with things like apps are a vast treasure trove of data helping the controllers map out the human brain and its collective methods. People cannot do without their smartphones now, they are totally addicted and attached to them. Studies have now shown that by taking smartphones away from some people who are then put into a room alone, results in them self-harming themselves, such is the level of control over their entire being and mind. Human data to benefit of AI systems AI will fully understand and replicate the human mind. It does not need to sleep, it does not need wage rises or maternity leave or holidays. There are no industrial disputes with AI, there are no sick days or loss of productivity. This is why AI was fed the entirety of human data because humans will be replaced by these systems soon enough as is the plan by the controllers. To fully control something, first you must completely understand every facet of it. Mapping out every single dendrite, synapse and connection of the human brain is another major project currently underway. What do you do with the entire data set of the human mind; the books, the literature, the behaviour analytics, the thought processes, the creativity, the emotions, the biases, the infinite variables and combinations of discourse etc.? You feed it all into AI machine learning projects, and this is the key factor in all the control processes envisioned by the controllers. This is all set up for the benefit of AI systems. For thousands of years they have dreamed of this very moment because they have been the few and the population has been the many, they have feared greatly of losing their grip on humanity, of losing their position of control. This is why companies like Apple recently produced an advert for their new iPad Pro with an M-chip that depicts the entire breadth and width of human culture, creativity, and art being crushed by a rubbish disposal machine. This depiction signals the final rallying call that machines have ultimately superseded the human experience and this is just the beginning of the end for the traditional biological state of humanity. In the future, when the brain chip is introduced to the entire population, it will be the final step of ultimate control. By then they will have mapped out the human brain in its entirety, and the controllers will gain direct access into every single thought and memory of each human. As is today, humans cannot function in business or anything without a smartphone, and this will be the method used for the brain chip as well. Elon Musk, a sinister deviant character, is tasked with the initial rollout of this technology, but there are others in the pipeline right now as well if he fails. Ultimately, humanity is on the cusp of a major epoch regarding the benefit of AI systems, a time of change so extreme that may bring back a state of feudalism once again but this time it will be an all encompassing form of technological feudalism and slavery incorporating complete control of the last bastion of human control — the brain. UPDATE – May 24 Looks like others are realising what the Daily Squib has been talking about for years. https://www.barrons.com/news/ai-relies-on-mass-surveillance-warns-signal-boss-20280d0a",Daily Squib,2024-05-11,Sci/Tech,Satire,https://www.dailysquib.co.uk/sci_tech/55812-personal-computers-and-smartphones-were-introduced-for-benefit-of-ai.html
5,ANALYSIS: Was Prince Harry Making a Statement in Latest Address?,"Yes, Prince Harry was a gunner on an Apache helicopter in Afghanistan for 6 weeks, and he served his country briefly, but royal fans have been questioning his ‘ludicrous’ display of medals for his brief stint in the military in his latest video address. “Arse!” Royal fans were up in arms when they saw a blurred, wonky amateur video of the prince on his porch in Montecito wearing medals all over a cheap looking civilian suit. One Twitter X user commented: “What a moron! Prince Harry looks like he is going to fall over from the weight of all the medals he never earned” Another X user revealed: “He looks like a prize ninny! Who’s he trying to fool?” However, others defended the duke, with one user Laquisha46 commenting: “Harry is wearing his medals proudly, which our African princess Meghan probably purchased for him in a flea market in Santa Monica. Leave them alone, you bunch of bullies. Love you Meghan.” The consensus was of ridicule for the wayward prince who has admitted to heavy use of class-A drugs like cocaine, mushrooms, meth and marijuana, allegedly including when he served. Expert analysis Royal expert and body language expert Arbuthnot Bollsaque commented on the BBC that “Harry is showing defiance for being stripped of his right to wear a military uniform by displaying the plethora of medals all over his cheap looking suit”. Harry was stripped of his honorary military titles by late Queen Elizabeth II after he stepped down from royal life in 2020 and moved to California with Meghan Markle.",Daily Squib,2024-04-27,Entertainment,Satire,https://www.dailysquib.co.uk/entertainment/55590-analysis-was-prince-harry-making-a-statement-in-latest-address.html
6,Interconnected: The Internet Only Creates War For Humanity,"When humans become too interconnected, they reject this state through war because too much connection creates a reaction of disconnection and a run to privacy. Our inherent differences are accentuated to a point that eventually sows the seeds for conflict on and off the internet. We are told day in day out that humans have to communicate with each other constantly through smartphones, the internet, social media, but no one has factored in the consequences of such levels of communication because they were blinded by the revenue made from connectivity overload. Of course, humans need some level of communication and connection with each other, but the situation we are at now is at a level of overload. When was the last time you sat at a desk and wrote a letter to someone on a piece of paper? When was the last time you had a real conversation with someone face to face, and not impersonal cold digital words with a stranger on the internet who interacts with thousands of other impersonal messages every day? This is the ultimate paradox that through more connectivity humans have grown further apart from each other and there is a very good reason humankind will head to complete global war soon. One of the reasons for rejection of the internet beast is the loss of privacy, of space, of real human contact, of community, of family and of love. The digital beast is a cold hard obtuse world of code with no warmth and human eye to eye, face to face connection. Zoom conferences do not substitute reality in any way but present a cold digital representation of human connectivity that is ultimately soulless. The prying incessant eyes of communication now delve into our lives with an extreme hunger for data on every part of our existence, and daily our loss of privacy is getting worse. Humans need privacy, but now every form you fill in for anything requires invasive data to be revealed that infringes on our rights as humans. Every part of your life is now digitized and filed/sold. All of this unhappiness naturally builds up to a crescendo, not only with individuals, but entire nations are soaked with this spam world malaise that infects every part of our existence. If you know that much data about your enemies then you know where to attack, where their weaknesses are and how to destroy your enemy with the most effectiveness. This level of interconnectedness will eventually backfire because it is now intrusive and one could even say a form of bullying to eek out every piece of information from people just wanting to live their lives in peace.",Daily Squib,2023-12-07,World,Satire,https://www.dailysquib.co.uk/world/54018-interconnected-the-internet-only-creates-war-for-humanity.html
7,"World Economic Forum: ""Brutal Totalitarian Communist China is Model For Western Nations""","According to the World Economic Forum, which dictates to Western nations on policy, and future direction, the brutal bullying thugs of the Chinese Communist Party and its evil policies of totalitarian savagery are a model for Western nations to follow. This is why we are seeing more censorship and totalitarian behaviour by Western Big Tech companies who are following China’s CCP policies closely. World Economic Forum founder and Chair Klaus Schwab recently sat down for an interview with a Chinese state media outlet and proclaimed that China was a “role model” for other nations. Schwab, 84, made these comments during an interview with CGTN’s Tian Wei on the sidelines of last week’s APEC CEO Summit. Schwab said he respected China’s “tremendous” achievements at modernizing its economy over the last 40 years. “I think it’s a role model for many countries,” Schwab said. “I think we should be very careful in imposing systems. But the Chinese model is certainly a very attractive model for quite a number of countries,” Schwab said. It’s not only the WEF who adore the CCP, but so does the UN, who wants all Western nations to model China. The EU also sees China as a model of its Soviet bloc of nations. They envisage a modern communist state where brutal torture of citizens, and heavy-handed communist commissars mete out daily punishments on the people. Totalitarian bullying regimes like China are idolised by the globalists, who have become very rich off the backs of Chinese slave workers. THE UNITED NATIONS OF CHINA: A VISION OF THE WORLD ORDER by European Council of Foreign Relations You will realise what these people have in store for you when the Citizen Social Credit system that China uses on its citizens already is introduced over in Western nations. This is what the EU, WEF and UN want introduced into the West to complete a totalitarian net of control over the population even more than is already present. You may think surveillance is already pretty bad in Western nations, but wait until the Chinese surveillance model is adopted. Big Tech companies are already preparing the framework to introduce a Chinese social credit model as well as increased Chinese CCP style surveillance. Since 2013, Big Tech has been tightening its deadly noose on all civilians, and it is only going to get worse. Looks like it’s forced CCP anal swabs and brutal poverty for everyone, that is, except for the globalist hierarchy in their mansions. A celebrated member of the globalist World Economic Forum (WEF) has called for a staggering 86 percent reduction in the population of humans, arguing that the goal can be achieved “peacefully.” Dennis Meadows, one of the main authors of the Club of Rome’s 1972 pro-depopulation book “The Limits to Growth.” Meadows argues that most of the world’s population must be wiped out so that the survivors can “have freedom” and a “high standard of living.” During a 2017 interview, Meadows claims that genocide of 86% of the world’s population is “inevitable.” However, he insists that a “benevolent” dictatorship could accomplish the mass de-population “peacefully.” “We could [ ] have eight or nine billion, probably,” he says of the world’s growing population. “If we have a very strong dictatorship which is smart … and [people have] a low standard of living,” Meadows says as he explains how the population reduction agenda could be triggered.",Daily Squib,2022-11-28,World,Satire,https://www.dailysquib.co.uk/world/48511-world-economic-forum-brutal-totalitarian-communist-china-is-model-for-western-nations.html
8,Matt Hancock Found With Huge Amounts of Midazolam in Jungle,"Matt Hancock could find himself embroiled in a row with his I’m A Celebrity campmates after being found with a large cache of the sedative Midazolam during his isolation period ahead of entering the jungle. The Daily Squib can reveal the former Health Secretary, 44, was primed and ready ‘to carry out another covid’ on the campmates. ITV bosses were left reeling at the erroneous discovery. Mr Hancock, who lost the Tory whip last week when it was announced he was joining the cast of the show, also has access to his laptop. Do Not Resuscitate “It seems Hancock’s plan was simple. Sedate the other contestants. Claim they had a virus, then up the dose, snuffing the fuckers out. Boom! Take the prize money,” an ITV worker revealed. Earlier on today, Matt Hancock was filmed eating an entire bowl of crunchy crocodile anuses with a big silly grin on his face.",Daily Squib,2022-11-07,Entertainment,Satire,https://www.dailysquib.co.uk/entertainment/48277-matt-hancock-found-with-huge-amounts-of-midazolam-in-jungle.html
9,The Beginning of the Post Consumerist Era,"Food scarcity, impossible prices, insane inflation and mass job losses/bankruptcies. Welcome to the beginning of the post consumerist era, a time where the assets of citizens are stripped and mass consumerism is effectively shut down. The Great Reset is a modern take on medieval feudalism, where Marxist collectivism meets the criteria of feudalism and totalitarianism. You will not own property, and you will not own any assets. Already, within the UK we are seeing huge gargantuan interest rate hikes that not only kill off all business but home ownership. The death of aspiration, where success is punished, is a predominantly Marxist construct that is touted as a major part of Labour Party policy, but has now been adopted by the socialist Tory Party fully. The middle class and bourgeoisie are being whittled down and systematically destroyed so that only the highest echelons will hold all the power and money. No true Conservative party or government exists in the West simply because all governmental departments have socialist structures and feed a massive welfare/social care state. If the entire system is socialist, it means Conservative policies cannot exist and if there are attempts to implement such policies they are rejected. Donald Trump was a prime example of this, as his every move and policy was rejected eventually leading to him being ousted from his limited role as president. Liz Truss was another example, ousted after 45 days in office. The role of the new Marxists, the new generation of indoctrinated controllers aligned with communist China is to create an ultimate hive mind post consumerist post capitalist landscape where in the name of the so-called eco-drive all citizens are impoverished and depend solely on the state. Universal income will be a pittance where citizens will barely be able to feed themselves with EU rations of maggot and insect protein. Citizen social scores adopted by China and being developed by Western tech companies now will determine what each family receives monthly. Citizens who are religious, or fight against the system in any way will simply have their rations or heat supplies reduced. Severe offenders and people who actively fight or deny state power will have their rations cut completely, and all privileges like travel, education, entertainment removed. Children will be indoctrinated by the state education system to report even their own family members for any indiscretions. One must not forget the hierarchy who will of course be dining on the finest gourmet food, and living in vast AI served palaces watching over the Untermenschen useless eaters. Their children will be educated separately and their life spans will be enhanced dramatically through science. From then on, there will be little use for the remaining population and through poverty, hunger, controlled virus outbreaks and disease the rest will be whittled down. A celebrated member of the globalist World Economic Forum (WEF) has called for a staggering 86 percent reduction in the population of humans, arguing that the goal can be achieved “peacefully.” Dennis Meadows, one of the main authors of the Club of Rome’s 1972 pro-depopulation book “The Limits to Growth.” Meadows argues that most of the world’s population must be wiped out so that the survivors can “have freedom” and a “high standard of living.” During a 2017 interview, Meadows claims that genocide of 86% of the world’s population is “inevitable.” However, he insists that a “benevolent” dictatorship could accomplish the mass de-population “peacefully.” “We could [ ] have eight or nine billion, probably,” he says of the world’s growing population. “If we have a very strong dictatorship which is smart … and [people have] a low standard of living,” Meadows says as he explains how the population reduction agenda could be triggered.",Daily Squib,2022-10-29,World,Satire,https://www.dailysquib.co.uk/world/47729-the-beginning-of-the-post-consumerist-era.html


**Waterford Whispers**

50 articles were take from the homepage (https://waterfordwhispersnews.com/), sorted from most recent to least recent.

In [5]:
urls = [
    "https://waterfordwhispersnews.com/2025/01/06/dickhead-boss-wants-to-hit-the-ground-running-in-2025/",
    "https://waterfordwhispersnews.com/2025/01/03/organised-local-woman-straight-onto-revenue-portal-to-get-that-e4-25-tax-back-shes-owed/",
    "https://waterfordwhispersnews.com/2025/01/06/eight-additional-data-centres-needed-to-store-pictures-of-irish-snow-energy-watchdog-warns/",
    "https://waterfordwhispersnews.com/2025/01/06/colin-farrell-asks-if-three-golden-globes-are-redeemable-for-one-oscar/",
    "https://waterfordwhispersnews.com/2025/01/06/remote-workers-wouldnt-have-agreed-to-work-from-home-if-they-knew-it-meant-zero-snow-days-off/",
    "https://waterfordwhispersnews.com/2025/01/06/im-too-old-and-rich-to-be-righteous-anymore-bono/",
    "https://waterfordwhispersnews.com/2025/01/02/irish-couples-under-increasing-pressure-to-have-minimoon/",
    "https://waterfordwhispersnews.com/2025/01/01/5-realistic-new-years-resolutions/",
    "https://waterfordwhispersnews.com/2024/12/30/seeing-elder-millennials-in-news-headlines-like-a-stab-in-the-heart/",
    "https://waterfordwhispersnews.com/2024/12/26/nations-traffic-at-standstill-as-post-christmas-re-turn-machine-queues-clog-roads/",
    "https://waterfordwhispersnews.com/2024/12/24/christmas-miracle-fine-gael-fianna-fail-put-aside-differences-to-play-football-in-leinster-house-trenches/",
    "https://waterfordwhispersnews.com/2024/12/22/local-woman-cant-believe-how-many-bullshit-made-up-christmas-traditions-in-laws-have/",
    "https://waterfordwhispersnews.com/2024/12/20/how-come-you-your-dad-support-different-teams-innocently-asks-girlfriend-about-to-receive-crash-course-in-chelseas-early-2000s-transformation/",
    "https://waterfordwhispersnews.com/2024/12/19/investigation-launched-to-discover-why-sofas-no-longer-come-with-arm-rest-covers/",
    "https://waterfordwhispersnews.com/2024/12/19/tesco-expand-self-service-checkout-to-include-customer-stacking-shelves-processing-deliveries/",
    "https://waterfordwhispersnews.com/2024/12/18/fears-rip-ie-death-notice-charge-may-turn-irish-funerals-into-a-money-racket/",
    "https://waterfordwhispersnews.com/2024/12/18/martina-burke-starting-to-suspect-family-would-do-anything-to-get-away-from-her/",
    "https://waterfordwhispersnews.com/2024/12/18/people-insisting-on-posting-about-death-destruction-in-the-birthplace-of-jesus-asked-to-shut-up-so-we-can-all-enjoy-a-guilt-free-christmas/",
    "https://waterfordwhispersnews.com/2024/12/18/fresh-hope-irish-politics-changing-for-better-with-ff-fg-supporting-td-who-thinks-3-year-old-immigrants-are-in-isis-for-ceann-comhairle/",
    "https://waterfordwhispersnews.com/2024/12/18/on-this-day-1981-irelands-first-swingers-discuss-having-no-one-to-ride/",
    "https://waterfordwhispersnews.com/2024/12/17/what-irish-people-are-saying-about-israel-closing-its-embassy-in-ireland/",
    "https://waterfordwhispersnews.com/2024/12/18/4-year-old-pours-glass-of-ribena-after-another-exhausting-day/",
    "https://waterfordwhispersnews.com/2024/12/16/mcentee-reassures-public-dublin-city-safe-between-the-times-of-10-23am-10-27am-every-second-tuesday/",
    "https://waterfordwhispersnews.com/2024/12/16/former-fine-gael-politician-charged-with-human-trafficking-possessing-sex-abuse-images-some-media-half-heartedly-reports/",
    "https://waterfordwhispersnews.com/2024/12/16/thousands-of-psychologists-descend-on-manchester-city-to-observe-study-guardiolas-meltdown/",
    "https://waterfordwhispersnews.com/2024/12/16/country-that-stole-irish-passports-for-use-in-assassinations-attacked-irish-peacekeepers-closes-embassy-in-ireland-over-countrys-opposition-to-genocide/",
    "https://waterfordwhispersnews.com/2024/12/16/conor-mcgregor-wins-brought-most-shame-to-ireland-at-rte-sports-awards/",
    "https://waterfordwhispersnews.com/2024/12/13/local-mans-always-had-a-distrust-of-the-state-ever-since-they-caught-him-doing-illegal-things/",
    "https://waterfordwhispersnews.com/2024/12/13/super-low-key-girls-xmas-meet-up-somehow-costs-e427/",
    "https://waterfordwhispersnews.com/2024/12/13/pep-guardiola-calls-shamrock-rovers-for-tips-on-winning-in-europe/",
    "https://waterfordwhispersnews.com/2024/12/12/everyone-advised-to-ignore-report-saying-house-prices-overvalued-by-10-everything-is-fine/",
    "https://waterfordwhispersnews.com/2024/12/12/man-greets-suitable-for-8-people-label-on-food-like-a-challenge/",
    "https://waterfordwhispersnews.com/2024/12/12/saudi-arabia-begins-plotting-workers-mass-grave-for-world-cup-2034/",
    "https://waterfordwhispersnews.com/2024/12/11/revenue-raise-concerns-as-some-gaa-county-board-accounts-written-in-marker-on-hurl-grip-tape/",
    "https://waterfordwhispersnews.com/2024/12/11/israel-bagsies-syria/",
    "https://waterfordwhispersnews.com/2024/12/11/luigi-mangione-remains-producers-favourite-for-next-season-of-the-bachelor/",
    "https://waterfordwhispersnews.com/2024/12/09/dont-worry-were-the-good-terrorists/",
    "https://waterfordwhispersnews.com/2024/12/11/hayes-sickened-he-didnt-hold-out-as-stocks-he-dumped-soars/",
    "https://waterfordwhispersnews.com/2024/12/09/some-irish-villages-resorting-to-cannibalism-as-power-yet-to-be-restored-after-storm-darragh/",
    "https://waterfordwhispersnews.com/2024/12/06/person-at-top-of-shop-queue-taken-completely-by-surprise-by-request-to-pay-for-goods/",
    "https://waterfordwhispersnews.com/2024/12/09/assad-to-be-given-tour-of-moscows-most-beautiful-10th-storey-windows/",
    "https://waterfordwhispersnews.com/2024/12/06/local-dad-enters-2nd-hour-of-christmas-tree-price-negotiation-stand-off/",
    "https://waterfordwhispersnews.com/2024/12/05/unitedhealthcare-board-not-sure-when-right-time-to-break-it-to-ceos-family-none-of-this-is-covered-under-insurance-plan/",
    "https://waterfordwhispersnews.com/2024/12/04/parents-counteract-child-dropping-hints-about-xmas-presents-with-dropping-hints-theyre-broke-as-fuck/",
    "https://waterfordwhispersnews.com/2024/12/04/you-up-hun-ff-fg-sends-late-night-text-to-michael-lowry/",
    "https://waterfordwhispersnews.com/2024/12/04/thats-bad-form-now-kim-jong-un-urges-south-korean-president-to-stand-down/",
    "https://waterfordwhispersnews.com/2024/12/03/supplying-weapons-to-kill-50000-people-all-fine-but-just-dont-pardon-your-son-biden-told/",
    "https://waterfordwhispersnews.com/2024/12/03/first-100-violations-of-ceasefire-are-free-us-tells-israel-as-idf-continues-to-strike-lebanon/",
    "https://waterfordwhispersnews.com/2024/12/03/fianna-fail-fine-gaels-red-lines-for-going-into-government-with-each-other/",
    "https://waterfordwhispersnews.com/2024/12/03/calm-down-sugar-tits-gregg-wallaces-guide-to-crafting-the-perfect-apology/",
    
]

def scrape_whispers_article(url):
    """
    Scrapes an article from a given URL on waterfordwhispersnews.com and extracts relevant information.

    Parameters:
    ----------
    url : str
        The URL of the article to scrape.

    Returns:
    -------
    dict
        A dictionary containing the extracted article data.
    """
    article_data = {
        "title": "",
        "text": "",
        "site": "",
        "date": "",
        "category": "",
        "class": "Satire", #satire is hardcoded here as we know WaterfordWhispers is a satire site
        "url": url
    }

    response = requests.get(url)
    if response.status_code == 200:
        soup = BeautifulSoup(response.content, 'html.parser')

        # Title
        title_meta = soup.find('meta', property='og:title')
        article_data["title"] = title_meta['content'] if title_meta else "Title not found"
        
        # URL
        url_meta = soup.find('meta', property='og:url')
        article_data["url"] = url_meta['content'] if url_meta else url  # Fallback to input URL
        
        # Site name
        site_name_meta = soup.find('meta', property='og:site_name')
        article_data["site"] = site_name_meta['content'] if site_name_meta else "Site name not found"
        
        # Published date 
        date_div = soup.find("div", class_="post-date", itemprop="datePublished")
        if date_div:
            article_data["date"] = date_div.get_text(strip=True)
        else:
            article_data["date"] = "Date not found"
 
        # Category (excluding the ones used just for web display)
        excluded_categories = {"breaking news", "featured-one", "featured-two", "featured-three","homepage"}
        category_div = soup.find("div", class_="post-category")
        if category_div:
            all_cats = [a.get_text(strip=True) for a in category_div.find_all("a")]
            valid_cats = [cat for cat in all_cats
                          if cat.lower() not in excluded_categories]
            if valid_cats:
                article_data["category"] = valid_cats[0]
            else:
                article_data["category"] = "Category not found"
        else:
            article_data["category"] = "Category not found"

        # Article copy
        content_div = soup.find("div", class_="article-content", itemprop="articleBody")
        if content_div:
            for p_tag in content_div.find_all("p"):
                p_text = p_tag.get_text(strip=True).lower()
                # remove marketing snippets
                if "check out our shop." in p_text or "www.waterfordwhispers.shop" in p_text or "buy some of our merch here" in p_text or "help us to keep pissing off all the right people" in p_text:
                    p_tag.decompose()

            # remove blockquotes
            for bq in content_div.find_all("blockquote"):
                bq.decompose()

            paragraphs = content_div.find_all("p")
            full_text = " ".join(p.get_text(strip=False) for p in paragraphs)
            article_data["text"] = full_text.strip()
        else:
            article_data["text"] = "Article text not found"
    
    else:
        print(f"Failed to fetch the webpage: {url}. Status code: {response.status_code}")
    
    return article_data


def scrape_multiple_whispers_articles(urls):
    """
    Scrapes multiple articles from a list of URLs and stores the data in a DataFrame.

    Parameters:
    ----------
    urls : list
        A list of article URLs to scrape.

    Returns:
    -------
    pd.DataFrame
        A DataFrame containing the scraped data from all URLs.
    """
    articles = []
    for url in urls:
        article = scrape_whispers_article(url)
        articles.append(article)
    return pd.DataFrame(articles)

# Scrape articles and create a DataFrame
whispers_data_df = scrape_multiple_whispers_articles(urls)
# Store to CSV
whispers_data_df.to_csv("satire_scraped_articles_whispers.csv", index=False)
# Print df 
whispers_data_df.head()

Unnamed: 0,title,text,site,date,category,class,url
0,Dickhead Boss Wants To Hit The Ground Running In 2025,"A COMPLETE visual replica of a gaping arsehole, local boss Jamie McCartlin, has informed staff on their first day back in 2025 that he wants to hit the ground running this year. In a circular email sent to staff that immediately led to rolling of eyes seen by WWN, McCartlin spoke of wanting ‘100%’ from staff, pointing to a less than stellar last quarter and how ‘some people around here won’t get away with coasting for another year’. “Not only is it the bastarding 2nd of January, it’s a Thursday. Save it for the first Monday of the year at the very least,” groused employee Cormac D’Arcy, who had hit the ground in 2025 and stayed there, hoping to sneak in a quick nap while no one way looking. “My internal work clock remains switched off until the Christmas Tree is fucked in the field down the road. So even if I wanted to hit, slam, bate the ground in 2025 it’ll have to wait,” offered Carmel Tullan, who didn’t make it past the first sentence in her boss’ email which began ‘New Year, New Focused Workforce’. For his part, McCartlin feels he is simply leading by example. “This ship is crew only, no passengers if you get what I’m saying. The only coasting around her is being done by the coastline down the way,” McCartlin said, communicating in empty jargon which appeared to make sense to him. Following up the email with a quick speech to the troops, McCartlin felt his message was getting through. “I won’t sugarcoat it, I need soldiers, people who eat, sleep and breathe this job. ,” concluded McCartlin, who neglected to mentioned he was offing on a two week skiing trip to Austria at the end of the month.",Waterford Whispers News,"January 6, 2025",BUSINESS,Satire,https://waterfordwhispersnews.com/2025/01/06/dickhead-boss-wants-to-hit-the-ground-running-in-2025/
1,Organised Local Woman Straight Onto Revenue Portal To Get That €4.25 Tax Back She’s Owed,"HAVING CALCULATED the projected tax relief she is entitled to under the Remote Working Relief on electricity and broadband bills at the start of December, local woman Orna Stewart is primed and ready to file her 2024 tax return. With her company allowing a flexible working arrangement for employees which includes 3 days at home, Stewart clocked up three days working from home because she thinks she gets more done in the office and she is ready to claim back what is hers. “People have no idea how much they’re entitled to, just leaving money there on the table but not me, no way Jose, that €4.25 is mine!” Stewart said as she hovered over the submit button in her sitting room, on her day off, and the weather lovely outside and all. “But since the bills are in your name, you could claim for much more, even if we split them evenly,” Stewart’s husband Martin stated as he queried why the 33-year-old wasn’t entering the full amount paid through her account, unwittingly setting himself up to be subjected to an hour’s long lecture on honesty and ethics. “Here she is, right on queue,” said the automated system in receipt of Stewarts tax declaration, making her the first person in the country to submit her declaration for the 5th year running.",Waterford Whispers News,"January 3, 2025",LOCAL NEWS,Satire,https://waterfordwhispersnews.com/2025/01/03/organised-local-woman-straight-onto-revenue-portal-to-get-that-e4-25-tax-back-shes-owed/
2,“Eight Additional Data Centres Needed To Store Pictures Of Irish Snow” Energy Watchdog Warns,"THE COMMISSION FOR Regulation of Utilities in Ireland has called on the public to refrain from taking yet more photos of the snow as server capacities reach breaking point. “Data centres already consume a huge amount of our electricity, but Meta will need to break ground on another eight data centres if you keep uploading those photos of your snowman with a carrot for a dick,” cautioned the energy watchdog. Usually limiting itself to providing a free complaint resolution service, the CRU has felt the need to speak out before the energy grid buckles under the weight of picturesque landscape photos accompanied by the word ‘sneactha’. “We all know what snow fucking looks like lads, you’ve already taken 150 pictures of it, that’s enough. And anyway, they’re likely out of focus and partially covered by your fat fingers. The world will survive without your visual documentation of it all,” a CRU spokesperson added, as the light caught their face just right amid a background of beautiful cascading snow which acted as cotton wool blanket pulled over the nearby hills. “You better not be taking a fucking picture of me,” insisted the spokesperson, as this reporter’s iPhone camera’s flash went off brighter than a Chinese New Year fireworks display. UPDATE: The CRU also urged the public to be mindful of the carbon footprint created by taking 400 photos of your pets every hour, and the impact of giving a thumb’s up emoji to every single WhatsApp message you receive.",Waterford Whispers News,"January 6, 2025",LOCAL NEWS,Satire,https://waterfordwhispersnews.com/2025/01/06/eight-additional-data-centres-needed-to-store-pictures-of-irish-snow-energy-watchdog-warns/
3,Colin Farrell Asks If Three Golden Globes Are Redeemable For One Oscar,"EMERGING TRIUMPHANT from the Golden Globes for an incredible third time, Colin Farrell, universally lauded for his transformation as Batman villain the Penguin opted not to celebrate his win, instead getting straight onto the Oscars actor hotline. “Do I send them in the post and then you send out one of your ones, or how does it work?” Farrell asked over the phone, the audience’s applause still audible as he walked offstage at the glitzy ceremony last night. It is unclear where Farrell heard the rumour that a certain number of Globe trophies can be exchanged for an Oscar statue, but voicemail recordings obtained by Waterford Whispers News confirm the celebrated performer is keen to see the exchange happen. “There’s no way Meryl hasn’t exchanged a few of her 9 Globes for one Oscar, don’t be messing me around lads. Playing psycho Henry Drax in The North Water wasn’t exactly a stretch for me, just saying,” one voicemail began, left several seconds after Farrell finished recording his previous message. “Enough time has passed, it’s okay to finally admit you made a mistake. Sean Penn in ‘Milk’? Over me in ‘In Bruges’, give over,” continued Farrell, the sound of his three Golden Globes chiming off one another as he tried to shove them into a post box. Farrell’s angling for the gong handed out by the Academy of Motions Arts and Sciences has led to other actors calling for an established exchange rate value between different awards. “Everyone wants an Oscar, but is it the equivalent to three Golden Globes? And if it is then I’m guessing 25 IFTAs is equal to one MTV Movie Awards for Best Kiss?” one industry insider offered.",Waterford Whispers News,"January 6, 2025",ENTERTAINMENT,Satire,https://waterfordwhispersnews.com/2025/01/06/colin-farrell-asks-if-three-golden-globes-are-redeemable-for-one-oscar/
4,Remote Workers Wouldn’t Have Agreed To Work From Home If They Knew It Meant Zero Snow Days Off,"“I FEEL TRICKED” confirmed Carmel Foley, one of hundreds of thousands of workers granted the ability to work from home several days a week in recent years, as she stared mournfully out at heavy snow she couldn’t use as an excuse for a handy day off. “You can’t begin to understand how sneaky and satisfying it felt back with the big snow in 2018 when I told the office the roads to the were impassable due to snow. Back then I didn’t have a home work set up,” added another worker, whose employer would have laughed in their face back in 2018 if they asked for an ergonomic home desk and laptop working arrangement. Irish remote workers now face the prospect of foregoing a ‘snow day’ due to their home being equipped with the ability to do a day’s work, all while non-office workers given the day off fill their social media feeds with snowman building. “That’s a ‘The Chase’ repeat marathon in my boxers off the fucking cards,” griped one worker who has actually had several of those days recently over the Christmas but still feels like they’re missing out. “A snow day off from work is one of those sacred if rare rights of passage all office workers should be entitled to, like when you get a day off ‘cus someone in the warehouse loses an arm or the under the table hush payment when the boss can’t keep his arms to himself,” complained workers rights’ advocate Tony Dallon. Asked why remote workers didn’t just lie and say they had no electricity, workers responded in unison stating that every office has some ‘busybody prick’ who would actually check that with the ESB networks and rat on them.",Waterford Whispers News,"January 6, 2025",BUSINESS,Satire,https://waterfordwhispersnews.com/2025/01/06/remote-workers-wouldnt-have-agreed-to-work-from-home-if-they-knew-it-meant-zero-snow-days-off/
5,“I’m Too Old And Rich To Be Righteous Anymore” – Bono,"IRISH singer-songwriter and former activist Bono clasped his hands and prayed to whatever God he worships these days while receiving the US Presidential Medal of Freedom from outgoing president and Zionist Joe Biden, admitting that yes, he’s too old and rich to be righteous anymore. Resembling an obedient child receiving his First Holy Communion while staring into the heavens, the almost billionaire cast decades of activism and calling out warmongers from his mind in favour of the highest civilian award of the United States usually given to those for their ‘meritorious contribution to either the security interests of the United States, cultural or other significant public or private endeavours or world peace’. “Look, we’ll forget about that last part there and just say it’s for the first two,” Biden reassured Bono as he continued to make a holy show out of his homeland in front of the world. “We’ll get you a nice site on the Gaza strip when we’re finished. The seaside property there is to die for”. Sir Bono, once known for his outspoken opinions on war and famine, later defended accepting the award from the nation responsible for funding ongoing genocide in Gaza and dozens of other unjust wars over the last century. “There comes a time in an activists’ life when he has to de-activist, especially when it starts crossing over into financial and foreign policy interests of my elitist friends,” Bono explained, whose name coincidentally appeared in the Panama Papers alongside fellow medal recipient Hillary Clinton. “I know people back home will not be happy, but they’re just begrudgers. Please forgive them Lord as they do not know what they are doing. Amen. Cel-a-brat-ion! Uh oh oh”. Meanwhile, Michael J Fox, Denzel Washington, ‘Magic’ Johnson, Ralph Lauren and several others were just glad they were actual American citizens and thus avoided such criticism from people calling for an end to the Palestinian genocide.",Waterford Whispers News,"January 6, 2025",ENTERTAINMENT,Satire,https://waterfordwhispersnews.com/2025/01/06/im-too-old-and-rich-to-be-righteous-anymore-bono/
6,Irish Couples Under Increasing Pressure To Have ‘Mini-moon’,"A NEW SURVEY of Irish couples reveals that many feel crushing pressure to go on a ‘mini-moon’ despite ‘stupid fucking mini-moons’ only coming to prominence in recent years. “I’d never heard of it before but now all because she saw randomer who is dead good at putting make up on herself on Instagram had one, herself is off down the Credit Union,” explained oblivious groom to be Daniel Dennan. A toxic mix of notions, societal pressure and following influencers who get all this stuff for free, has, according to the recent survey led to 92% of couples planning a wedding or newly married convincing themselves they might not be allowed stay married under the eyes of the law unless they go on a romantic mini-honeymoon. “Suddenly, one day out of nowhere it becomes an accepted and demanded practice in society. Notions on crystal meth. Kind of like when everyone was perfectly happy carrying water around in a plastic bottle but now everyone has been convinced you need a hardy metal container that is so oversized it can store the contents of an Olympic sized-swimming pool in it, baffles the mind but here we are,” explained g’way-out-of-that expert Shona Ward. “A fucking mini-moon? No, you idiot, that’s what the honeymoon is for. That’s the holiday. You don’t have to have a little holiday so Paris doesn’t get jealous of your two weeks in the Caribbean. Are you mad? Why are you doing this? Why are you like this?” one bride-to-be screamed at her irrational reflection in the mirror, minutes before dropping €12000 on a 36-hour trip to France.",Waterford Whispers News,"January 2, 2025",LIFESTYLE,Satire,https://waterfordwhispersnews.com/2025/01/02/irish-couples-under-increasing-pressure-to-have-minimoon/
7,5 Realistic New Year’s Resolutions,"EVERY YEAR thousands of Irish people set themselves unrealistic goals as so-called NEw Year’s resolutions and fail miserably when it comes to achieving anything. Why set the bar at an unobtainable level when there are some real everyday improvements to be made and very achievable targets. Check out these realistic New Year’s resolutions: 1) Drink a glass of water over a 12 month period 365 days x 1.3ml per day comes in at 500ml. You can do it. Believe in yourself. Transform the way you attack the day! 2) Be near, around or in the general vicinity of a gym or someone who goes to a gym Beware of the lazy pricks who duck out of any fitness regime after January, they don’t count for the other 11 months of the year. 3) Occasionally get dressed Why hold yourself to the toxic Instagram-filtered perfection of ‘getting dressed every day’. Three days on, four days off is a good start. Don’t set unrealistic expectations that society pressures people to conform to. Pyjamas most days are fine. You don’t even have to wash they that much. 4) Briefly glance at healthy salads in the supermarket before swiftly moving on If you want to take it one step further, put a pre-made salad in your basket before later abandoning on a shelf in the biscuit aisle. 5) Successfully download Duolingo. No need to put the app icon on the main screen where it will taunt you And if you really want to push yourself, maybe learn some swears words in German, Spanish or French.",Waterford Whispers News,"January 1, 2025",Uplifting Viral Content,Satire,https://waterfordwhispersnews.com/2025/01/01/5-realistic-new-years-resolutions/
8,Seeing ‘Elder Millennials’ In News Headlines Like A Stab In The Heart,"MILLENNIALS have urged the media to choose their words more carefully after reports dozens of people born between in the 80s and 90s have suffered cataclysmic emotional breakdowns at seeing ‘elder millennials’ in headlines. “You can’t be elder anything if you are still part the fresh young generation that’s just emerged and is defining culture. Fashion, music, art, film, we’re the zeitgeist,” falsely alleged geriatric millennial Susie Earley. The warning to be more mindful of the words people use has been extended to workplace settings which encompasses people just entering the workforce for the very first time. “If you even so much as try and say ‘oh, I wasn’t born then so it was before my time’ to me, I’ll send you the bills for my therapy, One Tree Hill isn’t ‘classic TV’,” sobbed office worker John Sconnell. Not helping the situation, the words ‘elder millennials’ often precede headlines such as ‘still have f-all hope of affording a house’. “Like, what the fuck are you talking about? We’re the tech generation,” offered another near pensionable millennial, who is too afraid to use ChatGPT because they think robots will burst through their window. UPDATE: An emergency team of therapists have been deployed to millennials after a meme stating ‘we’re as close to the year 2000 as people were in 1975’ began doing the rounds.",Waterford Whispers News,"December 30, 2024",LOCAL NEWS,Satire,https://waterfordwhispersnews.com/2024/12/30/seeing-elder-millennials-in-news-headlines-like-a-stab-in-the-heart/
9,Nation’s Traffic At Standstill As Post-Christmas Re-turn Machine Queues Clog Roads,"TAILBACKS of 10 miles have been reported in some areas in Ireland today as people attempt to dispense with their post-Christmas cans of beer, in the first Christmas since the Re-turn recycling scheme was introduced. “Authorities should have seen this coming, we should have quadrupled the number of machines,” said one man from his car which he has been trapped in for 17 hours just outside the car park to his local Lidl such is the traffic. With Re-turn machines currently processing on average 121 cans per minute nationwide on the four remaining machines that aren’t out of order, the country’s main traffic arteries are more clogged than your heart after a third helping a gravy covered turkey. “I didn’t think it through when I got in the eight slabs for the Christmas, I usually have a mental breakdown when I do the post-Christmas dump run but I guess I’ll just have it now,” added one individual towing a trailer full of his haul of Christmas cans which included some from the days they had the in-laws and neighbours over. The problem has been exacerbated by the fact everyone currently in traffic thought they were getting the can return done before it dawned on anyone else, rendering being stuck in bastard traffic all the more heartbreaking. “I’m missing an Indiana Jones marathon on RTÉ,” said one driver, unaware the car they’ve been stuck behind for 2 hours is in fact made entirely of beer cans abandoned by a driver who gave up and fled long ago. UPDATE: Supermarkets with Re-turn machines are now providing hose down showers for drivers stuck in their cars for hours on end who now stink of stale alcohol emanating from their boots and back seats.",Waterford Whispers News,"December 26, 2024",LOCAL NEWS,Satire,https://waterfordwhispersnews.com/2024/12/26/nations-traffic-at-standstill-as-post-christmas-re-turn-machine-queues-clog-roads/


In [6]:
# Combine DataFrames
satire_dataset = pd.concat(
    [whispers_data_df, squib_data_df, bee_data_df, onion_data_df],
    ignore_index=True
)

# Basic checks
print(satire_dataset.info())   # Data types & non-null counts
print(satire_dataset.head())   # Quick glance at first rows

# Print out the categories
print(satire_dataset["category"].value_counts())

# Confirm 4 sites are represented
print("Number of unique sites:", satire_dataset["site"].nunique())

Number of rows in satire_dataset: 200
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 200 entries, 0 to 199
Data columns (total 7 columns):
 #   Column    Non-Null Count  Dtype 
---  ------    --------------  ----- 
 0   title     200 non-null    object
 1   text      200 non-null    object
 2   site      200 non-null    object
 3   date      200 non-null    object
 4   category  200 non-null    object
 5   class     200 non-null    object
 6   url       200 non-null    object
dtypes: object(7)
memory usage: 11.1+ KB
None
                                                                                            title  \
0                                           Dickhead Boss Wants To Hit The Ground Running In 2025   
1        Organised Local Woman Straight Onto Revenue Portal To Get That €4.25 Tax Back She’s Owed   
2    “Eight Additional Data Centres Needed To Store Pictures Of Irish Snow” Energy Watchdog Warns   
3                          Colin Farrell Asks If Three Golden Glob

In [10]:
def clean_category(cat: str) -> str:
    """
    Convert categories to lowercase, unify synonyms, and return a single standardised category.
    """
    # Convert to lowercase
    c = cat.strip().lower()
    
    # Standardise categories
    replacements = {
        'politics': 'politics',
        'local news': 'local',
        'world news': 'world',
        'world': 'world',
        'worldviews':'world',
        'entertainment': 'entertainment',
        'business': 'business',
        'health': 'health',
        'lifestyle': 'lifestyle',
        'life':'lifestyle',
        'sports': 'sports',
        'sport': 'sports',
        'football': 'sports',
        'gaa': 'sports',
        'sci/tech': 'technology',
        'celebs':'entertainment',
        'tech': 'tech',
        'u.s.':'united states',
        'uplifting viral content': 'entertainment', 
    }

    # Do the changes
    if c in replacements:
        c = replacements[c]
    return c

# Apply the cleaning
satire_dataset['category'] = satire_dataset['category'].apply(clean_category)

# Now check the new distribution
print(satire_dataset['category'].value_counts())

politics              48
entertainment         38
world                 37
local                 27
news                  19
sports                 7
lifestyle              6
business               4
health                 4
category not found     3
tech                   3
united states          3
editorials             1
Name: category, dtype: int64


In [13]:
# Store to CSV
satire_dataset.to_csv("satire_articles.csv", index=False)