## MoreOrLess Analysis


[More Or Less Podcast](https://www.bbc.co.uk/programmes/p02nrss1/episodes/player)  of BBC Radio 4

# Setup

In [3]:
# import all necessary modules and packages
import requests
from bs4 import BeautifulSoup
from tqdm import tqdm
import pandas as pd
import numpy as np
#import seaborn as sns
#import matplotlib.pyplot as plt
import warnings
import time
warnings.simplefilter(action='ignore', category=FutureWarning)

In [137]:
pd.set_option('display.max_rows', None)  # Optionally, ensure all rows are shown

In [27]:
addresses_df = pd.read_csv('episode_links.csv')

# Webscraping

In [204]:
base_url = 'https://www.bbc.co.uk/programmes/p02nrss1/episodes/player?page='
page_numbers = range(1, 74) #74)

# Creating a list of URLs for pages 1 to 12 /!\ except page 4, see below!
pages_links_list = [base_url + str(page_number) for page_number in page_numbers] # if page_number != 4]
pages_links_list[0:5]
# pages_links_list[len(pages_links_list)-1]

['https://www.bbc.co.uk/programmes/p02nrss1/episodes/player?page=1',
 'https://www.bbc.co.uk/programmes/p02nrss1/episodes/player?page=2',
 'https://www.bbc.co.uk/programmes/p02nrss1/episodes/player?page=3',
 'https://www.bbc.co.uk/programmes/p02nrss1/episodes/player?page=4',
 'https://www.bbc.co.uk/programmes/p02nrss1/episodes/player?page=5']

In [181]:
episodes_links_list = []

for url in tqdm(pages_links_list):  # Iterate over the links and track progress
    request = requests.get(url)
    soup = BeautifulSoup(request.text, 'html.parser')

    # Find all h2 tags with the class "programme__titles"
    link_soup = soup.find_all('h2', class_="programme__titles")

    # For each h2 tag, find the <a> inside it and get the href
    for tag in link_soup:
        a_tag = tag.find('a')
        if a_tag and a_tag.get('href'):
            episodes_links_list.append(a_tag['href'])

    # Adding a 1-second delay
    time.sleep(1)


100%|█████████████████████████████████████████████████████████████████████████████| 73/73 [01:35<00:00,  1.30s/it]


In [206]:
len(episodes_links_list)

729

In [208]:
all_dfs = []

for url in tqdm(episodes_links_list): # Iterate over the links and track progress
    request = requests.get(url)
    soup = BeautifulSoup(request.text)

    
    date_soup = soup.find_all('time')  # Grab all <time> tags
    date = [tag['datetime'] for tag in date_soup if tag.get('datetime')]

    title_soup = soup.find('h1', class_='no-margin')
    title = title_soup.text.strip() if title_soup else None

    # Description (handles multi-paragraph)
    desc_container = soup.find('div', class_='synopsis-toggle__long')
    if desc_container:
        paragraphs = desc_container.find_all('p')
        description = "\n".join(p.text.strip() for p in paragraphs)
    else:
        description = None

    # If empty or not found, fallback to .text--prose.longest-synopsis
    if not description:
        fallback_container = soup.find('div', class_='text--prose longest-synopsis')
        if fallback_container:
            fallback_paragraphs = fallback_container.find_all('p')
            if fallback_paragraphs:
                description = " ".join(p.text.strip() for p in fallback_paragraphs)


    duration = None
    meta_tags = soup.find_all('p', class_='episode-panel__meta')

    for tag in meta_tags:
        if 'minutes' in tag.text:
            duration = tag.text.strip()
            break

    
    #current_df = pd.DataFrame({'Episode': episodes, 'Date': dates, 'Title': titles, 'Duration': durations, 'Categories': categories})
    current_df = pd.DataFrame({'Date': date, 'Title': title, 'Description': description, 'Duration': duration, 'Links': url})

    # Append the current DataFrame to the list
    all_dfs.append(current_df)

    # Adding a 5-second delay
    time.sleep(0.5)

# Concatenate all DataFrames in the list
result_df = pd.concat(all_dfs, ignore_index=True)

100%|███████████████████████████████████████████████████████████████████████████| 729/729 [09:48<00:00,  1.24it/s]


In [234]:
print(result_df.iloc[3]['Description'])

Neighbours, everybody needs good neighbours, and since the end of the Second World War that’s exactly what the US and Canada have been. They’ve enjoyed free trade agreements, close knit economic ties - and not so friendly ice hockey matches.
But recently this relationship has soured, with President Trump calling them “one of the nastiest countries to deal with”. It looks like the era of mostly free trade is over, with a raft of tariffs set to come into force on April the 2nd, or “liberation day” a Donald Trump calls it.
But is President Trump right about the trading relationship between the two countries? What does he mean when he claims that “the US subsidises Canada $200 billion a year”?
Presenter: Tim HarfordProducer: Lizzy McNeillSeries Producer: Tom CollsEditor: Richard VadonProduction co-ordinator: Katie MorrisonStudio manager: Andrew Mills


---

# NLP

In [216]:
# import spacy
# !python -m spacy download en_core_web_sm


# Load small English language model
nlp = spacy.load("en_core_web_sm")

# Function to extract location entities
def extract_locations(text):
    doc = nlp(text)
    return [ent.text for ent in doc.ents if ent.label_ in ("GPE", "LOC", "NORP")]

# Apply it to the dataframe
result_df["Places"] = result_df["Description"].apply(extract_locations)

result_df

           Date                                              Title  \
0    2025-04-12                    How much is a human life worth?   
1    2025-04-04              Trump tariffs: All about the deficits   
2    2025-04-02          Is one in four people in the UK disabled?   
3    2025-03-29                What’s Trump’s problem with Canada?   
4    2025-03-26                 Could a 2% wealth tax raise £24bn?   
5    2025-03-22  What are the chances of an asteroid hitting ea...   
6    2025-03-19  Why are more people claiming disability benefits?   
7    2025-03-17                  How did lockdown impact children?   
8    2025-03-15           What is an IQ map and can we trust them?   
9    2025-03-12                DOGE, apples and irregular migrants   
10   2025-03-08  Is there really $500bn of Rare Earths in Ukraine?   
11   2025-03-05     Defence Spending, Rare Earths and Trunk Truths   
12   2025-03-01  Has the US really given Ukraine more aid than ...   
13   2025-02-22  Are

# Postprocessing

In [243]:
# Flatten all lists into a single list, then make a set
all_places = [place for place_list in result_df["Places"] if isinstance(place_list, list) for place in place_list]
unique_places = set(all_places)
unique_places

{'A.I.',
 'Accra',
 'Afghanistan',
 'Africa',
 'Africa Check',
 'African',
 'African American',
 'African Americans',
 'African-American',
 'Allwell',
 'America',
 'American',
 'Americans',
 'Amsterdam',
 'Anglican',
 'Antarctica',
 'Antisocial',
 'Arctic',
 'Argentina',
 'Armageddon',
 'Asia',
 'Asian',
 'AstraZeneca',
 'Atlantic',
 'Australia',
 'Australians',
 'Author',
 'Avengers',
 'Ayeisha',
 'Balochistan',
 'Barcelona',
 'Beijing',
 'Belgian',
 'Belgium',
 'Birmingham',
 'Black Boy Lane',
 'Black Hispanic',
 'Blue Planet',
 'Blue Planet II',
 'Blur',
 'Bolivia',
 'Bono',
 'Brazil',
 'Britain',
 'British',
 'British Muslim',
 'British Muslims',
 'Britons',
 'Brits',
 'Brussels',
 'Bulletin',
 'Calais',
 'California',
 'Calmac',
 'Cambodia',
 'Canada',
 'Canadian',
 'Cape Town',
 'Cardiologist',
 'Caribbean',
 'Carina',
 'Central Africa',
 'Central America',
 'Charlotte',
 'Chernobyl',
 'Childbirth Connection',
 'China',
 'Chinese',
 'Christian',
 'Christians',
 'Christie Aschwand

### manually go thorugh this list and 

In [249]:
# Your mapping and deletion list
to_be_replaced = {
    'Africa Check': 'Africa',
    'African American': 'US',
    'African Americans': 'US',
    'African-American': 'US',
    'British': 'Britain',
    'British Muslim': 'Britain',
    'British Muslims': 'Britain',
    'Britons': 'Britain',
    'Brits': 'Britain',
    'Dutch': 'Netherlands',
    'England?Labour': 'England',
    'English': 'England',
    'English Usage': 'England',
    'Insulation Britain': 'Britain',
    'Quartz Africa': 'Africa',
    'the Russian Invasion of Ukraine': 'Ukraine',
    'the Southern Scottish': 'Scotland'
}

# The full list from your notes
to_be_dropped = [
    'A.I.', 'Africa Check', 'African American', 'African Americans', 'African-American',
    'Allwell', 'Anglican', 'Antisocial', 'Armageddon', 'AstraZeneca', 'Author', 'Avengers',
    'Ayeisha', 'Black Boy Lane', 'Blue Planet', 'Blue Planet II', 'Blur', 'Bono', 'British',
    'British Muslim', 'British Muslims', 'Britons', 'Brits', 'Bulletin', 'Calmac', 'Cardiologist',
    'Carina', 'Charlotte', 'Childbirth Connection', 'Christian', 'Christians', 'Christie Aschwanden',
    'Complex', 'Conservative', 'Conservatives', 'Coronavirus', 'Counting', 'Covid', 'Cruyff', 'Darin',
    'Data', 'Delta', 'Democrat', 'Democratic', 'Democrats', 'Dictionnaire', 'Diwali', 'Dr', 'Dutch',
    'Easter', 'England?Labour', 'English', 'English Usage', 'Ethnologue', 'Eugene',
    'Europa League Draw', 'Getty', 'Googol2.8', 'Gulf', 'Historian', 'Hjalmar', 'Insulation Britain',
    'Kantar', 'Lottery', 'MRP', 'Marina Adshade', 'Marina Ashdade', 'Mathematician', 'Mediazona',
    'Melville', 'Mersenne', 'Nazi', 'Netflix', 'Newton', 'Norovirus', 'Ortakoy', 'Paradise',
    'Pythagoras', 'Qanon', 'Quartz Africa', 'Republican', 'Robots', 'Samantha Burgess', 'Singh',
    'Spitfire', 'Spotify', 'Stand', 'Strid', 'The Coastline ParadoxWhy', 'Trumpton', 'Us',
    'Vaccines', 'Videos', 'covid-19', 'inequalityPoliticians', 'needs?Are', 'non-Hodgkin', 'n’t',
    'obese', 'the Russian Invasion of Ukraine', 'the United States Agency for International Development',
    'the Southern Scottish'
]

# Create the clean version
def clean_places(place_list):
    if not isinstance(place_list, list):
        return []
    cleaned = []
    for place in place_list:
        if place in to_be_replaced:
            cleaned.append(to_be_replaced[place])
        elif place not in to_be_replaced and place not in to_be_dropped:
            cleaned.append(place)
        # else: it's in raw_notes and not in the mapping => drop it
    return list(set(cleaned))  # Optional: deduplicate

# Apply to DataFrame
result_df['Places_clean'] = result_df['Places'].apply(clean_places)


In [254]:
result_df.to_csv('full_dataset_cleaned.csv', index=False)

# Geocoding

In [5]:
# result_df = pd.read_csv('full_dataset_cleaned.csv')

In [31]:
result_df_sample = result_df[1:15]
result_df_sample

Unnamed: 0,Date,Title,Description,Duration,Links,Places,Places_clean
1,2025-04-04,Trump tariffs: All about the deficits,US President Donald Trump has announced sweepi...,9 minutes,https://www.bbc.co.uk/programmes/p0l2j7d1,"['US', 'US']",['US']
2,2025-04-02,Is one in four people in the UK disabled?,"Donald Trump is raising tariffs on Canada, but...",27 minutes,https://www.bbc.co.uk/programmes/p0l1qbfb,"['Canada', 'UK', 'UK']","['UK', 'Canada']"
3,2025-03-29,What’s Trump’s problem with Canada?,"Neighbours, everybody needs good neighbours, a...",9 minutes,https://www.bbc.co.uk/programmes/p0l1069n,"['US', 'Canada', 'US', 'Canada']","['Canada', 'US']"
4,2025-03-26,Could a 2% wealth tax raise £24bn?,Some Labour politicians have been calling for ...,29 minutes,https://www.bbc.co.uk/programmes/p0l09td8,"['UK', 'Europe', 'Christian', 'Charlotte']","['UK', 'Europe']"
5,2025-03-22,What are the chances of an asteroid hitting ea...,"On 27 December 2024, astronomers spotted an as...",9 minutes,https://www.bbc.co.uk/programmes/p0kzlsrg,[],[]
6,2025-03-19,Why are more people claiming disability benefits?,More working age people are claiming disabilit...,29 minutes,https://www.bbc.co.uk/programmes/p0kyydsp,"['UK', 'Russia']","['UK', 'Russia']"
7,2025-03-17,How did lockdown impact children?,"In March 2020, the covid pandemic forced the U...",42 minutes,https://www.bbc.co.uk/programmes/p0ky87qk,['UK'],['UK']
8,2025-03-15,What is an IQ map and can we trust them?,You may have seen a map circulated on social m...,9 minutes,https://www.bbc.co.uk/programmes/p0ky8xm3,"['Africa', 'Charlotte']",['Africa']
9,2025-03-12,"DOGE, apples and irregular migrants",It’s been 12 weeks since President Trump annou...,29 minutes,https://www.bbc.co.uk/programmes/p0kxs3ql,"['New Zealand', 'British', 'Charlotte']","['New Zealand', 'Britain']"
10,2025-03-08,Is there really $500bn of Rare Earths in Ukraine?,As part of the fast-moving argument over US mi...,9 minutes,https://www.bbc.co.uk/programmes/p0kwvfrl,"['US', 'Ukraine', 'US', 'Ukraine']","['Ukraine', 'US']"


In [9]:
# Load addresses from CSV
addresses = result_df_sample['Places_clean'].tolist()

# Function to geocode address using Nominatim API
def geocode_address(address):
    url = 'https://nominatim.openstreetmap.org/search'
    headers = {'User-Agent': 'Nicolas'}
    params = {'q': address, 'format': 'json'}
    
    response = requests.get(url, headers=headers, params=params)
    if response.status_code == 200:
        results = response.json()
        if results:
            return results[0]['lat'], results[0]['lon']  # Return the latitude and longitude of the first result
    return None, None  # Return None if no results or an error occurred

# Create a list to hold geocoded results
geocoded_addresses = []

# Geocode each address
for address in tqdm(addresses):
    lat, lon = geocode_address(address)
    geocoded_addresses.append({'address': address, 'latitude': lat, 'longitude': lon})
    time.sleep(1)  # Sleep to respect Nominatim's usage policy

# Convert results to a DataFrame
geocoded_df = pd.DataFrame(geocoded_addresses)

# Optionally, save the geocoded results to a new CSV file
geocoded_df.to_csv('geocoded_locations.csv', index=False)

print('Geocoding complete. Results saved to geocoded_addresses.csv.')


100%|█████████████████████████████████████████████████████████████████████████████| 14/14 [00:19<00:00,  1.39s/it]

Geocoding complete. Results saved to geocoded_addresses.csv.





In [53]:
import geopandas as gpd

# Load the world country geometries
world = gpd.read_file("ne_10m_admin_0_countries/ne_10m_admin_0_countries.shp")
# Add a lowercase version of country names for matching
world["name_lower"] = world["SOVEREIGNT"].str.lower()

# Check what's in it
print(world.columns)

Index(['featurecla', 'scalerank', 'LABELRANK', 'SOVEREIGNT', 'SOV_A3',
       'ADM0_DIF', 'LEVEL', 'TYPE', 'TLC', 'ADMIN',
       ...
       'FCLASS_ID', 'FCLASS_PL', 'FCLASS_GR', 'FCLASS_IT', 'FCLASS_NL',
       'FCLASS_SE', 'FCLASS_BD', 'FCLASS_UA', 'geometry', 'name_lower'],
      dtype='object', length=170)


In [61]:
# Apply alias map
alias_map = {
    'us': 'united states',
    'uk': 'united kingdom',
    'britain': 'united kingdom',
    'russia': 'russian federation',
    'gaza': 'palestine',
    'korea': 'south korea',
    # Add more aliases if needed
}

normalized_places = set(alias_map.get(place) for place in all_places)