## Import Packages

In [46]:
import requests
from bs4 import BeautifulSoup
import pandas as pd
import openai

## Scrapping Website

In [47]:
# Base URL and years of interest
base_url = "https://www.pmo.gov.sg/Newsroom/National-Day-Rally-"
years = range(2004, 2024)  # 2004 to 2023

In [48]:
# Prepare a list to store dictionaries for each year and its speech content
speeches = []

# Iterate over the years and scrape the content
for year in years:
    url = f"{base_url}{year}"
    response = requests.get(url)
    response.raise_for_status()  # Raises an error for bad status codes
    
    # Parse the content with BeautifulSoup
    soup = BeautifulSoup(response.text, 'html.parser')
    
    # Find all <p> tags and exclude <p class="summary"> tags
    paragraphs = soup.find_all('p', class_=lambda x: x != 'summary')
    
    # Extract text from each paragraph
    speech_content = "\n".join([para.get_text(strip=True) for para in paragraphs])
    
    # Add the year, speech content, and URL to the list
    speeches.append({'Year': year, 'Speech': speech_content, 'URL': url})

In [49]:
# Convert the list of dictionaries to a DataFrame
df = pd.DataFrame(speeches)
df

Unnamed: 0,Year,Speech,URL
0,2004,\nThe first part of the English speech starts ...,https://www.pmo.gov.sg/Newsroom/National-Day-R...
1,2005,"\nFriends and fellow Singaporeans, 40 years ag...",https://www.pmo.gov.sg/Newsroom/National-Day-R...
2,2006,"\nFriends and fellow Singaporeans, my focus to...",https://www.pmo.gov.sg/Newsroom/National-Day-R...
3,2007,"\nFriends and fellow Singaporeans, Singapore i...",https://www.pmo.gov.sg/Newsroom/National-Day-R...
4,2008,"\nFriends and fellow Singaporeans, good evenin...",https://www.pmo.gov.sg/Newsroom/National-Day-R...
5,2009,"\nFriends and fellow Singaporeans, this is a s...",https://www.pmo.gov.sg/Newsroom/National-Day-R...
6,2010,"\nFellow Singaporeans, good evening.\nOur econ...",https://www.pmo.gov.sg/Newsroom/National-Day-R...
7,2011,"\nFriends and fellow Singaporeans, good evenin...",https://www.pmo.gov.sg/Newsroom/National-Day-R...
8,2012,\nFriends and fellow Singaporeans\nWe have tra...,https://www.pmo.gov.sg/Newsroom/National-Day-R...
9,2013,\nPM Lee delivered his 2013 National Day Rally...,https://www.pmo.gov.sg/Newsroom/National-Day-R...


Unable to scrape for year 2020, 2021 and 2022
- For 2021 and 2022, will need to add "-English" at the end of URL
- The 2020 National Day Rally was not held due to the COVID-19 pandemic. 2020 National Day Message will be scrapped instead

In [50]:
# Add the exception URLs for 2020, 2021, and 2022
exceptions = {
    2021: "https://www.pmo.gov.sg/Newsroom/National-Day-Rally-2021-English",
    2022: "https://www.pmo.gov.sg/Newsroom/National-Day-Rally-2022-English",
    2020: "https://www.pmo.gov.sg/Newsroom/National-Day-Message-2020"
}

# Iterate over the exceptional years and update the 'Speech' content in 'speeches' list
for year, url in exceptions.items():
    response = requests.get(url)
    response.raise_for_status()
    soup = BeautifulSoup(response.text, 'html.parser')
    paragraphs = soup.find_all('p', class_=lambda x: x != 'summary')
    speech_content = "\n".join([para.get_text(strip=True) for para in paragraphs])
    
    # Find the index for the specific year in the speeches list
    for speech_dict in speeches:
        if speech_dict['Year'] == year:
            # Update the 'Speech' value and 'URL' for that year
            speech_dict['Speech'] = speech_content
            speech_dict['URL'] = url  # Update the URL in case of exceptions
            break

In [51]:
df = pd.DataFrame(speeches)
df

Unnamed: 0,Year,Speech,URL
0,2004,\nThe first part of the English speech starts ...,https://www.pmo.gov.sg/Newsroom/National-Day-R...
1,2005,"\nFriends and fellow Singaporeans, 40 years ag...",https://www.pmo.gov.sg/Newsroom/National-Day-R...
2,2006,"\nFriends and fellow Singaporeans, my focus to...",https://www.pmo.gov.sg/Newsroom/National-Day-R...
3,2007,"\nFriends and fellow Singaporeans, Singapore i...",https://www.pmo.gov.sg/Newsroom/National-Day-R...
4,2008,"\nFriends and fellow Singaporeans, good evenin...",https://www.pmo.gov.sg/Newsroom/National-Day-R...
5,2009,"\nFriends and fellow Singaporeans, this is a s...",https://www.pmo.gov.sg/Newsroom/National-Day-R...
6,2010,"\nFellow Singaporeans, good evening.\nOur econ...",https://www.pmo.gov.sg/Newsroom/National-Day-R...
7,2011,"\nFriends and fellow Singaporeans, good evenin...",https://www.pmo.gov.sg/Newsroom/National-Day-R...
8,2012,\nFriends and fellow Singaporeans\nWe have tra...,https://www.pmo.gov.sg/Newsroom/National-Day-R...
9,2013,\nPM Lee delivered his 2013 National Day Rally...,https://www.pmo.gov.sg/Newsroom/National-Day-R...


In [64]:
print(df.loc[df['Year'] == 2021, 'Speech'].iloc[0])


My fellow Singaporeans
Good evening again
My last National Day Rally was two years ago.
Since then, COVID-19 has changed our world. Globally, it has taken millions of lives, sickened many more and disrupted countless jobs and businesses. In Singapore, each time it looks like we have beaten the virus, it breaks through in a different place and forces us to tighten up again.
But we have done better than many countries. We have kept our people safe and protected our livelihoods. I thank you all for your trust and cooperation. Your discipline and resilience have made all the difference in the fight against COVID-19.
I want to especially thank all those on the frontline, Singaporeans and non-Singaporeans, who have fought so hard, for so long. Some of you are here at the Rally. Others are joining us virtually. Welcome again and thank you all!
Many of you have gone beyond the call of duty. Like Aisha Abdul Rahman, a passenger service agent at Changi Airport. After passenger flights were shar

## Preprocessing

In [53]:
import spacy
import re

# Load English tokenizer, tagger, parser, NER and word vectors
nlp = spacy.load("en_core_web_sm")

# List of custom stopwords to add
custom_stopwords = [
    'singapore', 'singapores', 'singaporean', 'singaporeans',
    'people', 'year', 'one', 'also', 'say', 'good', 'government',
    'think', 'got', 'going', 'get', 'go', 'national', 'day',
    'rally', 'dont', 'thats'
]

# Add the custom stopwords to spaCy's default stopword list
for word in custom_stopwords:
    nlp.vocab[word].is_stop = True

In [54]:
# Function to preprocess text using spaCy
def preprocess_text_spacy(text):
    # Convert to lowercase and remove punctuation and numbers
    text = re.sub(r'[^\w\s]|[\d]', '', text.lower())
    
    # Parse the text with spaCy
    # This runs the entire pipeline
    doc = nlp(text)
    
    # Lemmatize and remove stopwords and words shorter than 2 characters
    tokens = [token.lemma_ for token in doc if not token.is_stop and len(token.text) > 2]
    
    # Join the tokens back into a string
    text = ' '.join(tokens).strip()
    return text

In [55]:
# Apply the preprocessing function to each speech
df['processed_speech'] = df['Speech'].apply(preprocess_text_spacy)

In [56]:
df

Unnamed: 0,Year,Speech,URL,processed_speech
0,2004,\nThe first part of the English speech starts ...,https://www.pmo.gov.sg/Newsroom/National-Day-R...,english speech start second english speech vie...
1,2005,"\nFriends and fellow Singaporeans, 40 years ag...",https://www.pmo.gov.sg/Newsroom/National-Day-R...,friend fellow year ago set uncertain future kn...
2,2006,"\nFriends and fellow Singaporeans, my focus to...",https://www.pmo.gov.sg/Newsroom/National-Day-R...,friend fellow focus tonight future rapidly cha...
3,2007,"\nFriends and fellow Singaporeans, Singapore i...",https://www.pmo.gov.sg/Newsroom/National-Day-R...,friend fellow thing look grow buzz confidence ...
4,2008,"\nFriends and fellow Singaporeans, good evenin...",https://www.pmo.gov.sg/Newsroom/National-Day-R...,friend fellow evening tonight start talk econo...
5,2009,"\nFriends and fellow Singaporeans, this is a s...",https://www.pmo.gov.sg/Newsroom/National-Day-R...,friend fellow significant anniversary selfgove...
6,2010,"\nFellow Singaporeans, good evening.\nOur econ...",https://www.pmo.gov.sg/Newsroom/National-Day-R...,fellow evening economy shake recession boom ha...
7,2011,"\nFriends and fellow Singaporeans, good evenin...",https://www.pmo.gov.sg/Newsroom/National-Day-R...,friend fellow evening general election team fr...
8,2012,\nFriends and fellow Singaporeans\nWe have tra...,https://www.pmo.gov.sg/Newsroom/National-Day-R...,friend fellow travel world know story date que...
9,2013,\nPM Lee delivered his 2013 National Day Rally...,https://www.pmo.gov.sg/Newsroom/National-Day-R...,lee deliver speech august institute technical ...


In [57]:
## Count Length of Speech

def count_words(speech_text):
    # Count the words in the modified speech text
    return len(speech_text.split())

# Apply the word counting function to the 'Speech' column
df['Word_Count'] = df['Speech'].apply(count_words)

In [58]:
df

Unnamed: 0,Year,Speech,URL,processed_speech,Word_Count
0,2004,\nThe first part of the English speech starts ...,https://www.pmo.gov.sg/Newsroom/National-Day-R...,english speech start second english speech vie...,21256
1,2005,"\nFriends and fellow Singaporeans, 40 years ag...",https://www.pmo.gov.sg/Newsroom/National-Day-R...,friend fellow year ago set uncertain future kn...,13872
2,2006,"\nFriends and fellow Singaporeans, my focus to...",https://www.pmo.gov.sg/Newsroom/National-Day-R...,friend fellow focus tonight future rapidly cha...,12696
3,2007,"\nFriends and fellow Singaporeans, Singapore i...",https://www.pmo.gov.sg/Newsroom/National-Day-R...,friend fellow thing look grow buzz confidence ...,14493
4,2008,"\nFriends and fellow Singaporeans, good evenin...",https://www.pmo.gov.sg/Newsroom/National-Day-R...,friend fellow evening tonight start talk econo...,15209
5,2009,"\nFriends and fellow Singaporeans, this is a s...",https://www.pmo.gov.sg/Newsroom/National-Day-R...,friend fellow significant anniversary selfgove...,13080
6,2010,"\nFellow Singaporeans, good evening.\nOur econ...",https://www.pmo.gov.sg/Newsroom/National-Day-R...,fellow evening economy shake recession boom ha...,14353
7,2011,"\nFriends and fellow Singaporeans, good evenin...",https://www.pmo.gov.sg/Newsroom/National-Day-R...,friend fellow evening general election team fr...,12600
8,2012,\nFriends and fellow Singaporeans\nWe have tra...,https://www.pmo.gov.sg/Newsroom/National-Day-R...,friend fellow travel world know story date que...,12328
9,2013,\nPM Lee delivered his 2013 National Day Rally...,https://www.pmo.gov.sg/Newsroom/National-Day-R...,lee deliver speech august institute technical ...,14870


In [59]:
import openai
import pandas as pd

# Initialize the OpenAI client
client = openai.OpenAI(api_key="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")

def summarize_speech(speech):
    # Send a request to the OpenAI API with the speech content and instructions for the summary
    response = client.chat.completions.create(
      model="gpt-4-turbo",
      messages=[
        {
          "role": "system",
          "content": ("Create a structured summary of the provided speech text. "
                      "Begin with an introductory paragraph of 2-3 sentences. "
                      "Then list the key themes, each followed by an elaboration of 2-3 sentences. "
                      "Conclude with a closing paragraph of 2-3 sentences. "
                      "Focus the summary on Singapore's developments and plans. "
                      "Ensure the language is professional and the summary does not exceed 300 words.")
        },
        {
          "role": "user",
          "content": speech
        }
      ],
      temperature=0,
      max_tokens=1000  # Adjust tokens if necessary to fit the 300 word limit
    )
    summary = response.choices[0].message.content
    return summary

# Apply the summarization function to each row in the DataFrame
df['Summary'] = df['Speech'].apply(summarize_speech)


In [60]:
df.head()

Unnamed: 0,Year,Speech,URL,processed_speech,Word_Count,Summary
0,2004,\nThe first part of the English speech starts ...,https://www.pmo.gov.sg/Newsroom/National-Day-R...,english speech start second english speech vie...,21256,The speech delivered by the speaker focuses on...
1,2005,"\nFriends and fellow Singaporeans, 40 years ag...",https://www.pmo.gov.sg/Newsroom/National-Day-R...,friend fellow year ago set uncertain future kn...,13872,The speech delivered by the speaker focuses on...
2,2006,"\nFriends and fellow Singaporeans, my focus to...",https://www.pmo.gov.sg/Newsroom/National-Day-R...,friend fellow focus tonight future rapidly cha...,12696,The speech focuses on Singapore's strategic pl...
3,2007,"\nFriends and fellow Singaporeans, Singapore i...",https://www.pmo.gov.sg/Newsroom/National-Day-R...,friend fellow thing look grow buzz confidence ...,14493,The speech delivered focuses on Singapore's on...
4,2008,"\nFriends and fellow Singaporeans, good evenin...",https://www.pmo.gov.sg/Newsroom/National-Day-R...,friend fellow evening tonight start talk econo...,15209,The speech delivered by the speaker primarily ...


In [62]:
print(df.loc[df['Year'] == 2023, 'Summary'].iloc[0])

The speech delivered by the leader of Singapore addresses several critical themes concerning the nation's past achievements, current challenges, and future directions. The speech is structured to reflect on the resilience of Singaporeans, outline the ongoing and upcoming economic and social strategies, and emphasize the importance of community and government support systems.

**Key Themes:**

1. **Post-COVID Recovery and Resilience:**
   The leader praises the resilience and unity of Singaporeans in overcoming the COVID-19 pandemic, highlighting the nation's strength and preparedness for future challenges. The successful handling of the pandemic has set a precedent for managing other national crises.

2. **Geopolitical and Economic Challenges:**
   The speech outlines the current geopolitical tensions, particularly between the US and China, and their implications for Singapore. The leader also discusses the economic uncertainties, including the impact of protectionism and global warmin

In [82]:
df.to_csv('output.csv', index=False)