# Analyst Builder: Python Programming for Beginners - Project 5

## Web Scraper + Regular Expression Project
- **URL**: http://analytictech.com/mb021/mlk.htm
- **Web Scraping**: Downloading and parsing MLK's speech HTML
- **Text Cleaning**: Removing punctuation, converting to lowercase, and splitting into words
- **Word Frequency Analysis**: Counting word occurrences and storing results in a Google Drive CSV file

In [None]:
# Setup

from bs4 import BeautifulSoup
import requests
import re
import pandas as pd
from google.colab import drive
import os

drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
url = 'http://analytictech.com/mb021/mlk.htm'

page = requests.get(url)
print(requests.get(url), '\n')  # Request and print page response

soup = BeautifulSoup(page.text, 'html')  # Parse HTML content
print('MLK Speech:\n', soup.prettify())  # Display formatted HTML for inspection

<Response [200]> 

MLK Speech:
 <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">
<html>
 <head>
  <meta content="text/html; charset=utf-8" http-equiv="Content-Type"/>
  <meta content="Microsoft FrontPage 4.0" name="GENERATOR"/>
  <title>
   Martin Luther King Jr.'s 1962 Speech
  </title>
 </head>
 <body alink="#FF0000" bgcolor="#FFFFFF" link="#0000FF" text="#000000" vlink="#551A8B">
  <h1>
   <font size="5">
    Transcript of speech by
   </font>
   <br/>
   Dr. Martin Luther King Jr.
   <br/>
   August 28, 1963. Lincoln Memorial in Washington D.C.
  </h1>
  <hr color="#008080" noshade="" size="5"/>
  <p>
   I am happy to join with you today in what will go down in
history as the greatest demonstration for freedom in the history
of our nation.
  </p>
  <p>
   Five score years ago a great American in whose symbolic shadow
we stand today signed the Emancipation Proclamation. This
momentous decree came as a great beckoning light of hope to
millions of Negro slaves who had been seared i

In [None]:
mlk_speech_1 = soup.find_all('p')  # Find all paragraph tags containing speech text
mlk_speech_1

[<p>I am happy to join with you today in what will go down in
 history as the greatest demonstration for freedom in the history
 of our nation. </p>,
 <p>Five score years ago a great American in whose symbolic shadow
 we stand today signed the Emancipation Proclamation. This
 momentous decree came as a great beckoning light of hope to
 millions of Negro slaves who had been seared in the flames of
 withering injustice. It came as a joyous daybreak to end the long
 night of their captivity. </p>,
 <p>But one hundred years later the Negro is still not free. One
 hundred years later the life of the Negro is still sadly crippled
 by the manacles of segregation and the chains of discrimination. </p>,
 <p>One hundred years later the Negro lives on a lonely island of
 poverty in the midst of a vast ocean of material prosperity. </p>,
 <p>One hundred years later the Negro is still languishing in the
 comers of American society and finds himself in exile in his own
 land. </p>,
 <p>We all have c

In [None]:
mlk_speech_2 = [p.text for p in mlk_speech_1]  # Extract text from each paragraph tag
mlk_speech_2

['I am happy to join with you today in what will go down in\r\nhistory as the greatest demonstration for freedom in the history\r\nof our nation. ',
 'Five score years ago a great American in whose symbolic shadow\r\nwe stand today signed the Emancipation Proclamation. This\r\nmomentous decree came as a great beckoning light of hope to\r\nmillions of Negro slaves who had been seared in the flames of\r\nwithering injustice. It came as a joyous daybreak to end the long\r\nnight of their captivity. ',
 'But one hundred years later the Negro is still not free. One\r\nhundred years later the life of the Negro is still sadly crippled\r\nby the manacles of segregation and the chains of discrimination. ',
 'One hundred years later the Negro lives on a lonely island of\r\npoverty in the midst of a vast ocean of material prosperity. ',
 'One hundred years later the Negro is still languishing in the\r\ncomers of American society and finds himself in exile in his own\r\nland. ',
 "We all have come

In [None]:
mlk_speech_3 = ' '.join(mlk_speech_2)  # Combine paragraphs into a single string
mlk_speech_3

'I am happy to join with you today in what will go down in\r\nhistory as the greatest demonstration for freedom in the history\r\nof our nation.  Five score years ago a great American in whose symbolic shadow\r\nwe stand today signed the Emancipation Proclamation. This\r\nmomentous decree came as a great beckoning light of hope to\r\nmillions of Negro slaves who had been seared in the flames of\r\nwithering injustice. It came as a joyous daybreak to end the long\r\nnight of their captivity.  But one hundred years later the Negro is still not free. One\r\nhundred years later the life of the Negro is still sadly crippled\r\nby the manacles of segregation and the chains of discrimination.  One hundred years later the Negro lives on a lonely island of\r\npoverty in the midst of a vast ocean of material prosperity.  One hundred years later the Negro is still languishing in the\r\ncomers of American society and finds himself in exile in his own\r\nland.  We all have come to this hallowed spo

In [None]:
mlk_speech_4 = mlk_speech_3.replace('\r\n', ' ')  # Replace newline characters with spaces
mlk_speech_4

'I am happy to join with you today in what will go down in history as the greatest demonstration for freedom in the history of our nation.  Five score years ago a great American in whose symbolic shadow we stand today signed the Emancipation Proclamation. This momentous decree came as a great beckoning light of hope to millions of Negro slaves who had been seared in the flames of withering injustice. It came as a joyous daybreak to end the long night of their captivity.  But one hundred years later the Negro is still not free. One hundred years later the life of the Negro is still sadly crippled by the manacles of segregation and the chains of discrimination.  One hundred years later the Negro lives on a lonely island of poverty in the midst of a vast ocean of material prosperity.  One hundred years later the Negro is still languishing in the comers of American society and finds himself in exile in his own land.  We all have come to this hallowed spot to remind America of the fierce ur

In [None]:
mlk_speech_5 = re.sub(r'[^\w\s]', '', mlk_speech_4)  # Remove punctuation
mlk_speech_5

'I am happy to join with you today in what will go down in history as the greatest demonstration for freedom in the history of our nation  Five score years ago a great American in whose symbolic shadow we stand today signed the Emancipation Proclamation This momentous decree came as a great beckoning light of hope to millions of Negro slaves who had been seared in the flames of withering injustice It came as a joyous daybreak to end the long night of their captivity  But one hundred years later the Negro is still not free One hundred years later the life of the Negro is still sadly crippled by the manacles of segregation and the chains of discrimination  One hundred years later the Negro lives on a lonely island of poverty in the midst of a vast ocean of material prosperity  One hundred years later the Negro is still languishing in the comers of American society and finds himself in exile in his own land  We all have come to this hallowed spot to remind America of the fierce urgency of

In [None]:
mlk_speech_6 = mlk_speech_5.lower()  # Convert text to lowercase for consistent word counting
mlk_speech_6

'i am happy to join with you today in what will go down in history as the greatest demonstration for freedom in the history of our nation  five score years ago a great american in whose symbolic shadow we stand today signed the emancipation proclamation this momentous decree came as a great beckoning light of hope to millions of negro slaves who had been seared in the flames of withering injustice it came as a joyous daybreak to end the long night of their captivity  but one hundred years later the negro is still not free one hundred years later the life of the negro is still sadly crippled by the manacles of segregation and the chains of discrimination  one hundred years later the negro lives on a lonely island of poverty in the midst of a vast ocean of material prosperity  one hundred years later the negro is still languishing in the comers of american society and finds himself in exile in his own land  we all have come to this hallowed spot to remind america of the fierce urgency of

In [None]:
mlk_speech_7 = re.split(r'\s+', mlk_speech_6)  # Split text into individual words
mlk_speech_7

['i',
 'am',
 'happy',
 'to',
 'join',
 'with',
 'you',
 'today',
 'in',
 'what',
 'will',
 'go',
 'down',
 'in',
 'history',
 'as',
 'the',
 'greatest',
 'demonstration',
 'for',
 'freedom',
 'in',
 'the',
 'history',
 'of',
 'our',
 'nation',
 'five',
 'score',
 'years',
 'ago',
 'a',
 'great',
 'american',
 'in',
 'whose',
 'symbolic',
 'shadow',
 'we',
 'stand',
 'today',
 'signed',
 'the',
 'emancipation',
 'proclamation',
 'this',
 'momentous',
 'decree',
 'came',
 'as',
 'a',
 'great',
 'beckoning',
 'light',
 'of',
 'hope',
 'to',
 'millions',
 'of',
 'negro',
 'slaves',
 'who',
 'had',
 'been',
 'seared',
 'in',
 'the',
 'flames',
 'of',
 'withering',
 'injustice',
 'it',
 'came',
 'as',
 'a',
 'joyous',
 'daybreak',
 'to',
 'end',
 'the',
 'long',
 'night',
 'of',
 'their',
 'captivity',
 'but',
 'one',
 'hundred',
 'years',
 'later',
 'the',
 'negro',
 'is',
 'still',
 'not',
 'free',
 'one',
 'hundred',
 'years',
 'later',
 'the',
 'life',
 'of',
 'the',
 'negro',
 'is',
 '

In [None]:
mlk_df = pd.DataFrame(mlk_speech_7).value_counts()  # Count word occurrences in a DataFrame
mlk_df

Unnamed: 0_level_0,count
0,Unnamed: 1_level_1
the,54
of,49
to,29
and,27
a,20
...,...
jews,1
joyous,1
judged,1
land,1


In [None]:
new_folder_path = '/content/drive/MyDrive/Analyst Builder/Web Scraper + Regex Project' # Create a folder in Google Drive to save results
os.makedirs(new_folder_path, exist_ok=True)

mlk_df.to_csv('/content/drive/MyDrive/Analyst Builder/Web Scraper + Regex Project/mlk_speech_counts.csv', header = ['Counts'], index_label = 'Word') # Save word counts to CSV in Google Drive