# Web Scraper for Friends Scripts

Episodes are found on Crazy For Friends, a Friends fansite.

[Crazy For Friends](https://www.livesinabox.com/friends/scripts.shtml)

*Disclaimer for scripts: This project is in no way associated with Friends, Warner Bros, NBC or Bright/Kauffman/Crane Productions. This project is for educational purposes only*

### Install Dependencies

In [21]:
!pip install requests beautifulsoup4




### Import Libraries

In [22]:
import requests
import os
from bs4 import BeautifulSoup

### Fetch the Webpage

In [23]:
# Base URL of the website
base_url = "https://www.livesinabox.com/friends/"
# URL of the page containing the script links
scripts_page_url = "https://www.livesinabox.com/friends/scripts.shtml"

# Send a request to the website
response = requests.get(scripts_page_url)
soup = BeautifulSoup(response.content, 'html.parser')

### Folder to save script files

In [24]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [27]:
!ls "/content/drive/My Drive/Colab/friends"

data  friends_script_generation.ipynb  web_scraper.ipynb


In [44]:
# Directory to save script files
scripts_dir = "/content/drive/My Drive/Colab/friends/scripts_data"
os.makedirs(scripts_dir, exist_ok=True)  # This will create the directory if it doesn't exist

### Locate the Links to Scripts

The scripts on the webpage are likely linked through anchor tags (<a>). You'll need to identify how these links are structured in the HTML and write code to extract them. For example:

In [45]:
# Find all the <a> tags that contain the links to the scripts
script_links = soup.find_all('a', href=True)

### Extract and Visit Each Script Link

You will iterate over the extracted links, visit each page, and scrape the script text. This can be complex, as you'll need to identify the HTML structure of each script page.

In [None]:
## og one
# Loop through all the found <a> tags to extract and save the scripts
for link in script_links:
    href = link.get('href')
    if href and 'season' in href:
        # Construct the full URL for the script page
        script_url = base_url + href
        # Fetch the script page
        script_response = requests.get(script_url)
        script_soup = BeautifulSoup(script_response.content, 'html.parser')

        # The script title is in the <h1> tag
        title_tag = script_soup.find('h1')
        title_text = title_tag.get_text(strip=True) if title_tag else 'Untitled'
        filename = f"{title_text}.txt".replace('/', ' ').replace(' ', '_')
        filepath = os.path.join(scripts_dir, filename)

        # Find all paragraph tags and concatenate their text
        paragraphs = script_soup.find_all('p')
        script_text = '\n'.join(paragraph.get_text(strip=True) for paragraph in paragraphs)

        # Write the script text to a file
        with open(filepath, 'w', encoding='utf-8') as file:
            file.write(script_text)
        print(f"Saved script: {filename}")

In [48]:
# Loop through all the found <a> tags to extract and save the scripts
for link in script_links:
    href = link.get('href')
    if href and 'season' in href:
        script_url = base_url + href
        script_response = requests.get(script_url)
        script_soup = BeautifulSoup(script_response.content, 'html.parser')

        title_tag = script_soup.find('h1')
        if title_tag:
            title_text = title_tag.get_text(strip=True)
            if title_text.startswith("The One"):
                filename = f"{title_text}.txt".replace('/', ' ').replace(' ', '_')
                filepath = os.path.join(scripts_dir, filename)

                paragraphs = script_soup.find_all('p')
                script_text = '\n'.join(paragraph.get_text(strip=True) for paragraph in paragraphs)

                with open(filepath, 'w', encoding='utf-8') as file:
                    file.write(script_text)
                print(f"Saved script: {filename}")


Saved script: The_One_Where_Monica_Gets_a_New_Roommate_(The_Pilot-The_Uncut_Version).txt
Saved script: The_One_With_the_Sonogram_at_the_End.txt
Saved script: The_One_With_the_Thumb.txt
Saved script: The_One_With_George_Stephanopoulos.txt
Saved script: The_One_With_the_East_German_Laundry_Detergent.txt
Saved script: The_One_With_the_Butt.txt
Saved script: The_One_With_the_Blackout.txt
Saved script: The_One_Where_Nana_Dies_Twice.txt
Saved script: The_One_Where_Underdog_Gets_Away.txt
Saved script: The_One_With_the_Monkey.txt
Saved script: The_One_With_Mrs._Bing.txt
Saved script: The_One_With_the_Dozen_Lasagnas.txt
Saved script: The_One_With_the_Boobies.txt
Saved script: The_One_With_the_Candy_Hearts.txt
Saved script: The_One_With_the_Stoned_Guy.txt
Saved script: The_One_With_Two_Parts,_part_1.txt
Saved script: The_One_With_Two_Parts,_Part_2.txt
Saved script: The_One_With_All_The_Poker.txt
Saved script: The_One_Where_the_Monkey_Gets_Away.txt
Saved script: The_One_With_the_Evil_Orthodontist

And now we should have all of our scripts!

In [49]:
# Checking our data folder
!ls "/content/drive/My Drive/Colab/friends/scripts_data"

'The_One_After_"I_Do".txt'
 The_One_After_Ross_Says_Rachel.txt
 The_One_After_the_Superbowl.txt
 The_One_After_Vegas.txt
 The_One_At_The_Beach.txt
 The_One_Hundredth.txt
 The_One_In_Massapequa.txt
 The_One_In_Vegas.txt
 The_One_That_Could_Have_Been.txt
 The_One_The_Last_Night.txt
 The_One_The_Morning_After.txt
 The_One_Where_Chandler_Can’t_Cry.txt
 The_One_Where_Chandler_Can’t_Remember_Which_Sister.txt
 The_One_Where_Chandler_Crosses_a_Line.txt
 The_One_Where_Chandler_Doesn’t_Like_Dogs.txt
 The_One_Where_Chandler_Takes_a_Bath.txt
 The_One_Where_Dr._Remoray_Dies.txt
 The_One_Where_Eddie_Moves_In.txt
"The_One_Where_Eddie_Won't_Go.txt"
 The_One_Where_Emma_Cries.txt
 The_One_Where_Everyone_Finds_Out.txt
 The_One_Where_Joey_Dates_Rachel.txt
 The_One_Where_Joey_Loses_His_Insurance.txt
 The_One_Where_Joey_Moves_Out.txt
 The_One_Where_Joey_Tells_Rachel.txt
 The_One_Where_Monica_and_Richard_Are_Friends.txt
'The_One_Where_Monica_Gets_a_New_Roommate_(The_Pilot-The_Uncut_Version).txt'
 The_One_Whe