# Falcon 9 & Falcon Heavy Launches - Web Scraping
**Author:** Muhammad Munawar Shahzad 

**Date:** August 2025  
**Project:** IBM Applied Data Science Capstone – SpaceX Falcon 9
**Author:** Muhammad Munawar Shahzad 


**Repository:** `falcon9_project`  


- Learn how to perform web scraping using Python's `requests` and `BeautifulSoup`.
- Extract launch data from Wikipedia.
- Store the extracted data in a structured format (CSV) for later analysis.

In [1]:
# Step 1: Import necessary libraries for web scraping and data handling
import requests                      # To fetch HTML content from the web
from bs4 import BeautifulSoup        # To parse and navigate HTML
import pandas as pd                  # To store and clean data in a tabular format

# Display a message to confirm imports
print("Libraries imported successfully.")


Libraries imported successfully.


We chose the Wikipedia page for "List of Falcon 9 and Falcon Heavy launches" because:
- It contains comprehensive, up-to-date launch records.
- Wikipedia pages are publicly accessible and have structured tables.
- The data can be scraped without requiring authentication.

In [2]:
# Step 2: Define the target URL
url = "https://en.wikipedia.org/wiki/List_of_Falcon_9_and_Falcon_Heavy_launches"

# Step 3: Fetch the HTML content using requests
response = requests.get(url)

# Step 4: Check status code to ensure the request was successful
print("Status Code:", response.status_code)

# Step 5: Store HTML content in a variable
html_content = response.text

Status Code: 200


In [3]:
# Step 6: Create a BeautifulSoup object to parse the HTML
soup = BeautifulSoup(html_content, 'html.parser')

# Step 7: Find all tables on the page
html_tables = soup.find_all('table', {"class": "wikitable"})

# Step 8: Check number of tables found
print("Number of tables found:", len(html_tables))

Number of tables found: 5


In [4]:
# Step 9: Extract column names from the first table
first_launch_table = html_tables[0]

column_names = []
for th in first_launch_table.find_all('th'):
    col_name = th.text.strip()
    if col_name:
        column_names.append(col_name)

print("Column Names:", column_names)

# Step 10: Extract row data
table_rows = []
for tr in first_launch_table.find_all('tr')[1:]:  # Skip header row
    cells = tr.find_all(['td', 'th'])
    row = [cell.text.strip() for cell in cells]
    if row:
        table_rows.append(row)

print("Number of rows extracted:", len(table_rows))

Column Names: ['Flight No.', 'Date andtime (UTC)', 'Version,booster[i]', 'Launchsite', 'Payload[j]', 'Payload mass', 'Orbit', 'Customer', 'Launchoutcome', 'Boosterlanding', '286', '287', '288', '289', '290', '291', '292', '293', '294', '295', '296', '297', '298', '299', '300', '301', '302', '303', '304', '305', '306', '307', '308', '309', '310', '311', '312', '313', '314', '315', '316', '317', '318', '319', '320', '321', '322', '323', '324', '325', '326', '327', '328', '329', '330', '331', '332', '333', '334', '335', '336', '337', '338', '339', '340', '341', '342', '343', '344', '345', '346', '347', '348', '349', 'FH 10', '350', '351', '352', '353', '354', '355', '356', '357', '358', '359', '360', '361', '362', '363', '364', '365', '366', '367', '368', '369', '370', '371', '372', '373', '374', '375', '376', '377', '378', '379', 'FH 11', '380', '381', '382', '383', '384', '385', '386', '387', '388', '389', '390', '391', '392', '393', '394', '395', '396', '397', '398', '399', '400', '401

In [5]:
import pandas as pd
import requests
from bs4 import BeautifulSoup

url = "https://en.wikipedia.org/wiki/List_of_Falcon_9_and_Falcon_Heavy_launches"
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')

# Step 1: Find all tables with class 'wikitable'
tables = soup.find_all('table', {"class": "wikitable"})

# Step 2: Select first table (you can loop later if multiple needed)
first_table = tables[0]

# Step 3: Extract header only from the first row
header_row = first_table.find('tr')
column_names = [th.get_text(strip=True) for th in header_row.find_all('th')]

# Step 4: Extract all rows after the header
table_rows = []
for tr in first_table.find_all('tr')[1:]:
    cells = tr.find_all(['td', 'th'])
    row = [cell.get_text(strip=True) for cell in cells]
    if row and len(row) == len(column_names):  # only keep rows with correct column length
        table_rows.append(row)

# Step 5: Create DataFrame safely
df = pd.DataFrame(table_rows, columns=column_names)

# Step 6: Clean and show
df.replace('\n', ' ', regex=True, inplace=True)
print(df.head())


  Flight No.         Date andtime (UTC) Version,booster[i]  \
0        286   January 3, 202403:44[24]        F9B5B1082‑1   
1        287   January 3, 202423:04[25]       F9B5B1076‑10   
2        288   January 7, 202422:35[29]       F9B5B1067‑16   
3        289  January 14, 202408:59[31]       F9B5B1061‑18   
4        290  January 15, 202401:52[32]       F9B5B1073‑12   

              Launchsite                         Payload[j]  \
0      Vandenberg,SLC‑4E  Starlink:Group 7-9(22 satellites)   
1  Cape Canaveral,SLC‑40                            Ovzon-3   
2  Cape Canaveral,SLC‑40  Starlink:Group 6-35(23satellites)   
3      Vandenberg,SLC‑4E  Starlink:Group 7-10(22satellites)   
4  Cape Canaveral,SLC‑40  Starlink:Group 6-37(23satellites)   

             Payload mass Orbit Customer Launchoutcome    Boosterlanding  
0  ~16,800 kg (37,000 lb)   LEO   SpaceX       Success  Success (OCISLY)  
1     1,800 kg (4,000 lb)   GTO    Ovzon       Success    Success (LZ‑1)  
2  ~17,100 kg (37,700 l

In [6]:
# -------------------------------------------------
# Step 14: Save cleaned data to processed folder
# -------------------------------------------------
# Save scraped data to raw folder
# -------------------------------------------------
from pathlib import Path

# Define project folders (adjust if not already defined)
PROJECT_ROOT = Path.cwd().parents[0]  # agar notebook 'notebooks/' me hai
RAW_DIR = PROJECT_ROOT / "data" / "raw"
RAW_DIR.mkdir(parents=True, exist_ok=True)

# Define file path
RAW_DATA_PATH = RAW_DIR / "falcon9_web_scraped.csv"

# Save DataFrame to raw folder
df.to_csv(RAW_DATA_PATH, index=False)
print(f"💾 Raw web scraping data saved to: {RAW_DATA_PATH}")

💾 Raw web scraping data saved to: d:\Projects\falcon9_project\data\raw\falcon9_web_scraped.csv


- Fetched HTML content from Wikipedia.
- Parsed and navigated HTML using BeautifulSoup.
- Extracted launch table data.
- Cleaned and structured the data using pandas.
- Saved the data to CSV for later use in analysis.

In the next notebook, we will:
- Load the cleaned CSV file.
- Perform exploratory data analysis (EDA).
- Create visualizations for Falcon 9 & Falcon Heavy launches.

- Wikipedia: [List of Falcon 9 and Falcon Heavy launches](https://en.wikipedia.org/wiki/List_of_Falcon_9_and_Falcon_Heavy_launches)
- BeautifulSoup Documentation: https://www.crummy.com/software/BeautifulSoup/
- Pandas Documentation: https://pandas.pydata.org/