## Objectives

Web scrap Falcon 9 launch records with `BeautifulSoup`:

*   Extract a Falcon 9 launch records HTML table from Wikipedia
*   Parse the table and convert it into a Pandas data frame


First let's import required packages for this lab


In [1]:
import sys

import requests
from bs4 import BeautifulSoup
import re
import unicodedata
import pandas as pd

To keep the lab tasks consistent, you will be asked to scrape the data from a snapshot of the  `List of Falcon 9 and Falcon Heavy launches` Wikipage updated on
`9th June 2021`


In [2]:
static_url = "https://en.wikipedia.org/w/index.php?title=List_of_Falcon_9_and_Falcon_Heavy_launches&oldid=1027686922"

### TASK 1: Request the Falcon9 Launch Wiki page from its URL


First, let's perform an HTTP GET method to request the Falcon9 Launch HTML page, as an HTTP response.


In [3]:
# use requests.get() method with the provided static_url
# assign the response to a object
data = requests.get(static_url)
data

<Response [200]>

Create a `BeautifulSoup` object from the HTML `response`


In [4]:
soup = BeautifulSoup(data.text,"html5lib")

Print the page title to verify if the `BeautifulSoup` object was created properly


In [5]:
# Use soup.title attribute
soup.title

<title>List of Falcon 9 and Falcon Heavy launches - Wikipedia</title>

In [6]:
# Use the find_all function in the BeautifulSoup object, with element type `table`
# Assign the result to a list called `html_tables`
html_tables = soup.find_all('table') 

print('number of tables is', len(html_tables))

number of tables is 24


Starting from the third table is our target table contains the actual launch records.


In [7]:
# Let's print the third table and check its content
first_launch_table = html_tables[2]
#print(first_launch_table)

In [8]:
data_table = pd.read_html(str(first_launch_table), flavor='bs4')
data_table = data_table[0]
data_table.columns

Index(['Flight No.', 'Date andtime (UTC)', 'Version,Booster [b]',
       'Launch site', 'Payload[c]', 'Payload mass', 'Orbit', 'Customer',
       'Launchoutcome', 'Boosterlanding'],
      dtype='object')

In [9]:
df = data_table
df.to_csv('spacex_web_scraped.csv', index=False)