<p style="text-align:center">
    <a href="https://skills.network" target="_blank">
    <img src="https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/assets/logos/SN_web_lightmode.png" width="200" alt="Skills Network Logo">
    </a>
</p>


# **Space X  Falcon 9 First Stage Landing Prediction**


## Web scraping Falcon 9 and Falcon Heavy Launches Records from Wikipedia


Estimated time needed: **40** minutes


In this lab, you will be performing web scraping to collect Falcon 9 historical launch records from a Wikipedia page titled `List of Falcon 9 and Falcon Heavy launches`

https://en.wikipedia.org/wiki/List_of_Falcon_9_and_Falcon_Heavy_launches


![](https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBM-DS0321EN-SkillsNetwork/labs/module_1_L2/images/Falcon9_rocket_family.svg)


Falcon 9 first stage will land successfully


![](https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DS0701EN-SkillsNetwork/api/Images/landing_1.gif)


Several examples of an unsuccessful landing are shown here:


![](https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DS0701EN-SkillsNetwork/api/Images/crash.gif)


More specifically, the launch records are stored in a HTML table shown below:


![](https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBM-DS0321EN-SkillsNetwork/labs/module_1_L2/images/falcon9-launches-wiki.png)


  ## Objectives
Web scrap Falcon 9 launch records with `BeautifulSoup`: 
- Extract a Falcon 9 launch records HTML table from Wikipedia
- Parse the table and convert it into a Pandas data frame


First let's import required packages for this lab


In [1]:
!pip3 install beautifulsoup4
!pip3 install requests



In [2]:
import sys

import requests
from bs4 import BeautifulSoup
import re
import unicodedata
import pandas as pd

and we will provide some helper functions for you to process web scraped HTML table


In [3]:
def date_time(table_cells):
    """
    This function returns the data and time from the HTML  table cell
    Input: the  element of a table data cell extracts extra row
    """
    return [data_time.strip() for data_time in list(table_cells.strings)][0:2]

def booster_version(table_cells):
    """
    This function returns the booster version from the HTML  table cell 
    Input: the  element of a table data cell extracts extra row
    """
    out=''.join([booster_version for i,booster_version in enumerate( table_cells.strings) if i%2==0][0:-1])
    return out

def landing_status(table_cells):
    """
    This function returns the landing status from the HTML table cell 
    Input: the  element of a table data cell extracts extra row
    """
    out=[i for i in table_cells.strings][0]
    return out


def get_mass(table_cells):
    mass=unicodedata.normalize("NFKD", table_cells.text).strip()
    if mass:
        mass.find("kg")
        new_mass=mass[0:mass.find("kg")+2]
    else:
        new_mass=0
    return new_mass


def extract_column_from_header(row):
    """
    This function returns the landing status from the HTML table cell 
    Input: the  element of a table data cell extracts extra row
    """
    if (row.br):
        row.br.extract()
    if row.a:
        row.a.extract()
    if row.sup:
        row.sup.extract()
        
    colunm_name = ' '.join(row.contents)
    
    # Filter the digit and empty names
    if not(colunm_name.strip().isdigit()):
        colunm_name = colunm_name.strip()
        return colunm_name    


To keep the lab tasks consistent, you will be asked to scrape the data from a snapshot of the  `List of Falcon 9 and Falcon Heavy launches` Wikipage updated on
`9th June 2021`


In [10]:
static_url = "https://en.wikipedia.org/w/index.php?title=List_of_Falcon_9_and_Falcon_Heavy_launches&oldid=1027686922"


Next, request the HTML page from the above URL and get a `response` object


### TASK 1: Request the Falcon9 Launch Wiki page from its URL


First, let's perform an HTTP GET method to request the Falcon9 Launch HTML page, as an HTTP response.


In [55]:
#added to practice for test
practiceResponse = requests.get("https://en.wikipedia.org/wiki/List_of_Falcon_9_and_Falcon_Heavy_launches")
#practiceResponse.text


In [20]:
#practiceResponse.text

In [11]:
# use requests.get() method with the provided static_url
# assign the response to a object

task1Response = requests.get(static_url)
#task1Response.text

In [21]:
#task1Response.text

Create a `BeautifulSoup` object from the HTML `response`


In [13]:
# Use BeautifulSoup() to create a BeautifulSoup object from a response text content
#soup = BeautifulSoup(practiceResponse.content, 'html.parser')
soup = BeautifulSoup(task1Response.content, 'html.parser')

Print the page title to verify if the `BeautifulSoup` object was created properly 


In [14]:
# Use soup.title attribute
soup.title.string

'List of Falcon 9 and Falcon Heavy launches - Wikipedia'

### TASK 2: Extract all column/variable names from the HTML table header


Next, we want to collect all relevant column names from the HTML table header


Let's try to find all tables on the wiki page first. If you need to refresh your memory about `BeautifulSoup`, please check the external reference link towards the end of this lab


In [15]:
# Use the find_all function in the BeautifulSoup object, with element type `table`
# Assign the result to a list called `html_tables`
html_tables = soup.find_all('table')

Starting from the third table is our target table contains the actual launch records.


# Let's print the third table and check its content
first_launch_table = html_tables[1]
print(first_launch_table)

In [65]:
first_launch_table = html_tables[2]
print(first_launch_table)

<table class="wikitable sticky-header" style="width:100%;">
<tbody><tr>
<th scope="col">Date and time (<a href="/wiki/Coordinated_Universal_Time" title="Coordinated Universal Time">UTC</a>)<sup class="reference" id="cite_ref-nextSFupcoming_393-1"><a href="#cite_note-nextSFupcoming-393"><span class="cite-bracket">[</span>385<span class="cite-bracket">]</span></a></sup>
</th>
<th scope="col"><a href="/wiki/List_of_Falcon_9_first-stage_boosters" title="List of Falcon 9 first-stage boosters">Version,<br/>booster</a><sup class="reference" id="cite_ref-booster_22-2"><a href="#cite_note-booster-22"><span class="cite-bracket">[</span>f<span class="cite-bracket">]</span></a></sup>
</th>
<th scope="col">Launch site
</th>
<th scope="col">Payload<sup class="reference" id="cite_ref-Dragon_23-2"><a href="#cite_note-Dragon-23"><span class="cite-bracket">[</span>g<span class="cite-bracket">]</span></a></sup>
</th>
<th scope="col">Orbit
</th>
<th scope="col">Customer
</th></tr>
<tr>
<td rowspan="2">4 O

You should able to see the columns names embedded in the table header elements `<th>` as follows:


```
<tr>
<th scope="col">Flight No.
</th>
<th scope="col">Date and<br/>time (<a href="/wiki/Coordinated_Universal_Time" title="Coordinated Universal Time">UTC</a>)
</th>
<th scope="col"><a href="/wiki/List_of_Falcon_9_first-stage_boosters" title="List of Falcon 9 first-stage boosters">Version,<br/>Booster</a> <sup class="reference" id="cite_ref-booster_11-0"><a href="#cite_note-booster-11">[b]</a></sup>
</th>
<th scope="col">Launch site
</th>
<th scope="col">Payload<sup class="reference" id="cite_ref-Dragon_12-0"><a href="#cite_note-Dragon-12">[c]</a></sup>
</th>
<th scope="col">Payload mass
</th>
<th scope="col">Orbit
</th>
<th scope="col">Customer
</th>
<th scope="col">Launch<br/>outcome
</th>
<th scope="col"><a href="/wiki/Falcon_9_first-stage_landing_tests" title="Falcon 9 first-stage landing tests">Booster<br/>landing</a>
</th></tr>
```


Next, we just need to iterate through the `<th>` elements and apply the provided `extract_column_from_header()` to extract column name one by one


In [66]:
column_names = []

# Apply find_all() function with `th` element on first_launch_table
# Iterate each th element and apply the provided extract_column_from_header() to get a column name
# Append the Non-empty column name (`if name is not None and len(name) > 0`) into a list called column_names

th_elements = first_launch_table.find_all('th')
for th in th_elements:
    columnName = extract_column_from_header(th)
    if columnName is not None and len(columnName) > 0:
        column_names.append(columnName)

NameError: name 'extract_column_from_header' is not defined

Check the extracted column names


In [11]:
print(column_names)

['Flight No.', 'Date and time ( )', 'Launch site', 'Payload', 'Payload mass', 'Orbit', 'Customer', 'Launch outcome']


## TASK 3: Create a data frame by parsing the launch HTML tables


We will create an empty dictionary with keys from the extracted column names in the previous task. Later, this dictionary will be converted into a Pandas dataframe


In [12]:
launch_dict= dict.fromkeys(column_names)

# Remove an irrelvant column
del launch_dict['Date and time ( )']

# Let's initial the launch_dict with each value to be an empty list
launch_dict['Flight No.'] = []
launch_dict['Launch Site'] = []
launch_dict['Payload'] = []
launch_dict['Payload mass'] = []
launch_dict['Orbit'] = []
launch_dict['Customer'] = []
launch_dict['Launch outcome'] = []
# Added some new columns
launch_dict['Version Booster'] = []
launch_dict['Booster landing'] = []
launch_dict['Date'] = []
launch_dict['Time'] = []

Next, we just need to fill up the `launch_dict` with launch records extracted from table rows.


Usually, HTML tables in Wiki pages are likely to contain unexpected annotations and other types of noises, such as reference links `B0004.1[8]`, missing values `N/A [e]`, inconsistent formatting, etc.


To simplify the parsing process, we have provided an incomplete code snippet below to help you to fill up the `launch_dict`. Please complete the following code snippet with TODOs or you can choose to write your own logic to parse all launch tables:


In [13]:
extracted_row = 0
#Extract each table 
for table_number,table in enumerate(soup.find_all('table',"wikitable plainrowheaders collapsible")):
   # get table row 
    for rows in table.find_all("tr"):
        #check to see if first table heading is as number corresponding to launch a number 
        if rows.th:
            if rows.th.string:
                flight_number=rows.th.string.strip()
                flag=flight_number.isdigit()
        else:
            flag=False
        #get table element 
        row=rows.find_all('td')
        #if it is number save cells in a dictonary 
        if flag:            
            extracted_row += 1
            # Flight Number value
            # TODO: Append the flight_number into launch_dict with key `Flight No.`
            launch_dict['Flight No.'].append(flight_number)
            #print(flight_number)
            datatimelist=date_time(row[0])
            
            # Date value
            # TODO: Append the date into launch_dict with key `Date`
            date = datatimelist[0].strip(',')
            launch_dict['Date'].append(date)
            #print(date)
            
            # Time value
            # TODO: Append the time into launch_dict with key `Time`
            time = datatimelist[1]
            launch_dict['Time'].append(time)
            #print(time)
              
            # Booster version
            # TODO: Append the bv into launch_dict with key `Version Booster`
            bv=booster_version(row[1])
            if not(bv):
                bv=row[1].a.string
            launch_dict['Version Booster'].append(bv)
            #print(bv)
            
            # Launch Site
            # TODO: Append the bv into launch_dict with key `Launch Site`
            launch_site = row[2].a.string
            launch_dict['Launch Site'].append(launch_site)
            #print(launch_site)
            
            # Payload
            # TODO: Append the payload into launch_dict with key `Payload`
            payload = row[3].a.string
            launch_dict['Payload'].append(payload)
            #print(payload)
            
            # Payload Mass
            # TODO: Append the payload_mass into launch_dict with key `Payload mass`
            payload_mass = get_mass(row[4])
            launch_dict['Payload mass'].append(payload_mass)
            #print(payload_mass)
            
            # Orbit
            # TODO: Append the orbit into launch_dict with key `Orbit`
            orbit = row[5].a.string
            launch_dict['Orbit'].append(orbit)
            #print(orbit)
            
            # Customer
            # TODO: Append the customer into launch_dict with key `Customer`
            try:
                customer = row[6].a.string if row[6].a else row[6].string
            except AttributeError:
                customer = None
            launch_dict['Customer'].append(customer)
            #print(customer)
            
            # Launch outcome
            # TODO: Append the launch_outcome into launch_dict with key `Launch outcome`
            launch_outcome = list(row[7].strings)[0]
            launch_dict['Launch outcome'].append(launch_outcome)
            #print(launch_outcome)
            
            # Booster landing
            # TODO: Append the launch_outcome into launch_dict with key `Booster landing`
            booster_landing = landing_status(row[8])
            launch_dict['Booster landing'].append(booster_landing)
            #print(booster_landing)
            
            
                        

In [14]:
print(launch_dict['Version Booster'])

['F9 v1.07B0003.18', 'F9 v1.07B0004.18', 'F9 v1.07B0005.18', 'F9 v1.07B0006.18', 'F9 v1.07B0007.18', 'F9 v1.17B10038', 'F9 v1.1', 'F9 v1.1', 'F9 v1.1', 'F9 v1.1', 'F9 v1.1', 'F9 v1.1[', 'F9 v1.1[', 'F9 v1.1[', 'F9 v1.1[', 'F9 v1.1[', 'F9 v1.1[', 'F9 v1.1[', 'F9 v1.1[', 'F9 FT[', 'F9 v1.1[', 'F9 FT[', 'F9 FT[', 'F9 FT[', 'F9 FT[', 'F9 FT[', 'F9 FT[', 'F9 FT[', 'F9 FT[', 'F9 FT[', 'F9 FT[', 'F9 FT♺[', 'F9 FT[', 'F9 FT[', 'F9 FT[', 'F9 FTB1029.2195', 'F9 FT[', 'F9 FT[', 'F9 B4[', 'F9 FT[', 'F9 B4[', 'F9 B4[', 'F9 FTB1031.2220', 'F9 B4[', 'F9 FTB1035.2227', 'F9 FTB1036.2227', 'F9 B4[', 'F9 FTB1032.2245', 'F9 FTB1038.2268', 'F9 B4[', 'F9 B4B1041.2268', 'F9 B4B1039.2292', 'F9 B4[', 'F9 B5311B1046.1268', 'F9 B4B1043.2322', 'F9 B4B1040.2268', 'F9 B4B1045.2336', 'F9 B5', 'F9 B5349B1048[', 'F9 B5B1046.2354', 'F9 B5[', 'F9 B5B1048.2364', 'F9 B5B1047.2268', 'F9 B5B1046.3268', 'F9 B5[', 'F9 B5[', 'F9 B5B1049.2397', 'F9 B5B1048.3399', 'F9 B5[]413', 'F9 B5[', 'F9 B5B1049.3434', 'F9 B5B1051.2420', 'F9

After you have fill in the parsed launch record values into `launch_dict`, you can create a dataframe from it.


In [15]:
print(launch_dict['Payload mass'])

[0, 0, '525 kg', '4,700 kg', '4,877 kg', '500 kg', '3,170 kg', '3,325 kg', '2,296 kg', '1,316 kg', '4,535 kg', '4,428 kg', '2,216 kg', '2,395 kg', '570 kg', '4,159 kg', '1,898 kg', '4,707 kg', '1,952 kg', '2,034 kg', '553 kg', '5,271 kg', '3,136 kg', '4,696 kg', '3,100 kg', '3,600 kg', '2,257 kg', '4,600 kg', '9,600 kg', '2,490 kg', '5,600 kg', '5,300 kg', 'C', '6,070 kg', '2,708 kg', '3,669 kg', '9,600 kg', '6,761 kg', '3,310 kg', '475 kg', '4,990 kg', '9,600 kg', '5,200 kg', '3,500 kg', '2,205 kg', '9,600 kg', 'C', '4,230 kg', '2,150 kg', '6,092 kg', '9,600 kg', '2,647 kg', '362 kg', '3,600 kg', '6,460 kg', '5,384 kg', '2,697 kg', '7,075 kg', '9,600 kg', '5,800 kg', '7,060 kg', '3,000 kg', '5,300 kg', '~4,000 kg', '2,500 kg', '4,400 kg', '9,600 kg', '4,850 kg', '12,055 kg', '2,495 kg', '13,620 kg', '4,200 kg', '2,268 kg', '6,500 kg', '15,600 kg', '2,617 kg', '6,956 kg', '15,600 kg', '12,050 kg', '15,600 kg', '15,600 kg', '1,977 kg', '15,600 kg', '15,600 kg', '12,530 kg', '15,600 kg',

In [16]:
import re

# Function to extract numeric value from a string
def extract_numeric(value):
    if isinstance(value, (int, float)):
        return float(value)
    elif isinstance(value, str):
        # Extract all numeric characters (including decimal point)
        numeric_string = re.sub(r'[^\d.]', '', value)
        try:
            return float(numeric_string)
        except ValueError:
            return None
    return None

In [17]:
# Extract numeric values from the 'Payload mass' list
payload_masses = [extract_numeric(mass) for mass in launch_dict['Payload mass']]

# Remove None values (which represent non-numeric or invalid entries)
payload_masses = [mass for mass in payload_masses if mass is not None]

# Calculate the mean
if payload_masses:
    mean_payload_mass = sum(payload_masses) / len(payload_masses)
    # Replace the 'Payload mass' list with the calculated mean value
    launch_dict['Payload mass'] = [mean_payload_mass]
else:
    print("No valid payload mass data available")

print(launch_dict['Payload mass'])

[431005.0423728814]


In [18]:
df= pd.DataFrame({ key:pd.Series(value) for key, value in launch_dict.items() })

  """Entry point for launching an IPython kernel.


We can now export it to a <b>CSV</b> for the next section, but to make the answers consistent and in case you have difficulties finishing this lab. 

Following labs will be using a provided dataset to make each lab independent. 


<code>df.to_csv('spacex_web_scraped.csv', index=False)</code>


In [121]:
df.to_csv('spacex_web_scraped.csv', index=False)

In [31]:
df

Unnamed: 0,Flight No.,Launch site,Payload,Payload mass,Orbit,Customer,Launch outcome,Launch Site,Version Booster,Booster landing,Date,Time
0,1,,Dragon Spacecraft Qualification Unit,431005.042373,LEO,SpaceX,Success\n,CCAFS,F9 v1.07B0003.18,Failure,4 June 2010,18:45
1,2,,Dragon,,LEO,NASA,Success,CCAFS,F9 v1.07B0004.18,Failure,8 December 2010,15:43
2,3,,Dragon,,LEO,NASA,Success,CCAFS,F9 v1.07B0005.18,No attempt\n,22 May 2012,07:44
3,4,,SpaceX CRS-1,,LEO,NASA,Success\n,CCAFS,F9 v1.07B0006.18,No attempt,8 October 2012,00:35
4,5,,SpaceX CRS-2,,LEO,NASA,Success\n,CCAFS,F9 v1.07B0007.18,No attempt\n,1 March 2013,15:10
...,...,...,...,...,...,...,...,...,...,...,...,...
116,117,,Starlink,,LEO,SpaceX,Success\n,CCSFS,F9 B5B1051.10657,Success,9 May 2021,06:42
117,118,,Starlink,,LEO,SpaceX,Success\n,KSC,F9 B5B1058.8660,Success,15 May 2021,22:56
118,119,,Starlink,,LEO,SpaceX,Success\n,CCSFS,F9 B5B1063.2665,Success,26 May 2021,18:59
119,120,,SpaceX CRS-22,,LEO,NASA,Success\n,KSC,F9 B5B1067.1668,Success,3 June 2021,17:29


In [16]:
# calling space X API
practiceResponse = requests.get("https://api.spacexdata.com/v3/launches")
json_data = practiceResponse.json()

In [17]:
df = pd.DataFrame(json_data)
df.shape

(111, 31)

In [28]:
df.shape

(111, 31)

In [20]:
from pandas.io.json import json_normalize

df_from_json = json_normalize(json_data)
#df.head
#df.to_csv('spacex_data2.csv', index=False)
#df
df_from_json['static_fire_date_utc']

  This is separate from the ipykernel package so we can avoid doing imports until


0      2006-03-17T00:00:00.000Z
1                          None
2                          None
3      2008-09-20T00:00:00.000Z
4                          None
                 ...           
106                        None
107    2020-11-17T13:17:00.000Z
108    2020-11-21T16:31:00.000Z
109                        None
110                        None
Name: static_fire_date_utc, Length: 111, dtype: object

In [21]:
filtered_df2 = df[df_from_json['rocket.rocket_id'] != 'falcon1']
#filtered_df2.shape
falcon9launches = len(filtered_df2)
falcon9launches

106

In [7]:
df_from_json.shape

(111, 99)

In [18]:
#filtered_df = df[df['rocket.rocket_name'] != 'Falcon 1']
#filtered_df.shape

filtered_df2 = df[df['rocket.rocket_id'] != 'falcon1']
#filtered_df2.shape
falcon9launches = len(filtered_df2)
falcon9launches

KeyError: 'rocket.rocket_id'

In [33]:
df_falcon1 = df[df['rocket.rocket_name'] == 'Falcon 1']
df_falcon1.shape

(5, 99)

In [34]:
df_falcon9 = df[df['rocket.rocket_name'] == 'Falcon 9']
df_falcon9.shape

(103, 99)

In [40]:
#pd.set_option('display.max_columns', None)
df_from_json.head()

#reset
pd.reset_option('display.max_columns')
pd.reset_option('display.max_rows')

In [41]:
df_from_json.columns.tolist()

['flight_number',
 'mission_name',
 'mission_id',
 'upcoming',
 'launch_year',
 'launch_date_unix',
 'launch_date_utc',
 'launch_date_local',
 'is_tentative',
 'tentative_max_precision',
 'tbd',
 'launch_window',
 'ships',
 'launch_success',
 'details',
 'static_fire_date_utc',
 'static_fire_date_unix',
 'crew',
 'rocket.rocket_id',
 'rocket.rocket_name',
 'rocket.rocket_type',
 'rocket.first_stage.cores',
 'rocket.second_stage.block',
 'rocket.second_stage.payloads',
 'rocket.fairings.reused',
 'rocket.fairings.recovery_attempt',
 'rocket.fairings.recovered',
 'rocket.fairings.ship',
 'telemetry.flight_club',
 'launch_site.site_id',
 'launch_site.site_name',
 'launch_site.site_name_long',
 'launch_failure_details.time',
 'launch_failure_details.altitude',
 'launch_failure_details.reason',
 'links.mission_patch',
 'links.mission_patch_small',
 'links.reddit_campaign',
 'links.reddit_launch',
 'links.reddit_recovery',
 'links.reddit_media',
 'links.presskit',
 'links.article_link',
 'li

In [46]:
def extract_landing_info(row):
    try:
        rocket_info = row['rocket']
        core_info = rocket_info['first_stage']['cores'][0]
        return {
            'flight_number': row['flight_number'],
            'mission_name': row['mission_name'],
            'launch_year': row['launch_year'],
            'rocket_name': rocket_info['rocket_name'],
            'rocket_type': rocket_info['rocket_type'],
            'core_serial': core_info['core_serial'],
            'landing_type': core_info['landing_type'],
            'land_success': core_info['land_success'],
            'landing_vehicle': core_info['landing_vehicle']
        }
    except (KeyError, IndexError, TypeError):
        return None

In [49]:
landing_data = [extract_landing_info(row) for _, row in df_from_json.iterrows()]
landing_data

[None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None]

In [34]:
# Make the API call
response = requests.get('https://api.spacexdata.com/v5/launches')
#https://api.spacexdata.com/v3/launches

data = response.json()

# Use json_normalize to flatten the JSON
df = pd.json_normalize(data)
df.columns.tolist()
#df.head()
# Filter out Falcon 1 launches
#falcon_9_df = df[df['rocket.rocket_id'] != 'falcon1']

# Count the Falcon 9 launches
#falcon_9_launches = len(falcon_9_df)

#print(f"Number of Falcon 9 launches: {falcon_9_launches}")

['static_fire_date_utc',
 'static_fire_date_unix',
 'net',
 'window',
 'rocket',
 'success',
 'failures',
 'details',
 'crew',
 'ships',
 'capsules',
 'payloads',
 'launchpad',
 'flight_number',
 'name',
 'date_utc',
 'date_unix',
 'date_local',
 'date_precision',
 'upcoming',
 'cores',
 'auto_update',
 'tbd',
 'launch_library_id',
 'id',
 'fairings.reused',
 'fairings.recovery_attempt',
 'fairings.recovered',
 'fairings.ships',
 'links.patch.small',
 'links.patch.large',
 'links.reddit.campaign',
 'links.reddit.launch',
 'links.reddit.media',
 'links.reddit.recovery',
 'links.flickr.small',
 'links.flickr.original',
 'links.presskit',
 'links.webcast',
 'links.youtube_id',
 'links.article',
 'links.wikipedia',
 'fairings']

In [10]:
# Make the API call
#response = requests.get('https://api.spacexdata.com/v3/launches')
#data = response.json()

# Use json_normalize to flatten the JSON
#df = pd.json_normalize(data)

# The landing pad information is nested within the cores array
# We need to extract it and create a new column
df['landingPad'] = df['rocket.first_stage.cores'].apply(
    lambda x: x[0].get('landing_vehicle') if x and isinstance(x[0], dict) else None
)

# Count the missing values in the landingPad column
missing_landing_pad = df['landingPad'].isnull().sum()

print(f"Number of missing values in the landingPad column: {missing_landing_pad}")

Number of missing values in the landingPad column: 38


In [23]:
# Fetch data from the SpaceX API
url = "https://api.spacexdata.com/v3/launches"
response = requests.get(url)
data = response.json()

# Convert the JSON data to a pandas DataFrame
df = pd.json_normalize(data)

# Filter for Falcon 9 launches
falcon_9_launches = df[df['rocket.rocket_id'] == 'falcon9']

# Count the number of Falcon 9 launches
falcon_9_count = len(falcon_9_launches)

print(f"Number of Falcon 9 launches: {falcon_9_count}")
print(df['rocket.rocket_id'].value_counts())

Number of Falcon 9 launches: 103
falcon9        103
falcon1          5
falconheavy      3
Name: rocket.rocket_id, dtype: int64


In [14]:
#df.head()
pd.set_option('display.max_columns', None)
#df_from_json.head()

#reset
#pd.reset_option('display.max_columns')
#pd.reset_option('display.max_rows')
df.head()

Unnamed: 0,flight_number,mission_name,mission_id,upcoming,launch_year,launch_date_unix,launch_date_utc,launch_date_local,is_tentative,tentative_max_precision,tbd,launch_window,ships,launch_success,details,static_fire_date_utc,static_fire_date_unix,crew,rocket.rocket_id,rocket.rocket_name,rocket.rocket_type,rocket.first_stage.cores,rocket.second_stage.block,rocket.second_stage.payloads,rocket.fairings.reused,rocket.fairings.recovery_attempt,rocket.fairings.recovered,rocket.fairings.ship,telemetry.flight_club,launch_site.site_id,launch_site.site_name,launch_site.site_name_long,launch_failure_details.time,launch_failure_details.altitude,launch_failure_details.reason,links.mission_patch,links.mission_patch_small,links.reddit_campaign,links.reddit_launch,links.reddit_recovery,links.reddit_media,links.presskit,links.article_link,links.wikipedia,links.video_link,links.youtube_id,links.flickr_images,timeline.webcast_liftoff,rocket.fairings,timeline.go_for_prop_loading,timeline.rp1_loading,timeline.stage1_lox_loading,timeline.stage2_lox_loading,timeline.engine_chill,timeline.prelaunch_checks,timeline.propellant_pressurization,timeline.go_for_launch,timeline.ignition,timeline.liftoff,timeline.maxq,timeline.meco,timeline.stage_sep,timeline.second_stage_ignition,timeline.seco-1,timeline.dragon_separation,timeline.dragon_solar_deploy,timeline.dragon_bay_door_deploy,timeline.fairing_deploy,timeline.payload_deploy,timeline.second_stage_restart,timeline.seco-2,timeline.webcast_launch,timeline.payload_deploy_1,timeline.payload_deploy_2,timeline.first_stage_boostback_burn,timeline.first_stage_entry_burn,timeline.first_stage_landing,timeline,timeline.beco,timeline.side_core_sep,timeline.side_core_boostback,timeline.center_stage_sep,timeline.center_core_boostback,timeline.side_core_entry_burn,timeline.center_core_entry_burn,timeline.side_core_landing,timeline.center_core_landing,timeline.first_stage_landing_burn,timeline.stage1_rp1_loading,timeline.stage2_rp1_loading,timeline.seco-3,timeline.seco-4,last_date_update,last_ll_launch_date,last_ll_update,last_wiki_launch_date,last_wiki_revision,last_wiki_update,launch_date_source
0,1,FalconSat,[],False,2006,1143239400,2006-03-24T22:30:00.000Z,2006-03-25T10:30:00+12:00,False,hour,False,0.0,[],False,Engine failure at 33 seconds and loss of vehicle,2006-03-17T00:00:00.000Z,1142554000.0,,falcon1,Falcon 1,Merlin A,"[{'core_serial': 'Merlin1A', 'flight': 1, 'blo...",1.0,"[{'payload_id': 'FalconSAT-2', 'norad_id': [],...",False,False,False,,,kwajalein_atoll,Kwajalein Atoll,Kwajalein Atoll Omelek Island,33.0,,merlin engine failure,https://images2.imgbox.com/40/e3/GypSkayF_o.png,https://images2.imgbox.com/3c/0e/T8iJcSN3_o.png,,,,,,https://www.space.com/2196-spacex-inaugural-fa...,https://en.wikipedia.org/wiki/DemoSat,https://www.youtube.com/watch?v=0a_00nJ_Y88,0a_00nJ_Y88,[],54.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
1,2,DemoSat,[],False,2007,1174439400,2007-03-21T01:10:00.000Z,2007-03-21T13:10:00+12:00,False,hour,False,0.0,[],False,Successful first stage burn and transition to ...,,,,falcon1,Falcon 1,Merlin A,"[{'core_serial': 'Merlin2A', 'flight': 1, 'blo...",1.0,"[{'payload_id': 'DemoSAT', 'norad_id': [], 're...",False,False,False,,,kwajalein_atoll,Kwajalein Atoll,Kwajalein Atoll Omelek Island,301.0,289.0,harmonic oscillation leading to premature engi...,https://images2.imgbox.com/be/e7/iNqsqVYM_o.png,https://images2.imgbox.com/4f/e3/I0lkuJ2e_o.png,,,,,,https://www.space.com/3590-spacex-falcon-1-roc...,https://en.wikipedia.org/wiki/DemoSat,https://www.youtube.com/watch?v=Lk4zQ2wP-Nc,Lk4zQ2wP-Nc,[],60.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
2,3,Trailblazer,[],False,2008,1217734440,2008-08-03T03:34:00.000Z,2008-08-03T15:34:00+12:00,False,hour,False,0.0,[],False,Residual stage 1 thrust led to collision betwe...,,,,falcon1,Falcon 1,Merlin C,"[{'core_serial': 'Merlin1C', 'flight': 1, 'blo...",1.0,"[{'payload_id': 'Trailblazer', 'norad_id': [],...",False,False,False,,,kwajalein_atoll,Kwajalein Atoll,Kwajalein Atoll Omelek Island,140.0,35.0,residual stage-1 thrust led to collision betwe...,https://images2.imgbox.com/4b/bd/d8UxLh4q_o.png,https://images2.imgbox.com/3d/86/cnu0pan8_o.png,,,,,,http://www.spacex.com/news/2013/02/11/falcon-1...,https://en.wikipedia.org/wiki/Trailblazer_(sat...,https://www.youtube.com/watch?v=v0w9p3U8860,v0w9p3U8860,[],14.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
3,4,RatSat,[],False,2008,1222643700,2008-09-28T23:15:00.000Z,2008-09-28T11:15:00+12:00,False,hour,False,0.0,[],True,Ratsat was carried to orbit on the first succe...,2008-09-20T00:00:00.000Z,1221869000.0,,falcon1,Falcon 1,Merlin C,"[{'core_serial': 'Merlin2C', 'flight': 1, 'blo...",1.0,"[{'payload_id': 'RatSat', 'norad_id': [33393],...",False,False,False,,,kwajalein_atoll,Kwajalein Atoll,Kwajalein Atoll Omelek Island,,,,https://images2.imgbox.com/e0/a7/FNjvKlXW_o.png,https://images2.imgbox.com/e9/c9/T8CfiSYb_o.png,,,,,,https://en.wikipedia.org/wiki/Ratsat,https://en.wikipedia.org/wiki/Ratsat,https://www.youtube.com/watch?v=dLQ2tZEH6G0,dLQ2tZEH6G0,[],5.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
4,5,RazakSat,[],False,2009,1247456100,2009-07-13T03:35:00.000Z,2009-07-13T15:35:00+12:00,False,hour,False,0.0,[],True,,,,,falcon1,Falcon 1,Merlin C,"[{'core_serial': 'Merlin3C', 'flight': 1, 'blo...",1.0,"[{'payload_id': 'RazakSAT', 'norad_id': [35578...",False,False,False,,,kwajalein_atoll,Kwajalein Atoll,Kwajalein Atoll Omelek Island,,,,https://images2.imgbox.com/8d/fc/0qdZMWWx_o.png,https://images2.imgbox.com/a7/ba/NBZSw3Ho_o.png,,,,,http://www.spacex.com/press/2012/12/19/spacexs...,http://www.spacex.com/news/2013/02/12/falcon-1...,https://en.wikipedia.org/wiki/RazakSAT,https://www.youtube.com/watch?v=yTaIDooc8Og,yTaIDooc8Og,[],5.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,


In [16]:
falcon_9_launches = df[df['rocket.rocket_name'] != 'Falcon 1']

# Count the number of Falcon 9 launches
falcon_9_count = len(falcon_9_launches)

print(f"Number of Falcon 9 launches: {falcon_9_count}")
print(df['rocket.rocket_name'].value_counts())

Number of Falcon 9 launches: 106
Falcon 9        103
Falcon 1          5
Falcon Heavy      3
Name: rocket.rocket_name, dtype: int64


In [22]:
url = "https://api.spacexdata.com/v3/launches"
response = requests.get(url)
data = response.json()
df2 = pd.read_json(data)
df2.head()

ValueError: Invalid file path or buffer object type: <class 'list'>

In [21]:
falcon_9_launches = df2[df2['rocket.rocket_name'] != 'Falcon 1']

# Count the number of Falcon 9 launches
falcon_9_count = len(falcon_9_launches)

print(f"Number of Falcon 9 launches: {falcon_9_count}")
print(df['rocket.rocket_name'].value_counts())

KeyError: 'rocket.rocket_name'

In [3]:
# Fetch data from the SpaceX API v5
url = 'https://api.spacexdata.com/v5/launches'
response = requests.get(url)
data = response.json()

# Count Falcon 9 launches
falcon_9_count = sum(1 for launch in data if launch['rocket'] == '5e9d0d95eda69973a809d1ec')

print(f"Number of Falcon 9 launches: {falcon_9_count}")

Number of Falcon 9 launches: 195


In [40]:
df3 = pd.json_normalize(data)

#df3.head()
#df3['cores']
df3.to_csv('spacex_web_scrapedv5.csv', index=False)

In [5]:
url = 'https://api.spacexdata.com/v5/launches'
response = requests.get(url)
data = response.json()

missing_landpad_count = 0
total_cores = 0
null_landpad_count = 0

for launch in data:
    for core in launch['cores']:
        total_cores += 1
        if 'landpad' not in core or core['landpad'] is None:
            missing_landpad_count += 1
        #if 'landpad' not in core or core['landpad'].isnull():
        #    null_landpad_count += 1

print(f"Total cores: {total_cores}")
print(f"Missing landpad values: {missing_landpad_count}")
print(f"Null landpad values: {null_landpad_count}")

Total cores: 215
Missing landpad values: 58
Null landpad values: 0


In [10]:

def get_falcon9_launches():
    url = 'https://api.spacexdata.com/v5/launches'
    try:
        response = requests.get(url)
        response.raise_for_status()  # Raise an exception for bad status codes
        launches = response.json()
        #print("Raw API response:", launches)  # Print raw response for inspection
        falcon9_count = sum(1 for launch in launches if 'falcon9' in launch['name'].lower())
        return falcon9_count
    except requests.exceptions.RequestException as e:
        print(f"An error occurred: {e}")
        return None



In [11]:
falcon9_launches = get_falcon9_launches()
print(f"Total Falcon 9 launches: {falcon9_launches}")

Total Falcon 9 launches: 0


In [12]:

def get_falcon9_launches():
    url = 'https://api.spacexdata.com/v5/launches'
    try:
        response = requests.get(url)
        response.raise_for_status()
        launches = response.json()
        falcon9_count = sum(1 for launch in launches if launch['rocket'] == '5e9d0d95eda69973a809d1ec')
        return falcon9_count
    except requests.exceptions.RequestException as e:
        print(f"An error occurred: {e}")
        return None



In [13]:
falcon9_launches = get_falcon9_launches()
print(f"Total Falcon 9 launches: {falcon9_launches}")

Total Falcon 9 launches: 195


In [14]:
def get_falcon9_launches():
    url = 'https://api.spacexdata.com/v5/launches'
    try:
        response = requests.get(url)
        response.raise_for_status()
        launches = response.json()
        # Normalize JSON data to create a DataFrame
        df = pd.json_normalize(launches)
        # Count Falcon 9 launches by filtering based on the rocket ID
        falcon9_count = df[df['rocket'] == '5e9d0d95eda69973a809d1ec'].shape[0]
        return falcon9_count
    except requests.exceptions.RequestException as e:
        print(f"An error occurred: {e}")
        return None



In [15]:
falcon9_launches = get_falcon9_launches()
print(f"Total Falcon 9 launches: {falcon9_launches}")

Total Falcon 9 launches: 195


In [24]:
# Setting this option will print all collumns of a dataframe
pd.set_option('display.max_columns', None)
# Setting this option will print all of the data in a feature
pd.set_option('display.max_colwidth', None)

In [25]:
url = "https://api.spacexdata.com/v4/launches"
response = requests.get(url)
data = response.json()
df3 = pd.json_normalize(data)

In [26]:
df3.head()

Unnamed: 0,static_fire_date_utc,static_fire_date_unix,net,window,rocket,success,failures,details,crew,ships,capsules,payloads,launchpad,flight_number,name,date_utc,date_unix,date_local,date_precision,upcoming,cores,auto_update,tbd,launch_library_id,id,fairings.reused,fairings.recovery_attempt,fairings.recovered,fairings.ships,links.patch.small,links.patch.large,links.reddit.campaign,links.reddit.launch,links.reddit.media,links.reddit.recovery,links.flickr.small,links.flickr.original,links.presskit,links.webcast,links.youtube_id,links.article,links.wikipedia,fairings
0,2006-03-17T00:00:00.000Z,1142554000.0,False,0.0,5e9d0d95eda69955f709d1eb,False,"[{'time': 33, 'altitude': None, 'reason': 'merlin engine failure'}]",Engine failure at 33 seconds and loss of vehicle,[],[],[],[5eb0e4b5b6c3bb0006eeb1e1],5e9e4502f5090995de566f86,1,FalconSat,2006-03-24T22:30:00.000Z,1143239400,2006-03-25T10:30:00+12:00,hour,False,"[{'core': '5e9e289df35918033d3b2623', 'flight': 1, 'gridfins': False, 'legs': False, 'reused': False, 'landing_attempt': False, 'landing_success': None, 'landing_type': None, 'landpad': None}]",True,False,,5eb87cd9ffd86e000604b32a,False,False,False,[],https://images2.imgbox.com/94/f2/NN6Ph45r_o.png,https://images2.imgbox.com/5b/02/QcxHUb5V_o.png,,,,,[],[],,https://www.youtube.com/watch?v=0a_00nJ_Y88,0a_00nJ_Y88,https://www.space.com/2196-spacex-inaugural-falcon-1-rocket-lost-launch.html,https://en.wikipedia.org/wiki/DemoSat,
1,,,False,0.0,5e9d0d95eda69955f709d1eb,False,"[{'time': 301, 'altitude': 289, 'reason': 'harmonic oscillation leading to premature engine shutdown'}]","Successful first stage burn and transition to second stage, maximum altitude 289 km, Premature engine shutdown at T+7 min 30 s, Failed to reach orbit, Failed to recover first stage",[],[],[],[5eb0e4b6b6c3bb0006eeb1e2],5e9e4502f5090995de566f86,2,DemoSat,2007-03-21T01:10:00.000Z,1174439400,2007-03-21T13:10:00+12:00,hour,False,"[{'core': '5e9e289ef35918416a3b2624', 'flight': 1, 'gridfins': False, 'legs': False, 'reused': False, 'landing_attempt': False, 'landing_success': None, 'landing_type': None, 'landpad': None}]",True,False,,5eb87cdaffd86e000604b32b,False,False,False,[],https://images2.imgbox.com/f9/4a/ZboXReNb_o.png,https://images2.imgbox.com/80/a2/bkWotCIS_o.png,,,,,[],[],,https://www.youtube.com/watch?v=Lk4zQ2wP-Nc,Lk4zQ2wP-Nc,https://www.space.com/3590-spacex-falcon-1-rocket-fails-reach-orbit.html,https://en.wikipedia.org/wiki/DemoSat,
2,,,False,0.0,5e9d0d95eda69955f709d1eb,False,"[{'time': 140, 'altitude': 35, 'reason': 'residual stage-1 thrust led to collision between stage 1 and stage 2'}]",Residual stage 1 thrust led to collision between stage 1 and stage 2,[],[],[],"[5eb0e4b6b6c3bb0006eeb1e3, 5eb0e4b6b6c3bb0006eeb1e4]",5e9e4502f5090995de566f86,3,Trailblazer,2008-08-03T03:34:00.000Z,1217734440,2008-08-03T15:34:00+12:00,hour,False,"[{'core': '5e9e289ef3591814873b2625', 'flight': 1, 'gridfins': False, 'legs': False, 'reused': False, 'landing_attempt': False, 'landing_success': None, 'landing_type': None, 'landpad': None}]",True,False,,5eb87cdbffd86e000604b32c,False,False,False,[],https://images2.imgbox.com/6c/cb/na1tzhHs_o.png,https://images2.imgbox.com/4a/80/k1oAkY0k_o.png,,,,,[],[],,https://www.youtube.com/watch?v=v0w9p3U8860,v0w9p3U8860,http://www.spacex.com/news/2013/02/11/falcon-1-flight-3-mission-summary,https://en.wikipedia.org/wiki/Trailblazer_(satellite),
3,2008-09-20T00:00:00.000Z,1221869000.0,False,0.0,5e9d0d95eda69955f709d1eb,True,[],"Ratsat was carried to orbit on the first successful orbital launch of any privately funded and developed, liquid-propelled carrier rocket, the SpaceX Falcon 1",[],[],[],[5eb0e4b7b6c3bb0006eeb1e5],5e9e4502f5090995de566f86,4,RatSat,2008-09-28T23:15:00.000Z,1222643700,2008-09-28T11:15:00+12:00,hour,False,"[{'core': '5e9e289ef3591855dc3b2626', 'flight': 1, 'gridfins': False, 'legs': False, 'reused': False, 'landing_attempt': False, 'landing_success': None, 'landing_type': None, 'landpad': None}]",True,False,,5eb87cdbffd86e000604b32d,False,False,False,[],https://images2.imgbox.com/95/39/sRqN7rsv_o.png,https://images2.imgbox.com/a3/99/qswRYzE8_o.png,,,,,[],[],,https://www.youtube.com/watch?v=dLQ2tZEH6G0,dLQ2tZEH6G0,https://en.wikipedia.org/wiki/Ratsat,https://en.wikipedia.org/wiki/Ratsat,
4,,,False,0.0,5e9d0d95eda69955f709d1eb,True,[],,[],[],[],[5eb0e4b7b6c3bb0006eeb1e6],5e9e4502f5090995de566f86,5,RazakSat,2009-07-13T03:35:00.000Z,1247456100,2009-07-13T15:35:00+12:00,hour,False,"[{'core': '5e9e289ef359184f103b2627', 'flight': 1, 'gridfins': False, 'legs': False, 'reused': False, 'landing_attempt': False, 'landing_success': None, 'landing_type': None, 'landpad': None}]",True,False,,5eb87cdcffd86e000604b32e,False,False,False,[],https://images2.imgbox.com/ab/5a/Pequxd5d_o.png,https://images2.imgbox.com/92/e4/7Cf6MLY0_o.png,,,,,[],[],http://www.spacex.com/press/2012/12/19/spacexs-falcon-1-successfully-delivers-razaksat-satellite-orbit,https://www.youtube.com/watch?v=yTaIDooc8Og,yTaIDooc8Og,http://www.spacex.com/news/2013/02/12/falcon-1-flight-5,https://en.wikipedia.org/wiki/RazakSAT,


In [32]:
df3['rocket']

0      5e9d0d95eda69955f709d1eb
1      5e9d0d95eda69955f709d1eb
2      5e9d0d95eda69955f709d1eb
3      5e9d0d95eda69955f709d1eb
4      5e9d0d95eda69955f709d1eb
                 ...           
200    5e9d0d95eda69973a809d1ec
201    5e9d0d95eda69973a809d1ec
202    5e9d0d95eda69973a809d1ec
203    5e9d0d95eda69974db09d1ed
204    5e9d0d95eda69973a809d1ec
Name: rocket, Length: 205, dtype: object

In [38]:
df3.columns.tolist()

['static_fire_date_utc',
 'static_fire_date_unix',
 'net',
 'window',
 'rocket',
 'success',
 'failures',
 'details',
 'crew',
 'ships',
 'capsules',
 'payloads',
 'launchpad',
 'flight_number',
 'name',
 'date_utc',
 'date_unix',
 'date_local',
 'date_precision',
 'upcoming',
 'cores',
 'auto_update',
 'tbd',
 'launch_library_id',
 'id',
 'fairings.reused',
 'fairings.recovery_attempt',
 'fairings.recovered',
 'fairings.ships',
 'links.patch.small',
 'links.patch.large',
 'links.reddit.campaign',
 'links.reddit.launch',
 'links.reddit.media',
 'links.reddit.recovery',
 'links.flickr.small',
 'links.flickr.original',
 'links.presskit',
 'links.webcast',
 'links.youtube_id',
 'links.article',
 'links.wikipedia',
 'fairings']

In [27]:
missing_landpad_count = 0
total_cores = 0
null_landpad_count = 0

for launch in data:
    for core in launch['cores']:
        total_cores += 1
        if 'landpad' not in core or core['landpad'] is None:
            missing_landpad_count += 1
        #if 'landpad' not in core or core['landpad'].isnull():
        #    null_landpad_count += 1

print(f"Total cores: {total_cores}")
print(f"Missing landpad values: {missing_landpad_count}")
print(f"Null landpad values: {null_landpad_count}")

Total cores: 215
Missing landpad values: 58
Null landpad values: 0


In [7]:
def get_falcon9_launches():
    url = 'https://api.spacexdata.com/v4/launches'
    try:
        response = requests.get(url)
        response.raise_for_status()
        launches = response.json()
        # Normalize JSON data to create a DataFrame
        df9 = pd.json_normalize(launches)
        # Count Falcon 9 launches by filtering based on the rocket ID
        falcon9_count = df9[df9['rocket'] == '5e9d0d95eda69973a809d1ec'].shape[0]
        return falcon9_count
    except requests.exceptions.RequestException as e:
        print(f"An error occurred: {e}")
        return None

In [8]:
url = 'https://api.spacexdata.com/v4/launches'
response = requests.get(url)
launches = response.json()
df9 = pd.json_normalize(launches)

In [9]:
falcon9_launches = get_falcon9_launches()
print(f"Total Falcon 9 launches: {falcon9_launches}")

Total Falcon 9 launches: 195


In [48]:
df9.to_csv('spacex_v4_20241001_1930.csv', index=False)

In [53]:
def analyze_spacex_launches():
    url = 'https://api.spacexdata.com/v4/launches'
    try:
        response = requests.get(url)
        response.raise_for_status()
        launches = response.json()
        # Normalize JSON data to create a DataFrame
        df = pd.json_normalize(launches)
        
        # Count Falcon 9 launches
        falcon9_count = df[df['rocket'] == '5e9d0d95eda69973a809d1ec'].shape[0]
        
        # Count missing values in the landingPad column
        missing_landing_pad = df['cores.0.landpad'].isna().sum()
        
        return falcon9_count, missing_landing_pad
    except requests.exceptions.RequestException as e:
        print(f"An error occurred: {e}")
        return None, None



In [54]:
falcon9_launches, missing_landing_pad = analyze_spacex_launches()
print(f"Total Falcon 9 launches: {falcon9_launches}")
print(f"Missing values in landingPad column: {missing_landing_pad}")

KeyError: 'cores.0.landpad'

In [16]:
def soup_to_dict(tag):
    if tag.name is None:
        return tag.string
    return {
        'name': tag.name,
        'attrs': tag.attrs,
        'contents': [soup_to_dict(child) for child in tag.contents if child != '\n']
    }

In [18]:
import json

# Convert soup to dictionary
data_dict = soup_to_dict(soup)

# Convert dictionary to JSON
json_data = json.dumps(data_dict)

#normalize json
df = pd.json_normalize(json.loads(json_data))

In [28]:
html_tables = soup.find_all('table')
#html_tables[2]

In [30]:
df = pd.read_html(str(html_tables[2]))
df.head()

AttributeError: 'list' object has no attribute 'head'

In [36]:
# Convert the selected table to a pandas DataFrame
df = pd.read_html(str(html_tables))[2]

# Convert the DataFrame to JSON
json_data = df.to_json(orient='records')
#json_data
#df = pd.json_normalize(json_data)
# If you want to pretty-print the JSON
json_data_pretty = json.dumps(json.loads(json_data), indent=4)

#print(json_data_pretty)

In [37]:
df = pd.json_normalize(json_data_pretty)

NotImplementedError: 

In [38]:
def table_to_json(soup, table_index):
    tables = soup.find_all('table')
    if table_index >= len(tables):
        raise IndexError("Table index out of range")
    
    selected_table = tables[table_index]
    headers = selected_table.find_all('th')
    
    if headers:
        df = pd.read_html(str(selected_table))[0]
    else:
        rows = selected_table.find_all('tr')
        data = [[td.text.strip() for td in row.find_all('td')] for row in rows]
        df = pd.DataFrame(data, columns=[f'Column_{i}' for i in range(len(data[0]))])
    
    return json.dumps(json.loads(df.to_json(orient='records')), indent=4)


In [40]:

# Usage
json_data = table_to_json(soup, 2)  # Convert the second table to JSON
#print(json_data)

In [43]:
from pandas.io.json import json_normalize

#selected_table = soup.find_all('table')[table_index]
selected_table = html_tables[2]

# Convert the table to a DataFrame
df = pd.read_html(str(selected_table))[0]

# Convert the DataFrame to JSON
json_data = df.to_json(orient='records')

# Parse the JSON string back into a Python object
parsed_json = json.loads(json_data)

df_from_json = json_normalize(parsed_json)

  from ipykernel import kernelapp as app


In [45]:
df_from_json

Unnamed: 0,Flight No.,Date andtime (UTC),"Version,Booster [b]",Launch site,Payload[c],Payload mass,Orbit,Customer,Launchoutcome,Boosterlanding
0,1,"4 June 2010,18:45",F9 v1.0[7]B0003.1[8],"CCAFS,SLC-40",Dragon Spacecraft Qualification Unit,,LEO,SpaceX,Success,Failure[9][10](parachute)
1,1,First flight of Falcon 9 v1.0.[11] Used a boil...,First flight of Falcon 9 v1.0.[11] Used a boil...,First flight of Falcon 9 v1.0.[11] Used a boil...,First flight of Falcon 9 v1.0.[11] Used a boil...,First flight of Falcon 9 v1.0.[11] Used a boil...,First flight of Falcon 9 v1.0.[11] Used a boil...,First flight of Falcon 9 v1.0.[11] Used a boil...,First flight of Falcon 9 v1.0.[11] Used a boil...,First flight of Falcon 9 v1.0.[11] Used a boil...
2,2,"8 December 2010,15:43[13]",F9 v1.0[7]B0004.1[8],"CCAFS,SLC-40",Dragon demo flight C1(Dragon C101),,LEO (ISS),".mw-parser-output .plainlist ol,.mw-parser-out...",Success[9],Failure[9][14](parachute)
3,2,"Maiden flight of Dragon capsule, consisting of...","Maiden flight of Dragon capsule, consisting of...","Maiden flight of Dragon capsule, consisting of...","Maiden flight of Dragon capsule, consisting of...","Maiden flight of Dragon capsule, consisting of...","Maiden flight of Dragon capsule, consisting of...","Maiden flight of Dragon capsule, consisting of...","Maiden flight of Dragon capsule, consisting of...","Maiden flight of Dragon capsule, consisting of..."
4,3,"22 May 2012,07:44[17]",F9 v1.0[7]B0005.1[8],"CCAFS,SLC-40",Dragon demo flight C2+[18](Dragon C102),"525 kg (1,157 lb)[19]",LEO (ISS),NASA (COTS),Success[20],No attempt
5,3,Dragon spacecraft demonstrated a series of tes...,Dragon spacecraft demonstrated a series of tes...,Dragon spacecraft demonstrated a series of tes...,Dragon spacecraft demonstrated a series of tes...,Dragon spacecraft demonstrated a series of tes...,Dragon spacecraft demonstrated a series of tes...,Dragon spacecraft demonstrated a series of tes...,Dragon spacecraft demonstrated a series of tes...,Dragon spacecraft demonstrated a series of tes...
6,4,"8 October 2012,00:35[21]",F9 v1.0[7]B0006.1[8],"CCAFS,SLC-40",SpaceX CRS-1[22](Dragon C103),"4,700 kg (10,400 lb)",LEO (ISS),NASA (CRS),Success,No attempt
7,4,"8 October 2012,00:35[21]",F9 v1.0[7]B0006.1[8],"CCAFS,SLC-40",Orbcomm-OG2[23],172 kg (379 lb)[24],LEO,Orbcomm,Partial failure[25],No attempt
8,4,"CRS-1 was successful, but the secondary payloa...","CRS-1 was successful, but the secondary payloa...","CRS-1 was successful, but the secondary payloa...","CRS-1 was successful, but the secondary payloa...","CRS-1 was successful, but the secondary payloa...","CRS-1 was successful, but the secondary payloa...","CRS-1 was successful, but the secondary payloa...","CRS-1 was successful, but the secondary payloa...","CRS-1 was successful, but the secondary payloa..."
9,5,"1 March 2013,15:10",F9 v1.0[7]B0007.1[8],"CCAFS,SLC-40",SpaceX CRS-2[22](Dragon C104),"4,877 kg (10,752 lb)",LEO (ISS),NASA (CRS),Success,No attempt


## Authors


<a href="https://www.linkedin.com/in/yan-luo-96288783/">Yan Luo</a>


<a href="https://www.linkedin.com/in/nayefaboutayoun/">Nayef Abou Tayoun</a>


<!--
## Change Log
-->


<!--
| Date (YYYY-MM-DD) | Version | Changed By | Change Description      |
| ----------------- | ------- | ---------- | ----------------------- |
| 2021-06-09        | 1.0     | Yan Luo    | Tasks updates           |
| 2020-11-10        | 1.0     | Nayef      | Created the initial version |
-->


Copyright © 2021 IBM Corporation. All rights reserved.
