In this jupyter notebook, I used selenium to webscrape the publicly available SpaceX data and structured it into a pandas dataframe which will be used to visualize any trends in the data and answer some analytics questions in a second part. 

## Initiate dependencies and open references

In [1]:
#Import corresponding libraries for selenium webscraping.
from selenium import webdriver
from selenium.common.exceptions import NoSuchElementException
import re
import copy
import pandas as pd
import sqlite3

In [2]:
#Devine driver path to reference.
driver_path = "/Users/lherna/Documents/chrome/chromedriver"

In [3]:
#Initiate chrome driver by calling the previously defined path.
driver = webdriver.Chrome(executable_path = driver_path)

In [4]:
#Import data from SpaceX wiki website.
driver.get('https://en.wikipedia.org/wiki/List_of_Falcon_9_and_Falcon_Heavy_launches')

## Data Exploration 

In [5]:
#Let's have a look at our data.
driver.find_element_by_xpath("//table[contains(@class, 'wikitable plainrowheaders collapsible mw-collapsible mw-made-collapsible')][1]/tbody/tr").text

'hide\nFlight No. Date and\ntime (UTC) Version,\nBooster [b] Launch site Payload Payload mass Orbit Customer Launch\noutcome Booster\nlanding'

As we can see, the data seems to be unstructured, but how much? Below I show a snippet of the table format directly from the wikipedia website, so this will be a little challenging to clean up and format, but it can still get done with a couple of steps. 

![SpaceX_Wiki](spacex_screenshot.png)

Done in a two step process, the data is initially extracted from the website through a nested while loop that uses a reference function - check_missing - to verify the iterations have been exhausted and there isn't any more data to be added. While in this process, initial steps to clean the data are also taken through regular expression filters and customized elemenent replacements.

In [6]:
#Defining a function to check an entry is missing in a table, 
#this will be used in the following nested while loop process. 
def check_missing(t2_str):
    try:
        #Parsed as a string,
        #the first step for this function fragment it and temporarily place it in a list of elements.
        t2_str_list = list(t2_str)
        #Once in a list of elements,
        #the string finds the iteration the loop is on,
        #and adds one more iteration to look ahead and see whether it was only missing or it truly is the end.
        t2_str_list[-2] = str(int(t2_str_list[-2]) + 1)
        #Modified by looking ahead one iteration,
        #the list of elements is regrouped to once again make the string that will be parsed.
        t2_str = ''.join(t2_str_list)
        #If the string parsed succesfully extracts data, 
        #this will be passed back onto the loop.
        t2 = driver.find_element_by_xpath(t2_str).text
        return(t2)
    #Placed to handle the exception for the case that nothing is returned when looking ahead,
    #the available NoSuchElementException will return an empty string and notify the loop that it has reached the end.
    except NoSuchElementException:
        return('')

In [7]:
#Let's start by defining parameters and objects that will be used to extract the data.
tmain = []
k = 1  # table
ttest = True
i = 1   # row
test = True
j = 1   # column
t2 = True
#First thing to do is to generate a main loop that iterates over the website; 
#this loop will be looking for the tables that have the launch data up until now.
while(ttest):
    #Using the try-except method to handle exceptions,
    #this will come in handy to make sure we have reached the end of the table through the NoSuchElementException.
    try:
        #Defining initially the path that will be taken by the selenium object,
        #I have customized it to change so that every table that has data is scraped.
        ttest_str = "//table[contains(@class, 'wikitable plainrowheaders collapsible mw-collapsible mw-made-collapsible')][" + str(k) + "]"
        #Using the find by xpath option available for selenium, 
        #I extract the text to be used in the next iteration in order for the loop to continue.
        ttest = driver.find_element_by_xpath(ttest_str).text
        t_smain = []
        #For the second loop, I use the same method as the main loop to iterate over rows once in the table.
        while(test):
            try:
                #Changing within each iteration, I have customized both the table and the row that correspond to this change.
                test_str = "//table[contains(@class, 'wikitable plainrowheaders collapsible mw-collapsible mw-made-collapsible')][" + str(k) + "]/tbody/tr[" + str(i) + "]"
                test = driver.find_element_by_xpath(test_str).text
                t_submain = []
                #Arriving to our detailed search, 
                #this loop will handle the data within each cell inside of each row.
                while(t2):
                    try: 
                        #Modifying one last time the path to our intended point of extraction, 
                        #the table, row and cell are customized as each iteration continues.
                        t2_str = "//table[contains(@class, 'wikitable plainrowheaders collapsible mw-collapsible mw-made-collapsible')][" + str(k) + "]/tbody/tr[" + str(i) + "]/td[" + str(j) + "]"
                        t2 = driver.find_element_by_xpath(t2_str).text
                        #While we are looping to extract the data, an initial cleanup step can be done.
                        t2 = re.sub(r'\[.*?\]','', t2)            # Remove the citation indices.
                        t2 = t2.replace('\n','')                  # Remove explicit new line characters.
                        t2 = t2.replace('°', ' degrees')          # Replace degree symbol with actual word.
                        t2 = t2.replace('♺', '')                  # Remove recycling symbol from the data.
                        print(t2)
                        #Having drilled down to each cell, 
                        #the data is appended to a list.
                        t_submain.append(t2)
                        #Given that there are cells without data in some tables,
                        #I have placed a conditional statement here to test whether there is still more data or if it is the end of the table. 
                        if t2 == '':
                            print('Checking missing.')
                            #Using the function previously defined, 
                            #I call it and send the most recent cell data as an argument.
                            t2_check = check_missing(t2_str)
                            #Returning the double checked cell, 
                            #I once again use a conditional statement to classify,
                            #and either append or ignore the data to the row list.
                            if t2_check:
                                t2 = t2_check
                                #Similar to the steps taken prior to the check,
                                #if the data exists, a cleaning step can be made.
                                t2 = re.sub(r'\[.*?\]','', t2)      # Remove the citation indices.
                                t2 = t2.replace('\n','')            # Remove explicit new line characters.
                                t2 = t2.replace('°', ' degrees')    # Replace degree symbol with actual word.
                                t2 = t2.replace('♺', '')            # Remove recycling symbol from the data.
                                t_submain.append(t2)
                            else:
                                pass
                    #So far, the commands have been to extract data from the website and/or add it to lists,
                    #but as part of handling the case that we have indeed reached the end of a row,
                    #an exception - NoSuchElementException - available in selenium can be used to finish up the work.
                    except NoSuchElementException:
                        break
                
                    j += 1
                i += 1
                j=1
                print('')
            #Once again, I use an exception to check if we have reached the end of a table in this case.
            except NoSuchElementException:
                break
            #Belonging to the table loop, 
            #I append the list of cells here to the row list. 
            t_smain.append(t_submain)
        k += 1
        i = 1
        print('')
        print('')
    #Finally, I use the exception to finish up if we have reached the end of set of tables in the link being scraped.
    except NoSuchElementException:
        break
    #To end the loop, I append the list of rows to the table list.
    tmain.append(t_smain)


4 June 2010,18:45
F9 v1.0B0003
CCAFS,SLC-40
Dragon Spacecraft Qualification Unit

Checking missing.
LEO
SpaceX
Success
Failure(parachute)

First flight of Falcon 9 v1.0. Used a boilerplate version of Dragon capsule which was not designed to separate from the second stage.(more details below) Attempted to recover the first stage by parachuting it into the ocean, but it burned up on reentry, before the parachutes even deployed.

8 December 2010,15:43 
F9 v1.0 B0004
CCAFS,SLC-40
Dragon demo flight C1(Dragon C101)

Checking missing.
LEO (ISS)
NASA (COTS)NRO
Success 
Failure (parachute)

Maiden flight of Dragon capsule, consisting of over 3 hours of testing thruster maneuvering and reentry. Attempted to recover the first stage by parachuting it into the ocean, but it disintegrated upon reentry, before the parachutes were deployed. (more details below) It also included two CubeSats, and a wheel of Brouère cheese.

22 May 2012,07:44
F9 v1.0B0005
CCAFS,SLC-40
Dragon demo flight C2+(Dragon C10

SpaceX CRS-6(Dragon C108.1)
1,898 kg (4,184 lb)
LEO (ISS)
NASA (CRS)
Success
Failure(drone ship)

After second-stage separation, a controlled-descent test was attempted with the first stage. After the booster contacted the ship, it tipped over due to excess lateral velocity caused by a stuck throttle valve that delayed downthrottle at the correct time.

27 April 2015,23:03
F9 v1.1B1016
Cape Canaveral,LC-40
TürkmenÄlem 52 degreesE / MonacoSAT
4,707 kg (10,377 lb)
GTO
Turkmenistan NationalSpace Agency
Success
No attempt

Original intended launch was delayed over a month after an issue with the helium pressurisation system was identified on similar parts in the assembly plant. Subsequent launch successfully positioned this first Turkmen satellite at 52.0 degrees East.

28 June 2015,14:21
F9 v1.1B1018
Cape Canaveral,LC-40
SpaceX CRS-7(Dragon C109)
1,952 kg (4,303 lb)
LEO (ISS)
NASA (CRS)
Failure(in flight)
Precluded(drone ship)

Launch performance was nominal until an overpressure incident

LEO (ISS)
NASA (CRS)
Success
Success(ground pad)

First Falcon 9 flight from the historic LC-39A launchpad at Kennedy Space Center, and first uncrewed launch from LC-39A since Skylab-1. The flight carried supplies and materials to support ISS Expeditions 50 and 51, and third return of first stage booster to landing pad at Cape Canaveral Landing Zone 1.

16 March 2017,06:00
F9 FTB1030
KSC,LC-39A
EchoStar 23
5,600 kg (12,300 lb)
GTO
EchoStar
Success
No attempt

First uncrewed non-station launch from LC-39A since Apollo 6. Launched a communications satellite for broadcast services over Brazil. Due to the payload size launch into a GTO, the booster was expended into the Atlantic Ocean and did not feature landing legs and grid fins.

30 March 2017,22:27
F9 FT B1021.2
KSC,LC-39A
SES-10
5,300 kg (11,700 lb)
GTO
SES
Success
Success(drone ship)

First payload to fly on a reused first stage, B1021, previously launched with CRS-8, and first to land intact a second time. Additionally, this flight 

31 January 2018,21:25
F9 FT B1032.2
CCAFS,SLC-40
GovSat-1 / SES-16
4,230 kg (9,330 lb)
GTO
SES
Success
Controlled(ocean)

Reused booster from the classified NROL-76 mission in May 2017. Following a successful experimental soft ocean landing that used three engines, the booster unexpectedly remained intact. Recovery was talked about and a Craiglist ad believed to be made by Elon Musk jokingly said the booster was for sale at US$9.9 million if the buyer brought their own tugboat. Despite this, recovery was not attempted, and the booster was subsequently destroyed. GovSat-1 satellite was put into a high-energy Supersynchronous Transfer Orbit of 250 x 51,500 km.

6 February 2018,20:45
Falcon HeavyB1033 (core)
KSC,LC-39A
Elon Musk's Tesla Roadster
~1,250 kg (2,760 lb)
Heliocentric0.99–1.67 AU(close to Mars transfer orbit)
SpaceX
Success
Failure(drone ship)

B1023.2 (side) 
Success(ground pad)

B1025.2 (side) 
Success(ground pad)

Maiden flight of Falcon Heavy, using two recovered Falcon 9 c

F9 B5B1048.1
VAFB,SLC-4E
Iridium NEXT-7(10 satellites)
9,600 kg (21,200 lb)
Polar LEO
Iridium Communications
Success
Success(drone ship)

Seventh Iridium NEXT launch, with 10 communication satellites. The booster landed safely on the drone ship in the worst weather conditions for any landing yet attempted. Mr. Steven boat with an upgraded 4x size net was used to attempt fairing recovery but failed due to harsh weather.

7 August 2018,05:18
F9 B5 B1046.2
CCAFS,SLC-40
Merah Putih (formerly Telkom 4)
5,800 kg (12,800 lb)
GTO
Telkom Indonesia
Success
Success(drone ship)

Indonesian comsat intended to replace the aging Telkom 1 at 108.0 degrees East. First reflight of a Block 5-version booster.

10 September 2018,04:45
F9 B5B1049.1
CCAFS,SLC-40
Telstar 18V / Apstar-5C
7,060 kg (15,560 lb)
GTO
Telesat
Success
Success(drone ship)

Condosat for 138.0 degrees East over Asia and Pacific. Delivered to a GTO orbit with apogee close to 18,000 km.

8 October 2018,02:22
F9 B5 B1048.2
VAFB,SLC-4E
SAOC


25 June 2019,06:30
Falcon HeavyB1057 core
KSC,LC-39A
Space Test Program Flight 2 (STP-2)
3,700 kg (8,200 lb)
LEO / MEO
USAF
Success
Failure(drone ship)

B1052.2(side) 
Success(ground pad)

B1053.2(side) 
Success(ground pad)

USAF Space Test Program Flight 2 (STP-2) carried 24 small satellites, including: FormoSat-7 A/B/C/D/E/F integrated using EELV Secondary Payload Adapter, DSX, Prox-1 GPIM, DSAC, ISAT, SET, COSMIC-2, Oculus-ASR, OBT, NPSat, and several CubeSats including E-TBEx, LightSail 2, TEPCE, PSAT, and three ELaNa 15 CubeSats. Total payload mass was 3,700 kg. The mission lasted six hours during which the second stage ignited four times and went into different orbits to deploy satellites including a "propulsive passivation maneuver".Third flight of Falcon Heavy. The side boosters from the Arabsat-6A mission just 2.5 months before were reused on this flight and successfully returned to LZ-1 and LZ-2. The center core, in use for the first time, underwent the most energetic reentr

18 March 2020,12:16
F9 B5 B1048.5
KSC,LC-39A
Starlink 5 v1.0 (60 satellites)
15,600 kg (34,400 lb)
LEO
SpaceX
Success
Failure(drone ship)

Fifth operational launch of Starlink satellites. It was the first time a first stage booster flew for a fifth time and the second time the fairings were reused (Starlink flight in May 2019). Towards the end of the first stage burn, the booster suffered premature shut down of an engine, the first of a Merlin 1D variant and first since the CRS-1 mission in October 2012. However, the payload still reached the targeted orbit. This was the second Starlink launch booster landing failure in a row, later revealed to be caused by residual cleaning fluid trapped inside a sensor.

22 April 2020,19:30
F9 B5 B1051.4
KSC,LC-39A
Starlink 6 v1.0 (60 satellites)
15,600 kg (34,400 lb)
LEO
SpaceX
Success
Success(drone ship)

Sixth operational launch of Starlink satellites. The 84th flight of the Falcon 9 rocket, it surpassed Atlas V to become the most-flown operationa

F9 B5B1063.1
VAFB,SLC-4E
Sentinel-6 Michael Freilich (Jason-CS A)
1,192 kg (2,628 lb)
LEO
NASA / NOAA / ESA / EUMETSAT
Success
Success(ground pad)

Named after the former director of NASA's Earth science program, it is a radar altimeter satellite part of the Ocean Surface Topography constellation located at 1336 km and 66 degrees inclination, and a follow-up to Jason 3 as a partnership between the United States (NOAA and NASA), Europe (EUMETSAT, ESA, CNES).

25 November 202002:13
F9 B5 B1049.7
CCAFS,SLC-40
Starlink 15 v1.0 (60 satellites)
15,600 kg (34,400 lb)
LEO
SpaceX
Success
Success(drone ship)

First time a booster was launched for a seventh time and first time SpaceX completed four launches in a single month.

6 December 202016:17:08
F9 B5 B1058.4
KSC,LC-39A
SpaceX CRS-21
2,972 kg (6,552 lb)
LEO (ISS)
NASA (CRS)
Success
Success(drone ship)

First launch of phase 2 of the CRS contract of six launches awarded in January 2016. It was the first launch of the upgraded version Cargo Dr

## Data cleanup

Now that the data is in a structured format that I can work with and has gone through an inital cleaning process, the next step is to further organize the data into something that can eventually end up as a dataframe that can be used for the analysis part. Before working on that process, I first had to retrieve the most important part needed to get to the end goal - the column titles. 

In [8]:
#First, since the title row was ignored in the previous nested loop (due to it having a different path from the data),
#I will do this first.
t = 1
title = True
title_list = []
while(title):
    try:
        title_str = "//table[contains(@class, 'wikitable plainrowheaders collapsible mw-collapsible mw-made-collapsible')][1]/tbody/tr/th[" + str(t) + "]"
        title = driver.find_element_by_xpath(title_str).text
        title = title.replace('\n', ' ')
        title = title.replace('hide', '')
        title = re.sub(r'\[.*?\]','', title) 
        title_list.append(title)
        t += 1
    except NoSuchElementException:
        break
title_list = title_list[1:] + ['Description']
title_list

['Date and time (UTC)',
 'Version, Booster ',
 'Launch site',
 'Payload',
 'Payload mass',
 'Orbit',
 'Customer',
 'Launch outcome',
 'Booster landing',
 'Description']

In [9]:
#After that hard work, let's copy to be more efficient.
spacex_main = copy.deepcopy(tmain)

Having the titles available at my disposal, I could now start with the assembly of the data and form a precursor of the end dataframe using lists. Accomodating the rows first that require complementary data in order to build the dataframe, this is the initial step taken below:

In [10]:
#Nested list comprehension to see what rows in each table are incomplete.
[[i for i in spacex_main[j] if len(i) < 9 and len(i) > 1] for j in range(len(spacex_main))]

[[['Orbcomm-OG2', '172 kg (379 lb)', 'LEO', 'Orbcomm', 'Partial failure']],
 [],
 [],
 [],
 [],
 [['B1023.2 (side) ', 'Success(ground pad)'],
  ['B1025.2 (side) ', 'Success(ground pad)']],
 [['B1052.1(side)', 'Success(ground pad)'],
  ['B1053.1(side)', 'Success(ground pad)'],
  ['B1052.2(side) ', 'Success(ground pad)'],
  ['B1053.2(side) ', 'Success(ground pad)']],
 [],
 []]

As we can see, there are 7 rows that still require data in order for the row to be complete (i.e. be the same size as the rest of the rows), so we can modify them manually using a list concatenation method. It is possible to automate this process, but it is challenging since it is not straight forward to figure out which index is the one that is correctly assigned to a specific element (e.g. B1023.2 (side) may belong to element 1 or 2, etc.). Alternatively, this could also be semi-automated using user input to handle the modification, but since this isn't much data a full manual modification is performed.

In [11]:
#First row modification containing flight attributes.
spacex_main[0][8] = spacex_main[0][7][0:3] + spacex_main[0][8][0:5] + [spacex_main[0][7][-1]]
#Duplicate and place description for flight in appropriate location.
spacex_main[0].insert(8,spacex_main[0][9])

In [12]:
#Second row modification containing flight attributes.
spacex_main[5][6] = [spacex_main[5][5][0]] + [spacex_main[5][6][0]] + spacex_main[5][5][2:8] + [spacex_main[5][6][1]]
#Duplicate and place description for flight in appropriate location.
spacex_main[5].insert(6,spacex_main[5][8])

In [13]:
#Third row modification containing flight attributes.
spacex_main[5][8] = [spacex_main[5][5][0]] + [spacex_main[5][8][0]] + spacex_main[5][5][2:8] + [spacex_main[5][8][1]]
#Duplicate and place description for flight in appropriate location.
spacex_main[5].insert(8,spacex_main[5][9])

In [14]:
#Fourth row modification containing flight attributes.
spacex_main[6][8] = [spacex_main[6][7][0]] + [spacex_main[6][8][0]] + spacex_main[6][7][2:8] + [spacex_main[6][8][1]]
#Duplicate and place description for flight in appropriate location.
spacex_main[6].insert(8,spacex_main[6][10])

In [15]:
#Fifth row modification containing flight attributes.
spacex_main[6][10] = [spacex_main[6][7][0]] + [spacex_main[6][10][0]] + spacex_main[6][7][2:8] + [spacex_main[6][10][1]]
#Duplicate and place description for flight in appropriate location.
spacex_main[6].insert(10,spacex_main[6][11])

In [16]:
#Sixth row modification containing flight attributes.
spacex_main[6][20] = [spacex_main[6][19][0]] + [spacex_main[6][20][0]] + spacex_main[6][19][2:8] + [spacex_main[6][20][1]]
#Duplicate and place description for flight in appropriate location.
spacex_main[6].insert(20,spacex_main[6][22])

In [17]:
#Seventh row modification containing flight attributes.
spacex_main[6][22] = [spacex_main[6][19][0]] + [spacex_main[6][22][0]] + spacex_main[6][19][2:8] + [spacex_main[6][22][1]]
#Duplicate and place description for flight in appropriate location, ready to iterate!
spacex_main[6].insert(22,spacex_main[6][23])

In [18]:
#Check if any more are missing.
[[i for i in spacex_main[j] if len(i) < 9 and len(i) > 1] for j in range(len(spacex_main))]

[[], [], [], [], [], [], [], [], []]

In [19]:
#Example of the end goal to properly align list for its transformation onto a dataframe.
spacex_main[0][1:3]

[['4 June 2010,18:45',
  'F9 v1.0B0003',
  'CCAFS,SLC-40',
  'Dragon Spacecraft Qualification Unit',
  '',
  'LEO',
  'LEO',
  'SpaceX',
  'Success',
  'Failure(parachute)'],
 ['First flight of Falcon 9 v1.0. Used a boilerplate version of Dragon capsule which was not designed to separate from the second stage.(more details below) Attempted to recover the first stage by parachuting it into the ocean, but it burned up on reentry, before the parachutes even deployed.']]

Now that all of the data is properly aligned in the same format as the example above, the next thing to do is to join the data in order to complete the rows and prepare it for a dataframe transformation. So in the case of the list of 2 lists that can be seen above, the result would be a list of 1 list containing all of the data for the datapoint.

In [20]:
#Join data with comments to complete rows.
spacex_main = [[[inner for outer in spacex_main[j][i:i+2] for inner in outer] for i in range(1,int((len(spacex_main[j])-1))) if i % 2 != 0] for j in range(len(spacex_main))]

In [21]:
#Flatten the 3D list to a 2D list, ready for dataframe transform!
spacex_main = [sitem for ssublist in spacex_main for sitem in ssublist]
spacex_main[0]

['4 June 2010,18:45',
 'F9 v1.0B0003',
 'CCAFS,SLC-40',
 'Dragon Spacecraft Qualification Unit',
 '',
 'LEO',
 'LEO',
 'SpaceX',
 'Success',
 'Failure(parachute)',
 'First flight of Falcon 9 v1.0. Used a boilerplate version of Dragon capsule which was not designed to separate from the second stage.(more details below) Attempted to recover the first stage by parachuting it into the ocean, but it burned up on reentry, before the parachutes even deployed.']

## Data preparation

As part of the last couple of steps, the data is revised once more to see if there are dimensional congruencies. Given that there was some duplicate data, these are removed and the transformation is pushed onto a dataframe. 

In [22]:
#First two datapoints do not have the same dimensions as the rest of the data.
for i in range(len(spacex_main)):
    print(i,len(spacex_main[i]))

0 11
1 11
2 10
3 10
4 10
5 10
6 10
7 10
8 10
9 10
10 10
11 10
12 10
13 10
14 10
15 10
16 10
17 10
18 10
19 10
20 10
21 10
22 10
23 10
24 10
25 10
26 10
27 10
28 10
29 10
30 10
31 10
32 10
33 10
34 10
35 10
36 10
37 10
38 10
39 10
40 10
41 10
42 10
43 10
44 10
45 10
46 10
47 10
48 10
49 10
50 10
51 10
52 10
53 10
54 10
55 10
56 10
57 10
58 10
59 10
60 10
61 10
62 10
63 10
64 10
65 10
66 10
67 10
68 10
69 10
70 10
71 10
72 10
73 10
74 10
75 10
76 10
77 10
78 10
79 10
80 10
81 10
82 10
83 10
84 10
85 10
86 10
87 10
88 10
89 10
90 10
91 10
92 10
93 10
94 10
95 10
96 10
97 10
98 10
99 10
100 10
101 10
102 10
103 10
104 10
105 10
106 10
107 10
108 10
109 10
110 10
111 10
112 10
113 10
114 10


In [23]:
#Upon further inspection,
#it can be seen that the orbit for both is duplicated.
spacex_main[1]

['8 December 2010,15:43 ',
 'F9 v1.0 B0004',
 'CCAFS,SLC-40',
 'Dragon demo flight C1(Dragon C101)',
 '',
 'LEO (ISS)',
 'LEO (ISS)',
 'NASA (COTS)NRO',
 'Success ',
 'Failure (parachute)',
 'Maiden flight of Dragon capsule, consisting of over 3 hours of testing thruster maneuvering and reentry. Attempted to recover the first stage by parachuting it into the ocean, but it disintegrated upon reentry, before the parachutes were deployed. (more details below) It also included two CubeSats, and a wheel of Brouère cheese.']

In [24]:
#Removal of first duplicate data.
spacex_main[0].pop(5)

'LEO'

In [25]:
#Removal of second duplicate data.
spacex_main[1].pop(5)

'LEO (ISS)'

In [26]:
spacex_df = pd.DataFrame(spacex_main, columns = title_list)

In [27]:
spacex_df.head()

Unnamed: 0,Date and time (UTC),"Version, Booster",Launch site,Payload,Payload mass,Orbit,Customer,Launch outcome,Booster landing,Description
0,"4 June 2010,18:45",F9 v1.0B0003,"CCAFS,SLC-40",Dragon Spacecraft Qualification Unit,,LEO,SpaceX,Success,Failure(parachute),First flight of Falcon 9 v1.0. Used a boilerpl...
1,"8 December 2010,15:43",F9 v1.0 B0004,"CCAFS,SLC-40",Dragon demo flight C1(Dragon C101),,LEO (ISS),NASA (COTS)NRO,Success,Failure (parachute),"Maiden flight of Dragon capsule, consisting of..."
2,"22 May 2012,07:44",F9 v1.0B0005,"CCAFS,SLC-40",Dragon demo flight C2+(Dragon C102),"525 kg (1,157 lb)",LEO (ISS),NASA (COTS),Success,No attempt,Dragon spacecraft demonstrated a series of tes...
3,"8 October 2012,00:35",F9 v1.0B0006,"CCAFS,SLC-40",SpaceX CRS-1(Dragon C103),"4,700 kg (10,400 lb)",LEO (ISS),NASA (CRS),Success,No attempt,"CRS-1 was successful, but the secondary payloa..."
4,"8 October 2012,00:35",F9 v1.0B0006,"CCAFS,SLC-40",Orbcomm-OG2,172 kg (379 lb),LEO,Orbcomm,Partial failure,No attempt,"CRS-1 was successful, but the secondary payloa..."


As a final step, the data will be sent off to a database using python's sqlite3 in order to access it in the second part for the analysis.

In [28]:
#Set up a connection to the sqlite3 server.
conn = sqlite3.connect('spacex_data.db')

In [29]:
#Send the data as a table to the database.
spacex_df.to_sql(name='spacex_flights',con=conn)

  sql.to_sql(


In [30]:
#Ready!
pd.read_sql_query('SELECT * FROM spacex_flights', conn)

Unnamed: 0,index,Date and time (UTC),"Version, Booster",Launch site,Payload,Payload mass,Orbit,Customer,Launch outcome,Booster landing,Description
0,0,"4 June 2010,18:45",F9 v1.0B0003,"CCAFS,SLC-40",Dragon Spacecraft Qualification Unit,,LEO,SpaceX,Success,Failure(parachute),First flight of Falcon 9 v1.0. Used a boilerpl...
1,1,"8 December 2010,15:43",F9 v1.0 B0004,"CCAFS,SLC-40",Dragon demo flight C1(Dragon C101),,LEO (ISS),NASA (COTS)NRO,Success,Failure (parachute),"Maiden flight of Dragon capsule, consisting of..."
2,2,"22 May 2012,07:44",F9 v1.0B0005,"CCAFS,SLC-40",Dragon demo flight C2+(Dragon C102),"525 kg (1,157 lb)",LEO (ISS),NASA (COTS),Success,No attempt,Dragon spacecraft demonstrated a series of tes...
3,3,"8 October 2012,00:35",F9 v1.0B0006,"CCAFS,SLC-40",SpaceX CRS-1(Dragon C103),"4,700 kg (10,400 lb)",LEO (ISS),NASA (CRS),Success,No attempt,"CRS-1 was successful, but the secondary payloa..."
4,4,"8 October 2012,00:35",F9 v1.0B0006,"CCAFS,SLC-40",Orbcomm-OG2,172 kg (379 lb),LEO,Orbcomm,Partial failure,No attempt,"CRS-1 was successful, but the secondary payloa..."
...,...,...,...,...,...,...,...,...,...,...,...
110,110,25 November 202002:13,F9 B5 B1049.7,"CCAFS,SLC-40",Starlink 15 v1.0 (60 satellites),"15,600 kg (34,400 lb)",LEO,SpaceX,Success,Success(drone ship),First time a booster was launched for a sevent...
111,111,6 December 202016:17:08,F9 B5 B1058.4,"KSC,LC-39A",SpaceX CRS-21,"2,972 kg (6,552 lb)",LEO (ISS),NASA (CRS),Success,Success(drone ship),First launch of phase 2 of the CRS contract of...
112,112,13 December 202017:30:00,F9 B5 B1051.7,"CCSFS,SLC-40",SXM 7,"7,000 kg (15,000 lb)",GTO,Sirius XM,Success,Success(drone ship),"Launched the largest, high-power broadcasting ..."
113,113,19 December 202014:00:00,F9 B5 B1059.5,"KSC,LC-39A",NROL-108,Classified,LEO,NRO,Success,Success(ground pad),The planned launch was not known by the public...
