<div style="text-align:center; cursor: auto;">
    <a href="https://www.credly.com/badges/5277e6b4-acf1-4f18-b83f-05d1d2ef3059/public_url" target="_blank">
        <img 
            src="applied-data-science-capstone.png" 
            width="150" 
            alt="IBM Applied Data Science Badge" 
            style="object-fit: cover; border-radius: 50%;">
    </a>
</div>

**<center><h2>EDA via SQL</h2></center>**

## Overview of the DataSet

SpaceX has gained worldwide attention for a series of historic milestones. 

It is the only private company ever to return a spacecraft from low-earth orbit, which it first accomplished in December 2010.
SpaceX advertises Falcon 9 rocket launches on its website with a cost of 62 million dollars wheras other providers cost upward of 165 million dollars each, much of the savings is because Space X can reuse the first stage. 


Therefore if we can determine if the first stage will land, we can determine the cost of a launch. 

This information can be used if an alternate company wants to bid against SpaceX for a rocket launch.

This dataset includes a record for each payload carried during a SpaceX mission into outer space.


### Connect to the database

In [1]:
import csv, sqlite3
import pandas as pd

In [2]:
# specifies the database connection URL
con = sqlite3.connect("my_data1.db")
cur = con.cursor()

In [3]:
df = pd.read_csv("spacex_dataset.csv")
# df = pd.read_csv('https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBM-DS0321EN-SkillsNetwork/labs/module_2/data/Spacex.csv')
df.to_sql("SPACEXTABLE", con, if_exists='replace', index=False, method="multi")

cur.execute("CREATE TABLE IF NOT EXISTS SPACEXTABLE_CLEAN AS SELECT * FROM SPACEXTABLE WHERE Date IS NOT NULL")
con.commit()

df_clean = pd.read_sql_query("SELECT * FROM SPACEXTABLE_CLEAN", con)
df_clean

Unnamed: 0,Date,Time (UTC),Booster_Version,Launch_Site,Payload,PAYLOAD_MASS__KG_,Orbit,Customer,Mission_Outcome,Landing_Outcome
0,2010-06-04,18:45:00,F9 v1.0 B0003,CCAFS LC-40,Dragon Spacecraft Qualification Unit,0,LEO,SpaceX,Success,Failure (parachute)
1,2010-12-08,15:43:00,F9 v1.0 B0004,CCAFS LC-40,"Dragon demo flight C1, two CubeSats, barrel of...",0,LEO (ISS),NASA (COTS) NRO,Success,Failure (parachute)
2,2012-05-22,7:44:00,F9 v1.0 B0005,CCAFS LC-40,Dragon demo flight C2,525,LEO (ISS),NASA (COTS),Success,No attempt
3,2012-10-08,0:35:00,F9 v1.0 B0006,CCAFS LC-40,SpaceX CRS-1,500,LEO (ISS),NASA (CRS),Success,No attempt
4,2013-03-01,15:10:00,F9 v1.0 B0007,CCAFS LC-40,SpaceX CRS-2,677,LEO (ISS),NASA (CRS),Success,No attempt
...,...,...,...,...,...,...,...,...,...,...
96,2020-11-05,23:24:23,F9 B5B1062.1,CCAFS SLC-40,"GPS III-04 , Crew-1",4311,MEO,USSF,Success,Success
97,2020-11-16,0:27:00,F9 B5B1061.1,KSC LC-39A,"Crew-1, Sentinel-6 Michael Freilich",12500,LEO (ISS),NASA (CCP),Success,Success
98,2020-11-21,17:17:08,F9 B5B1063.1,VAFB SLC-4E,"Sentinel-6 Michael Freilich, Starlink 15 v1.0",1192,LEO,NASA / NOAA / ESA / EUMETSAT,Success,Success
99,2020-11-25,2:13:00,F9 B5 B1049.7,CCAFS SLC-40,"Starlink 15 v1.0, SpaceX CRS-21",15600,LEO,SpaceX,Success,Success


## Some Insights Extracted via SQL

##### All launch site names

In [4]:
unique_launch_site_query = "SELECT DISTINCT Launch_Site FROM SPACEXTABLE_CLEAN"
launch_sites_df = pd.read_sql_query(unique_launch_site_query, con)

print(launch_sites_df)

    Launch_Site
0   CCAFS LC-40
1   VAFB SLC-4E
2    KSC LC-39A
3  CCAFS SLC-40



#####  Some launches originated at a CCA launch site


In [5]:
CCA_5_query = "SELECT * FROM SPACEXTABLE_CLEAN WHERE Launch_Site LIKE 'CCA%' LIMIT 10;"
cca_5_df = pd.read_sql_query(CCA_5_query, con)
print(cca_5_df)

         Date Time (UTC) Booster_Version  Launch_Site  \
0  2010-06-04   18:45:00  F9 v1.0  B0003  CCAFS LC-40   
1  2010-12-08   15:43:00  F9 v1.0  B0004  CCAFS LC-40   
2  2012-05-22    7:44:00  F9 v1.0  B0005  CCAFS LC-40   
3  2012-10-08    0:35:00  F9 v1.0  B0006  CCAFS LC-40   
4  2013-03-01   15:10:00  F9 v1.0  B0007  CCAFS LC-40   
5  2013-12-03   22:41:00         F9 v1.1  CCAFS LC-40   
6  2014-01-06   22:06:00         F9 v1.1  CCAFS LC-40   
7  2014-04-18   19:25:00         F9 v1.1  CCAFS LC-40   
8  2014-07-14   15:15:00         F9 v1.1  CCAFS LC-40   
9  2014-08-05    8:00:00         F9 v1.1  CCAFS LC-40   

                                             Payload  PAYLOAD_MASS__KG_  \
0               Dragon Spacecraft Qualification Unit                  0   
1  Dragon demo flight C1, two CubeSats, barrel of...                  0   
2                              Dragon demo flight C2                525   
3                                       SpaceX CRS-1                500 

##### Total payload mass carried by boosters launched by NASA (CRS)

In [6]:
total_payload_mass_CRS_query = "SELECT SUM(PAYLOAD_MASS__KG_) FROM SPACEXTABLE_CLEAN WHERE Customer = 'NASA (CRS)'"
print('TOTAL PAYLOAD: ', pd.read_sql_query(total_payload_mass_CRS_query, con).iloc[0,0])

TOTAL PAYLOAD:  45596


##### Average payload mass carried by booster version F9 v1.1


In [7]:
average_payload_mass_F9_query = "SELECT AVG(PAYLOAD_MASS__KG_) FROM SPACEXTABLE WHERE Booster_Version LIKE '%F9 v1.1%'"
print('AVERAGE PAYLOAD: ', pd.read_sql_query(average_payload_mass_F9_query, con).iloc[0,0])

AVERAGE PAYLOAD:  2534.6666666666665


##### Date when the first succesful landing outcome (ground pad) was achieved.

In [8]:
first_successful_landing_date_query = "SELECT MIN(Date) FROM SPACEXTABLE_CLEAN WHERE Landing_Outcome = 'Success (ground pad)'"
print('DATE: ', pd.read_sql_query(first_successful_landing_date_query, con).iloc[0,0])

DATE:  2015-12-22


##### All successfull Boosters in drone ship landing with a payload mass between 4000 and 6000

In [9]:
successful_drone_ship_names_query = """
SELECT Booster_Version FROM SPACEXTABLE_CLEAN
WHERE Landing_Outcome = "Success (drone ship)"
    AND PAYLOAD_MASS__KG_ > 4000
    AND PAYLOAD_MASS__KG_ < 6000;
"""

print(pd.read_sql_query(successful_drone_ship_names_query, con))

  Booster_Version
0     F9 FT B1022
1     F9 FT B1026
2  F9 FT  B1021.2
3  F9 FT  B1031.2


##### Total number of successful and failed mission outcomes


In [10]:
successful_outcomes_query = """
SELECT COUNT('*')
FROM SPACEXTABLE
WHERE Mission_Outcome LIKE "Success%"
"""

successful_missions = pd.read_sql_query(successful_outcomes_query, con)


failed_outcomes_query = """
SELECT COUNT ('*')
FROM SPACEXTABLE
WHERE Mission_Outcome LIKE "Failure%"
"""

failed_missions = pd.read_sql_query(failed_outcomes_query, con)


print('Successful Missions: ', successful_missions.iloc[0,0])
print('Failed Missions: ', failed_missions.iloc[0,0])


Successful Missions:  100
Failed Missions:  1


##### All boster Versions which have carried the maximum payload mass

In [11]:
max_payload_mass = """
SELECT "Booster_Version"
FROM SPACEXTABLE_CLEAN
WHERE "PAYLOAD_MASS__KG_" = (
    SELECT MAX("PAYLOAD_MASS__KG_")
    FROM SPACEXTABLE_CLEAN
);
"""

print(pd.read_sql_query(max_payload_mass, con))

   Booster_Version
0    F9 B5 B1048.4
1    F9 B5 B1049.4
2    F9 B5 B1051.3
3    F9 B5 B1056.4
4    F9 B5 B1048.5
5    F9 B5 B1051.4
6    F9 B5 B1049.5
7   F9 B5 B1060.2 
8   F9 B5 B1058.3 
9    F9 B5 B1051.6
10   F9 B5 B1060.3
11  F9 B5 B1049.7 


##### All 2015 failed launches (drone ship) with their month number, booster version, and launch site

In [12]:
max_payload_mass_query = """
SELECT strftime('%m', "Date") AS Month, 
       "Landing_Outcome", 
       "Booster_Version", 
       "Launch_Site"
  FROM SPACEXTABLE
  WHERE "Landing_Outcome" = 'Failure (drone ship)' AND strftime('%Y', "Date") = '2015'
"""

print(pd.read_sql_query(max_payload_mass_query, con))

  Month       Landing_Outcome Booster_Version  Launch_Site
0    01  Failure (drone ship)   F9 v1.1 B1012  CCAFS LC-40
1    04  Failure (drone ship)   F9 v1.1 B1015  CCAFS LC-40


##### Ranking of landing outcomes (such as Failure (drone ship) or Success (ground pad)) between the dates of 06/04/2010 and 03/20/2021, in descending order.

In [13]:
landing_outcomes_ranking = """
SELECT 
    "Landing_Outcome", 
    COUNT(*) AS Outcome_Count,
    RANK() OVER (ORDER BY COUNT(*) DESC) AS Rank
FROM SPACEXTABLE_CLEAN
WHERE "Date" BETWEEN '2010-06-04' AND '2021-03-20'
GROUP BY "Landing_Outcome"
ORDER BY Rank;
"""

df_landing_outcomes = pd.read_sql_query(landing_outcomes_ranking, con)
print("{:<25} {:<15} {:<5}".format('Landing_Outcome', 'Outcome_Count', 'Rank'))
for i, row in df_landing_outcomes.iterrows():
    print("{:<25} {:<15} {:<5}".format(row['Landing_Outcome'], row['Outcome_Count'], row['Rank']))

Landing_Outcome           Outcome_Count   Rank 
Success                   38              1    
No attempt                21              2    
Success (drone ship)      14              3    
Success (ground pad)      9               4    
Failure (drone ship)      5               5    
Controlled (ocean)        5               5    
Failure                   3               7    
Uncontrolled (ocean)      2               8    
Failure (parachute)       2               8    
Precluded (drone ship)    1               10   
No attempt                1               10   
