## Project Findings Notes

EDA Findings
* The landing success rate that SpaceX experienced, which was 66.67%, is the double of its failure (33.33%). (bar chart percentage)
* We saw that as the years go on, the number of success increases and number of failures decreases. This might be because of experience and lessons learned during the previous launch projects. (line plot of both successes and failures overtime)
* We also see that different launch sites have different success rates. CCAFS SLC-40, has a success rate of 60%, while KSC LC-39A and WAFB SLC 4E has a success rate of 77%. (We should make a bar chart to show this)
* As the number of flights increases for the CCAFS SLC-40 launch site, the number of successful landings increases as well. This goes for the other launch sites. We can observe a decrease in the number of failed landings.
* There is no much different in Payload mass for CCAFS SLC-40 launch site. However, we observe that three of launches with the highest payload were successful. (scatter plot between “Payload Mass” and “Launch Site” by Class)
* Goes in the same bucket as the above bullet point. If you observe Payload vs. Launch Site scatter point chart you will find for the VAFB-SLC launch site there are no rockets launched for heavy payload mass (greater than 10,000).
* We observed that the majority of low earth orbits, such as ISS, VLEO, PO, LEO, and SSO, tend to have experienced a significant amount of successes. However, most of the company missions were targeting low earth orbits. GTO, which is a high earth orbit, had more space missions than GEO. But GTO only had 51.85% landing success rate. (bar chart of “Orbit” counts by Class)
* You should see that in the LEO orbit the success appears related to the number of flights; on the other hand, there seems to be no relationship between flight number when in GTO orbit. (scatter plot between “Flight Number” and “Orbit” by Class)
* KSC LC-39A is the site that has the largest successful launches, which is 41.2% (meaning 41.2% of the total successful missions that were launched on this site, the first stage landed successfully). (pie chart from the dashboard)
* KSC LC-39A was also the site with the highest launch success rate of 76.9% versus 23.1% of failed launches. (pie chart from the dashboard)
* The payload range with the highest number of successful launches is between 0 and 10,000 kg, with a total of 24 missions. The range with the highest success rate is between 3,000 and 4,000 kg, which is 72.73% with a total of 8 successful launches, and just 3 failed ones. (pie chart from the dashboard)
* The payload range with the lowest number of successful launches are (6,000 - 7,000), (6,000 - 8,000), and (6,000 - 9,000) with both having 0 success and 4 failures. They all have 0% success rate.
* The Booster Version Category with the highest success rate is B5, with 100% success rate. However, there was only 1 mission where this one was used. Therefore, we decided to drop it and consider the remaining ones. So the highest Booster Version Category with the highest success rate is FT, with 66.67% success rate. (from the dashboard)

In [1]:
import pandas as pd
import numpy as np
import plotly.graph_objects as go
import plotly.express as px

In [2]:
spacex_df = pd.read_csv('data/spacex_launch_dash.csv')
spacex_df.head(2)

Unnamed: 0.1,Unnamed: 0,Flight Number,Launch Site,class,Payload Mass (kg),Booster Version,Booster Version Category
0,0,1,CCAFS LC-40,0,0.0,F9 v1.0 B0003,v1.0
1,1,2,CCAFS LC-40,0,0.0,F9 v1.0 B0004,v1.0


In [3]:
spacex_df.loc[spacex_df['Launch Site'] == 'CCAFS LC-40', 'class'].value_counts(normalize=True)

class
0    0.730769
1    0.269231
Name: proportion, dtype: float64

In [4]:
spacex_df.loc[spacex_df['Launch Site'] == 'KSC LC-39A', 'class'].value_counts(normalize=True).sort_index().reset_index()

Unnamed: 0,class,proportion
0,0,0.230769
1,1,0.769231


In [12]:
fig = px.scatter(spacex_df, x='Payload Mass (kg)', y='class',
                 color='Booster Version Category'
                 )
fig.update_traces(marker_size=10)
fig.show()

In [13]:
filtered = spacex_df.loc[spacex_df['Launch Site'] == 'CCAFS LC-40']
filtered.head()

Unnamed: 0.1,Unnamed: 0,Flight Number,Launch Site,class,Payload Mass (kg),Booster Version,Booster Version Category
0,0,1,CCAFS LC-40,0,0.0,F9 v1.0 B0003,v1.0
1,1,2,CCAFS LC-40,0,0.0,F9 v1.0 B0004,v1.0
2,2,3,CCAFS LC-40,0,525.0,F9 v1.0 B0005,v1.0
3,3,4,CCAFS LC-40,0,500.0,F9 v1.0 B0006,v1.0
4,4,5,CCAFS LC-40,0,677.0,F9 v1.0 B0007,v1.0


In [14]:
fig = px.scatter(filtered, x='Payload Mass (kg)', y='class',
                 color='Booster Version Category'
                 )
fig.update_traces(marker_size=10)
fig.show()

In [16]:
list(spacex_df['Launch Site'].unique())

['CCAFS LC-40', 'VAFB SLC-4E', 'KSC LC-39A', 'CCAFS SLC-40']

## Finding Insights Visually

Now with the dashboard completed, you should be able to use it to analyze SpaceX launch data, and answer the following questions:

* Which site has the largest successful launches?
    * **KSC LC-39A** with **41.2%** of the total successful launches made by SpaceX
* Which site has the highest launch success rate?
    * **KSC LC-39A** with **76.9%** successful launches versus 23.1% failed ones
* Which payload range(s) has the highest launch success rate?
    * The range with the highest number of successful launches is between 0 and 10,000 kg, which is 24. The range with the highest success rate is between 3,000 and 4,000 kg, which is 72.73% with a total of 8 successful launches, and just 3 failed.
* Which payload range(s) has the lowest launch success rate?
    * The ranges with the lowest number of successful launches are (6,000 - 7,000), (6,000 - 8,000), and (6,000 - 9,000) with both having 0 success and 4 failures. They all have 0% success rate.
* Which F9 Booster version (v1.0, v1.1, FT, B4, B5, etc.) has the highest launch success rate?

In [13]:
print(list(range(0, 10000+1, 1000)))

[0, 1000, 2000, 3000, 4000, 5000, 6000, 7000, 8000, 9000, 10000]


In [21]:
import itertools

payload_mass_ranges = list(range(0, 10000+1, 1000))

pairs = list(itertools.combinations(payload_mass_ranges, 2))

type(pairs)

list

In [33]:
print(pairs)

[(0, 1000), (0, 2000), (0, 3000), (0, 4000), (0, 5000), (0, 6000), (0, 7000), (0, 8000), (0, 9000), (0, 10000), (1000, 2000), (1000, 3000), (1000, 4000), (1000, 5000), (1000, 6000), (1000, 7000), (1000, 8000), (1000, 9000), (1000, 10000), (2000, 3000), (2000, 4000), (2000, 5000), (2000, 6000), (2000, 7000), (2000, 8000), (2000, 9000), (2000, 10000), (3000, 4000), (3000, 5000), (3000, 6000), (3000, 7000), (3000, 8000), (3000, 9000), (3000, 10000), (4000, 5000), (4000, 6000), (4000, 7000), (4000, 8000), (4000, 9000), (4000, 10000), (5000, 6000), (5000, 7000), (5000, 8000), (5000, 9000), (5000, 10000), (6000, 7000), (6000, 8000), (6000, 9000), (6000, 10000), (7000, 8000), (7000, 9000), (7000, 10000), (8000, 9000), (8000, 10000), (9000, 10000)]


In [108]:
mydict = {'Range': [], 'sucess': [], 'failure': []}

for i in pairs:
    mydict['Range'].append(str(i))
    mydict['sucess'].append(0)
    mydict['failure'].append(0)
    
df = pd.DataFrame(mydict)

In [105]:
df.head(2)

Unnamed: 0,Range,sucess,failure
0,"(0, 1000)",0,0
1,"(0, 2000)",0,0


In [109]:
for range_pair, j in zip(pairs, range(0, len(pairs)+1)):
    for mass, classes, i in zip(spacex_df['Payload Mass (kg)'], spacex_df['class'], range(0, len(spacex_df['class'])+1)):
        if (mass >= range_pair[0]) & (mass < range_pair[1]):
            if classes == 1:
                df.iloc[j, 1] = df.iloc[j, 1] + 1
            else:
                df.iloc[j, 2] = df.iloc[j, 2] + 1
            

In [119]:
df.sort_values(by='sucess', ascending=False)

Unnamed: 0,Range,sucess,failure,success_rate
9,"(0, 10000)",24,32,0.4286
18,"(1000, 10000)",22,24,0.4783
26,"(2000, 10000)",21,22,0.4884
5,"(0, 6000)",21,26,0.4468
6,"(0, 7000)",21,30,0.4118
7,"(0, 8000)",21,30,0.4118
8,"(0, 9000)",21,30,0.4118
16,"(1000, 8000)",19,22,0.4634
4,"(0, 5000)",19,23,0.4524
17,"(1000, 9000)",19,22,0.4634


In [117]:
df['success_rate'] = round(df['sucess'] / (df['sucess'] + df['failure']), 4)

In [118]:
df.sort_values(by='success_rate', ascending=False)

Unnamed: 0,Range,sucess,failure,success_rate
27,"(3000, 4000)",8,3,0.7273
20,"(2000, 4000)",13,8,0.619
54,"(9000, 10000)",3,2,0.6
53,"(8000, 10000)",3,2,0.6
51,"(7000, 10000)",3,2,0.6
12,"(1000, 4000)",14,10,0.5833
28,"(3000, 5000)",11,8,0.5789
21,"(2000, 5000)",16,13,0.5517
29,"(3000, 6000)",13,11,0.5417
13,"(1000, 5000)",17,15,0.5312


### Which F9 Booster version (v1.0, v1.1, FT, B4, B5, etc.) has the highest launch success rate?

In [5]:
spacex_df.head(2)

Unnamed: 0.1,Unnamed: 0,Flight Number,Launch Site,class,Payload Mass (kg),Booster Version,Booster Version Category
0,0,1,CCAFS LC-40,0,0.0,F9 v1.0 B0003,v1.0
1,1,2,CCAFS LC-40,0,0.0,F9 v1.0 B0004,v1.0


In [13]:
# Make a column class_label from the class column
# 1 replaces by success and 0 by failure
spacex_df['class_label'] = spacex_df['class'].replace(1, 'success').replace(0, 'failure')
spacex_df.head()

Unnamed: 0.1,Unnamed: 0,Flight Number,Launch Site,class,Payload Mass (kg),Booster Version,Booster Version Category,class_label
0,0,1,CCAFS LC-40,0,0.0,F9 v1.0 B0003,v1.0,failure
1,1,2,CCAFS LC-40,0,0.0,F9 v1.0 B0004,v1.0,failure
2,2,3,CCAFS LC-40,0,525.0,F9 v1.0 B0005,v1.0,failure
3,3,4,CCAFS LC-40,0,500.0,F9 v1.0 B0006,v1.0,failure
4,4,5,CCAFS LC-40,0,677.0,F9 v1.0 B0007,v1.0,failure


In [16]:
# Finds the counts for successful and failed launches by Booster Version Category
success_failure_by_booster = pd.pivot_table(spacex_df, index='Booster Version Category', columns='class_label', values='class', aggfunc='count').reset_index()
# Rename the columns
success_failure_by_booster.rename(columns={'failure': 'n_failures', 'success': 'n_successes'}, inplace=True)

In [17]:
# Finds success rate by Booster Version Category
success_rate_by_booster = spacex_df.groupby('Booster Version Category')[['class']].mean().reset_index().sort_values('class', ascending=False)
# Rename column class to success_rate
success_rate_by_booster.rename(columns={'class': 'success_rate'}, inplace=True)

In [18]:
# Merge the two dataframes
booster_success_launch_data = success_failure_by_booster.merge(success_rate_by_booster, on='Booster Version Category')
# Preview the data
booster_success_launch_data

Unnamed: 0,Booster Version Category,n_failures,n_successes,success_rate
0,B4,5.0,6.0,0.545455
1,B5,,1.0,1.0
2,FT,8.0,16.0,0.666667
3,v1.0,5.0,,0.0
4,v1.1,14.0,1.0,0.066667
