# Progetto **SpaceLA**

* Descrizione: panoramica dei lanci spaziali (1950-2023)
* Strumenti utilizzati: BS4, Pandas, Numpy, Plotly 
* Goal: mettere insieme le conoscenze acquisite divertendosi

In [None]:
# install required libraries
!pip install requests
!pip install bs4
!pip install pandas
!pip install numpy
!pip install plotly

In [3]:
# --- Libraries --------------
import requests
from bs4 import BeautifulSoup
import pandas as pd
import numpy as np
from itertools import chain
import plotly.express as px

from functions import *

L'utente può scegliere l'anno che vuole analizzare!

In [53]:
year = input("Which year in spaceflight? Year: ")

Which year in spaceflight? Year: 1951


In [54]:
# download data and headers
data, headers = scrapingByYear(year)

In [55]:
# join every 3 lists in just one list
merged_data = [list(chain(*data[i:i + 2])) for i in range(0, len(data), 2)]
# take only lists (records) with the right amount of columns (to avoid non-necessary info)
filtered_data = [lst for lst in merged_data if len(lst) == 11]
data = filtered_data

In [56]:
# pandas DataFrame
df = pd.DataFrame(data, columns = headers[:-1])

# fix date format (not really)
fixDateFormat(df)

# save to excel file to better visualize data
df.to_excel("output.xlsx")

## Enter Pandas
#### Analisi Dati con Pandas 


In [57]:
# Exploring the dataset
df.head(5)

Unnamed: 0,Date and time (UTC),Rocket,Flight number,Launch site,LSP,Payload,Operator,Orbit,Function,Decay (UTC),Outcome
0,18 January 20:14,V-2,V-2 No. 54,White Sands LC-33,GE / US Army,,NRL,Suborbital,Cosmic Radiation / Solar UV / Solar X-Ray,18 January,Launch failure
1,22 January 22:55,Aerobee RTV-N-10,A19,White Sands LC-35,US Navy,,APL,Suborbital,Aeronomy,22 January,Successful
2,25 January 15:00,Aerobee RTV-N-10,A20,White Sands LC-35,US Navy,,APL,Suborbital,Ozone Aeronomy,25 January,Successful
3,29 January,R-1,,Kapustin Yar,OKB-1,,OKB-1,Suborbital,Missile test,29 January,Successful
4,30 January,R-1,,Kapustin Yar,OKB-1,,OKB-1,Suborbital,Missile test,30 January,Successful


In [58]:
# Keep diving in
df.tail()

Unnamed: 0,Date and time (UTC),Rocket,Flight number,Launch site,LSP,Payload,Operator,Orbit,Function,Decay (UTC),Outcome
56,27 September 00:06,Aerobee XASR-SC-1,SC 21,White Sands LC-35,US Army,,USASC / University of Michigan,Suborbital,Aeronomy,27 September,Successful
57,17 October 18:17,Aerobee RTV-A-1a,USAF 20,Holloman LC-A,US Air Force,,AFCRC / Boston University,Suborbital,Ionospheric,17 October,Successful
58,29 October 21:04,V-2,V-2 No. 60,White Sands LC-33,US Army,,USASC / University of Michigan,Suborbital,Aeronomy,29 October,Successful
59,1 November 09:46,Aerobee XASR-SC-1,SC 20,White Sands LC-35,US Army,Grenades,USASC,Suborbital,Aeronomy,1 November,Successful
60,3 November 00:35,Aerobee XASR-SC-1,SC 22,White Sands LC-35,US Army,Grenades,USASC,Suborbital,Aeronomy,3 November,Successful


In [59]:
# Print dtypes
print(df.dtypes)

Date and time (UTC)    object
Rocket                 object
Flight number          object
Launch site            object
LSP                    object
Payload                object
Operator               object
Orbit                  object
Function               object
Decay (UTC)            object
Outcome                object
dtype: object


In [60]:
# Total Launches
print("\n--- Total Launches ---")
totalLaunches = len(df.index)
print(totalLaunches)


--- Total Launches ---
61


In [61]:
# NaN Analysis --------------------------------
print("\n--- NaN Overview ---")
# Blank spaces are NaN
df = df.replace(r'^\s*$', np.nan, regex=True)

# Print total NaN count
nanTotal = df.isnull().sum().sum()
print("Total: ", nanTotal)
print("% of Records: ", round(nanTotal/(df.shape[0]*df.shape[1]), 2),"%" )


--- NaN Overview ---
Total:  102
% of Records:  0.15 %


In [62]:
# Print NaN by column
print(pd.isnull(df).sum()[pd.isnull(df).sum() > 0].to_string())

Flight number    36
Payload          53
Outcome          13


# Data Visualization

In [65]:
#print("\n--- Function Overview ---")
functionCount = df['Function'].value_counts()
filtered_functionCount = functionCount[functionCount >= 5]
#print(filtered_functionCount.to_string())

#print("\n--- Orbit Overview ---")
orbitCount = df['Orbit'].value_counts()
filtered_orbitCount = orbitCount[orbitCount >= 5]
#print(filtered_orbitCount.to_string())

In [66]:
# Function bar chart
colors = ['blue', 'red', 'green', 'yellow', 'purple']
# Resize list to match len of data
colors = colors * (len(filtered_functionCount) // len(colors)) + colors[:len(filtered_functionCount) % len(colors)]


fig = px.bar(x=filtered_functionCount.index, 
             y=filtered_functionCount.values,
             color=colors,
             labels={'x':'Function', 'y':'Count'},
             title='Function of the Launches')
fig.show()

In [67]:
# NaN pie chart
# NaN sum for each column
nan_counts = pd.isnull(df).sum()
# filters columns with more than 0 NaN
nan_counts = nan_counts[nan_counts > 0]

# pie chart
fig = px.pie(names=nan_counts.index, 
             values=nan_counts.values, 
             title='NaN counts in each column')
fig.show()

In [68]:
# Outcome pie chart
# Conta il numero di ciascun valore unico nella colonna 'Outcome'
outcome_counts = df['Outcome'].value_counts()

# rafico a torta
fig = px.pie(names=outcome_counts.index, 
             values=outcome_counts.values, 
             title='Outcomes of space launches')
fig.show()

## Upgrade possibili 

> l'utente può inserire un range di anni

nota: come gestire una lista di df o come unire più df, conservando l'informazione sull'anno (ex.una nuova colonna)
