<a href="https://colab.research.google.com/github/natnew/Python-Projects-Collecting-and-Manipulating-Data/blob/main/Collecting_and_Manipulating_For_Rocket_Launches_Data.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Introduction

Weather is tracked and analyzed every day to help airplanes have safe flights. Many weather conditions must be monitored to ensure that the likelihood of something negative happening to the aircraft is as low as possible. With rocket launches, the risk and outcomes of a bit of misread or untracked data can be devastating. (Microsoft 2021)

This module will look at:



* Conditions (cloudy, partly cloudy, fair, rain, thunder, heavy storm)
* Temperature
* Humidity
* Wind speed
* Wind direction
* Precipitation
* Visibility
* Sea level
* Pressure



## Questions

"What day in X number of years will be least likely to require a launch push due to weather?"

"Will the weather in this area at this time cause any potential issues for the launch?"

## Collect Data

The data we will explore is weather data from NOAA and Weather Underground for the dates of NASA rocket launches, taken from the list of NASA missions Wikipedia page. 

You can find the data here: <br>
https://en.wikipedia.org/wiki/List_of_NASA_missions <br>
https://www.noaa.gov/ <br>
https://www.wunderground.com/history

## Missing Data

The Excel file has extensive data about each launch. However, as you start to explore this data, you might find a significant problem. Only one row represents a rocket launch that was supposed to happen but was pushed because of weather concerns:

Row 294 - Space X Dragon - May 27, 2020 (Microsoft 2021)

# Import Libraries

In [1]:
# Pandas library is used for handling tabular data
import pandas as pd

# NumPy is used for handling numerical series operations (addition, multiplication, and ...)

import numpy as np
# Sklearn library contains all the machine learning packages we need to digest and extract patterns from the data
from sklearn import linear_model, model_selection, metrics
from sklearn.model_selection import train_test_split

# Machine learning libraries used to build a decision tree
from sklearn.tree import DecisionTreeClassifier
from sklearn import tree

# Sklearn's preprocessing library is used for processing and cleaning the data 
from sklearn import preprocessing

# for visualizing the tree
import pydotplus
from IPython.display import Image

# Read Data into a variable

In [2]:
launch_data = pd.read_excel('RocketLaunchDataCompleted.xlsx')
launch_data.head()

Unnamed: 0,Name,Date,Time (East Coast),Location,Crewed or Uncrewed,Launched?,High Temp,Low Temp,Ave Temp,Temp at Launch Time,Hist High Temp,Hist Low Temp,Hist Ave Temp,Percipitation at Launch Time,Hist Ave Percipitation,Wind Direction,Max Wind Speed,Visibility,Wind Speed at Launch Time,Hist Ave Max Wind Speed,Hist Ave Visibility,Sea Level Pressure,Hist Ave Sea Level Pressure,Day Length,Condition,Notes
0,,1958-12-04,,Cape Canaveral,,,75.0,68.0,71.0,,75.0,55.0,65.0,0.0,0.08,E,16.0,15.0,,,,30.22,,10:26:00,Cloudy,
1,,1958-12-05,,Cape Canaveral,,,78.0,70.0,73.39,,75.0,55.0,65.0,0.0,0.09,E,14.0,10.0,,,,30.2,,10:26:00,Cloudy,
2,Pioneer 3,1958-12-06,01:45:00,Cape Canaveral,Uncrewed,Y,73.0,0.0,60.21,62.0,75.0,55.0,65.0,0.0,0.09,NE,15.0,10.0,11.0,,,30.25,,10:25:00,Cloudy,
3,,1958-12-07,,Cape Canaveral,,,76.0,57.0,66.04,,75.0,55.0,65.0,0.0,0.08,N,10.0,10.0,,,,30.28,,10:25:00,Partly Cloudy,
4,,1958-12-08,,Cape Canaveral,,,79.0,60.0,70.52,,75.0,55.0,65.0,0.0,0.09,E,12.0,10.0,,,,30.23,,12:24:00,Partly Cloudy,


# Explore Data

In [4]:
launch_data.columns

Index(['Name', 'Date', 'Time (East Coast)', 'Location', 'Crewed or Uncrewed',
       'Launched?', 'High Temp', 'Low Temp', 'Ave Temp', 'Temp at Launch Time',
       'Hist High Temp', 'Hist Low Temp', 'Hist Ave Temp',
       'Percipitation at Launch Time', 'Hist Ave Percipitation',
       'Wind Direction', 'Max Wind Speed', 'Visibility',
       'Wind Speed at Launch Time', 'Hist Ave Max Wind Speed',
       'Hist Ave Visibility', 'Sea Level Pressure',
       'Hist Ave Sea Level Pressure', 'Day Length', 'Condition', 'Notes'],
      dtype='object')

# Data Cleaning

In [5]:
launch_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 300 entries, 0 to 299
Data columns (total 26 columns):
 #   Column                        Non-Null Count  Dtype         
---  ------                        --------------  -----         
 0   Name                          60 non-null     object        
 1   Date                          300 non-null    datetime64[ns]
 2   Time (East Coast)             59 non-null     object        
 3   Location                      300 non-null    object        
 4   Crewed or Uncrewed            60 non-null     object        
 5   Launched?                     60 non-null     object        
 6   High Temp                     299 non-null    float64       
 7   Low Temp                      299 non-null    float64       
 8   Ave Temp                      299 non-null    float64       
 9   Temp at Launch Time           59 non-null     float64       
 10  Hist High Temp                299 non-null    float64       
 11  Hist Low Temp                 29

Observation: <br>
* Hist Ave Max Wind Speed, Hist Ave Visibility, and Hist Ave Sea Level Pressure have no data. 
* Wind Speed at Launch Time, Temp at Launch Time, Launched, Crewed or Uncrewed, Time, and Name have only 60 values, because the data includes only 60 launches. 

Ways that data will be cleaned: <br>
* The rows that don't have Y in the Launched column didn't have a rocket launch, so make those missing values N.
* For rows missing information on whether the rocket was crewed or uncrewed, assume uncrewed. Uncrewed is more likely because there were fewer crewed missions.
* For missing wind direction, mark it as unknown.
* For missing condition data, assume it was a typical day and use fair.
* For any other data, use a value of 0.

In [6]:
## To handle missing values, we will fill the missing values with appropriate values 
launch_data['Launched?'].fillna('N',inplace=True)
launch_data['Crewed or Uncrewed'].fillna('Uncrewed',inplace=True)
launch_data['Wind Direction'].fillna('unknown',inplace=True)
launch_data['Condition'].fillna('Fair',inplace=True)
launch_data.fillna(0,inplace=True)
launch_data.head()

Unnamed: 0,Name,Date,Time (East Coast),Location,Crewed or Uncrewed,Launched?,High Temp,Low Temp,Ave Temp,Temp at Launch Time,Hist High Temp,Hist Low Temp,Hist Ave Temp,Percipitation at Launch Time,Hist Ave Percipitation,Wind Direction,Max Wind Speed,Visibility,Wind Speed at Launch Time,Hist Ave Max Wind Speed,Hist Ave Visibility,Sea Level Pressure,Hist Ave Sea Level Pressure,Day Length,Condition,Notes
0,0,1958-12-04,0,Cape Canaveral,Uncrewed,N,75.0,68.0,71.0,0.0,75.0,55.0,65.0,0.0,0.08,E,16.0,15.0,0.0,0.0,0.0,30.22,0.0,10:26:00,Cloudy,0
1,0,1958-12-05,0,Cape Canaveral,Uncrewed,N,78.0,70.0,73.39,0.0,75.0,55.0,65.0,0.0,0.09,E,14.0,10.0,0.0,0.0,0.0,30.2,0.0,10:26:00,Cloudy,0
2,Pioneer 3,1958-12-06,01:45:00,Cape Canaveral,Uncrewed,Y,73.0,0.0,60.21,62.0,75.0,55.0,65.0,0.0,0.09,NE,15.0,10.0,11.0,0.0,0.0,30.25,0.0,10:25:00,Cloudy,0
3,0,1958-12-07,0,Cape Canaveral,Uncrewed,N,76.0,57.0,66.04,0.0,75.0,55.0,65.0,0.0,0.08,N,10.0,10.0,0.0,0.0,0.0,30.28,0.0,10:25:00,Partly Cloudy,0
4,0,1958-12-08,0,Cape Canaveral,Uncrewed,N,79.0,60.0,70.52,0.0,75.0,55.0,65.0,0.0,0.09,E,12.0,10.0,0.0,0.0,0.0,30.23,0.0,12:24:00,Partly Cloudy,0


In [7]:
launch_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 300 entries, 0 to 299
Data columns (total 26 columns):
 #   Column                        Non-Null Count  Dtype         
---  ------                        --------------  -----         
 0   Name                          300 non-null    object        
 1   Date                          300 non-null    datetime64[ns]
 2   Time (East Coast)             300 non-null    object        
 3   Location                      300 non-null    object        
 4   Crewed or Uncrewed            300 non-null    object        
 5   Launched?                     300 non-null    object        
 6   High Temp                     300 non-null    float64       
 7   Low Temp                      300 non-null    float64       
 8   Ave Temp                      300 non-null    float64       
 9   Temp at Launch Time           300 non-null    float64       
 10  Hist High Temp                300 non-null    float64       
 11  Hist Low Temp                 30

Observation: We now have a clearer datset to work with. 

# Data Manipulation

In [8]:
## As part of the data cleaning process, we have to convert text data to numerical because computers understand only numbers
label_encoder = preprocessing.LabelEncoder()

# Three columns have categorical text info, and we convert them to numbers
launch_data['Crewed or Uncrewed'] = label_encoder.fit_transform(launch_data['Crewed or Uncrewed'])
launch_data['Wind Direction'] = label_encoder.fit_transform(launch_data['Wind Direction'])
launch_data['Condition'] = label_encoder.fit_transform(launch_data['Condition'])

In [9]:
launch_data.head()

Unnamed: 0,Name,Date,Time (East Coast),Location,Crewed or Uncrewed,Launched?,High Temp,Low Temp,Ave Temp,Temp at Launch Time,Hist High Temp,Hist Low Temp,Hist Ave Temp,Percipitation at Launch Time,Hist Ave Percipitation,Wind Direction,Max Wind Speed,Visibility,Wind Speed at Launch Time,Hist Ave Max Wind Speed,Hist Ave Visibility,Sea Level Pressure,Hist Ave Sea Level Pressure,Day Length,Condition,Notes
0,0,1958-12-04,0,Cape Canaveral,1,N,75.0,68.0,71.0,0.0,75.0,55.0,65.0,0.0,0.08,0,16.0,15.0,0.0,0.0,0.0,30.22,0.0,10:26:00,0,0
1,0,1958-12-05,0,Cape Canaveral,1,N,78.0,70.0,73.39,0.0,75.0,55.0,65.0,0.0,0.09,0,14.0,10.0,0.0,0.0,0.0,30.2,0.0,10:26:00,0,0
2,Pioneer 3,1958-12-06,01:45:00,Cape Canaveral,1,Y,73.0,0.0,60.21,62.0,75.0,55.0,65.0,0.0,0.09,2,15.0,10.0,11.0,0.0,0.0,30.25,0.0,10:25:00,0,0
3,0,1958-12-07,0,Cape Canaveral,1,N,76.0,57.0,66.04,0.0,75.0,55.0,65.0,0.0,0.08,1,10.0,10.0,0.0,0.0,0.0,30.28,0.0,10:25:00,6,0
4,0,1958-12-08,0,Cape Canaveral,1,N,79.0,60.0,70.52,0.0,75.0,55.0,65.0,0.0,0.09,0,12.0,10.0,0.0,0.0,0.0,30.23,0.0,12:24:00,6,0


Observation: We have data in a format that can be explored, manipulated and presented.

# Further Exploration

Ways that the data exploration ournay can be extended include:<br>
* Explore the data further: Look up articles and reports on each launch. Were there considerations made about weather before launch? Were there weather conditions around these dates that might have been worrisome?
* Explore the missing weather data: What about the dates that NASA didn't choose to launch rockets? Beyond individual days, were there seasons that NASA avoided? What kind of weather profile do those seasons tend to have?
* Explore the missing launch data: Can you find data on launches that were pushed because of weather? Is there data about other countries' launches that you can incorporate?
* Explore other data manipulations: Could we have used better values to fill in missing data?
* Decide what data you would want: If you had access to NASA's subject matter experts and data sources, what do you think would be most important to making a launch or push decision? If you could ask an expert something, what would it be?
* Evaluate similar problems: Are there similar problems that you can use to help fill in this data? For example, are airplane delays because of weather in the area also an indicator?