## Objective
1. Visualize the launches in an informative way
2. Identify any outlier launches that should be reviewed
3. Identify any interesting patterns in the data (seasonality, poorly performing parts)

## 1. Visualize the launches in an informative way
- The data contains only 5 seconds prior to 15 seconds after launch
- I assume the presuption is that something strange might be happening after the zip is launched
- Need to detect outlier launches, which would mean that the distance or speed after the launch is not as high as it could be when one accounts for the wind speed

### a. load the data into a single dataframe

In [1]:
import pandas as pd
import re
from glob import glob
import yaml

from sklearn.preprocessing import MinMaxScaler
import plotly.plotly as py
import plotly.graph_objs as go

In [2]:
with open('creds.yaml', 'r') as f:
    creds = yaml.load(f)

In [3]:
import plotly
plotly.tools.set_credentials_file(username=creds['plotly']['username'], 
                                  api_key=creds['plotly']['apikey'])

In [4]:
hires_flight_csv = glob('../data/flight*.csv')

In [5]:
reg = '../data/flight_(\d*)'

In [6]:
hires_flight_data = pd.DataFrame()
for csv in hires_flight_csv:
    csv_data = pd.read_csv(csv)
    flight_number = re.match(reg, csv).group(1)
    csv_data['flight_id'] = int(flight_number)
    c = pd.concat([hires_flight_data, csv_data])

### b. Check that the range of values for the time after launch is the same
- This is important because if the distance traveled is the metric, then the time after the launch needs to be the same

In [7]:
max_time_after_launch = hires_flight_data.groupby('flight_id').\
                                          agg({'seconds_since_launch':'max'})

KeyError: 'flight_id'

In [None]:
max_time_after_launch.describe()

It looks like almost all of the data has a datapoint within one one hunderedth of the 15 second mark. This will make it easy to compare one flight with another.

One metric that could be important is the velocity after 15 seconds. This metric would be important if every flight was relatively straight. Plot the positions for a random set of flights 

### c. Plot the path of a sample of planes

In [None]:
flight_summaries = pd.read_csv('../data/summary_data.csv')

In [None]:
flight_sample = flight_summaries.sample(n=10, random_state=42)

In [None]:
hires_flight_data.columns

In [None]:
set(flight_summaries.columns).intersection(set(hires_flight_data.columns))

In [None]:
select_flights = flight_sample[['flight_id']].merge(hires_flight_data, 
                                    how='inner',
                                    on='flight_id')
select_flights.reset_index
select_flights.head(2)

In [None]:
coordinates = ['flight_id', 'seconds_since_launch', 'position_ned_m[0]', 'position_ned_m[1]',
       'position_ned_m[2]']

In [None]:
select_flight_groups = select_flights.groupby('flight_id')

In [None]:
traces = []
for flight, flight_data in select_flight_groups:
    temp_trace = go.Scatter3d(x=flight_data['position_ned_m[1]'],
                            y=flight_data['position_ned_m[0]'],
                            z=flight_data['position_ned_m[2]'] * (-1),
                            hovertext = flight_data['seconds_since_launch'],
                            mode='lines',
                            name=flight)
    traces.append(temp_trace)
layout = go.Layout(title='Trajectory for 10 random flights')
py.iplot(traces, layout=layout, filename='trajectory')

In [None]:
traces = []
for flight, flight_data in select_flight_groups:
    temp_trace = go.Scatter3d(x=flight_data['velocity_ned_mps[1]'],
                            y=flight_data['velocity_ned_mps[0]'],
                            z=flight_data['velocity_ned_mps[2]'],
                            hovertext = flight_data['seconds_since_launch'],
                            mode='lines',
                            name=flight)
    traces.append(temp_trace)
layout = go.Layout(title='Velocity for 10 random flights')
py.iplot(traces, layout=layout, filename='trajectory')

It looks like the flights are primarly along the same XY plane

## 2. Quantify the deviation from the flight path
First make sure that the number of datapoints that we get from each flight is the same

In [None]:
flight_data_points = hires_flight_data.groupby('flight_id').\
                                          agg({'seconds_since_launch':'count'})
flight_data_points.describe()

In [None]:
flights_with_good_data = flight_data_points.loc[flight_data_points.seconds_since_launch == 1001]
flights_with_good_data.shape

In [None]:
position_columns = ['seconds_since_launch', 'position_ned_m[0]', 'position_ned_m[1]','position_ned_m[2]',
                    'orientation_rad[0]', 'orientation_rad[1]', 'orientation_rad[2]']

In [None]:
downsampled_data = hires_flight_data.copy()
downsampled_data['seconds_since_launch'] = downsampled_data.seconds_since_launch.round(1)
flight_maxmin = downsampled_data[position_columns].groupby('seconds_since_launch').agg(['mean', 'std'])
cols = pd.Series(flight_maxmin.columns.tolist()).apply(pd.Series).sum(axis=1)
flight_maxmin.columns = cols
flight_maxmin.head()

In [None]:
data_and_agg = flight_maxmin.merge(downsampled_data[['flight_id', 'seconds_since_launch', 'position_ned_m[0]', 'position_ned_m[1]','position_ned_m[2]',
                    'orientation_rad[0]', 'orientation_rad[1]', 'orientation_rad[2]']], 
                                   left_index=True, 
                                   right_on='seconds_since_launch')    
data_and_agg.head()

In [None]:
for column in ['position_ned_m[0]', 'position_ned_m[1]','position_ned_m[2]',
                'orientation_rad[0]', 'orientation_rad[1]', 'orientation_rad[2]']:
    mean_values = data_and_agg[column + 'mean']
    std_values = data_and_agg[column + 'std']
    data_and_agg[column + '_scaled'] = (data_and_agg[column] - mean_values)/ (std_values)
    

In [None]:
data_and_agg.head(2)

In [None]:
data_and_agg.loc[(data_and_agg.flight_id == 17459) & (data_and_agg.seconds_since_launch == -5.0)]

## 3. Visualize the flight metrics with PCA

In [None]:
scaled_fleet_data = data_and_agg[['position_ned_m[0]_scaled', 
                                 'position_ned_m[1]_scaled', 
                                 'position_ned_m[2]_scaled', 
                                 'orientation_rad[0]_scaled', 
                                 'orientation_rad[1]_scaled', 
                                 'orientation_rad[2]_scaled']]