# Exercise 2
- multiple files - pitch_at_liftoff

Reading multiple files to extract pitch from multiple flights.
In this exercise we will take out the pitch_at_liftoff from all the flights by specificing functions to be called in order called on each of the files found.



### Reading multiple files using the os library

Using the walk function which walks through a specific directory to give the folders, files for that specific case.
```python
import os
for root, dirs, files in os.walk(directory):
```
Takes out the directories (folders) and all files within the repspective folder.


In [3]:
import os

def get_filenames_in_directory(directory):
    rawfiles = []

    # os.walk() function - https://www.tutorialspoint.com/python/os_walk.htm
    for root, dirs, files in os.walk(directory):
        for filename in files:
            if '.csv' in filename:
                rawfiles.append(filename)
    return rawfiles

In [4]:
raw_dir = '../data/raw/'
rawfiles = get_filenames_in_directory(raw_dir)

### Reading just one file to test the logic of our settings.

In this case we specificy that by `rawfiles[0]`
```python
flight_df = pd.read_csv(raw_dir + rawfiles[0])
```


In [5]:
import pandas as pd

# Calling the pandas method for reading a csv.
flight_df = pd.read_csv(raw_dir + rawfiles[0])

### Function for getting the pitch at liftoff, explaination of one of the lines


For all the x > 30, when we loop through the values of `GROUND_SPEED for row i until i+3`.

NOTE:  
`csv_data['GROUND_SPEED'][0:3]`, will only consider the 0,1,2 frames.

```python
if all(x > 30 for x in csv_data['GROUND_SPEED'][i:i+3]):
```

In [17]:
# NOTE:
# This is only a simplified version of pitch_at_liftoff. You can specify or try out however you would like.
def get_frame_number_at_last_second_on_ground(csv_data):
    # setting variable
    frame_number_at_liftoff = 0

    for i, _squat_nose in enumerate(csv_data['SQUAT_NOSE']):

        # setting variables to be used in logic
        have_liftoff = False
        stable_gs = False
        stable_pitch = False
        nose_left_ground = False

        # check ground speed > 30 for 3 secs
        if 'GROUND_SPEED' in csv_data:
            # check through all of the row values for ground speed 3 secs ahead
            if all(x > 30 for x in csv_data['GROUND_SPEED'][i:i+3]):
                stable_gs = True

        # check pitch > 4 for 3 secs
        if 'PITCH' in csv_data:
            # check through all of the row values for pitch 3 secs ahead
            if all(x > 4 for x in csv_data['PITCH'][i:i+3]):
                stable_pitch = True

        if 'SQUAT_NOSE' in csv_data:
            # check if the squat has left the ground
            if csv_data['SQUAT_NOSE'][i] == 0:
                # now we know that the squat left the ground
                nose_left_ground = True
                
        # setting a one line logic statement (preferably)
        if nose_left_ground and (stable_gs or stable_pitch):
            have_liftoff = True

        # savign the frame at that value
        if have_liftoff:
            frame_number_at_liftoff = i
            break

    # we know when the flight has liftoff, so we will consider the frame just the second before.
    return frame_number_at_liftoff - 1

In [18]:
def get_pitch_at_liftoff(csv_data):
    
    # specify output dictionary
    output = {}
    
    # get the frame_number for last_second
    last_second_on_ground = get_frame_number_at_last_second_on_ground(csv_data)
    
    # loop through the pitches
    pitches = []

    for pitch_column in ['PITCH', 'PITCH_2', 'PITCH_3', 'PITCH_4']:
        pitches.append(csv_data[pitch_column][last_second_on_ground])

    # saving the output
    output['pitch_at_liftoff'] = min(pitches)
    output['frame_number_at_last_second_on_ground'] = last_second_on_ground
    
    return output

In [19]:
get_pitch_at_liftoff(flight_df)

{'pitch_at_liftoff': 0.0, 'frame_number_at_last_second_on_ground': 798}

# Putting it all together

0. Test with one file
1. Getting the important frame_number
2. From that frame_number, get the logic for getting values from that point
3. If everything is in functions; we can apply that to all of the files in one directory.

In [20]:
get_filenames_in_directory(raw_dir)

['2018-03-02--04_9V-SMP_A-350-122_SIA_0236__fda86edc-767f-4526-8433-e4e9751042cf.csv',
 '2018-03-04--00_9V-SSH_A-330-1648_SIA_0930__e9caf21c-df80-4eb0-a4f4-9bfdbf0c1e15.csv']

### Using this filename directory function together with the function for creating output



In [21]:
analysis_data = []

for filename in get_filenames_in_directory(raw_dir):
    # reading the flight
    flight = pd.read_csv(raw_dir + filename)
    
    # getting pitch from the function
    pitch_output = get_pitch_at_liftoff(flight)
    
    # appending that to a list (array) for further analysis
    analysis_data.append(pitch_output)
analysis_data

[{'pitch_at_liftoff': 0.0, 'frame_number_at_last_second_on_ground': 798},
 {'pitch_at_liftoff': 8.0859375, 'frame_number_at_last_second_on_ground': 890}]

# Further programming

Using the analysis data into a dataframe. This is how you get the analysis output into another set of a dataframe for further investigation. As a way of doing further analysis

In [22]:
analysis_df = pd.DataFrame.from_records(analysis_data)
analysis_df

Unnamed: 0,frame_number_at_last_second_on_ground,pitch_at_liftoff
0,798,0.0
1,890,8.085938


A typical thing when looking at parameters is checking the statistics for all the values. This can be done via a simple method for pandas in `df.describe()`

In [23]:
analysis_df.describe()

Unnamed: 0,frame_number_at_last_second_on_ground,pitch_at_liftoff
count,2.0,2.0
mean,844.0,4.042969
std,65.053824,5.717621
min,798.0,0.0
25%,821.0,2.021484
50%,844.0,4.042969
75%,867.0,6.064453
max,890.0,8.085938
