# Mission Command Data Analysis - Pre-Part - Preparing the Data for Use
## By: Matthew Jacobsen


Welcome to the Mission Command Data Analysis project walkthrough.  This series is designed to illustrate some of the python techniques that can (and have) been used in the analysis of mission command data.  More description is available in Part 1 as to what this actually entails.  This part is a precursor to Part 1, as will use some pre-built tools included in the portfolio file for this series.  The goal of these tools is the following:

- Generate a set of fictional data to practice our analytical skills on

- Convert that fictional data into a format that requires some processing to extract

- Save that data for use in the remainder of this series

As it is beneficial to walkthrough the entire process, I would recommend downloading the files discussed in each section, so that you can generate your own fictional data and walkthrough the process, while keeping your conclusions separate from my own.  Keep in mind that, while the data we are creating is fictional, we are not manipulating it in order to get specific numbers.  Randomness has been injected in as many places as possible to ensure the analysis is somewhat more realistic that it might otherwise be. 

### Generating the Data

Files for this section: fictional_data_maker.xlsx

The excel file listed above is used in this section to first generate fictional the general description of how to use this spreadsheet is documented in the various tabs.  Start on the network_data tab to generate your fictional network data, then copy without headers and paste values into the third tab, where you will save out as CSV data.  Name it whatever you wish.  For this series, we will use 'fictional_network_data.csv'.  Then copy and paste values in the indicated columns on the mcwidget_data tab.  The first several columns will have autogenerated the widgetized data, with some random errors and time delays.  As with the network data, copy and paste these values and save with a different name as CSV.  For this series, we will use 'fictional_mcwidget_data.csv'. 

### 'Network'-izing the Data

In order to make the data appear similar to what we'd likely see measured from a network, we'll need to work our data into that format.  For internal data, that will mean our data looks something like a JavaScript Object Notation message (as an example), with the measured time being the key for that entry.  For external data, we'll have a text-based string with the values encapsulated.  In the end, we'll have to pickled dictionary objects for both internal and external data that we can use in the remainder of this series.  So, let's get started

In [1]:
import json
import pickle
import csv
from csv import reader
import os

os.chdir(r'C:\Users\Matt\Documents\Personal File\Python Code Packages\Data-Science-Portfolio\Mission Command Data Analysis')

In [2]:
with open('fictional_network_data.csv','r') as network_in:
    csv_reader = reader(network_in)
    network_data_list = list(csv_reader)

with open('fictional_mcwidget_data.csv','r') as widget_in:
    csv_reader = reader(widget_in)
    widget_data_list = list(csv_reader)

In [3]:
widget_data_list[:5]

[['9', '10', '1500', '38.8898', '77.0526', '31', '1586526756300'],
 ['10', '7', '1672', '38.8896', '77.0455', '23', '1586414317326'],
 ['24', '24', '24', '24', '24', '24', '1586703862239'],
 ['10', '5', '1500', '38.8896', '77.0474', '17', '1586565377077'],
 ['24', '24', '24', '24', '24', '24', '1586863937247']]

Alright, so we know what we are pulling in and we've shown that this method gets us a list of lists.  So, how can we format this for our use?  We'll need to do the two separately, so let's start with the network data.  This is intended to be a string, with the measured time as the key.  Here's how we'll do this:

In [4]:
network_dict = {}
for row in network_data_list:
    measured_time = row[6]
    text_string = 'To:{} \\n From:{} \\n Altitude:{} \\n Latitude:{} \\n Longitude:{} \\n Speed:{} \\n'.format(row[0],row[1],row[2],row[3],row[4],row[5])
    network_dict[measured_time] = text_string

In [5]:
network_dict[list(network_dict.keys())[2]]

'To:3 \\n From:7 \\n Altitude:1500 \\n Latitude:38.8892 \\n Longitude:77.0463 \\n Speed:25 \\n'

Now let's do the same for the widget data!  This data will be in json format (hence the json library), but we can use much the same method.  

In [6]:
widget_dict = {}
for row in widget_data_list:
    measured_time = row[6]
    json_string = {"to":row[0],"from":row[1],"altitude":row[2],"latitude":row[3],"longitude":row[4],"speed":row[5]}
    widget_dict[measured_time] = json.dumps(json_string)

In [7]:
widget_dict[list(widget_dict.keys())[2]]

'{"to": "24", "from": "24", "altitude": "24", "latitude": "24", "longitude": "24", "speed": "24"}'

Now, all that is left is to export the dictionaries as pickle files and we are ready for our analysis in the next part of this series.  

In [8]:
with open('pickled_widget_data.pkl','wb') as widget_out:
    pickle.dump(widget_dict, widget_out, protocol=2)

with open('pickled_network_data.pkl','wb') as network_out:
    pickle.dump(network_dict, network_out, protocol=2)

You should now have two pickle files ready for use in the remainder of this series.  See you in the next installment!