# Mission Command Data Analysis - Part 1 - Obtaining the Data
## By: Matthew Jacobsen


When evaluating automated mission command systems, the source of data for users is the Common Operating Picture, or COP.  This satellite-type overview appears to the user similar to Google Maps (which is what we will use for this demonstration), with overlaid data of multiple types. One aspect used to assess the effectiveness of any system purporting to provide such mission command capabilities is how accurate the information presented to the user is.  There are two components to this:

- Processing of the data in order to prepare for display

- Display of the data to the user

In this series, we will cover how to accomplish this using some fictional data, from the initial processing through to the conclusion generation.  While the data used is fictional, the process illustrated is the same that would be used in the actual analysis for a real system.

### Overview

Before getting too much into the weed on this, let's start by defining what we are talking about.  Mission Command is, according to the US Army, 

*The exercise of authority and direction by the commander using mission orders to enable disciplined initiative within the commander's intent to empower agile and adaptive leaders in the conduct of unified land operations.*

In non-military terms, it is ensuring that the commander of a given unit is able to express what he wants done, and have the remainder of the unit work to make that happen.  That said, one key component of exercising the commander's intent is to be aware of what is occurring on the battlefield.  Often, this is shown to the commander via a overhead map view with graphics displayed on top to indicate where things are happening and where he/she has resources available for tasks.  

So, think something like that displayed below in the figure.

<img src='example_cop1.png'>

<center> Figure 1 : Example Common Operating Picture </center>

So, for the purposes of this example project, we will use the graphic above and assume that our assessment scenario is a unit (blue circles) protecting the Washington DC National Mall from invading forces (red diamonds).  The scenario has already played out and we know what happened, but we want to understand how well the mission command tool used by the defending unit aided them.  So, let's imagine we have a setup below for our tool, which we will call MCWidget. This system has an internal data type and needs to convert that into a simpler data type in order to communicate with other systems over some network.  Figure 2 shows an example flow of data for this system. 

<img src='example_dataflow.png'>

<center> Figure 2 : Example Data Flow </center>

So, in order to assess the MCWidget, we need to know that the translations are happening correctly and that the data showed to the Soldier corresponds to that value.  

### Importing and Normalizing "Instrumented" Data

For this example, we have three data sources, which include:

- Screen Capture Information of the Soldier's Terminal (we will be using Figure 1 for this purpose)

- Example Network Data (Simple Text Message Format)

- Example Translated Data (JavaScript Object Notation Messages)

Let's import our simulated data for this and see what we have. 

*As a note for the reader, the data used in this simulated experiment was generated by establishing a range of acceptable values and allowing all parameters to vary randomly within this range (thanks to numpy).  Additionally, the delivery time ranges will vary up to 15 seconds, with the occasional value being more than 60 seconds off.  In a similarly random fashion, some of the messages will translate incorrectly, to provide us some interesting data to analyze.*

The sample data used for this exercise is available for download, if you would like to explore.  The files are pickled_mcwidget_data.pkl (translated data in widget) and pickled_network_data.pkl (network).  For this example, our internal data will be in JavaScript object notation (hence the json package) and our network data will string text.  The first step is to manipulate the data into a dataframe, for easier handling and numerical capabilities.

In [1]:
import pickle
import pandas as pd
import json
import re

In [2]:
with open('pickled_widget_data.pkl','rb') as widget_data:
    in_data = pickle.load(widget_data)
    widget_df = pd.DataFrame()
    for key in in_data.keys():
        data = {key:json.loads(in_data[key])}
        row = pd.DataFrame(data)
        widget_df = pd.concat([row.T,widget_df])

print(in_data[key])
widget_df.index = widget_df.index.set_names(['Internal Observed Time'])
widget_df = widget_df.reset_index()

{"to": "4", "from": "5", "altitude": "1500", "latitude": "38.8897", "longitude": "77.0206", "speed": "25"}


In [3]:
widget_df.head()

Unnamed: 0,Internal Observed Time,altitude,from,latitude,longitude,speed,to
0,1586214751618,1500,5,38.8897,77.0206,25,4
1,1586526747179,1500,2,38.8899,77.0592,31,8
2,1586551614191,835,8,38.8892,77.0481,28,5
3,1586515475433,1186,8,38.8895,77.0353,25,1
4,1586271183767,1834,7,38.8898,77.0353,16,7


In [4]:
with open('pickled_network_data.pkl','rb') as network_data:
    in_data = pickle.load(network_data)
    columns = ['External Observed Time','To','From','Altitude','Latitude','Longitude','Speed']
    network_df = pd.DataFrame()
    for key in in_data.keys():
        text_message = in_data[key]
        split_text_message = text_message.split(' \\n ')
        concat_data = [int(key)]
        for entry in split_text_message:
            value = entry[entry.find(':')+1:]
            concat_data.append(value)
        out_row = pd.DataFrame(concat_data)
        network_df = pd.concat([out_row.T, network_df],ignore_index=True)

network_df.columns=columns
network_df['Speed'] = network_df['Speed'].apply(lambda x : re.sub(r' \\n','',x))

In [5]:
network_df.head()

Unnamed: 0,External Observed Time,To,From,Altitude,Latitude,Longitude,Speed
0,1586214746999,4,5,1500,38.8897,77.0206,25
1,1586551601348,5,8,835,38.8892,77.0481,28
2,1586515475138,1,8,1186,38.8895,77.0353,25
3,1586271176350,7,7,1834,38.8898,77.0353,16
4,1586627112164,6,10,2109,38.8895,77.0191,28


From these, we now have DataFrame objects containing the information within each of these pretend datasets.  An example of what each raw data element looked like is printed after the cell preceding the DataFrame itself.  As will be shown in the next part of this example project, these can be combined by minimizing the observed time differences and then checking the remaining elements. However, as we said at the beginning, this only gets us to a conclusion regarding the *translation* of the data.  It says nothing regarding what was displayed to the user.  

### Extracting Data from a Map-based Graphic for Comparison

In order to make that last leap to connect the internal data with what is on the screen, we need to be able to extract data from a screen capture image.  In order to do this, we will need two reference points that have both vertical and horizontal spread to them. By looking at the map, we can use the geographic coordinates of the US Capitol Building and the White House. For reference,  the publicly available coordinates for these locations are:

- U.S. Capitol Building at 38.8899° N, 77.0091° W or [38.8899, -77.0091]

- White House at 38.8977° N, 77.0365° W or [38.8977, -77.0365]

In order to do this, I have a code package written for this purpose called Image Data Extractor. Documentation for this package will be put up elsewhere, but a brief description is in order here. As seen in the interactions below, the package first asks for the image file that the user wants to analyze.  Then, it allows the user to either double-right click or double-left click on the map.  The image window looks like that below after the clicks logged in the code results following this narrative. 

<img src='annotated_example_cop1.png'>

<center> Figure 3 : Annotated Example Common Operating Picture </center>

The two insets in this figure display the markers put on the image as a result of the double clicks.  Over the White House, there is a green dot, to indicate this is a reference location in the image.  Over the blue graphics, a dark blue dot is placed, indicated these are targets for data extraction.  The data output below includes the calculated latitude and longitude, as well as the color values of the pixel clicked (for other analytical purposes).  

In [6]:
import ImageDataExtractor

In [7]:
ImageDataExtractor.main()

Enter the path to the image you wish to extract data from: C:\Users\Matt\Documents\Personal File\Python Code Packages\Data-Science-Portfolio\Mission Command Data Analysis\example_cop1.png
What type of reference are you selecting                        (1 = MGRS, 2 = Decimal Lat/Long)? 2
Please enter details of the reference                        (e.g. [41.999, -93.000]): [38.8977, -77.0365]
What type of reference are you selecting                        (1 = MGRS, 2 = Decimal Lat/Long)? 2
Please enter details of the reference                        (e.g. [41.999, -93.000]): [38.8899, -77.0091]
[38.8899, -77.03861611185087, (0, 176, 240, 255)]
[38.88915053380783, -77.03591624500666, (0, 176, 240, 255)]
[38.889566903914584, -77.03533249001332, (0, 176, 240, 255)]
[38.89003879003558, -77.03591624500666, (0, 176, 240, 255)]
[38.889566903914584, -77.02037376830893, (0, 176, 240, 255)]
[38.889511387900356, -77.01913328894807, (0, 176, 240, 255)]


In [6]:
map_data = [
    [38.8899, -77.03861611185087],
    [38.88915053380783, -77.03591624500666],
    [38.889566903914584, -77.03533249001332],
    [38.889566903914584, -77.02037376830893],
    [38.889434328358206, -77.03303736263737],
    [38.889511387900356, -77.01913328894807]
]

map_df = pd.DataFrame(map_data,columns=['Latitude','Longitude'])

map_df.head()

Unnamed: 0,Latitude,Longitude
0,38.8899,-77.038616
1,38.889151,-77.035916
2,38.889567,-77.035332
3,38.889567,-77.020374
4,38.889434,-77.033037


### Saving the Data for Analysis

Now that all of the data required is converted and stored as a DataFrame, we will store it for use in the next part of this overview.  In that portion, we will cover how to merge the data across these different sets and begin the analysis process. Until then, thank you for your attention.

In [7]:
with open('map_df.pkl','wb') as map_out:
    pickle.dump(map_df,map_out,protocol=2)

with open('network_df.pkl','wb') as network_out:
    pickle.dump(network_df, network_out, protocol=2)

with open('mcwidget_df.pkl','wb') as mcwidget_out:
    pickle.dump(widget_df, mcwidget_out, protocol=2)