# **Section 1** - Web App
----
Our implementation of ZEBRA is done on the web using the python web framework Flask. This was so that the implementation wasn't dependant on any specific operating systems and could be accessed by anyone with a smart watch. Also  most members had experience developing in Python and prefered the language. 



## User Story: Setting up Web App
-----

### Motivation:

We used Flask as it is an easy to set up and use web framework. There’s no additional database configuration or server set up required. Once flask is installed, running the application is as simple as running the command ```flask run```

### Folder Structure

Part of setting up a flask application is making sure the folder structure is correct so that app knows how to navigate to which pages. The folder in which the Flask app is stored is named **zebra_web**. Since our git repository holds other folders, such as the watch data and our backend, our folder **Csc490_Zebra_Project** is not just a flask app. 

Inside the **zebra_web** directory there is the **\__init__\.py** file, a **templates** folder and **static** folder. The **\__init__\.py** is the main python file. All the html pages are stored within the **templates** folder. The static content such as css and the model are stored in the **static** folder.

```
Csc490_Zebra_Project
|
└─── zebra_watch
      |
      └─── __init__.py
           static
          |
          └─── css
          └─── model.joblib
           templates
          |
          └─── index.html
               login.html
               main.html
               ...
          
```

### Running Web App Locally 
In order to run a Flask server that is locally served up on your machine please run the following commands in your terminal, making sure you are in the root directory **Csc490_Zebra_Project**. The commands are written for windows. The environment sets variables differently if you’re using a different operating system. After running all of the commands open a browser and go to ```http://localhost:5000/```

In [0]:
cd zebra_web

pip install passlib
pip install flask
pip install numpy
pip install -U scikit-learn
pip install joblib

$env:FLASK_APP = "zebra_web"
$env:FLASK_ENV = "development"

flask run

Running the flask server for the first time will call the setup funciton, that creates and initates all the global variables that we use for storing data:

In [0]:
all_keys_pressed = {}
all_keys_released = {}
watch_data = {}
authenticate = {}

def setup():
    all_keys_pressed["data"] = []
    all_keys_released["data"] = []
    watch_data["data"] = []
    authenticate["data"] = []

setup()

We use dictionaries instead of lists because flask doesn't allow global lists, but does allow global dictionaries. Since we only have once user account, we didn't worry about the global variables being accessed by everyone. 

### References
1. https://flask.palletsprojects.com/en/1.1.x/tutorial/factory/
2. https://flask.palletsprojects.com/en/1.1.x/tutorial/layout/

## User Story: Login Request Form
-----

### Motivation:
Inorder to show continuous authentication for our app there needs to be a page where users can log in, so that when their input doesn’t match the watch data they will be logged out. 

### Design Consideration:
Since we aren’t worried about creating a login process with multiple users, the login functionality will simply be a web page that accepts a set of default credentials, username and password, and once authenticated the page will redirect users to the main page. There is no backend configuration or lookup that needs to be done. 

### Implementation:
Following a tutorial found online<sup>1,2</sup> create an html file, named `login.html`, that requires users to enter a username and password. Have a button that allows the user to submit their credentials validation request.
Clicking the button will submit a form with a POST request to the Flask web app. Below is snippet of the credentials form from the html page.


In [0]:
<form class="form-signin" action="login" method="POST" style="margin-bottom:50px;">
    <label for="inputUserName" class="sr-only">User Name</label>
    <input type="text" name="inputUserName" id="inputUserName" 
      class="form-control" placeholder="User name" required>

    <label for="inputPassword" class="sr-only">Password</label>
    <input type="password" name="inputPassword" id="inputPassword" 
      class="form-control" placeholder="Password" required>

    <div class="text-center">
        <button id="btnLogin" class="btn btn-lg btn-primary" type="submit">
            Login
        </button>
    </div>
</form>

The URL endpoint, when running locally, is ```http://localhost:5000/login```. This route is set up thanks to Flask, in the **\__init__\.py**, specifically this function. When the user accesses the page, which would be considered a `GET` request, the login page is displayed using the jinja `render_template()` method.

In [0]:
@app.route('/login', methods=['GET', 'POST'])
def login():
    error = None
    if request.method == 'POST':
        if sha256_crypt.verify(request.form['inputUserName'], userandpassword) and \
            sha256_crypt.verify(request.form['inputPassword'], userandpassword):
            return redirect(url_for('main'))
        else:
            flash('You entered the wrong credentials. Please try again', 'danger')
            error = 'Invalid Credentials. Please try again.'
    return render_template('login.html', error=error)


### Challenges
1. Understanding how POST request works, specifically within Flask, and how they can be utilized within forms
2. How to have a Flask function that displays a webpage to verify a POST request

### References
1. https://realpython.com/introduction-to-flask-part-2-creating-a-login-page/
2. https://code.tutsplus.com/tutorials/creating-a-web-app-from-scratch-using-python-flask-and-mysql--cms-22972


## User Story: Verify Login
-----

### Motivation:
There needs to be a process that once the user sends a request to login with a set of credentials they are verified and authenticated. 

### Design Consideration:
* There’s only 1 default credential, since we aren't worried about registering new users
* The credential can’t be stored in plain text, they need to be securely encrypted. 

### Implementation:
When the user submits their credentials the `POST` request is fired and the code inside the if statement is executed. If the username and password align are correct they will then be redirected to the **main.html** page with the ```redirect(...)``` function. 

In [0]:
if request.method == 'POST':
        if sha256_crypt.verify(request.form['inputUserName'], userandpassword) and \
            sha256_crypt.verify(request.form['inputPassword'], userandpassword):
            return redirect(url_for('main'))

We encrypted the credentials to use passlib<sup>1</sup>, a library that hashes content using **SHA256** algorithm. After encryption that ```sha256_crypt.verify(...)``` is what verifies the encrpyted password.

In [0]:
from passlib.hash import sha256_crypt
userandpassword = sha256_crypt.encrypt("admin")

### References
1. https://pythonprogramming.net/password-hashing-flask-tutorial/

## User Story: Login Error Flashing
-----

### Motivation:
If the user were to input the wrong credentials there should be a warning message that flashes letting users know they entered the wrong credential. 

### Design Consideration:
Display the text in some say that is pops and stands out. Inorder to achieve this look will be using BootStrap's alert styling.

### Implementation:
If the user enters the wrong credentials, a flash message is set in the ```login()``` function, in **\__init__\.py**.

In [0]:
@app.route('/login', methods=['GET', 'POST'])
def login():
    ...
    if request.method == 'POST':
        ...
        else:
            flash('You entered the wrong credentials. Please try again', 'danger')
            error = 'Invalid Credentials. Please try again.'
    return render_template('login.html', error=error)


When the page re-renders, **login.html**, there’s a section of code that will appear only when because the flash method was set earlier<sup>1</sup>. Below is the snippet of code that displays the error message about entering the wrong credentials within a red rectangle.

In [0]:
<div>
    <div>
        {% with messages = get_flashed_messages(with_categories=true) %}
        {% if messages %}
        {% for category, message in messages %}
        <div class="alert alert-{{ category }} alert-dismissible fade in" role="alert">
            <span>{{ message }}</span>
        </div>
        {% endfor %}
        {% endif %}
        {% endwith %}
      </div>
  </div>

### Challenges
1. Understanding how flash works within Flask, since flash is specifically designed for displaying error messages or giving feedback to users after an action has been done.


### References
1. https://flask.palletsprojects.com/en/1.1.x/patterns/flashing/

## User Story: Main Page
-----

### Motivation:
Once the user is logged in they need a page that they will be redirected to, which is the main page. The main page also serves another purpose and that it displays information that we need at the moment. In the end the main page will display information about keyboard input. 

### Design Consideration:
Since we need a way to keep track of keys that the user inputs there needs to be something that allows the user to type on the webpage, such as a text box 

The keyboard input should display information such as the timestamp of the key, which key was pressed, along with the specific action (key press, key up, etc.)

### Implementation:
The endpoint for the function is `http://localhost:5000/main`. Like the login, the endpoint is mapped in the **\__init__\.py**, to a function named `main()`. All that is being done by the function is the rendering of that html page. 


In [0]:
@app.route('/main')
def main():
    return render_template('main.html')

The html page, **main.html** has various parts to it. First, the main page contains a div that has the text area that the user can type in and we can record the data. 

In [0]:
<div>
    <h4 id="message" stye=“margin-top: 200px;”> Please type something...</h4>
    <textarea id="textarea" rows="4" cols="100"> </textarea>
</div>

Another chunk of the code is regarding the code regarding the keylogging aspect, which will be explained in other user stories. The code regarding this is in the `textarea.addEventListener(...)`.

There is another set of divs under the event listeners that are empty at the moment. These divs are placeholders to display text regarding keyboard input; the html within the divs gets updated whenever there’s any keyboard input.

In [0]:
<div class="container">
    <div class="row">
        <div class="col-md-2">
            <p> Timestamp </p>
        </div>
        <div class="col-md-2">
            <p> Key action </p>
        </div>
        <div class="col-md-2">
            <p> Key </p>
        </div>
    </div>
    <div class="row">
        <div class="col-md-2">
            <p id="timestamp"> </p>
        </div>
        <div class="col-md-2">
            <p id="keyaction"> </p>
        </div>
        <div class="col-md-2">
            <p id="key"> </p>
        </div>
    </div>
</div>

There’s a snippet of code inside that event listener that updates the html text. It grabs the element Id and changes the inner html. It prepends so that the newest information shows on the top of the screen. Due to spacing and formatting, prepend wasn't working so neatly for `key` and so needed to manually prepend as shown below.


In [0]:
document.getElementById('timestamp').prepend(timestamp + "\n");
document.getElementById('keyaction').prepend("keypress\n");
document.getElementById('key').innerHTML = event.key + "<br>" + document.getElementById('key').innerHTML;

## User Story: Log Out - Inactivity 
-----

### Motivation:
When the user hasn’t interacted with the system for a set amount of time, i.e. 1 minute, log them off of the system. As part of an authentication process, if the user hasn’t been interacting with the system for some time it’s an indicator that the user isn’t using the system anymore, they have stepped away or are using a different app, but for security purposes they should be logged off.  

### Design *Consideration*:
When the user is logged off they should be redirected to a page that notifies them that they’ve been logged out. From there the user can then login using the nav bar login button. 

### Implementation:
The user will be log out when they’re on the main page therefore the event needs to be fired from the **main.html** page. Below is a script inside **main.html** that sets an event to click the logout button 60 seconds after the last keypress or mouse movement<sup>1</sup>. Whenever there’s a key press or mouse movement the event is reset and so the logout functionality will only execute after 60 of the last event. 


In [0]:
<script>
    let timeout = null;
    window.addEventListener('keyup', function (e) {
        clearTimeout(timeout);
        timeout = setTimeout(function () {
            document.getElementById("logout").click();
        }, 60000);
    });
    window.addEventListener('mousemove', function (e) {
        clearTimeout(timeout);
        timeout = setTimeout(function () {
            document.getElementById("logout").click();
        }, 60000);
    });
</script>

After the logout button is clicked, it redirects the user to a logout page, **logout.html** that simply has text saying `You've been logged out` and a button that allows the user to log back in. 

In [0]:
<div class="jumbotron">
    <h1 style="text-align: center;">You've been logged out</h1>
    <div class="text-center">
        <a class="btn btn-lg btn-primary" href="login" role="button">Login</a>
    </div>
</div>

The logout page is rendered by the logout function in the **\__init__\.py** function. The endpoint of this URL is `http://localhost:5000/logout`. 

When the user first lands on the page, only the last line is executed which renders the **logout.html** page. The if statement regarding the POST method executes when the user hits the login button on the page. The login button triggers a POST request. Once received it will redirect users to the login page, **login.html**. 

In [0]:
@app.route('/logout', methods=['GET', 'POST'])
def logout():
    if request.method == 'POST':
        return redirect(url_for('login'))

    return render_template('logout.html')

### Challenges
- Figuring out how to create a function that would timeout or trigger some event after some time. Luckily found a simple tutorial online. 
- How to trigger or render another webpage
  - Work around was simulating a click on the logout button
- How to communicate to a function that a button was pressed and do some action
  - Work around was to use a POST request to accept an incoming button request

### References
1. https://schier.co/blog/wait-for-user-to-stop-typing-using-javascript


## User Story: Log Out - Different user 
-----

### Motivation:
Using the predictions from our model (values of which are retrieved from other User Stories). Create some sort of function that when keystroke movements don't align with the predicts then log off the user. 

### Design *Consideration*:
Need to take the ```prediction()``` function and it’s value into consideration when designing this functionality. Need to figure out a threshold for how many false positives can be allowed. If x number of predictions are positive, leave the user logged in otherwise log them out. 

### Implementation:
Unfortunately we were not able to implement this due to predictions being consistently accurate. The user would have been logged off the moment they were logged in.


## User Story: Keyboard Data Collection (User Side)
-----

### Motivation:
The keyboard data is the primary information needed to be paired with the smart watch information. We need to collect keystroke data so it can be used with our model. The data needs to be collected as the user is typing, and then sent to the Flask server, where the information can be further processed and formatted.

### Design Consideration:
Capture keyboard data using a text box which was set up in _User Story: Home Page_, in the **main.html** page. An event listener needs to be added to the text box so that the key event can be recorded. 

The formatting of the keyboard data will be:


In [0]:
{'timestamp': timestamp, 'action': 'keypress' or 'keyup', 'key': event.key};

- **timestamp** is the Unix time when the keyEvent occurred
- **action** is either "Keyup" or "Keydown"
- **key** is the actual letter, special character or number pressed. 

_Note:_ The **action** is used in another user story to help separate these keyEvents. After formatting and logging information to the web page(which is taken care of by the _User Story: Home Page_) send the keyStrokes in a POST. The data will then be taken care of in the Flask server and further refined 

### Implementation:
We implement key listeners in javascript for `keyup` and `keypressed` actions, which were added to a textbox where we have the user inputting text. 

Here is an example for the `keyup` event listener:


In [0]:
textarea.addEventListener('keyup', function (event) {

    var timestamp = Date.now();
    keystrokes = {'timestamp': timestamp,'action':'keyup','key':event.key};
    
    # ...

In the event listener we have an ajax call that makes the POST request with the keyboard input data. The POST request is received by the flask server, in **\__init__\.py** under the `get_keystrokes` function. 


In [0]:
$.ajax({
        type: "POST",
        url: "/keyboard",
        contentType: "application/json;charset=UTF-8",
        dataType: "json",
        data: JsonKeys,
    }).done(function (data) {
        console.log(data);
        console.log("successed, keyup");
        //success, fires when the ajax call is complete
    }).fail(function (err) {
        console.log(err);
        console.log("failed keyup");
    });

## User Story: Keyboard Data Collection (Server Side)
-----

### Motivation:

Once we received our keyboard data on the flask server, we want to format it in a way that will make it easier to predict on. 

### Design Considerations: 

Since our neural network is predicting sequences that start from the first `keyup` event and end with the next `keypress` event we need to seperate the incoming data by key events. We also need to assign each key a group based on how close they are to each other, in the same way that they were grouped when we trained the neural net. 

### Implementation:
When we have an incoming POST request at the /keyboard route the `get_keystrokes` function extracts the key event type, the key that the event was triggered on, and the time when the event was triggered (in unix time).


In [0]:
if request.method == 'POST':
    data = request.get_json()
    if (data):
        event = data["action"]
        key = data["key"].lower()
        timestamp = int(data["timestamp"])
        seperate_key_events(timestamp, event, key)

In that function the `seperate_key_events` function gets called with the parameters (timestamp, event, key). 

In `seperate_key_events` we seperate the `keypress` events and `keyup` events into two seperate lists. This is so that the same keys line up at the same index. 



In [0]:
if event == "keypress": 
  all_keys_pressed["data"].append([timestamp, key)])
 elif event == "keyup": 
   all_keys_released["data"].append([timestamp, key)])

# After a few keys have been pressed, the lists will look something like this:
# >>> all_keys_pressed["data"]
# [[1585799543415, "I"], [1585799543724, " "], [1585799544082, "a"]]
# >>> all_keys_released["data"]
# [[1585799543467, "I"], [1585799543771, " "], [1585799544138, "a"]]

We filter out keys that do not have `keypressed` events by addings the condition `and key not in special_keys["keys"]` in our if statements.

In [0]:
special_keys = {"keys": ["shift", "control", "alt", "meta", "backspace", "arrowleft", "arrowright", "arrowup", "arrowdown"]}

if event == "keypress" and key not in special_keys["keys"]:
    # ...
if event == "keyup" and key not in special_keys["keys"]:
    # ...

Finally we need to group our keys based on locations. The way we mapped each keys was as follows:

In [0]:
keys_mapped = {'1': 1, '2': 1, '3': 1, '!': 1, '@': 1, '#': 1, '4': 2, '5': 2,
               '6': 2, '$': 2, '%': 2, '^': 2, '7': 3, '8': 3, '&': 3, '*': 3,
               '9': 4, '0': 4, '(': 4, ')': 4, '-': 5, '_': 5, '=': 5, '+': 5,
               'backspace': 5, 'q': 6, 'tab': 6, 'w': 7, 'e': 7, 'r': 7, 't': 8,
               'y': 8, 'u': 9, 'i': 9, 'o': 9, 'p': 9, '[': 10, ']': 10,
               '\\': 10, '{': 10, '}': 10, '|': 10, 'a': 11, 's': 11, 'd': 11,
               'f': 12, 'g': 12, 'h': 12, 'j': 13, 'k': 13, 'l': 13, ';': 14,
               "'": 14, '"': 14, ':': 14, 'enter': 14, 'z': 15, 'x': 15, 'c': 15,
               ' ': 16, 'v': 16, 'b': 16, 'n': 16, 'm': 16, ',': 17, '.': 17, '/': 17,
               '<': 17, '>': 17, '?': 17, 'arrowleft': 18, 'arrowright': 18,
               'arrowup': 18, 'arrowdown': 18} 

When storing our key presses in the `all_keys_pressed` and `all_keys_released` dictionaries, we store them based on the numerical number that they get mapped to. If the key isn't in the list, it gets mapped to "other" or just 19. 

Here is the code with that addition for key pressed event:

In [0]:
all_keys_pressed["data"].append([timestamp, keys_mapped.get(key, 19)])

## User Story: Get and Mapping Sequences
-----

### Motivation:

When we have collected enough typing data and watch data for a batch, we would like to begin prepping our data to be used for predictions. We want to format the watch data in a batch of sequences from when a key is released to when the next key is pressed.

### Design Considerations: 

* We need to choose a batch size that is small enough to be able to quickly start predicting, but also big enough to compensate for the watch data delay. 

We decided to make our model predict on batches of 10. For that we need to wait for 11 seperate key up events, and 11 key pressed events, because we remove the first key pressed event, last key up event, and pair the others to form a sequence.

### Implementation:

In our `seperate_key_events` function we add a condition that will call the `get_sequences()` function once we have 11 seperate events:

In [0]:
if len(all_keys_pressed["data"]) > 11 and len(all_keys_released["data"]) > 11:
        ret = _get_sequences()

The `get_sequences` function will only start getting sequences if there is enough watch data, and it does that by checking the time of the last key event that we will be using to batch:

In [0]:
def _get_sequences():
    # Make sure that the watch data exists
    if watch_data["data"] == []:
        return None

    pressed = all_keys_pressed["data"][:11]
    released = all_keys_released["data"][:11]

    # Make sure the last item in the watch data is >= time last key is released
    time_released = released[-1][0]
    time_watch = watch_data["data"][-1][0]

    if time_watch < time_released:
        return None

To create a sequence in our batch, we remove the first and last actions to create the offset:

In [0]:
 # Remove the last and first
    pressed = pressed[1:]
    released = released[:-1]

If the `get sequences` function returns a sequence of keys, it is time to call our map sequences function to map each keyup-keypress pair into a sequence of movements using the watch data:

In [0]:
# In seperate_key_events: 
ret = _get_sequences()

if ret is not None:

    # We want to remove the batch from the currently stored because we have 
    # (or are about to) use it
    all_keys_pressed["data"] = all_keys_pressed["data"][11:]
    all_keys_released["data"] = all_keys_released["data"][11:]

    pressed, released = ret[0], ret[1]
    sequences, predictions = map_sequences(pressed, released)

In `map_sequences` we go through every item from the 10 pairs of key events, and locating the acceleration data that happens in between each start and end time frame. We then store these sequences in a list, and the keys they correspond to in a different list.

In [0]:
 for i in range(len(keys_pressed)):
        start = int(keys_released[i][0])
        end = int(keys_pressed[i][0])
        key = keys_pressed[i][1]

        sequence = []

        while len(watch_data["data"]) != 0:
            # We want to remove the line so we dont have to iterate trough everything again
            line = watch_data["data"].pop(0)
            if line == [''] or len(line) < 4:
                continue

            time, acc_x, acc_y, acc_z = line[0], line[1], line[2], line[3]

            current_time = int(time)

            # Happens before the start - keep searching
            if (current_time < start):
                continue

            # Happens after the end - we are done with our sequence
            if (current_time >= end):
                break

            sequence.append([float(acc_x), float(acc_y), float(acc_z)])
        sequences.append(sequence)
        predictions.append(key)

return (sequences, predictions)

## User Story: Set up prediction data
-----

### Motivation:

The Random Forest Classifier by sklearn uses numpy arrays, so we need to reformat all our lists to be numpy arrays. We also need to make every sequence the same length, otherwise the model will give errors. Since we trained the model on sequences of length 270, we need to pad the shorter sequnces, and cut the longer ones. 

### Implementation:
After `map_squences` returns in the `seperate_key_events` function, the `setup_predict` function is called. The function will first pad the predictions by calling `padding(sequences)`, which appends [0, 0, 0] to the sequence, for every sequence in sequences, until the sequence is of size 270. If its bigger than 270, then ignore everything after. 

Finally we convert the batch of sequences into a numpy array of shape `10 x 270 x 3` using `np.stack()`:


In [0]:
def padding(sequences):
    max_len = len(max(sequences,key=len))

    # If the sequence is greater than 270, just remove it
    while max_len > 270:
        sequences.remove(max_seq)
        max_len = len(max(sequences,key=len))

    max_len = 270

    padded_sequences = []
    for sequence in sequences:
        while (len(sequence) < max_len):
            sequence.append([0, 0, 0])
        np.stack(sequence)
        padded_sequences.append(sequence)

    # Stack the list of sequences
    np_sequences = np.stack(padded_sequences)

    return np_sequences

In `setup_predict` we then call the predict function using the returned ys. 

## User Story: Making Prediction
-----

### Implementation:
`predict()` is called in `setup_predict()` using the ys we just generated and padded, and the ts (key predictions we got from `map_sequences`)

We import and load our model in the beggining by loading the joblib file which is stored in the static folder:

In [0]:
import os
SITE_ROOT = os.path.realpath(os.path.dirname(__file__))
path = os.path.join(SITE_ROOT, "static", "weights.joblib")
model = joblib.load(path)

Back in predict() we have to reshape our 3 dimentional matrix into a 2 dimentional matrix:

In [0]:
N, nx, ny = batch_of_10.shape
ys = batch_of_10.reshape((N,nx*ny))

Then we just call the model's `score()` function to have our model classify our input, and score they classified results with the expected results:

In [0]:
acc = model.score(ys, ts) # ys MUST be of shape (10, 270, 3)

If accuary is over 20% we return True, otherwise False.

The returned values are stored in a dictionary as tuples of (input, authenticated):

In [0]:
def setup_predict(sequences, predictions):
    ys = padding(sequences)

    auth = predict(ys, predictions)
    row = {"Keys": predictions, "Authenticated": auth}

    authenticate["data"].append(row)

## User Story: Displaying Predictions
-----

### Motivation:

Since we decided not to implement a logout feature, for now we just have our results being displayed in a table showing what batch of keys were used, and True/False if the user has been authenticated.

### Implementation:
We created `predictions_log.html` in templates, which displays a table with the given headers and data. The template takes `{{colnames}}` and `{{preds}}` as variables:

In [0]:
<!DOCTYPE html>
<html>
    <head>
        <!-- Form template from https://www.w3docs.com/tools/editor/5910 -->
        <title>Display Predictions</title>
        <link rel="stylesheet" href="/static/css/table.css">
    </head>
    <body>
        <div class="preds">
            <table id="table">
                    <thead>
                            <tr>
                                {% for col in colnames%}
                                <th>{{ col }}</th>
                                {% endfor %}
                            </tr>
                        </thead>
                        <tbody>
                            {% for pred in preds%}
                            <tr>
                                {% for col in colnames %}
                                <td>{{ pred[col]}}</td>
                                {% endfor %}
                            </tr>
                            {% endfor %}
                        </tbody>
            </table>
        </div>
    </body>
</html>

The tutorial for how to create tables used a css file, so we used that as well, and slightly modified it.

When a user visits the `/pred` link in the flask app, the `printpred()` function is run, which renders the template using the input keys, and true/false authenticated values as values for our rows/cols:



In [0]:
@app.route('/pred', methods=["GET"])
def printpred():

    colnames=["Keys", "Authenticated"]
    rows = authenticate["data"]
    return render_template("predictions_log.html", colnames=colnames, preds=rows)

### References:
1. Table and Css: https://www.w3schools.com/howto/howto_js_filter_table.asp

## User Story: Logging watch and keyboard data
-----

### Motivation:
While testing our model, we want to make sure that we are receiving data and that it is the correct data. 

### Implementation
We created a template file called `log.html` which we used to display any list of data that we give it, and the length of the given list. The elements will then be displayed in a list using a for loop:  


In [0]:
<!DOCTYPE html>
<html>
<head>
</head>
        <body style="background-color: white">
                {%for i in range(0, len_data)%}
                    <li>{{data[i]}}</li>
                {%endfor%}
        </body>
</html>

When a user goes a `GET` request to `/watch` and `/keyboard` links in the web app, the functions will return the rendered template using `watch_data["data"]` and `keyboard_data["data"]` as the data argument, their lengths for the length argument. 

Here is an example for `/watch`:


In [0]:
@app.route('/watch', methods = ['GET', 'POST'])
def get_watch():
    # ...
    
    if request.method == 'GET':
        len_watch = len(watch_data["data"])
        return render_template("log.html", data=watch_data["data"], len_data=len_watch)

## User Story: Parallelism
-----

### Motivation:
Our webapp takes a long time to receive and set up data, especially during the processing data stages. We wanted to speed the process up by using threadding to work in parallel. 

### Design Considerations: 
A few things to note:
* When introducing threadding we need to be very careful when accessing shared data. 

* Most of our processing funtion access the watch_data and keyboard_data dictionaries which are shared. 

* We can use locks to block other threads from accessing shared data, if one thread is already accessing it

* It probably isn't feasable to use locks where the are constantly accessing shared data, so don't use threadding if that is the case 

* We cannot have threads modifying the watch_data and keyboard_data dictionary/lists because the order matters

With that in mind, we decided only to parallelize everything after `setup_predict` because the function only accesses the shared variable `authenticate` once, at the very end, and the order doesn't really matter. 

### Implementation

The lock is a global variable that is created when the flask app is run for the first time: `lock = threading.Lock()`. 

In `seperate_key_events` we change the function to have a create a thread which calls `setup_predict()`, using the sequences, predictions, and lock as parameters. We use `threadding.start()` to start the thread:

In [0]:
def seperate_key_events(timestamp, event, key):
  # ...
  if len(all_keys_pressed["data"]) > 11 and len(all_keys_released["data"]) > 11:
    # ...
    if ret is not None:
      # ...

      sequences, predictions = map_sequences(pressed, released)
      thread = threading.Thread(target=setup_predict, args=(sequences, predictions, lock))
      thread.start()

When we are ready to access `authenticate` we get the lock using `lock.aquire` and release the lock when finished (or if an error occurs) by doing `lock.rekease()`:

In [0]:
def setup_predict(sequences, predictions, lock):
  
  # ...

  lock.acquire()
  try:
      authenticate["data"].append(row)
  except:
      lock.release()
  lock.release()

## User Story: Mouse Data Collection
-----

**SCRAPPED**

### Motivation:
We are looking to authenticate a user based on their trackpad movements as well as their keyboard strokes. For this reason, we would like the mouse movements from the webpage to be collected and put into a format where the they can then be sent to our backend server to be processed using our tool.

### Implementation:
In a script tag on **main.html**, on every mouse move event, create a Javascript object saving the deltaX, deltaY and Timestamp(Unix).

In [0]:
$( "body" ).mousemove(function( e ) {
    var mouseCoords = {
        'X': e.pageX-prevX,
        'Y': e.pageY-prevY,
        'timestamp': Date.now()
    };

We then send this data in the form POST request to our Flask app using JSON to store the data payload. 

In [0]:
if(mouseCoords.X==0 && mouseCoords.Y==0){

}else{
    var jsonMouseData = JSON.stringify(mouseCoords);
    
    //Send mouse data using post call
    $.ajax({ 
        method: "POST", 
        url: "/main",
        data: jsonMouseData
    }).done(function(data){
       //success
    }).fail(function(err){
    });

For testing purposes we would'd also display the code to the page.

In [0]:
$('#mouseMovement').prepend('<p><small>' +(mouseCoords.X)+' , '+(mouseCoords.Y)+' : ' +mouseCoords.timestamp+'</small></p>');

**This idea was scrapped early on because the group realised that the movements on a trackpad were too small to work with, so the mouse data is no longer being displayed in `/main`**

### Challenges: 
One challenge we faced was either the server calls where delayed, or multiple server calls where not getting sent due to the mouse being moved more than 50 times per second. In future implementations, the mouse data should be sent in batches of 5 of 10 to prevent delays.

# **Section 2** - Watch App
----
The implementation of the Apple Watch application was one of the essential parts of the project that allowed to collect the accelerometer data as well as to determine the type of activity (running, walkin, static) the user is doing.


## User Story: Collecting accelerometer and gyroscope data
-----

### Motivation:

User's accelerometer and gyroscope data were collected in order to use this data to train the model and later in the web application to determine whether the person that is using the application should be deauthenticated. 


### Implementation:
This task was implemented using the CoreMotion framework that allowed to check whether the motion data is available, if the device motion was not available the appliction would not proceed. 

In [0]:
if !motionManager.isDeviceMotionAvailable {
               print("Device Motion is not available.")
               return
}

If the device motion data was available, then we would call the processDeviceMotion() with a deviceMotion (the motion data) parameter to process the data. It would also update the user screen to say "Collecting data".

In [0]:
 if deviceMotion != nil {
                   self.processDeviceMotion(deviceMotion!)
                   self.accelerationString = "Collecting data"
                   self.gyroscopeString = "Collecting data"
 }

The processDeviceMotion() function, logged the data to the IDE that allowed us to track the data collection progress for the development purposes. It aslo created and sent the POST requst that will be implemented as a separate user story. 

In [0]:
 os_log("Motion: %@, %@, %@, %@",
                 String(timestamp),
                 String(deviceMotion.userAcceleration.x),
                 String(deviceMotion.userAcceleration.y),
                 String(deviceMotion.userAcceleration.z))

## User Story: Determining the type of user activity (eliminated in the development process)
-----

### Motivation:

In addition to recording the user's accelerometer and gyroscope data, the group decided to also determin the type of activity the user was performing (static, walking, running etc.) that would allow the system to log user out if the user was not performin the static (sitting-typing or standing-typing) activity. This would serve as an additional log out feature in addition to the main one of deauthenticating the user when the keyboard and watch data so not correlate.


### Implementation:
CoreMotion allowed us to implement this feature using CMMotionActivityManager object that stores the data about the device motion. This was implemented in a similar manner to the collection of the accelerometer data. If the activity data was available, the function would store it in a string (that later could be used to display the activity or send it to the server)

In [0]:
func startActUpdates() {
        if CMMotionActivityManager.isActivityAvailable() {
            motionActivityManager.startActivityUpdates(to: queue, withHandler: {
                activityData
                in
                if activityData!.walking == true {
                    self.activityStr = "Walking"
                } else if activityData!.running == true {
                    self.activityStr = "Running"
                } else if activityData!.automotive == true {
                    self.activityStr = "Automotive"
                } else if activityData!.stationary == true {
                    self.activityStr = "Stationary"
                }
            })
        }
    }

### Challenges:

The user story was eliminated in the process since there was a delay (about a minute usually) for the watch to process the motion data and switch between the activities which was deemed not efficient for our project.

## User Story: Sending accelerometer data to the server
-----

### Motivation:

The web application running on a server requires movement data from asmartwatch in order to determine whether a user should be deauthenticated. The data is sent using POST HTTPS request. 


### Implementation:
In order to make the data more organized, we created a new struct WatchData that would allow us to store the data that is needed to be send to the server.

In [0]:
 struct WatchData: Codable {
        var Ax: Double
        var Ay: Double
        var Az: Double
        var TimeStamp: Int64
  }

Then an HTTPS POST request is created after collecting the data, which is sent to the server as a JSON object.

In [0]:
let s = WatchData(Ax: deviceMotion.userAcceleration.x, Ay: deviceMotion.userAcceleration.y, Az: deviceMotion.userAcceleration.z, TimeStamp: timestamp)

var request = URLRequest(url: URL(string: "https://dev3.horizon.tom.srl/watch")!)
request.setValue("application/json", forHTTPHeaderField: "Content-Type")
request.httpMethod = "POST"

let jsonData = try? JSONEncoder().encode(newTodoItem)
request.httpBody = jsonData

After sending the HTTPS request, if the request is accepted, the program prints the returned response string, otherwise it prints an error.

In [0]:
let task = URLSession.shared.dataTask(with: request) { (data, response, error) in
                    if let error = error {
                        print("Error took place \(error)")
                        return
                    }

                    if let data = data, let dataString = String(data: data, encoding: .utf8) {
                        print("Response data string:\n \(dataString)")
                    }
            }

# **Section 3** - Training Machine Learning Model
----
We needed to implement machine learning into our project, and create a model from scratch to be able to process Apple Watch movements and keyboard presses.

## User Story: Setting up
-----

### Motivation:
Before we are able to write anything, we need to import the data files. 


### Implementation:
Since we are working in google colab, we can upload the data files to google drive, and them import them using `glob` which finds all pathnames in the google drive, based on a pattern. In this case we are looking for all `.log` and `.csv` files in the main project folder.

In [0]:
# First we connect to google drive
from google.colab import drive
drive.mount('/content/gdrive', force_remount=True)

In [0]:
watch_data = path_to_watch + '*.csv'
keyboard_log = path_to_keys + '*.log'

watch_acceleration = {}
keyboard_logs = {}

for file in glob.glob(watch_data):
  filename = file.split("/")[-1].strip(".csv").strip("watch_")
  wa = [line.strip().split(",") for line in open(file) if line.strip().split(",") != '']
  watch_acc = wa

for file in glob.glob(keyboard_log):
  filename = int(file.split("/")[-1].strip(".log").strip("keys_"))-1
  kd = [line.strip() for line in open(file)]
  keyboard_logs[filename] = kd

The watch data is stored as a string `"timestamp, acceleration_x, acceleration_y, acceleration_z"` and the keyboard data is stores as another string `"index.html:xx timestamp, event, key, location"`. 

## User Story: Data Processing Functions
-----

### Motivation:
In the same way we do on Zebra web, we need to seperate all our keys based on their `keyup` and `keypressed` events. We also want to assign every event pair a sequence of watch data. Finally we want to pad these sequences and convert everything into a numpy array.

### Design Considerations: 
* We want every sequence to begin from when a key is lifted to when the next key is pressed.
* Since the first event is always a keypress, and we we want our sequence to begin with a keyup, we need to remove the first element of every batch.
* We also need to remove the last event in the batch, which is always a keyup event, to account for the offset created from the previous step.
* Every batch is equal to all the data in one key log file.

### Implementation:
The only difference in this implementation versus the one implemented in Zebra Web is that we are not working with JSON formatted data, but strings containing the logged info.

Each key event is formatted as `"index.html:xx timestamp, event, key, location"`. We use the following regex to extract the timestamp, event, and key from the string:


In [0]:
p = re.compile(r"index.html:[0-9]* (?P<timestamp>[0-9]*), (?P<event>[a-z]*), (?P<key>([a-zA-Z0-9]*|[^a-zA-Z0-9_])), (?P<location>(left|center|right))")

timestamp = int(s.group("timestamp"))
event     = s.group("event")
key       = s.group("key").lower()

Another difference from the web implementation is that we are batching the data seperately based on which log file keys are found in.

On the web we only work with a single batch of 11 pairs of key events. 

We need to remove the first and last events for every batch before combining the lists together.

Here is how we had the code modified to work when using multiple batches:

In [0]:
# Seperate
for file_number in keyboard_logs.keys():
  key_log = keyboard_logs[file_number] 

  keys_pressed = []
  keys_released = []
  for line in key_log:
    # ...

# Combine keyup-keypress pairs
keys_pressed = []
keys_released = []

for index in all_keys_pressed.keys():
  kp = all_keys_pressed[index]
  keys_pressed.extend(kp[1:]) # remove the first keypress from each file
                              # before adding to the master keys_pressed list

for index in all_keys_released.keys():
  kr = all_keys_released[index]
  keys_released.extend(kr[0:-1]) # remove the last keyup from each file

The mapping function that maps every key to a sequence of watch movements is identical to the one written on Zebra web, as the Zebra web is a direct copy of the one written for the backend.

After the sequences are mapped to predictions, we need to call the padding function. 

Unlike the padding function in Zebra web, this padding function will look for the longest sequence length and pad every other sequence to be the same length:

In [0]:
max_seq_len = len(max(sequences,key=len))

padded_sequences = []
for sequence in sequences:
  while len(sequence) < max_len:
     sequence.append([0, 0, 0])
     # ...

**Since the longest sequence (after manually fixing our data many times) is 270 movements long, we set that as the maximum sequence length on the Zebra web end.***

### Challenges: 
There were a few issues with the keyboard data when creating sequences: 
* The watch still records data if the user is no longer typing.
* If every long pause is included as a sequence, the calculations will be too slow
* We needed to discard movements that were longer than a certian threshold

This was fixed by taking advantage of the way we create sequences out of every `key.log` file. Since we treat every log file as one batch, we always remove the first and last key events. We have to manually split the log files into two seperete log files, where the second file contains everything after the time the user gets back to typing after the pause.

## User Story: Models
-----

### Motivation:
The *Pitfalls in Designing Zero-Effort Deauthentication* paper discusses using a Random Forest Classifier for implementing their version of ZEBRA, so we decided to do the same.


### Implementation

We first reshaped our numpy sequence data into two dimentional arrays:


In [0]:
# Our shape is (5680, 270, 3)

# Sklearn expects 2d arrays... gotta reshape
N, nx, ny = np_sequences.shape
new_sequences = np_sequences.reshape((N,nx*ny))

Next we want to say how much of our data we want to use for training, and how much for testing. We do the same for our predictions:

In [0]:
# Usually we want to have about 70% Training and 30% for Testing
# 70% of 5680 is about 3976
train_data, test_data = new_sequences[:3976,:], new_sequences[3976:,:]
train_ts, test_ts = predictions[:3976], predictions[3976:]

Next we create the RF Classifier using the hyperparametes following hyperparameters: 

In [0]:
rfc=RandomForestClassifier(n_estimators=150, max_features=0.15, min_samples_leaf=60, oob_score=True)

* `n_estimatiors` is the number of trees we have in our forest, we need a high number, but not too high since training becomes too slow; we chose 150 trees

* `min_sample_leaf` is the minimum number of samples required to be a leaf node; the tutorial we were using suggested using a value over 50, so we used 60

* `max_features` is the number of features to look at when deciding to split; 0.15 means the classifier will consider 15% of the variables 

* `oob_score` is a cross validation method; cross validation is used to further tune hyperparameters, so we kept it set to True


Finally we train and test the model:

In [0]:
# Train the model using the training set
rfc.fit(train_data,train_ts)

# Test our accuracy
test_ys=rfc.predict(test_data)

We measure the accuracy using `metrics.accuracy_score(test_ts, test_ys)` which gave us about 23% accuracy. 

Finally we store the state of our model using the joblib package:

In [0]:
import joblib

filename = "/content/gdrive/My Drive/School Winter 2020/Csc490/data/weights.joblib"
weights = joblib.dump(rfc,filename)

### References: 
1. Pitfalls Paper: https://arxiv.org/pdf/1505.05779.pdf
2. Tuning Random Forest: https://www.analyticsvidhya.com/blog/2015/06/tuning-random-forest-model/ 