# Task: Wifi Positioning with RSSI Fingerprints
You will get a dataset that was gathered by the Minodes team to enable wifi positioning. In this store we installed 145 of our sensors, also called **nodes**. 

As you walk through the store your phone searches for available wifi networks, by sending probe requests (see 802.11 standard). A part of this probe request is the received signal strength indication (**RSSI**), which has a range between 0db (very strong signal) and -100db (very weak signal). Depending on the distance of your phone to a sensor, the **RSSI** has a different value. In an ideal case one probe request would be perceived by all nodes of the store with different signal strength. In our data processing we collect all this probe requests and aggregate them, so that we have all **RSSI** values for all nodes of one probe request available:

A data set might look like this:


|  ID             | RSSI value node 1 | RSSI value node 2  | .... | RSSI value node n  | fr_zone_id |
| -------------   |-------------      | -----              | -----| -------------------| ------- |
| 1               |-70                | -70                | .... |  -70               | 1       |
| 2               |-55                | -60                | .... |  -45               | 2       |
| 2               |-60                | -60                | .... |  -45               | 2       |
| ...             |.......            | ....               | .... | ....               | ..      |

Each line of this the table is called a fingerprint, which gives us an indication about the potential location of a person.

To locate a person within the store, we separate the store into zones representing compartments of a store. For each of this **zones** we collect reference data, which is stored in the file *training_set.csv* of this exercise.

The goal of this task is to use machine learning techniques to predict the correct zones, based on **RSSI** fingerprints. At the end of this task you need to predict the zones of unkown **RSSI** fingerprints stored in the file *test_without_target.csv*. 

Since we at Minodes love to code, we have done some preparation work, so you can focus on the machine learning.

In case you have any questions please contact **alexander.mueller@minodes.com**

## Remarks

* Please use python 3.x
* I suggest using an anaconda python distribution https://www.continuum.io/downloads
* The code is tested in python 3.4
* Please use pre-existing libraries

## Requirements
The current versions of:
* pandas 
* numpy

## Some imports and a preprocessing routine

In [1]:
import ast
import pandas as pd
import numpy as np
from sklearn.feature_extraction import DictVectorizer

def preprocess_x_values(x_raw_values):
    """
    A simple preprocessing routine returning a feature vector for the specific fingerprint. 
    """

    v = DictVectorizer(sparse=False)
    
    return v.fit_transform(x_raw_values)

### Loading the data

In [2]:
training_set = pd.read_csv('training_set.csv')

In [3]:
# Load training data

# Parse dictionary of features
training_set['values'] =training_set['values'].apply(lambda x:  ast.literal_eval(x))

# preparing x and y sets 
# x are the features of the set
# y is the class to be predicted
x_raw = training_set['values']
y = training_set['fr_zone_id']

# preprocess the node dictionary to get feature vectors
x_features= preprocess_x_values(x_raw)

In [4]:
x_features.shape

(63134, 137)

In [5]:
x_features

array([[  0.,   0.,   0., ...,   0.,   0.,   0.],
       [  0., -75., -71., ...,   0.,   0.,   0.],
       [  0.,   0.,   0., ...,   0.,   0.,   0.],
       ..., 
       [  0.,   0.,   0., ..., -75.,   0., -78.],
       [-54., -58., -50., ...,   0.,   0.,   0.],
       [  0.,   0.,   0., ...,   0.,   0.,   0.]])

### Implementation

Think about how you validate your model (based on accuracy), what classifier you want to use and how in the end you want to scorce the test set.

You need to add only a couple of lines of code to have a basic solution. Do this first and then iterate on it.

In [6]:
#  please start here

### Predict and unkown dataset

In [7]:
test_set = pd.read_csv('test_without_target.csv',index_col='id')
test_set['values'] = test_set['values'].apply(lambda x:  ast.literal_eval(x))

### Submit your solution
Please zip the complete folder with your solution and send it back to **alexander.mueller@minodes.com**. We will review it as soon as possible and will come back to you!