<center><h1 style="color:blue">Indoor Location & Navigation</h1></center>
<center><h1 style="color:blue">Identify the position of a smartphone in a shopping mall</h1></center>
<img src="https://storage.googleapis.com/kaggle-competitions/kaggle/22559/logos/header.png?t=2020-09-30-17-40-59">

# 1. Welcome to The competition

**By Microsoft Research***
- Accurate indoor positioning, based on public sensors and user permission, allows for a great location-based experience even when we aren’t outside.
- Current positioning solutions have poor accuracy, particularly in multi-level buildings, or generalize poorly to small datasets. Additionally, GPS was built for a time before smartphones. Today’s use cases often require more granularity than is typically available indoor.
- Task is to predict the ***indoor position of smartphones based on real-time sensor data***, provided by indoor positioning technology company **XYZ10 in partnership with Microsoft Research**. we'll locate devices using ***“active” localization data***, which is made available with the cooperation of the user. Unlike passive localization methods (e.g. radar, camera), the data provided for this competition requires explicit user permission. 
- Dataset comprised of nearly 30,000 traces from over 200 buildings.

# 1.1 Evaluation Criteria and Metrics

Submissions are evaluated on the `mean position error` as defined as:

$$\text{mean position error} = \frac{1}{N} \sum_{i=1}^{N}  
                                                \left( \sqrt{( \hat{x}_i - x_i )^{2} + ( \hat{y}_i - y_i )^{2}} 
                                                + p \cdot | \hat{f}_{i} - f_i | \right)$$
where:

- $N$ is the number of rows in the test set  
- $\hat{x}_i, \hat{y}_i$ are the predicted locations for a given test row
- $x_i, y_i$ are the ground truth locations for a given test row
- $p$ is the floor penalty, set at 15
- $\hat{f}_{i}, f_{i}$ are the predicted and ground truth integer floor level for a given test row

IMPORTANT: The integer `floor` used in the submission must be mapped from the char/int floors used in the dataset. The mapping is as follows:

- F1, 1F → 0
- F2, 2F → 1
etc.
- B1, 1B → -1
- B2, 2B → -2

There are other floor names in the training data, e.g., LG2, LM, etc., which you may decide to use for training, but none of these non-standard floors are found in the test set.

# 1.2 Submission Format

For each `site_path_timestamp` row in the test set, we must predict the floor converted to an integer as per above and the `x` and `y` of the waypoint. The file should contain a header and have the following format:

```
site_path_timestamp,floor,x,y
5a0546857ecc773753327266_046cfa46be49fc10834815c6_1578474564146,0,15.0,55.0
5a0546857ecc773753327266_046cfa46be49fc10834815c6_1578474573154,0,25.0,65.0
5a0546857ecc773753327266_046cfa46be49fc10834815c6_1578474579463,0,35.0,75.0
etc.
```

# 2. Data

- The dataset for this competition consists of dense indoor signatures of WiFi, geomagnetic field, iBeacons etc., as well as ground truth (waypoint) (locations) collected from hundreds of buildings in Chinese cities.
- The data found in path trace files (*.txt) corresponds to an indoor path between position p_1 and p_2 walked by a site-surveyor.

# 2.1 The Data Collection Process

- During the walk, an Android smartphone is held flat in front of the surveyors body, and a sensor data recording app is running on the device to collect IMU (accelerometer, gyroscope) and geomagnetic field (magnetometer) readings, as well as WiFi and Bluetooth iBeacon scanning results.
- A detailed description of the format of trace file is shown, along with other details and processing scripts, at this [github link](https://github.com/location-competition/indoor-location-competition-20).
- In addition to raw trace files, floor plan metadata (e.g., raster image, size, GeoJSON) are also included for each floor.

<p style="color:red">In the training files, you may find occasionally that a line is missing the ending newline character, causing it to run on to the next line. It is up to you how you want to handle this issue. This issue is not found in the test data.</p>

# 2.2 Files

- **train** - training path files, organized by site and floor; each path files contains the data of a single path on a single floor
- **test** - test path files, organized by site and floor; each path files contains the data of a single path on a single floor, but without the waypoint (x, y) data; the task of this competition is, for a given site-path file, predict the floor and waypoint locations at the timestamps given in the sample_submission.csv file
- **metadata** - floor metadata folder, organized by site and floor, which includes the following for each floor:
    - floor_image.png
    - floor_info.json
    - geojson_map.json
- **sample_submission.csv** - a sample submission file in the correct format; each has a unique id which contains a site id, a path id, and the timestamp within the trace for which to make a prediction; see the Evaluation page for the required integer mapping of floor names

# 3 EDA

In [None]:
import os
import gc
import json
import glob
import random
import numpy as np
import pandas as pd
from PIL import Image
import seaborn as sns
import matplotlib.pyplot as plt
from tqdm.notebook import tqdm
%matplotlib inline

import warnings
warnings.filterwarnings("ignore")

In [None]:
train_dir = "../input/indoor-location-navigation/train"
test_dir = "../input/indoor-location-navigation/test"
meta_dir = "../input/indoor-location-navigation/metadata"
ss = "../input/indoor-location-navigation/sample_submission.csv"

# 3.1 Train

In [None]:
train_files = glob.glob(os.path.join(train_dir, "**/*.txt"), recursive=True)
print("Number of files in Train: ", len(train_files))

Okay now let's see what lies inside a txt file? Feel free to click on `output` button and have a look at the content of a txt file.

In [None]:
with open(train_files[0], "r") as fh:
    for line in fh.readlines():
        print(line)
    fh.close()

### Read this section, if you have read section 3.1 and section 2. Else, read them first.

Now, let us reason together. What do we have in the training folder?

> Each \*.txt file contains the path trace data, collected by various sensors. And this data corresponds to an indoor path  between position p_1 and p_2 walked by a site-surveyor.


But what trace data? Let's have a look. Well, if you have already inspected the relevant code block output, here is what it has to offer:

- startTime: Probably the time the person started navigating
- SiteId: 
- SiteName: 
- FloorId: 
- FloorName: 
- Brand: 
- Model: 
- AndroidName: 
- APILevel: 
- type * X(where X can be number of sensors in the device): 
- VersionName:
- VersionCode: 
and `THE TEN` mentioned below.

What we need, we will see, but ->

## What the organizers have to offer:


- The first column is Unix Time in millisecond. In specific, we use SensorEvent.timestamp for sensor data and system time for WiFi and Bluetooth scans.

- The second column is the data type (ten in total).

    - TYPE_ACCELEROMETER
    - TYPE_MAGNETIC_FIELD
    - TYPE_GYROSCOPE
    - TYPE_ROTATION_VECTOR
    - TYPE_MAGNETIC_FIELD_UNCALIBRATED
    - TYPE_GYROSCOPE_UNCALIBRATED
    - TYPE_ACCELEROMETER_UNCALIBRATED
    - TYPE_WIFI
    - TYPE_BEACON
    - TYPE_WAYPOINT: ground truth location labeled by the surveyor

- Data values start from the third column.

* Column 3-5 of TYPE_ACCELEROMETER、TYPE_ACCELEROMETER、TYPE_GYROSCOPE、TYPE_ROTATION_VECTOR are SensorEvent.values[0-2] from the callback function onSensorChanged(). Column 6 is SensorEvent.accuracy.

* Column 3-8 of TYPE_ACCELEROMETER_UNCALIBRATED、TYPE_GYROSCOPE_UNCALIBRATED、TYPE_MAGNETIC_FIELD_UNCALIBRATED are SensorEvent.values[0-5] from the callback function onSensorChanged(). Column 9 is SensorEvent.accuracy.

* Values of TYPE_BEACON are obtained from ScanRecord.getBytes(). The results are decoded based on iBeacon protocol using the code below.

```
val major = ((scanRecord[startByte + 20].toInt() and 0xff) * 0x100 + (scanRecord[startByte + 21].toInt() and 0xff))
val minor = ((scanRecord[startByte + 22].toInt() and 0xff) * 0x100 + (scanRecord[startByte + 23].toInt() and 0xff))
val txPower = scanRecord[startByte + 24]
```

* Distance in column 8 is calculated as

```
private static double calculateDistance(int txPower, double rssi) {
  if (rssi == 0) {
    return -1.0; // if we cannot determine distance, return -1.
  }
  double ratio = rssi*1.0/txPower;
  if (ratio < 1.0) {
    return Math.pow(ratio,10);
  }
  else {
    double accuracy =  (0.89976)*Math.pow(ratio,7.7095) + 0.111;
    return accuracy;
  }
}
```

Did you read? Well, it's okay if you didn't. I got you covered. But before that, let's read the Evaluation section again and rethink: `What do we need to do at the first place?` Well, clean the data first xD. Just joking, head over to Evaluation section to better understand. BUt remember that, this section will help us find our way in organizing our data and make it model ready.

Let's find a way to organize these things:

In [None]:
def read_txt(txt_path):
    # ignore lines starting with # because they contain meta-data sort of thing
    with open(txt_path, 'r') as fh:
        unique_keys = []
        for line in fh.readlines():
            if line.startswith("#"):
                dummy = line.split("\n")[0].split("\t")
                unique_keys.extend(list(map(lambda x: '' if x=="#" else x, dummy)))
            else:
                pass
        fh.close()
    return unique_keys
    pass

read_txt(train_files[0])

# 3.2 Test

In [None]:
test_files = glob.glob(os.path.join(test_dir, "**/*.txt"), recursive=True)
print("Number of files in Test: ", len(test_files))

Feel free to click on the `output` button to see what lies inside a test `*.txt` file.

In [None]:
with open(test_files[0], "r") as fh:
    for line in fh.readlines():
        print(line)

In [None]:
names = ['Time', 'Type'] + ['col'+str(x) for x in range(1,9)]
df = pd.read_csv(train_files[0], sep='\t', comment='#', header=None, names=names)
df.head()

# 3.3 Metadata

In [None]:
floor_images = glob.glob(os.path.join(meta_dir, "**/*.png"), recursive=True)
floor_info = glob.glob(os.path.join(meta_dir, "**/floor_info.json"), recursive=True)
GeoMaps = glob.glob(os.path.join(meta_dir, "**/geojson_map.json"), recursive=True)
                                      
print("Number of Floor Images in Meta Data: ", len(floor_images))
print("Number of Floor Info(in JSON) in Meta Data: ", len(floor_info))
print("Number of Geo Map (in JSON) in Meta Data: ", len(GeoMaps))

In [None]:
for _ in range(5):
    img = Image.open(floor_images[np.random.randint(0, len(floor_images))])
    display(img)

# 3.4 Sample Submission

In [None]:
sub = pd.read_csv(ss)
sub.head()

# References:

- Competition GitHub Page: https://github.com/location-competition/indoor-location-competition-20
- Competition Site: https://location20.xyz10.com/

# <h1 style="color:red">Work in Progress...</h1>