# **Introduction**

You probably have experienced this before: You are in an underground carpark and you have just activated your navigation system. But the navigation system has trouble locating you on the map due to poor GPS signal quality caused by the concrete walls. Although, in this use case a better accuracy would only be a "nice to have", in other cases it could become a necessity for indoor applications.

To improve the accuracy of indoor positioning systems, we are asked to predict the indoor position of smartphones based on real-time sensor data in this competition.


In [None]:
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

from dataclasses import dataclass

import matplotlib.pyplot as plt # visualization
plt.rcParams.update({'font.size': 14})
import seaborn as sns # visualization

import warnings # Supress warnings 
warnings.filterwarnings('ignore')

from tqdm import tqdm

import json
import plotly.graph_objs as go
from PIL import Image

# **Dataset Overview**

The dataset we are working with is provided by the Chinese company XYZ10 specialized in indoor positioning technology. The dataset consists of path trace recordings of a person walking from one point to another. During the walk, the following sensor signals are recorded:

* accelerometer
* magnetic field
* gyroscope
* rotation vector
* WiFi
* Bluetooth iBeacon
* ground truths (waypoint locations)


Let's have a first look at one of the trace files to get a rough feeling for the data. Unfortunately, this time we don't have the comfort of .csv format. Instead, we are provided text files.

# **Unix Timestamp**

The first column is the Unix Time in milliseconds. If you are not familiar with Unix time, then I recommend reading up on it on wikipedia. But in short, the unix time is the time elapsed since 00:00:00 UTC on 1 January 1970.

At this point, I am not yet sure if we really need to convert Unix timestamps to human understandable timestamps but here is the conversion - just in case. Since we are working with milliseconds, we need to divide the timestamps by 1000. The below sample starts at 1573713056850 and ends at 1573713091483, which corresponds to a short 34.633 s long trace done on November 14th 2019.

In [None]:
from datetime import datetime
start_time = 1573713056850
end_time = 1573713091483

print(datetime.fromtimestamp(start_time/1000.0))
print(datetime.fromtimestamp(end_time/1000.0))
print(datetime.fromtimestamp(end_time/1000.0)-datetime.fromtimestamp(start_time/1000.0))

# **Waypoint**

A waypoint is a point of reference that can be used for location and navigation. Waypoints can be the specific latitude and longitude of a location, a well-known building or natural feature.Let's plot the trace of the waypoint on the map first to get a feeling for this example.

The following code is also copied and edited from <a href="https://www.kaggle.com/ihelon/indoor-location-exploratory-data-analysis">@ihelon's notebook</a> and is originally from the <a href="https://github.com/location-competition/indoor-location-competition-20/blob/master/visualize_f.py">competition's Github page.</a>


In [None]:

@dataclass
class ReadData:
    acce: np.ndarray
    acce_uncali: np.ndarray
    gyro: np.ndarray
    gyro_uncali: np.ndarray
    magn: np.ndarray
    magn_uncali: np.ndarray
    ahrs: np.ndarray
    wifi: np.ndarray
    ibeacon: np.ndarray
    waypoint: np.ndarray


def read_data_file(data_filename):
    acce = []
    acce_uncali = []
    gyro = []
    gyro_uncali = []
    magn = []
    magn_uncali = []
    ahrs = []
    wifi = []
    ibeacon = []
    waypoint = []
    
    
    with open(data_filename, 'r', encoding='utf-8') as file:
        lines = file.readlines()

    for line_data in lines:
        line_data = line_data.strip()
        if not line_data or line_data[0] == '#':
            continue

        line_data = line_data.split('\t')

        if line_data[1] == 'TYPE_WAYPOINT':
            waypoint.append([int(line_data[0]), float(line_data[2]), float(line_data[3])])
            continue
            
        if line_data[1] == 'TYPE_ACCELEROMETER':
            acce.append([int(line_data[0]), float(line_data[2]), float(line_data[3]), float(line_data[4])])
            continue
        
        if line_data[1] == 'TYPE_ACCELEROMETER_UNCALIBRATED':
            acce_uncali.append([int(line_data[0]), float(line_data[2]), float(line_data[3]), float(line_data[4])])
            continue
        
        if line_data[1] == 'TYPE_GYROSCOPE':
            gyro.append([int(line_data[0]), float(line_data[2]), float(line_data[3]), float(line_data[4])])
            continue
        
        if line_data[1] == 'TYPE_GYROSCOPE_UNCALIBRATED':
            gyro_uncali.append([int(line_data[0]), float(line_data[2]), float(line_data[3]), float(line_data[4])])
            continue
        
        if line_data[1] == 'TYPE_MAGNETIC_FIELD':
            magn.append([int(line_data[0]), float(line_data[2]), float(line_data[3]), float(line_data[4])])
            continue

        if line_data[1] == 'TYPE_MAGNETIC_FIELD_UNCALIBRATED':
            magn_uncali.append([int(line_data[0]), float(line_data[2]), float(line_data[3]), float(line_data[4])])
            continue

        if line_data[1] == 'TYPE_ROTATION_VECTOR':
            ahrs.append([int(line_data[0]), float(line_data[2]), float(line_data[3]), float(line_data[4])])
            continue
        
        if line_data[1] == 'TYPE_WIFI':
            sys_ts = line_data[0]
            ssid = line_data[2]
            bssid = line_data[3]
            rssi = line_data[4]
            lastseen_ts = line_data[6]
            wifi_data = [sys_ts, ssid, bssid, rssi, lastseen_ts]
            wifi.append(wifi_data)
            continue

        if line_data[1] == 'TYPE_BEACON':
            ts = line_data[0]
            uuid = line_data[2]
            major = line_data[3]
            minor = line_data[4]
            rssi = line_data[6]
            ibeacon_data = [ts, '_'.join([uuid, major, minor]), rssi]
            ibeacon.append(ibeacon_data)
            continue
            
    acce = np.array(acce)
    acce_uncali = np.array(acce_uncali)
    gyro = np.array(gyro)
    gyro_uncali = np.array(gyro_uncali)
    magn = np.array(magn)
    magn_uncali = np.array(magn_uncali)
    ahrs = np.array(ahrs)
    wifi = np.array(wifi)
    ibeacon = np.array(ibeacon)
    waypoint = np.array(waypoint)
    
    return ReadData(acce, acce_uncali, gyro, gyro_uncali, magn, magn_uncali, ahrs, wifi, ibeacon, waypoint)
    
sample_file = read_data_file("../input/indoor-location-navigation/train/5a0546857ecc773753327266/F2/5dccf516c04f060006e6e3c9.txt")

print('acce shape:', sample_file.acce.shape)
print('acce_uncali shape:', sample_file.acce_uncali.shape)
print('gyro shape:', sample_file.gyro.shape)
print('gyro_uncali shape:', sample_file.gyro_uncali.shape)
print('magn shape:', sample_file.magn.shape)
print('magn_uncali shape:',sample_file.magn_uncali.shape)
print('ahrs shape:', sample_file.ahrs.shape)
print('wifi shape:', sample_file.wifi.shape)
print('ibeacon shape:', sample_file.ibeacon.shape)
print('waypoint shape:', sample_file.waypoint.shape)

In [None]:
waypoint_df = pd.DataFrame(sample_file.waypoint)
waypoint_df.columns = ['timestamp', 'waypoint_x','waypoint_y']
display(waypoint_df.style.set_caption('Waypoint'))

# **Inertial Measurement Unit (IMU)**

The inertial measurement unit (IMU) is a sensor that measures the force, angular rate and orientation of a body. In this case, the body is a phone. These values are measured by accelerometers, gyroscopes, and in this case also magnetometers.

* Accelerometer: Measures change in velocity  
* Gyroscopes: Measures change in rotation  
* Magnetometer: Measures magnetic field 

The IMU sensor data has the same shape in this case. Note, that this is true for a lot of traces but not all of them. We can concatenate them to a dataframe for the initial analysis of the data.



# **Analysing The Data**

In [None]:
import os
import glob
from PIL import Image
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import json
from tqdm.notebook import tqdm
from pathlib import Path
from dataclasses import dataclass
dataset_path = Path('../input/indoor-location-navigation')
os.listdir(dataset_path)

In [None]:
train_sites = os.listdir(dataset_path/"train")
print(f'There are {len(train_sites)} sites in the training set')

In [None]:
example_site = os.listdir(dataset_path/"train")[10]
example_site_path = dataset_path/"train"/example_site
print('Floors for example site:')
print(os.listdir(example_site_path))


In [None]:
floors_per_site = []
for i in os.listdir(dataset_path/"train"): floors_per_site.append(len(os.listdir(dataset_path/"train"/i)))
print(f'There are a total of {sum(floors_per_site)} floors. On average, each site has {np.mean(floors_per_site)} floors')

In [None]:
print('Path text files for example floor:')
print(os.listdir(example_site_path/'B1'))


In [None]:
print(f"There are {len(list((dataset_path/'train').rglob('*.txt')))} path text files in the training set")
print(f"There are {len(os.listdir(dataset_path/'test'))} path text files in the test set")
print(f'There are {len(os.listdir(dataset_path/"metadata"))} sites in the metadata, just like the training set')

metadata_example_site = os.listdir(dataset_path/"metadata")[10]
metadata_example_site_path = dataset_path/"metadata"/metadata_example_site
metadata_example_floor_path = dataset_path/"metadata"/metadata_example_site/os.listdir(metadata_example_site_path)[0]
print(os.listdir(metadata_example_floor_path))

In [None]:
Image.open(metadata_example_floor_path/'floor_image.png')

In [None]:
with open(metadata_example_floor_path/'geojson_map.json') as geojson_map:
    data = json.load(geojson_map)
    geojson_map.close()
print(data)

with open(metadata_example_floor_path/'floor_info.json') as floor_info:
    data = json.load(floor_info)
    floor_info.close()
print(data)


In [None]:
example_floor_path = example_site_path/'B1'
example_txt_path = example_floor_path/os.listdir(example_floor_path)[0]
with open(example_txt_path) as example_txt:
    data = example_txt.read()
    example_txt.close()
print(data)