## [Google Smartphone Decimeter Challenge](https://www.kaggle.com/c/google-smartphone-decimeter-challenge): EDA

**Objective:**  Current mobile phones only offer 3-5 meters of positioning accuracy. The challenge is to compute location down to decimeter or even centimeter resolution, if possible.

What is a GNSS?
> "*A satellite navigation system with global coverage may be termed a global navigation satellite system (GNSS). As of September 2020, the United States' Global Positioning System (GPS), Russia's Global Navigation Satellite System (GLONASS), China's BeiDou Navigation Satellite System (BDS) and the European Union's Galileo are fully operational GNSSs. Japan's Quasi-Zenith Satellite System (QZSS) is a (US) GPS satellite-based augmentation system to enhance the accuracy of GPS, with satellite navigation independent of GPS scheduled for 2023. The Indian Regional Navigation Satellite System (IRNSS) plans to expand to a global version in the long term.*

> *Global coverage for each system is generally achieved by a satellite constellation of 18–30 medium Earth orbit (MEO) satellites spread between several orbital planes. The actual systems vary, but use orbital inclinations of >50° and orbital periods of roughly twelve hours (at an altitude of about 20,000 kilometres or 12,000 miles).*" (Source: [Wikipedia](https://en.wikipedia.org/wiki/Satellite_navigation))

## <center style="background-color:Gainsboro; width:40%;">Contents</center>
* [The training data files](#train)
* [Raw data](#Raw)
* [OrientationDeg](#OrientationDeg)
* [UncalMag](#UncalMag)
* [UncalGyro](#UncalGyro)
* [Ground truth file](#Ground_truth)
* [Derived values file](#Derived_values)
* [The test data](#Test_data)
* [The Haversine and Vincenty formula](#Haversine)
* [Related reading](#Related_reading)

<a class="anchor" id="train"></a>
## <center style="background-color:Gainsboro; width:40%;">Training data files</center>
Let us take a look at just one of the [Global Navigation Satellite System (GNSS)](https://www.euspa.europa.eu/european-space/what-euspace/what-gnss) log files:

In [None]:
import pandas as pd
pd.set_option('display.max_columns', None)
#pd.set_option('display.max_rows', None)

import numpy as np
import missingno as msno
import seaborn as sns
sns.set(font_scale=1.3)
import matplotlib.pyplot as plt

#-------------------------------------------------------------------------
# The following is thanks to https://www.kaggle.com/sohier/loading-gnss-logs by Sohier Dane
#-------------------------------------------------------------------------
def gnss_log_to_dataframes(path):
    print('Loading ' + path, flush=True)
    gnss_section_names = {'Raw','UncalAccel', 'UncalGyro', 'UncalMag', 'Fix', 'Status', 'OrientationDeg'}
    with open(path) as f_open:
        datalines = f_open.readlines()

    datas = {k: [] for k in gnss_section_names}
    gnss_map = {k: [] for k in gnss_section_names}
    for dataline in datalines:
        is_header = dataline.startswith('#')
        dataline = dataline.strip('#').strip().split(',')
        # skip over notes, version numbers, etc
        if is_header and dataline[0] in gnss_section_names:
            gnss_map[dataline[0]] = dataline[1:]
        elif not is_header:
            datas[dataline[0]].append(dataline[1:])

    results = dict()
    for k, v in datas.items():
        results[k] = pd.DataFrame(v, columns=gnss_map[k])
    # pandas doesn't properly infer types from these lists by default
    for k, df in results.items():
        for col in df.columns:
            if col == 'CodeType':
                continue
            results[k][col] = pd.to_numeric(results[k][col])

    return results

The training data directories:

In [None]:
!ls -1 ../input/google-smartphone-decimeter-challenge/train/

In [None]:
!ls -1 ../input/google-smartphone-decimeter-challenge/train/ | wc -w

We can see that there are 29 directories with training data. Let us select just one of those directories and look at the GNSS log:

In [None]:
dfs = gnss_log_to_dataframes('../input/google-smartphone-decimeter-challenge/train/2021-04-22-US-SJC-1/SamsungS20Ultra/SamsungS20Ultra_GnssLog.txt')

<a class="anchor" id="Raw"></a>
## <center style="background-color:Gainsboro; width:40%;">Raw data</center>

In [None]:
dfs_raw = dfs['Raw']

`utcTimeMillis` - Milliseconds since UTC epoch (1970/1/1), converted from [GnssClock](https://developer.android.com/reference/android/location/GnssClock)

In [None]:
fig, ax = plt.subplots(figsize=(15, 7))
sns.histplot(data = dfs_raw, x="utcTimeMillis", kde=True, kde_kws={"bw_adjust":.45} , color='sienna', alpha=0.85)
plt.title('utcTimeMillis')
#plt.xlim([0, 600])
plt.show();

`TimeNanos` - The GNSS receiver internal hardware clock value in nanoseconds.

In [None]:
fig, ax = plt.subplots(figsize=(15, 7))
sns.histplot(data = dfs_raw, x="TimeNanos", kde=True, kde_kws={"bw_adjust":.45} , color='sienna', alpha=0.85)
plt.title('TimeNanos')
#plt.xlim([0, 600])
plt.show();

`BiasNanos` - The clock's sub-nanosecond bias.

`BiasUncertaintyNanos` - The clock's bias uncertainty (1-sigma) in nanoseconds.

In [None]:
fig, ax = plt.subplots(figsize=(15, 7))
sns.histplot(data = dfs_raw, x="BiasUncertaintyNanos", kde=True, kde_kws={"bw_adjust":.45} , color='sienna', alpha=0.85)
plt.title('BiasUncertaintyNanos')
#plt.xlim([0, 600])
plt.show();

`DriftNanosPerSecond` - The clock's drift in nanoseconds per second.

In [None]:
fig, ax = plt.subplots(figsize=(15, 7))
sns.histplot(data = dfs_raw, x="DriftNanosPerSecond", kde=True, kde_kws={"bw_adjust":.45} , color='sienna', alpha=0.85)
plt.title('DriftNanosPerSecond')
#plt.xlim([0, 600])
plt.show();

`Svid` - The satellite ID.

In [None]:
dfs_raw['Svid'].value_counts().to_frame().T

`State` - Integer signifying sync state of the satellite. Each bit in the integer attributes to a particular state information of the measurement. See the metadata/raw_state_bit_map.json file for the mapping between bits and states.

In [None]:
dfs_raw['State'].value_counts().to_frame().T

`ReceivedSvTimeNanos` - The received GNSS satellite time, at the measurement time, in nanoseconds.

In [None]:
fig, ax = plt.subplots(figsize=(15, 7))
sns.histplot(data = dfs_raw, x="ReceivedSvTimeNanos", kde=True, kde_kws={"bw_adjust":.45} , color='sienna', alpha=0.85)
plt.title('ReceivedSvTimeNanos')
#plt.xlim([0, 600])
plt.show();

`ReceivedSvTimeUncertaintyNanos` - The error estimate (1-sigma) for the received GNSS time, in nanoseconds.

In [None]:
fig, ax = plt.subplots(figsize=(15, 7))
sns.histplot(data = dfs_raw, x="ReceivedSvTimeUncertaintyNanos", kde=True, kde_kws={"bw_adjust":.45} , color='sienna', alpha=0.85)
plt.title('ReceivedSvTimeUncertaintyNanos')
#plt.xlim([0, 600])
plt.show();

`Cn0DbHz` - The carrier-to-noise density in dB-Hz.

In [None]:
fig, ax = plt.subplots(figsize=(15, 7))
sns.histplot(data = dfs_raw, x="Cn0DbHz", kde=True, kde_kws={"bw_adjust":.45} , color='sienna', alpha=0.85)
plt.title('Cn0DbHz')
#plt.xlim([0, 600])
plt.show();

`PseudorangeRateMetersPerSecond` - The pseudorange rate at the timestamp in m/s.

In [None]:
fig, ax = plt.subplots(figsize=(15, 7))
sns.histplot(data = dfs_raw, x="PseudorangeRateMetersPerSecond", kde=True, kde_kws={"bw_adjust":.45} , color='sienna', alpha=0.85)
plt.title('PseudorangeRateMetersPerSecond')
#plt.xlim([0, 600])
plt.show();

`PseudorangeRateUncertaintyMetersPerSecond` - The pseudorange's rate uncertainty (1-sigma) in m/s.

In [None]:
fig, ax = plt.subplots(figsize=(15, 7))
sns.histplot(data = dfs_raw, x="PseudorangeRateUncertaintyMetersPerSecond", kde=True, kde_kws={"bw_adjust":.45} , color='sienna', alpha=0.85)
plt.title('PseudorangeRateUncertaintyMetersPerSecond')
#plt.xlim([0, 600])
plt.show();

`AccumulatedDeltaRangeState` - This indicates the state of the 'Accumulated Delta Range' measurement. Each bit in the integer attributes to state of the measurement.

In [None]:
fig, ax = plt.subplots(figsize=(15, 7))
sns.histplot(data = dfs_raw, x="AccumulatedDeltaRangeState", kde=True, kde_kws={"bw_adjust":.45} , color='sienna', alpha=0.85)
plt.title('AccumulatedDeltaRangeState')
#plt.xlim([0, 600])
plt.show();

`AccumulatedDeltaRangeMeters` - The accumulated delta range since the last channel reset, in meters.

In [None]:
fig, ax = plt.subplots(figsize=(15, 7))
sns.histplot(data = dfs_raw, x="AccumulatedDeltaRangeMeters", kde=True, kde_kws={"bw_adjust":.45} , color='sienna', alpha=0.85)
plt.title('AccumulatedDeltaRangeMeters')
#plt.xlim([0, 600])
plt.show();

`AccumulatedDeltaRangeUncertaintyMeters` - The accumulated delta range's uncertainty (1-sigma) in meters.

In [None]:
fig, ax = plt.subplots(figsize=(15, 7))
sns.histplot(data = dfs_raw, x="AccumulatedDeltaRangeUncertaintyMeters", kde=True, kde_kws={"bw_adjust":.45} , color='sienna', alpha=0.85)
plt.title('AccumulatedDeltaRangeUncertaintyMeters')
#plt.xlim([0, 600])
plt.show();

`CarrierFrequencyHz` - The carrier frequency of the tracked signal.

In [None]:
dfs_raw['CarrierFrequencyHz'].value_counts().to_frame().T

`ConstellationType` - GNSS constellation type.

In [None]:
dfs_raw['ConstellationType'].value_counts().to_frame().T

what does this mean?

In [None]:
constellation_type_mapping = pd.read_csv('../input/google-smartphone-decimeter-challenge/metadata/constellation_type_mapping.csv')
constellation_type_mapping

We can see that we have data from the European [GALILEO](http://www.esa.int/Applications/Navigation/Galileo), the U.S.A. [GPS](https://www.gps.gov/) and the Russian [GLONASS](https://www.glonass-iac.ru/en/), the Chinese [BEIDOU](https://en.wikipedia.org/wiki/BeiDou) and the Japanese [Quasi-Zenith Satellite System (QZSS)](https://en.wikipedia.org/wiki/Quasi-Zenith_Satellite_System) GNSS systems.

`AgcDb` - The Automatic Gain Control level in dB.

In [None]:
fig, ax = plt.subplots(figsize=(15, 7))
sns.histplot(data = dfs_raw, x="AgcDb", kde=True, kde_kws={"bw_adjust":.45} , color='sienna', alpha=0.85)
plt.title('AgcDb')
#plt.xlim([0, 600])
plt.show();

<a class="anchor" id="OrientationDeg"></a>
## <center style="background-color:Gainsboro; width:40%;">OrientationDeg</center>

In [None]:
dfs_orientation = dfs['OrientationDeg']
dfs_orientation

* `yawDeg`  If the screen is in portrait mode, this value equals the Azimuth degree (modulus to 0° / 360°). If the screen is in landscape mode, it equals the sum (modulus to 0°~360°) of the screen rotation angle (either 90° or 270°) and the Azimuth degree. Azimuth, refers to the angle of rotation about the -z axis. This value represents the angle between the device's y axis and the magnetic north pole.
* `rollDeg` Roll, angle of rotation about the y axis. This value represents the angle between a plane perpendicular to the device's screen and a plane perpendicular to the ground.
* `pitchDeg` Pitch, angle of rotation about the x axis. This value represents the angle between a plane parallel to the device's screen and a plane parallel to the ground.

![](https://qph.fs.quoracdn.net/main-qimg-5b1137edd49a238813e4f9bef255cd55)

<a class="anchor" id="UncalMag"></a>
## <center style="background-color:Gainsboro; width:40%;">UncalMag</center>
A [magnetometer](https://en.wikipedia.org/wiki/Magnetometer) is a device that measures magnetic field or magnetic dipole moment. Some magnetometers measure the direction, strength, or relative change of a magnetic field at a particular location. A compass is one such device, one that measures the direction of an ambient magnetic field, in this case, the Earth's magnetic field. (Source: Wikipedia)

In [None]:
dfs_UncalMag = dfs['UncalMag']
dfs_UncalMag

<a class="anchor" id="UncalGyro"></a>
## <center style="background-color:Gainsboro; width:40%;">UncalGyro</center>
Readings from the internal [gyroscope](https://en.wikipedia.org/wiki/Gyroscope)

In [None]:
dfs_UncalGyro = dfs['UncalGyro']
dfs_UncalGyro

<a class="anchor" id="Ground_truth"></a>
## <center style="background-color:Gainsboro; width:40%;">Ground truth file</center>
We also have an associated `ground_truth.csv` file 

In [None]:
ground_truth = pd.read_csv('../input/google-smartphone-decimeter-challenge/train/2021-04-22-US-SJC-1/SamsungS20Ultra/ground_truth.csv')
ground_truth

of particular interest is the [heightAboveWgs84EllipsoidM](https://support.pix4d.com/hc/en-us/articles/211739726-When-to-use-the-Geoid-Height-Above-the-Ellipsoid-Function)

![](https://support.pix4d.com/hc/article_attachments/206328206/conversions_heights2.png)

this is the height, in meters, above the [Reference ellipsoid for World Geodetic System 1984](https://www.mathworks.com/help/map/ref/wgs84ellipsoid.html).

In the other columns we see
* `latDeg`, `lngDeg` - The WGS84 latitude, longitude (in decimal degrees) estimated by the reference GNSS receiver (NovAtel SPAN).
* `timeSinceFirstFixSeconds` - The elapsed time (in seconds) since the first location fix.
* `hDop` - Horizontal dilution of precision DOP, from the GGA sentence, describes how errors in the measurements affect the final horizontal position estimation.
* `vDop` - Vertical dilution of precision DOP, from the GSA sentence, describes how errors in the measurements affect the final vertical position estimation.
* `courseDegree` - The course angle clockwise with respect to the truth north over ground (in degrees).
* `speedMps` - The speed over ground in meters per second.

Out of curiosity, let us look at the maximum speed

In [None]:
max_speed_mps = ground_truth['speedMps'].max()
max_speed_mps

and in miles per hour

In [None]:
max_speed_mph = max_speed_mps*3600/1609.344
max_speed_mph

<a class="anchor" id="Derived_values"></a>
## <center style="background-color:Gainsboro; width:40%;">Derived values file</center>

In [None]:
derived = pd.read_csv('../input/google-smartphone-decimeter-challenge/train/2021-04-22-US-SJC-1/SamsungS20Ultra//SamsungS20Ultra_derived.csv')
derived

from this data we can calculate a *corrected pseudorange*:

> correctedPrM = rawPrM + satClkBiasM - isrbM - ionoDelayM - tropoDelayM

where `ionoDelayM` is the ionospheric delay in meters, estimated with the [Klobuchar model](https://gssc.esa.int/navipedia//index.php/Klobuchar_Ionospheric_Model).
and `tropoDelayM`  is the tropospheric delay in meters, estimated with the [EGNOS model](https://doi.org/10.1017/S0373463300001107).


![](https://hi-static.z-dn.net/files/d73/e70767f1ef85a33968c9a5300afc3f53.jpg)

<a class="anchor" id="Test_data"></a>
## <center style="background-color:Gainsboro; width:40%;">The test data</center>

In [None]:
!ls -1 ../input/google-smartphone-decimeter-challenge/test/

In [None]:
!ls -1 ../input/google-smartphone-decimeter-challenge/test/ | wc -w

we can see that we have 19 directories of test data.
<a class="anchor" id="Haversine"></a>
## <center style="background-color:Gainsboro; width:40%;">The haversine and Vincenty formulas</center>
The haversine formula is used to calculate the great-circle distance between two points on a sphere, given their longitudes and latitudes. 

$$ d =  2r\arcsin\left(\sqrt{hav(\theta)}\right) $$

where 

$$ {hav}(\theta) = {hav}\left(\varphi_2 - \varphi_1\right) + \cos\left(\varphi_1\right)\cos\left(\varphi_2\right)\operatorname{hav}\left(\lambda_2 - \lambda_1\right) $$

where $\varphi$ is the latitude, and $\lambda$ is the longitude.

However, the Earth is not a perfect sphere but rather more like an oblate spheroid, as per the Reference ellipsoid for World Geodetic System of 1984. In view of this Vincenty developed more precise [formulas](https://en.wikipedia.org/wiki/Vincenty%27s_formulae) to take this into account. Despite the complexity of the Vincenty equations there exists a python package, [Vincenty](https://github.com/maurycyp/vincenty) (`!pip install vincenty`) which makes the calculation trivial. 

<a class="anchor" id="Related_reading"></a>
## <center style="background-color:Gainsboro; width:40%;">Related reading</center>
* [Guoyu (Michael) Fu, Mohammed Khider, Frank van Diggelen "*Android Raw GNSS Measurement Datasets for Precise Positioning*", Proceedings of the 33rd International Technical Meeting of the Satellite Division of The Institute of Navigation (ION GNSS+ 2020) pp. 1925-1937 (2020)](https://doi.org/10.33012/2020.17628)