This is a simple submission using the basic estimates provided by the host. After forming the submission I check the logic using the training data, where ground truths are known.

## Submission

In [None]:
from pathlib import Path
import pandas as pd

In [None]:
data_path = Path("../input/google-smartphone-decimeter-challenge")
test_base = pd.read_csv(
    data_path / 'baseline_locations_test.csv')
sub = pd.read_csv(
    data_path / 'sample_submission.csv')

sub.assign(
    latDeg = test_base.latDeg,
    lngDeg = test_base.lngDeg
).to_csv(
    'submission.csv', index=False
)

The above values will score a 7, which I think indicates a mean error (within the 50-95 distribution per phone) of 7 meters.


## Training Error

In [None]:
import numpy as np
from tqdm.notebook import tqdm

In [None]:
truths = (data_path / 'train').rglob('ground_truth.csv')
    # returns a generator

df_list = []
cols = ['collectionName', 'phoneName', 'millisSinceGpsEpoch', 'latDeg',
       'lngDeg']

for t in tqdm(truths, total=73):
    df_phone = pd.read_csv(t, usecols=cols)  
    df_list.append(df_phone)
df_truth = pd.concat(df_list, ignore_index=True)

df_basepreds = pd.read_csv(data_path / 'baseline_locations_train.csv', usecols=cols)
df_all = df_truth.merge(df_basepreds, how='inner', on=cols[:3], suffixes=('_truth', '_basepred'))
display(df_all[:5])


In [None]:
def calc_haversine(lat1, lon1, lat2, lon2):
    """Calculates the great circle distance between two points
    on the earth. Inputs are array-like and specified in decimal degrees.
    """
    RADIUS = 6_367_000
    lat1, lon1, lat2, lon2 = map(np.radians, [lat1, lon1, lat2, lon2])
    dlat = lat2 - lat1
    dlon = lon2 - lon1
    a = np.sin(dlat/2)**2 + \
        np.cos(lat1) * np.cos(lat2) * np.sin(dlon/2)**2
    dist = 2 * RADIUS * np.arcsin(a**0.5)
    return dist

In [None]:
df_all['dist'] = calc_haversine(df_all.latDeg_truth, df_all.lngDeg_truth, 
    df_all.latDeg_basepred, df_all.lngDeg_basepred)

df_all.dist.mean()


At a mean error of < 4m, these numbers appear to be good baseline predictions. Note that the actual eval metric would be a bit different due to the distribution clip.