## **Overview**

One of the important things in this competition is how we can **reduce the error of multipath.**

>**[Multipath](https://gssc.esa.int/navipedia/index.php/Multipath)**
>
>The interference by multipath is generated when a signal arrives, by different ways, at the antenna (see figure 1). Its principal cause is the antenna closeness to the reflecting structures, and it is important when the signal comes from the satellite with low elevation.

According to [this article](https://www.kaggle.com/t88take/gsdc-eda-error-when-stopping), the phenomenon of multipath occurs when the speed is zero.

So I created a model to determine if a car is moving using IMU data and was able to record an accuracy of 94%, and I'm going to publish it in code.

I hope this will be useful for everyone who is participating in the competition！！！

![image](https://user-images.githubusercontent.com/47235292/127402718-0e8b4075-322a-41a4-86a7-e321e86bd2e5.png)

**Reference¶**

@T88's notebook https://www.kaggle.com/t88take/gsdc-eda-error-when-stopping

@museas's notebook https://www.kaggle.com/museas/estimating-the-direction-with-a-magnetic-sensor

Data Overview https://www.kaggle.com/c/google-smartphone-decimeter-challenge/data

## Light GBM with IMU data

----------------------------
Please note that the code is absolutly messy m(__)m

----------------------------

As **training data**, two of the downtown paths of the train data were used.

As **test data**, one of the remaining downtown paths of the train data was used 

For the input features, I used IMU acceleration, gyro and magn.No pre-processing was applied to each feature. This may have come as a surprise to you.

In [1]:
import pandas as pd
import pathlib
import numpy as np

### Method:Make train data

In [2]:
#  making ground truth file
def make_gt(path, collectionName, phoneName):
    # ground_truth
    p = pathlib.Path(path)
    gt_files = list(p.glob('train/*/*/ground_truth.csv'))

    gts = []
    for gt_file in gt_files:
        gts.append(pd.read_csv(gt_file))
    ground_truth = pd.concat(gts)
    
    # baseline
    cols = ['collectionName', 'phoneName', 'millisSinceGpsEpoch', 'latDeg', 'lngDeg']
    baseline = pd.read_csv(path + '/baseline_locations_train.csv', usecols=cols)
    ground_truth = ground_truth.merge(baseline, how='inner', on=cols[:3], suffixes=('_gt', '_bs'))
    ground_truth["millisSinceGpsEpoch"] = ground_truth["millisSinceGpsEpoch"]//1000
    if (collectionName is None) or (phoneName is None):
        return ground_truth
    else:
        return ground_truth[(ground_truth['collectionName'] == collectionName) & (ground_truth['phoneName'] == phoneName)]
    

def make_tag(df, tag_v):
    df.loc[df['speedMps'] < tag_v, 'tag'] = 1
    df.loc[df['speedMps'] >= tag_v, 'tag'] = 0
    return df


# loading gnss file
def gnss_log_to_dataframes(path):
    print('Loading ' + path, flush=True)
    gnss_section_names = {'Raw', 'UncalAccel', 'UncalGyro', 'UncalMag', 'Fix', 'Status', 'OrientationDeg'}
    with open(path) as f_open:
        datalines = f_open.readlines()

    datas = {k: [] for k in gnss_section_names}
    gnss_map = {k: [] for k in gnss_section_names}
    for dataline in datalines:
        is_header = dataline.startswith('#')
        dataline = dataline.strip('#').strip().split(',')
        # skip over notes, version numbers, etc
        if is_header and dataline[0] in gnss_section_names:
            try:
                gnss_map[dataline[0]] = dataline[1:]
            except:
                pass
        elif not is_header:
            try:
                datas[dataline[0]].append(dataline[1:])
            except:
                pass
    results = dict()
    for k, v in datas.items():
        results[k] = pd.DataFrame(v, columns=gnss_map[k])
    # pandas doesn't properly infer types from these lists by default
    for k, df in results.items():
        for col in df.columns:
            if col == 'CodeType':
                continue
            try:
                results[k][col] = pd.to_numeric(results[k][col])
            except:
                pass
    return results


def add_IMU(df, INPUT, cname, pname):
    path = INPUT + "/train/"+cname+"/"+pname+"/"+pname+"_GnssLog.txt"
    gnss_dfs = gnss_log_to_dataframes(path)
    acce_df = gnss_dfs["UncalAccel"]
    magn_df = gnss_dfs["UncalMag"]
    gyro_df = gnss_dfs["UncalGyro"]
    
    acce_df["millisSinceGpsEpoch"] = acce_df["utcTimeMillis"] - 315964800000
    acce_df["millisSinceGpsEpoch"] = acce_df["millisSinceGpsEpoch"]//1000 +18
    magn_df["millisSinceGpsEpoch"] = magn_df["utcTimeMillis"] - 315964800000
    magn_df["millisSinceGpsEpoch"] = magn_df["millisSinceGpsEpoch"]//1000 +18
    gyro_df["millisSinceGpsEpoch"] = gyro_df["utcTimeMillis"] - 315964800000
    gyro_df["millisSinceGpsEpoch"] = gyro_df["millisSinceGpsEpoch"]//1000 +18
    
    acce_df["x_f_acce"] = acce_df["UncalAccelZMps2"]
    acce_df["y_f_acce"] = acce_df["UncalAccelXMps2"]
    acce_df["z_f_acce"] = acce_df["UncalAccelYMps2"]
    # magn 
    magn_df["x_f_magn"] = magn_df["UncalMagZMicroT"]
    magn_df["y_f_magn"] = magn_df["UncalMagYMicroT"]
    magn_df["z_f_magn"] = magn_df["UncalMagXMicroT"]
    # gyro
    gyro_df["x_f_gyro"] = gyro_df["UncalGyroXRadPerSec"]
    gyro_df["y_f_gyro"] = gyro_df["UncalGyroYRadPerSec"]
    gyro_df["z_f_gyro"] = gyro_df["UncalGyroZRadPerSec"]    

    df = pd.merge_asof(df[["collectionName", "phoneName", "millisSinceGpsEpoch", "latDeg_gt", "lngDeg_gt", "latDeg_bs", "lngDeg_bs", "heightAboveWgs84EllipsoidM", "speedMps"]].sort_values('millisSinceGpsEpoch'), acce_df[["millisSinceGpsEpoch", "x_f_acce", "y_f_acce", "z_f_acce"]].sort_values('millisSinceGpsEpoch'), on='millisSinceGpsEpoch', direction='nearest')
    df = pd.merge_asof(df[["collectionName", "phoneName", "millisSinceGpsEpoch", "latDeg_gt", "lngDeg_gt", "latDeg_bs", "lngDeg_bs", "heightAboveWgs84EllipsoidM", "speedMps", "x_f_acce", "y_f_acce", "z_f_acce"]].sort_values('millisSinceGpsEpoch'), magn_df[["millisSinceGpsEpoch", "x_f_magn", "y_f_magn", "z_f_magn"]].sort_values('millisSinceGpsEpoch'), on='millisSinceGpsEpoch', direction='nearest')
    df = pd.merge_asof(df[["collectionName", "phoneName", "millisSinceGpsEpoch", "latDeg_gt", "lngDeg_gt", "latDeg_bs", "lngDeg_bs", "heightAboveWgs84EllipsoidM", "speedMps", "x_f_acce", "y_f_acce", "z_f_acce", "x_f_magn", "y_f_magn", "z_f_magn"]].sort_values('millisSinceGpsEpoch'), gyro_df[["millisSinceGpsEpoch", "x_f_gyro", "y_f_gyro", "z_f_gyro"]].sort_values('millisSinceGpsEpoch'), on='millisSinceGpsEpoch', direction='nearest')
    return df

def make_train(INPUT, train_cname, tag_v):
    # make ground_truth file
    gt = make_gt(INPUT, None, None)
    train_df = pd.DataFrame()
    for cname in train_cname:
        phone_list = gt[gt['collectionName'] == cname]['phoneName'].drop_duplicates()
        for pname in phone_list:
            df = gt[(gt['collectionName'] == cname) & (gt['phoneName'] == pname)]
            df = add_IMU(df, INPUT, cname, pname)
            train_df = pd.concat([train_df, df])
    # make tag
    train_df = make_tag(train_df, tag_v)
    return train_df

###  Method:Model(Light GBM)

In [3]:
import lightgbm as lgb
from sklearn.metrics import confusion_matrix
from sklearn.metrics import accuracy_score


def lgbm(train, test, col, lgb_params):
    model = lgb.LGBMClassifier(**lgb_params)
    model.fit(train[col], train['tag'])
    preds = model.predict(test[col])
    print('confusion matrix :  \n', confusion_matrix(preds, test['tag']))
    print('accuracy score : ', accuracy_score(preds, test['tag']))
    return preds

### Method:Confirm Score

In [4]:
def get_train_score(df):
    # calc_distance_error
    df['err'] =  calc_haversine(df.latDeg_bs, df.lngDeg_bs, 
    df.latDeg_gt, df.lngDeg_gt)
    # calc_evaluate_score
    df['phone'] = df['collectionName'] + '_' + df['phoneName']
    res = df.groupby('phone')['err'].agg([percentile50, percentile95])
    res['p50_p90_mean'] = (res['percentile50'] + res['percentile95']) / 2 
    score = res['p50_p90_mean'].mean()
    return score


def percentile50(x):
    return np.percentile(x, 50)


def percentile95(x):
    return np.percentile(x, 95)


def calc_haversine(lat1, lon1, lat2, lon2):
    """Calculates the great circle distance between two points
    on the earth. Inputs are array-like and specified in decimal degrees.
    """
    lat1, lon1, lat2, lon2 = map(np.radians, [lat1, lon1, lat2, lon2])
    dlat = lat2 - lat1
    dlon = lon2 - lon1
    a = np.sin(dlat/2.0)**2 + np.cos(lat1) * np.cos(lat2) * np.sin(dlon/2.0)**2

    c = 2 * np.arcsin(a**0.5)
    dist = 6_367_000 * c
    return dist

### Initial value

Please adjust the parameters as you like.

In [5]:
INPUT = '../input/google-smartphone-decimeter-challenge'

train_cname = ['2021-04-29-US-SJC-2', '2021-03-10-US-SVL-1']
test_cname = ['2021-04-28-US-SJC-1']
tag_v = 0.5
col = ["x_f_acce", "y_f_acce", "z_f_acce", "x_f_magn", "y_f_magn", "z_f_magn", "x_f_gyro", "y_f_gyro", "z_f_gyro"]

# parameter
lgb_params = {
    'num_leaves': 90,
    'n_estimators': 125,
}

### Main

In [6]:
# make train&test
train_df = make_train(INPUT, train_cname, tag_v)
test_df = make_train(INPUT, test_cname, tag_v)

Loading ../input/google-smartphone-decimeter-challenge/train/2021-04-29-US-SJC-2/Pixel4/Pixel4_GnssLog.txt
Loading ../input/google-smartphone-decimeter-challenge/train/2021-04-29-US-SJC-2/SamsungS20Ultra/SamsungS20Ultra_GnssLog.txt
Loading ../input/google-smartphone-decimeter-challenge/train/2021-03-10-US-SVL-1/Pixel4XL/Pixel4XL_GnssLog.txt
Loading ../input/google-smartphone-decimeter-challenge/train/2021-03-10-US-SVL-1/SamsungS20Ultra/SamsungS20Ultra_GnssLog.txt
Loading ../input/google-smartphone-decimeter-challenge/train/2021-04-28-US-SJC-1/Pixel4/Pixel4_GnssLog.txt
Loading ../input/google-smartphone-decimeter-challenge/train/2021-04-28-US-SJC-1/SamsungS20Ultra/SamsungS20Ultra_GnssLog.txt


In [7]:
# prediction with light gbm
test_df['preds'] = lgbm(train_df, test_df, col, lgb_params)

confusion matrix :  
 [[2611  153]
 [  75 1258]]
accuracy score :  0.9443495240419819


### Visualize data

Visualizing that speed is 0 or not.

Looking at the figure, it appears that multipath is occurring where the car is determined to be stationary😄

In [8]:
import plotly.express as px
fig = px.scatter_mapbox(test_df,
                    # Here, plotly gets, (x,y) coordinates
                    lat="latDeg_bs",
                    lon="lngDeg_bs",
                    text='phoneName',

                    #Here, plotly detects color of series
                    color="preds",
                    labels="collectionName",

                    zoom=14.5,
                    center={"lat":37.334, "lon":-121.89},
                    height=600,
                    width=800)
fig.update_layout(mapbox_style='stamen-terrain')
fig.update_layout(margin={"r": 0, "t": 0, "l": 0, "b": 0})
fig.update_layout(title_text="GPS trafic")
fig.show()