**INTRODUCTION**

RTKLIB is an open source software library used for (among other things) calculating GNSS solutions from raw observation data.  It was originally written by Tomoji Takasu of the Tokyo University of Marine Science and Technology, but there are now multiple forks available, including the demo5 fork which I maintain. When used to generate PPK (post-processing kinematic) solutions it has two advantages over the baseline solutions provided by Google.  First of all, it uses the carrier phase observations (ADR) as well as the pseduorange observations.  The carrier phase observations are more difficult to use but also have smaller errors than the pseudorange observations.  Secondly, the PPK solutions are differential, relative to a nearby known base location, rather than absolute like the Google solutions.  The differential solution allows us to difference raw observations between the rover and base which effectively cancels most of the satellite orbital, clock, and atmospheric errors, resulting in more accurate solutions.  Typically, PPK solutions also use integer ambiguity resolution to further increase accuracy, but in this case, I am using only the float solutions, since the quality of the smartphone observations makes ambiguity resolution extremely challenging. 

This notebook is based on the version of the "Getting Started with RTKLIB" notebook which I shared at the end of last year's competition, but it is updated to run with this year's data.  It will generate a score of 1.803 on the public leaderboard.  I have also made some changes to make the code compatible with Linux as well as Windows.  In both cases, you will need to download this code to your computer and run it there.  I have successfully tested it on Windows 11 and WSL2 (Windows Subsystem for Linux 2). In general, the top of each file will have a set of input parameters.  Unless your folder names and paths are identical to mine, you will often need to update these before running.

The folder structure I use in this solution is:

GSDC_2023

    config
    
    data
    
      test
    
      train
      
    python
   
      android_rinex
      
    rtklib

It will be easier to follow these instuctions if you use the same folder structure.

My hope is to provide a platform which will allow competitors to jump right into extending the existing GNSS theory rather than having to build a solution from scratch. In addition to the C version of RTKLIB I describe in this notebook, I have also created rtklib-py, an all python subset of RTKLIB for PPK solutions. The python code runs somewhat slower than the C code but it does make for an easier development platform. This notebook will only cover working with the C code version of RTKLIB but I describe how to use the python version in the "Getting Started with rtklib-py" notebook available in the notebook section of last year's competition. 

**Step 1: Retrieve base observation and satellite navigation files**

Since these are differential solutions, we will need raw observation measurements from a nearby base station for each data set.  Fortunately, these are available from the U.S.National Geodetic Survey (NGS) website.  We will also need satellite orbital data for each data set for the GPS, GLONASS, and Galileo constellations.  These are available from multiple websites.  I chose to retrieve them from the CDDIS site, in part because these files include Galileo navigation data as well as GPS and GLONASS, and because last year they appeared to be more complete than the data from other sources.

To download the CDDIS files yourself, you will need to setup a free account and then create a .netrc file in your user home directory as described at https://cddis.nasa.gov/Data_and_Derived_Products/CreateNetrcFile.html. 

If you just want the base observation and navigation files necessary for the test and training data sets, I have zipped them up and included them in the input data for this notebook. Unzip them directly into the test or train folder.

This year there are data sets from the Bay Area and from the LA area, so we will need to select the appropriate base station for each data set and also use the correct location for that base station in the solution.

The code below simply retrieves the base and navigation data for the full day corresponding to the starting time of each data set.  This works fine for the test data set since all data sets start and end on the same UTC day but but in the training set there is one data set that starts in one (UTC) day and finishes in the next day.  I will be demonstrating this exercise on the test data so will not worry about this issue but if you are trying to retrieve this data for the training data I suggest just removing this one data set.

The observation files are doubly compressed.  They first need to be decompressed with gzip and then with crx2rnx.  This second step translates from compressed rinex to uncompressed rinex format.  This requires the CRX2RNX executable available for both Windows and Linux at https://terras.gsi.go.jp/ja/crx2rnx.html.  Put this file in the "rtklib" folder


In [29]:
""" 
get_base_data.py - retrieve base observation and navigation data for the
    2023 GSDC competition 
"""

import os
from os.path import join
from datetime import datetime
import numpy as np
import requests
import gzip
from glob import glob
import subprocess

# Input parameters
datadir = '.'  # relative to python script
# List of CORS stations to use
stas = ['slac', 'vdcy', 'p222']  # Bay Area, LA, backup for Bay Area

# site to retrieve base observation data
obs_url_base = 'https://geodesy.noaa.gov/corsdata/rinex'  

# site to retrieve satellite navigation data
nav_url_base = 'https://cddis.nasa.gov/archive/gnss/data/daily' #/2021/342/21p/ 
nav_file_base = 'BRDM00DLR_S_' # 20213420000_01D_MN.rnx.gz
# Access to CDDIS navigation data requires registering for a free account and 
# setup of a .netrc file as described at 
# https://cddis.nasa.gov/Data_and_Derived_Products/CreateNetrcFile.html.  
# Make sure this file  is in the users home directory 

# Make sure you have downloaded this executable before running this code
crx2rnx_bin = '../../rtklib/CRX2RNX' # relative to data directory

# Loop through data sets in the data directory
os.chdir(datadir)

for dataset in np.sort(os.listdir()):
    if not os.path.isdir(join(dataset)):
        continue
    print(dataset)
    ymd = dataset.split('-')
    doy = datetime(int(ymd[0]), int(ymd[1]), int(ymd[2])).timetuple().tm_yday # get day of year
    doy = str(doy).zfill(3)
    print(glob(join(dataset,'*.*o')))
    
    if len(glob(join(dataset,'*.*o'))) == 0:
        print('doing this')
        # get obs data
        i = 1 if '-lax-' in dataset else 0  # use different base for LA
        fname = stas[i] + doy + '0.' + ymd[0][2:4] + 'd.gz'
        url = '/'.join([obs_url_base, ymd[0], doy, stas[i], fname])
        try:
            obs = gzip.decompress(requests.get(url).content) # get obs and decompress
            # write obs data
            open(join(dataset, fname[:-3]), "wb").write(obs)
        except:
            # try backup CORS station
            print('Try backup CORS:', dataset)
            i += 2
            fname = stas[i] + doy + '0.' + ymd[0][2:4] + 'd.gz'
            url = '/'.join([obs_url_base, ymd[0], doy, stas[i], fname])
            try:
                obs = gzip.decompress(requests.get(url).content) # get obs and decompress
                # write obs data
                open(join(dataset, fname[:-3]), "wb").write(obs)
            except:
                print('Fail obs: %s' % dataset)
            
        # convert compact rinex to rinex
        crx_files = glob(join(dataset,'*.*d'))
        if len(crx_files) > 0:
            subprocess.call([crx2rnx_bin, '-f', crx_files[0]])
    
    # get nav data
    print('Things', len(glob(join(dataset,'*.rnx'))))
    if len(glob(join(dataset,'*.rnx'))) > 0:
        continue  # file already exists
    fname = nav_file_base + ymd[0] + doy + '0000_01D_MN' + '.rnx.gz'
    url = '/'.join([nav_url_base, ymd[0], doy, ymd[0][2:4]+'p', fname])
    try:
        obs = gzip.decompress(requests.get(url).content) # get obs and decompress    
        # write nav data
        print('blah blah blah')
        open(join(dataset, fname[:-3]), "wb").write(obs)
    except:
        print('Fail nav: %s' % dataset)

2020-06-25-00-34-us-ca-mtv-sb-101
['2020-06-25-00-34-us-ca-mtv-sb-101/slac1770.20o']
Things 1
2020-07-08-22-28-us-ca
['2020-07-08-22-28-us-ca/slac1900.20o']
Things 1
2020-07-17-22-27-us-ca-mtv-sf-280
['2020-07-17-22-27-us-ca-mtv-sf-280/slac1990.20o']
Things 1
2020-07-17-23-13-us-ca-sf-mtv-280
['2020-07-17-23-13-us-ca-sf-mtv-280/slac1990.20o']
Things 1
2020-08-04-00-19-us-ca-sb-mtv-101
['2020-08-04-00-19-us-ca-sb-mtv-101/slac2170.20o']
Things 1
2020-08-04-00-20-us-ca-sb-mtv-101
['2020-08-04-00-20-us-ca-sb-mtv-101/slac2170.20o']
Things 1
2020-08-13-21-41-us-ca-mtv-sf-280
['2020-08-13-21-41-us-ca-mtv-sf-280/slac2260.20o']
Things 1
2020-08-13-21-42-us-ca-mtv-sf-280
['2020-08-13-21-42-us-ca-mtv-sf-280/slac2260.20o']
Things 1
2020-12-10-22-17-us-ca-sjc-c
['2020-12-10-22-17-us-ca-sjc-c/slac3450.20o']
Things 1
2020-12-10-22-52-us-ca-sjc-c
['2020-12-10-22-52-us-ca-sjc-c/slac3450.20o']
Things 1
2021-01-04-21-50-us-ca-e1highway280driveroutea
['2021-01-04-21-50-us-ca-e1highway280driveroutea/slac00

**Step 2: Download android_rinex library and create RTKLIB config file**

You will need the android_rinex library for converting the raw Android observation files to rinex format.  RTKLIB post processing solutions require that the input files be in the rinex format.  You will need to use my fork of this library which is available at the address shown in the code below. Make sure you have the most recent version which was updated on 8/1/22

Put this in the GSDC_2023/python/android_rinex folder.  There is currently a path issue when running in multi-processing mode.  To avoid this, you will need to copy the files from the android_rinex/src folder into the GSDC_2023/python folder.

You will also need a configuration file for the RTKLIB PPK solutions. Copy the config file below to the GSDC_2023/config folder with a file name of gsdc_2023_config1.conf

In [10]:
# # gsdc_2023_config1.conf - config file for RTKLIB PPK solution

# pos1-posmode       =kinematic  # (0:single,1:dgps,2:kinematic,3:static,4:static-start,5:movingbase,6:fixed,7:ppp-kine,8:ppp-static,9:ppp-fixed)
# pos1-frequency     =l1+l2+l5   # (1:l1,2:l1+l2,3:l1+l2+l5,4:l1+l2+l5+l6)
# pos1-soltype       =combined-nophasereset # (0:forward,1:backward,2:combined,3:combined-nophasereset)
# pos1-elmask        =5          # (deg)
# pos1-snrmask_r     =on         # (0:off,1:on)
# pos1-snrmask_b     =off        # (0:off,1:on)
# pos1-snrmask_L1    =28,28,28,28,28,28,28,28,28
# pos1-snrmask_L2    =34,34,34,34,34,34,34,34,34
# pos1-snrmask_L5    =20,20,20,20,20,20,20,20,20
# pos1-dynamics      =on         # (0:off,1:on)
# pos1-tidecorr      =off        # (0:off,1:on,2:otl)
# pos1-ionoopt       =brdc       # (0:off,1:brdc,2:sbas,3:dual-freq,4:est-stec,5:ionex-tec,6:qzs-brdc)
# pos1-tropopt       =saas       # (0:off,1:saas,2:sbas,3:est-ztd,4:est-ztdgrad)
# pos1-sateph        =brdc       # (0:brdc,1:precise,2:brdc+sbas,3:brdc+ssrapc,4:brdc+ssrcom)
# pos1-exclsats      =           # (prn ...)
# pos1-navsys        =13         # (1:gps+2:sbas+4:glo+8:gal+16:qzs+32:bds+64:navic)
# pos2-armode        =off        # (0:off,1:continuous,2:instantaneous,3:fix-and-hold)
# pos2-gloarmode     =off        # (0:off,1:on,2:autocal,3:fix-and-hold)
# pos2-bdsarmode     =off         # (0:off,1:on)
# pos2-arelmask      =15         # (deg)
# pos2-arminfix      =10
# pos2-armaxiter     =1
# pos2-elmaskhold    =15         # (deg)
# pos2-aroutcnt      =1
# pos2-maxage        =30         # (s)
# pos2-syncsol       =off        # (0:off,1:on)
# pos2-slipthres     =0.1        # (m)
# pos2-dopthres      =10         # (m)
# pos2-rejionno      =5          # (m)
# pos2-rejcode       =10         # (m)
# pos2-niter         =1
# pos2-baselen       =0          # (m)
# pos2-basesig       =0          # (m)
# out-solformat      =llh        # (0:llh,1:xyz,2:enu,3:nmea)
# out-outhead        =on         # (0:off,1:on)
# out-outopt         =on         # (0:off,1:on)
# out-outvel         =off        # (0:off,1:on)
# out-timesys        =gpst       # (0:gpst,1:utc,2:jst)
# out-timeform       =tow        # (0:tow,1:hms)
# out-timendec       =3
# out-degform        =deg        # (0:deg,1:dms)
# out-fieldsep       =
# out-outsingle      =off        # (0:off,1:on)
# out-maxsolstd      =0          # (m)
# out-height         =ellipsoidal # (0:ellipsoidal,1:geodetic)
# out-geoid          =internal   # (0:internal,1:egm96,2:egm08_2.5,3:egm08_1,4:gsi2000)
# out-solstatic      =all        # (0:all,1:single)
# out-outstat        =residual   # (0:off,1:state,2:residual)
# stats-eratio1      =200
# stats-eratio2      =300
# stats-eratio5      =25
# stats-errphase     =0.005      # (m)
# stats-errphaseel   =0          # (m)
# stats-errphasebl   =0          # (m/10km)
# stats-snrmax       =45         # (dB.Hz)
# stats-errsnr       =0.005      # (m)
# stats-errrcv       =0          # ( )
# stats-stdbias      =30         # (m)
# stats-stdiono      =0.03       # (m)
# stats-stdtrop      =0.3        # (m)
# stats-prnaccelh    =0.5        # (m/s^2)
# stats-prnaccelv    =0.1        # (m/s^2)
# stats-prnbias      =0.001      # (m)
# stats-prniono      =0.01       # (m)
# stats-prntrop      =0.001      # (m)
# stats-prnpos       =0          # (m)
# stats-clkstab      =0          # (s/s)
# ant1-postype       =llh        # (0:llh,1:xyz,2:single,3:posfile,4:rinexhead,5:rtcm,6:raw)
# ant2-postype       =posfile    # (0:llh,1:xyz,2:single,3:posfile,4:rinexhead,5:rtcm,6:raw)
# ant2-maxaveep      =1
# ant2-initrst       =on         # (0:off,1:on)
# misc-timeinterp    =off        # (0:off,1:on)
# file-satantfile    =
# file-rcvantfile    =
# file-staposfile    =../../../../config/bases.sta

**Step 3: Setup base station locations**

Since we are dealing with multiple base stations, we need a separate file containing the different base locations.  Create a file named bases.sta in the C:\gps\GSDC_2023\config folder and copy the lines below into this file.  RTKLIB will use the first four characters of the base station file to select the correct location from this list.  Note that if you don't use the exact same file name and folder name as I used, you will need to modify the "file-staposfile" parameter in the config file above.

The precise base station locations are continuously changing by small amounts due to tectonic plate movement but I have ignored the relative movement between data sets and the locations below were calculated for roughly the middle of 2022 using the base velocities specified in the coordinate files available for each base station.

In [None]:
# %  LATITUDE(DEG) LONGITUDE(DEG)    HEIGHT(M)   NAME
# 37.41651904  -122.20426828  63.778  SLAC
# 34.17856659  -118.22000501  318.230  VDCY
# # 37.53924080  -122.08326860  53.605  P222

SyntaxError: invalid syntax (3359925115.py, line 2)

**Step 4: Download RTKLIB code**

You can download the RTKLIB executables and source code for the version of RTKLIB that I have optimized for smartphone solutions at https://github.com/rtklibexplorer/RTKLIB/releases/tag/gsdc_2022_v1.0. 

You can download the Windows executables directly from here and put them in the GSDC_2023/rtklib folder.  If you are running in Linux, you will need to build your own executables from the source code. This is decribed in a blog post at https://rtklibexplorer.wordpress.com/2020/12/18/building-rtklib-code-in-linux/.

Note that if you would like to build the Windows executables yourself, the Windows instructions describe using the Embarcadero compiler which is required for the GUI apps.  If you are just compiling the rnx2rtkp app, you can compile it with the VisualStudio compiler using the project file in the \app\consapp\rnx2rtkp\msc folder.

**Step 5: Convert the raw observation files and run PPK solutions**

As configured in the header below, this code will convert the raw Android files to RINEX format and run the RTKLIB PPK solutions for the test set.  Note that these will be float solutions, we are not attempting to resolve the integer ambiguities, since the quality of the smartphone observations is very low.

In the main code at the bottom of the file, the execution can be set up as either sequential or multiprocessing by commenting or uncommenting the appropriate lines, both for the file conversion, and for running the solutions.  It is easier to debug when run sequentially but is much slower.  I recommend running each step sequentially until you are convinced it's working, then switch it to multiprocessing.  

The solution files will all be tagged with the "soltag_rtklib" parameter defined in the header so you can use this to keep separate the results from multiple runs.  They will be in the "supplemental" folders inside each phone folder.

Note that the parameters in the header are configured to overwrite all rinex files and solution files.  If you want to just rerun a subset of the data, set one or both of these parameters to False and then the code will only run when the output file is missing.

This code is setup to run either the C version of RTKLIB or the python version or both.  In this notebook, I am only addressing the C version, please see my other notebook if you would like to run the python code.

Debugging hint:  If the code runs without error but does not produce any solution files then the error is very likely occurring during the call to the rtklib executable since any errors that occur in that code are not fed back to the python code.  The easiest way to debug this is to place a breakpoint in the "run_rtklib" function while in sequential execution mode, open a console window, change the directory to the contents of the "folder" variable, then copy and paste the contents of the "rtkcmd_debug" variable into the console window and run it.  Most likely you will find that one of the input files is missing or in the wrong location.

In [31]:
"""
run_ppk_multi.py - convert raw android files to rinex and run PPK solutions for GDSC_2023
data set with RTKLIB and/or rtklib-py.   
"""

import sys
import numpy as np
if 'rtklib-py/src' not in sys.path:
    sys.path.append('rtklib-py/src')
if 'android_rinex/src' not in sys.path:
    sys.path.append('android_rinex/src')

import os, shutil
from os.path import join, isdir, isfile, abspath
from glob import glob
from multiprocessing import Pool
import subprocess
import python.gnsslogger_to_rnx as rnx
from time import time

# set run parameters
maxepoch = None # max number of epochs, used for debug, None = no limit

# Set solution choices
ENABLE_PY = False        # Use RTKLIB-PY to generate solutions 
ENABLE_RTKLIB = True     # Use RTKLIB to generate solutions
OVERWRITE_RINEX = True  # overwrite existing rinex filex
OVERWRITE_SOL = True    # overwrite existing solution files

# specify location of input folder and files
datadir = '../data/test'   # relative to python script
# base and nav file locations are relative to obs files
basefiles = '../*0.2*o' # rinex2, use this for rtklib only
#basefiles = '../base.obs' # rinex3, use this for python only
navfiles = '../BRDM*MN.rnx' # navigation files with wild cards

# Setup for RTKLIB,  paths relative to python script
binpath_rtklib  = '../rtklib/rnx2rtkp'
cfgfile_rtklib = '../config/gsdc_2023_config1.conf'
soltag_rtklib = '_rtklib' # postfix for solution file names

# Setup for rtklib-py - not supported in this notebook
#cfgfile = '../config/ppk_phone_0510.py'
soltag_py = '_py0510'  # postfix for python solution file names

# convert relative paths to absolute paths
datadir = abspath(datadir)
binpath_rtklib = abspath(binpath_rtklib)
cfgfile_rtklib = abspath(cfgfile_rtklib)

# Select phones to process
# all phones
# PHONES = ['pixel4', 'pixel4xl', 'pixel5', 'pixel5a', 'pixel6pro', 'pixel7pro',
#           'mi8', 'xiaomimi8',
#           'sm-g988b', 'sm-g955f', 'sm-s908b', 'sm-a226b', 'sm-a600t',
#           'sm-a505g', 'sm-a325f', 'sm-a217m', 'sm-a205u', 'sm-a505u', 
#           'samsungs22ultra', 'samsunga325g', 'samsunga32', 'samsung21ultra']
# phones in test set
PHONES = ['pixel4', 'pixel4xl', 'pixel5', 'pixel6pro', 'pixel7pro',
          'mi8', 'xiaomimi8',
          'sm-g988b', 'sm-s908b', 'sm-a325f', 'sm-a505u', 'sm-a205u',
          'samsunga325g', 'samsunga32']

# These are only for rtklib-py, see the bases.sta file described above for RTKLIB base locations
BASE_POS = {'slac' : [-2703116.3527, -4291766.8501, 3854248.1361],  # WGS84 XYZ coordinates
            'vdcy' : [-2497836.8748, -4654543.0665, 3563029.0635],
            'p222' : [-2689640.5799, -4290437.1653, 3865051.0923]}

# input structure for rinex conversion
class args:
    def __init__(self):
        # Input parameters for conversion to rinex
        self.slip_mask = 0 # overwritten below
        self.fix_bias = True
        self.timeadj = 1e-7
        self.pseudorange_bias = 0
        self.filter_mode = 'sync'
        # Optional hader values for rinex files
        self.marker_name = ''
        self.observer = ''
        self.agency = ''
        self.receiver_number = ''
        self.receiver_type = ''
        self.receiver_version = ''
        self.antenna_number = ''
        self.antenna_type = ''

# Copy and read config file
if ENABLE_PY:
    shutil.copyfile(cfgfile, '__ppk_config.py')
    import __ppk_config as cfg
    import rinex as rn
    import rtkcmn as gn
    from rtkpos import rtkinit
    from postpos import procpos, savesol

# function to convert single rinex file
def convert_rnx(folder, rawFile, rovFile, slipMask):
    os.chdir(folder)
    argsIn = args()
    argsIn.input_log = rawFile
    argsIn.output = os.path.basename(rovFile)
    argsIn.slip_mask = slipMask
    rnx.convert2rnx(argsIn)

# function to run single RTKLIB-Py solution
def run_ppk(folder, rovfile, basefile, navfile, solfile):
    # init solution
    os.chdir(folder)
    gn.tracelevel(0)
    nav = rtkinit(cfg)
    nav.maxepoch = maxepoch
    print(folder)

    # load rover obs
    rov = rn.rnx_decode(cfg)
    print('    Reading rover obs...')
    if nav.filtertype == 'backward':
        maxobs = None   # load all obs for backwards
    else:
        maxobs = maxepoch
    rov.decode_obsfile(nav, rovfile, maxobs)

    # load base obs and location
    base = rn.rnx_decode(cfg)
    print('   Reading base obs...')
    base.decode_obsfile(nav, basefile, None)
    
    # determine base location from original base obs file name
    if len(BASE_POS) > 1:
        baseName = glob('../*.2*o')[0][-12:-8]
        nav.rb[0:3]  = BASE_POS[baseName]
    elif nav.rb[0] == 0:
        nav.rb = base.pos # from obs file
        
    # load nav data from rover obs
    print('   Reading nav data...')
    rov.decode_nav(navfile, nav)

    # calculate solution
    print('    Calculating solution...')
    sol = procpos(nav, rov, base)

    # save solution to file
    savesol(sol, solfile)
    return rovfile

# function to run single RTKLIB solution
def run_rtklib(binpath_rtklib, cfgfile_rtklib, folder, rovfile, basefile, 
               navfile, solfile):
    # create command to run solution
    rtkcmd = ['%s' % binpath_rtklib, '-x', '0', '-y', '2', '-k', cfgfile_rtklib,
              '-o', solfile, rovfile, basefile, navfile]
    
    # use this command line for debug from console, run from path in folder variable
    rtkcmd_debug = '%s -x 0 -y 2 -k %s -o %s %s %s %s' % (binpath_rtklib, cfgfile_rtklib,
              solfile, rovfile, basefile, navfile)
    
    # run command
    os.chdir(folder)
    subprocess.run(rtkcmd, stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL)   

####### Start of main code ##########################

def main():

    # get list of data sets in data path
    datasets = np.sort(os.listdir(datadir))

    # loop through data set folders
    rinexIn = []
    ppkIn = []
    rtklibIn = []
    for dataset in datasets:
        for phone in PHONES:
            # skip if no folder for this phone
            folder = join(datadir, dataset, phone)
            if not isdir(folder):  
                continue
            os.chdir(folder)
            rawFile = join('supplemental', 'gnss_log.txt')
            rovFile = join('supplemental', 'gnss_log.obs')

            rinex = False
            # check if need rinex conversion
            if OVERWRITE_RINEX or not isfile(rovFile):
                # generate list of input parameters for each rinex conversion
                if phone[:7] == 'samsung': # Use cycle slip flags for Samsung phones
                    slipMask = 0 # 1 to unmask recevier cycle slips
                else:
                    slipMask = 0 
                rinexIn.append((folder, rawFile, rovFile, slipMask))
                print(rawFile, '->', rovFile) 
                rinex = True
            
            # check if need to create PPK solution
            try:
                baseFile = glob(basefiles)[0]
                navFile = glob(navfiles)[0]
                solFile = rovFile[:-4] + soltag_py + '.pos'
                solFile_rtklib = rovFile[:-4] + soltag_rtklib + '.pos'
            except:
                print(folder,'  Error: Missing file')
                continue
            if ENABLE_PY and (OVERWRITE_SOL == True or len(glob(solFile)) == 0 
                              or rinex == True):
                # generate list of input/output files for each python ppk solution
                print('PY: ', join(dataset, phone))
                ppkIn.append((folder, rovFile, baseFile, navFile, solFile))
            if ENABLE_RTKLIB and (OVERWRITE_SOL == True or 
                        len(glob(solFile_rtklib)) == 0 or rinex == True):
                # generate list of input/output files for each rtklib ppk solution
                print('RTKLIB: ', join(dataset, phone))
                rtklibIn.append((binpath_rtklib, cfgfile_rtklib,
                        folder, rovFile, baseFile, navFile, solFile_rtklib))

    if len(rinexIn) > 0:
        print('\nConvert rinex files...')
        # generate rinx obs files in parallel, does not give error messages
        #with Pool() as pool: # defaults to using cpu_count for number of procceses
        #    res = pool.starmap(convert_rnx, rinexIn)
        # run sequentially, use for debug
        for input in rinexIn:
            convert_rnx(input[0],input[1],input[2],input[3])

    if ENABLE_PY and len(ppkIn) > 0:
        print('Calculate rtklib-py solutions...')
        # run PPK solutions in parallel, does not give error messages
        #with Pool() as pool: # defaults to using cpu_count for number of procceses
        #    res = pool.starmap(run_ppk, ppkIn)
        # run sequentially, use for debug
        for input in ppkIn:
            run_ppk(input[0],input[1],input[2],input[3],input[4])

    if ENABLE_RTKLIB and len(rtklibIn) > 0:
        print('Calculate RTKLIB solutions...')
        # run PPK solutions in parallel, does not give error messages
        #with Pool() as pool: # defaults to using cpu_count for number of procceses
        #    res = pool.starmap(run_rtklib, rtklibIn)
        # run sequentially, use for debug
        for input in rtklibIn:
            run_rtklib(input[0],input[1],input[2],input[3],input[4],input[5],input[6])

if __name__ == '__main__':
    t0 = time()
    main()
    print('Runtime=%.1f' % (time() - t0))

ModuleNotFoundError: No module named 'gnsslogger'

**Step 6: Combine RTKLIB solutions into a single .csv file**

The code below will read in all the individual RTKLIB solution files and create a single .csv file in the correct format for submitting to Kaggle.  The time stamps in the RTKLIB solutions may not exactly match the time stamps in the original raw data and may be missing some data points or solution files, so the RTKLIB solution points are interpolated onto the time stamps in a sample submission file. If the solution file is missing, the data from the sample submission file is used instead.  I recommend downloading the current best scoring notebook result from the other notebooks and renaming it to create this file. Make sure this file is named "best_submission.csv" and is in the data folder.  The file I used is in the input data and is the result from the baseline notebook from Chirag Chauhan.

This will work for the test data, but for the training data you will need to generate a reference file for the correct timestamps from the ground truth data.  This is described in the training data section below.

Only solutions with the same tag will be included so make sure you use the same tag (SOL_TAG) here as you did when creating the solutions in the previous step.

The output file name will include the test set and date and will be in the datapath folder.

In [None]:
""" create_baseline_csv_from_pos.py -  Create csv file PPK solution files using timestamps in reference file
"""

import os
from os.path import join, isfile
import numpy as np
from datetime import date

########### Input parameters ###############################

DATA_SET = 'test'
SOL_TAG = '_rtklib'
datapath = '../data' # relative to python script
rovfile = 'gnss_log'
hdrlen = 25    # 25 for RTKLIB, 1 for RTKLIB-py

outThresh = 100   # max horizontal accuracy estimate
# phones in test set
PHONES = ['pixel4', 'pixel4xl', 'pixel5', 'pixel6pro', 'pixel7pro',
          'mi8', 'xiaomimi8',
          'sm-g988b', 'sm-s908b', 'sm-a325f', 'sm-a505u', 'sm-a205u',
          'samsunga325g', 'samsunga32']
# PHONES = []  # use all phones

# Also make sure the appropriate reference file is in the datapath
#  test: best_submission.csv - best available sample submission
# train: ground_truths_train.csv - created with create_ground_truths.py

############################################################

GPS_TO_UTC = 315964782  # second

def create_csv(datapath, DATA_SET, SOL_TAG):
    # get timestamps from existing baseline file
    datapath = os.path.abspath(datapath)
    os.chdir(datapath)
    if DATA_SET[:5] == 'train':
        baseline_file = 'ground_truths_' + DATA_SET + '.csv'
    else: # 'test'
        baseline_file = 'best_submission.csv'
    # read data from baseline file
    base_txt = np.genfromtxt(baseline_file, delimiter=',',invalid_raise=False, 
                             skip_header=1, dtype=str)
    msecs_base = base_txt[:,1].astype(np.int64)
    phones_base = base_txt[:,0]
    pos_base = base_txt[:,2:4].astype(float) # baseline positions
    
    # open output file
    fout =open('locations_' + DATA_SET + '_' + date.today().strftime("%m_%d") + '.csv','w')
    fout.write('tripId,UnixTimeMillis,LatitudeDegrees,LongitudeDegrees\n')
    
    # get list of data sets in data path
    os.chdir(join(datapath, DATA_SET))
    trips = np.sort(os.listdir())
    
    # loop through data set folders
    ix_b, npts = [], 0
    for trip in trips:
        if isfile(trip):
            continue
        phones = os.listdir(trip)
        # loop through phone folders
        for phone in phones:
            if isinstance(phone, bytearray):
                phone = phone.decode('utf-8')
            # check for valid folder and file
            folder = join(trip, phone)
            if isfile(folder):
                continue
            if PHONES != [] and phone not in PHONES:
                continue
            trip_phone = trip + '/' + phone
            #print(trip_phone)
    
            ix_b = np.where(phones_base == trip_phone)[0]
            sol_path = join(folder, 'supplemental', rovfile + SOL_TAG + '.pos')
            fields = []
            if isfile(sol_path):
                # parse solution file
                fields = np.genfromtxt(sol_path, invalid_raise=False, skip_header=hdrlen)
            if len(fields) > 1:
                if int(fields[0,1]) > int(fields[-1,1]): # invert if backwards solution
                    fields = fields[::-1]
                pos = fields[:,2:5]
                qs = fields[:,5].astype(int)
                nss = fields[:,6].astype(int)
                acc = fields[:,7:10]
                msecs = (1000 * (fields[:,0] * 7 * 24 * 3600 + fields[:,1])).astype(np.int64)
                msecs += GPS_TO_UTC * 1000
            # if no data, use baseline data
            if not isfile(sol_path) or len(fields) == 0:
                print('Warning: data substitution: ', sol_path)
                msecs = msecs_base[ix_b].copy()
                pos = acc = np.zeros((len(msecs), 3))
                pos[:,:2] = pos_base[ix_b].copy()
                qs = nss = np.zeros(len(msecs))
           
            # interpolate to baseline timestamps to fill in missing samples
            llhs = []; stds = []
            for j in range(6):
                if j < 3:
                    llhs.append(np.interp(msecs_base[ix_b], msecs, pos[:,j]))
                    stds.append(np.interp(msecs_base[ix_b], msecs, acc[:,j]))
            qsi = np.interp(msecs_base[ix_b], msecs, qs)
            nssi = np.interp(msecs_base[ix_b], msecs, nss)
    
            # write results to combined file
            for i in range(len(ix_b)):
                fout.write('%s,%d,%.12f,%.12f,%.2f,%.0f,%.0f,%.3f,%.3f,%.3f\n' % 
                        (trip_phone, msecs_base[ix_b[i]], llhs[0][i], llhs[1][i],
                         llhs[2][i], qsi[i], nssi[i], stds[0][i], stds[1][i], 
                         stds[2][i]))
                try:
                    npts += len(fields)
                except:
                    pass
    
    fout.close()
    return npts

if __name__ == '__main__':
    create_csv(datapath, DATA_SET, SOL_TAG)

**Step 7:  Creating final submission and filtering out problematic RTKLIB solution points**

Unfortunately, there are a couple of data sets in the test data and several more in the training data sets for which the RTKLIB solutions are quite poor. These need further investigation but for now, we will replace the solution points with high estimated errors with values from the "best_submission.csv" file mentioned in the previous section.  

Note that the previous step will generate a file with the current date in the name so you will need to rename the input file below.

In [None]:
"""
create_submission.py - convert baseline file into submission file
"""

import numpy as np
import os

LOCATIONS_FILE = 'locations_test_09_20.csv'
OUT_FILE = 'submit_0920.csv'
# specify data locations
datapath = '../data'  # relative to python script
max_hstd = 0.5

lowQualityRides = [
    '2022-06-28-20-56-us-ca-sjc-r/samsunga32',
    '2022-10-06-20-46-us-ca-sjc-r/sm-a205u']

datapath = os.path.abspath(datapath)
os.chdir(datapath)

# load baseline data 
baseline_file = 'best_submission.csv'
base_txt = np.genfromtxt(baseline_file, delimiter=',',invalid_raise=False, 
                         skip_header=1, dtype=str)
msecs_base = base_txt[:,1].astype(np.int64)
phones_base = base_txt[:,0]
pos_base = base_txt[:,2:4].astype(float)

# load test data
d = np.genfromtxt(LOCATIONS_FILE, delimiter=',',invalid_raise=False, skip_header=1, dtype=str)
stds = d[:,7:10].astype(float)
hstds = np.sqrt(stds[:,0]**2 + stds[:,1]**2)

        
# merge low quality rides with Google baseline
for trip_phone in np.unique(d[:,0]):
    if trip_phone in lowQualityRides:
        ixt = np.where(d[:,0] == trip_phone)[0]
        #ix = ixt[np.where(d[ixt,5] != '2')[0]]
        ix = ixt[np.where(hstds[ixt] >= max_hstd)[0]]
        d[ix,2:4] = pos_base[ix,0:2]

# save results to file
fout =open( OUT_FILE,'w')
fout.write('tripId,UnixTimeMillis,LatitudeDegrees,LongitudeDegrees\n')
for i in range(len(d)):
    # write results to combined file
    fout.write('%s, %s, %3.12f, %3.12f\n' % (d[i,0], d[i,1], float(d[i,2]), float(d[i,3])))
fout.close()

**Step 8: Submit CSV file to Kaggle**

You can now submit the csv file created in the previous step to Kaggle.  This should give you a score of 1.803 meters on the public leaderboard.

**To run this code on the training data**

For the most part, you can use the same code to generate solutions for the training data simply by changing all references above from the test folder to the train folder.  However you will need to run the code below first to create a reference file for combining the solution files in step 6.

In [None]:
"""
create_groundtruth_csv.py - create csv file from all ground truth files
"""

import os
from os.path import join, isfile


datapath = '../data/train' # relative to python script
# Select phones to process
# all phones
# PHONES = ['pixel4', 'pixel4xl', 'pixel5', 'pixel5a', 'pixel6pro', 'pixel7pro',
#           'mi8', 'xiaomimi8',
#           'sm-g988b', 'sm-g955f', 'sm-s908b', 'sm-a226b', 'sm-a600t',
#           'sm-a505g', 'sm-a325f', 'sm-a217m', 'sm-a205u', 'sm-a505u', 
#           'samsungs22ultra', 'samsunga325g', 'samsunga32', 'samsung21ultra']

# just phones in test set
PHONES = ['pixel4', 'pixel4xl', 'pixel5', 'pixel6pro', 'pixel7pro',
          'mi8', 'xiaomimi8',
          'sm-g988b', 'sm-s908b', 'sm-a325f', 'sm-a505u', 
          'samsunga325g', 'samsunga32']
GPS_TO_UTC = 315964782  # second

# open output file
datapath = os.path.abspath(datapath)
os.chdir(datapath)
fout =open('../ground_truths_train.csv','w')
fout.write('tripId,UnixTimeMillis,LatitudeDegrees,LongitudeDegrees, Height, Heading\n')

# get list of data sets in data path
datasets = sorted(os.listdir(datapath))

# loop through data set folders
for dataset in datasets:
    if isfile(dataset):
        continue
    try:
        phones = sorted(PHONES)
    except:
        phones = os.listdir(join(datapath,dataset))
    for phone in phones:
        folder = join(datapath, dataset, phone)
        if isfile(folder):
            continue
        
        csv_file = join(folder, 'ground_truth.csv')
        if not isfile(csv_file):
            continue

        # parse ground truth file
        with open(csv_file) as f:
            lines = f.readlines()[1:]
        flag = 0
        for line in lines:
            if len(line) <= 1:
                continue
            d = line.split(',')
            t = float(d[8]) # get time stamp
            if flag == 0:            
                print('%20s,%16s' % (dataset, phone))
                flag = 1
            # write results to combined file
            fout.write('%s/%s,%.0f,%s,%s,%s, %s\n' % ((dataset, phone, t, d[2],
                                                      d[3], d[4][:7], d[7][:5])))
        
fout.close()

**Final thoughts**

The intent of this notebook is not to provide a fully optmized solution, but only to get you started with RTKLIB and demonstrate some of its capability.  

Following these instructions will provide an improved baseline solution file which can be post-processed with filtering, map-matching, etc to give you a good jump on the competition. This alone, however, will probably not be enough to win the competion.  To do that I believe that you will need to improve the RKTLIB solution itself.  Some of this can be done by modifying the configuration file.  More dramatic changes will require modifying the code.  More information on the configuration file and the code algorithms are available in the [demo5 RTKLIB Users Manual](https://rtkexplorer.com/pdfs/manual_demo5.pdf), particularly section 3.5 and Appendix F for information on configuration, and Appendix E for information on the core algorithms.    

There are instructions for compiling the code in Windows or Linux on my blog (https://rtklibexplorer.wordpress.com/). Note that the Windows instructions use the Embarcadero compiler which is required for the GUI apps but if you are just compiling the rnx2rtkp app, you can compile it with the VisualStudio compiler using the project file in the \app\consapp\rnx2rtkp\msc folder.  More involved changes to the code may be done more easily in rtklib-py, the python version of RTKLIB.

I'm happy to answer any questions regarding RTKLIB.  I just ask that, to follow the rules of the competition, you ask your questions in the discussion group here so that the answers are available to all of the competitors.  

If you find any errors or omission in this code, please let me know and I will update it.

More details of the optimizations I have made to RTKLIB for smart phone observations are described in these links:

* [RTKLIBexplorer blog post: Google Smartphone Decimeter Challenge](http://https://rtklibexplorer.wordpress.com/2022/01/10/google-smartphone-decimeter-challenge/)

* [Optimizing the Use of RTKLIB for Smartphone-Based GNSS Measurements](http://https://www.mdpi.com/1424-8220/22/10/3825)

* [3rd Place Winner: 2022 Smartphone Decimeter Challenge: An RTKLIB Open-Source Based Solution](http://https://www.ion.org/publications/abstract.cfm?articleID=18376)