# Simple GNSS data preprocessing and processing

Gareth Funning, University of California, Riverside

Once we have collected some GNSS data, you might want to process it. Before you do that, you will need to convert the data into a processable format, calculate some important metadata parameters, and edit the datafile headers with useful information. This can take longer than processing the data sometimes!

To turn our data files into positions, we can make use of the NGS's spiffy, web-based OPUS processor. If you process several files spanning multiple years, you can turn that into a velocity! And then you really are getting somewhere! 

## 0. Dependencies

Jupyter needs these to function! Make sure you're in the right conda environment!

In [1]:
from os.path import splitext
import numpy as np

## 1. Pre-processing your GNSS data

The first thing to do is to convert your data to something we can process! The data we collect tend to be recorded in proprietary binary formats, and so we will want to convert them to something more readable $-$ the RINEX format.


### 1.1 RINEX

The **R**eceiver **IN**dependent **E**xchange **F**ormat (RINEX for short) is an ASCII-based format used for archiving and processing GNSS data. There are different versions of the format for different applications $-$ navigational data, meterorological data, ionosphere data $-$ but today we will use the version for GNSS *observations*.

Included with this notebook are several past versions of RINEX files for GNSS data collected at site VERS (Versity), the NGS benchmark on the UCR campus. We will use these later on to produce a deformation time series for the site. For now, we will use them to examine the file format and contents.

First, let's look at the files we have, and the naming convention.

In [2]:
# this will list all files with filenames ending with an 'o' (GNSS observation files)

!ls -l *o

ls: cannot access '*o': No such file or directory


These files all follow the same naming convention:

SSSSDDDN.YYo

- SSSS $-$ four digit site code
- DDD $-$ day of year of start of data acquisition
- N $-$ number of the acquisition on that GNSS receiver on that day (counting from zero)
- YY $-$ year
- o $-$ o is for 'observation'

Let's look at the contents of one of these files:

In [3]:
# insert the raw file name -kmr
# it should look like complete nonsense -- we will change it in this notebook
!head -40 35323331.t01

Y��A�ڹ$�j��� �$ð:�^��q��x�=��4�KQ��Y�6ԒDC�  c4����^���lf���43`�CA���k�f� �۠���I$���c�,�N�HS(�	~d^
j�~(|CD6�i$�cv�V_`�e��A��~Ps�l`Q
$�I'p+j�j�Y�{Ə8Ѫ�1��*��$�I$�u_�I*~���L{���N�I��o�#sI;R����گx;�;C$64��=�Lc'`$�I$�k��a�c#nsV�YO�$�I$�I$�I$�I$�I$�I$�I$�I$�I$�I$�I$�I$�I$�I$�I$�I$�I$�I?��3<l�g3�I$�I$����$�:��dճ�P�I$�I$�I$�I$�I$�I$�I$�I$�I$�I$�I$�I$�I$�I$�I$�I$�I$�I$���s3���s>)$�I$�M�0/��I#���mY!>�o���3����x�>Հ��\oZ�63⌖���Q���_Ə8Ѫ�1��=��I$�I"���IS�"�I�dӒIؤ"\I���wv��N��ϡ1^��T�{g�N¦�2/��LK���"c'`w�4�2��g�i^�&jH�V���,����2G.$����$��4�
W0��s�1Q��$��߻�V�*Yd��"�@[�I��I$�H�5u5h�N4�Fݟ�O�O��I$�I����J�ܽ� �#[I$�)�8�M[=�A�q"�h���x;�y�1RNW�a�QBĽ��2EI�7b��4��kw�@�Ċ������=����D�7i&%u��J�+�'#�R--q$�0��I��I$���j�dCF*@Q��Mo���6I$�I$�I$�I$�I$�I$�I$�I$�I$�I$�I$�I$�I$�I$�I$�I$��{Wd�\���7p*"$�I�Ձ�Є!m��m��x5lY�`s$�v/�I"�M$�*N$�Lw�I'}rI �%$�Ay$�u�I$�
P�, �������oLb=}��J�x�J R�o�4����?��c%!�P^;��c��G�3+|�q$����� ���]ܾ�Ҽ�`��,0q$��k��d׬��

The file has two parts: 
1. A header of 21 lines containing important metadata, such as the site code, who collected the data, the receiver and antenna type (and their serial numbers), the approximate site location, the antenna height and details of the data (the start and end times, sample interval, and which observations are recorded $-$ such as which carrier phases and which pseudoranges).
2. Data, with time and date stamps for each epoch, as well as the details of the observables for each visible satellite.

A RINEX data file containing this information is processable using multiple different processing softwares. But how do we make one?

### 1.2 Converting RAW data to RINEX

The first step will be to pull the data from the GNSS receiver $-$ and how you do this depends a lot on the receiver type. Some receivers (such as the Trimble NetRS and NetR9, and the Septentrio PolaRx5) log data to internal storage, and require you to connect to a web server on the receiver to access the data. Others, such as the Trimble R7 and Topcon GB-1000, log data to a CompactFlash card within the receiver, which can then be pulled out and read with a card reader at the end of the survey (which is usually much easier to do, especially when out in the field).

Assuming we are using a Trimble R7, which are the majority of the GNSS receivers I have access to at UCR, then the data will be recorded to the receiver in a proprietary, binary format ('t01' format). I have included one of these files as well, collected in July 2024 at VERS, to practice with. Let's have a look!

In [4]:
# lets look at all the files that end with .t01
!ls -l *.t01

-rw-r--r-- 1 katie katie 85499 Jul 24 12:13 35323331.t01


EarthScope (UNAVCO, as-was) is a good source of information on how to convert these proprietary data files to RINEX. For the Trimble data we are using, it is a two step process $-$ first convert the 't01' file to a 'dat' file, and then convert the 'dat' file to RINEX. 

The conversion program we need for the first step is called runpkr00, and it is a compiled binary executable that only runs on certain hardware (Linux PCs, Windows PCs, Intel Macs, Solaris Unix machines).

The runpkr00 executables can be downloaded from this legacy UNAVCO Knowledge Base webpage: https://kb.unavco.org/article/trimble-runpkr00-latest-versions-744.html

You should download the version that you need (version 5.40 is good enough for our purposes). If you are running this on Linux (or emulated Linux through WSL2), make sure you copy the `runpkr00` exectuable to the directory containing this notebook and the t01 file.

In [5]:
# run the process for the raw data
!ls -l runpkr00

-rwxr-xr-x 1 katie katie 1897487 Jul 24 12:24 runpkr00


In order to convert the file, we need to run `runpkr00` on the data file. Consulting the [Knowledge Base](https://kb.unavco.org/article/trimble-runpkr00-latest-versions-744.html) again, the correct syntax is something like: `runpkr00 -g -d filename.T01`

So let's do that...

In [6]:
# insert YOUR FILE that you would like to process
rawfile='35323331.t01'  # if you have your own file, feel free to substitute it here

print('./runpkr00 -g -d ' + rawfile)  # this is the command you are running

!./runpkr00 -g -d $rawfile

./runpkr00 -g -d 35323331.t01


What did that do? Let's look for 'dat' files in the directory:

In [7]:
# lets see the .dat file it should have just created
!ls -l *.dat

-rw-rw-r-- 1 katie katie 334134 Jul 24 12:26 35323331.dat


And now we should have a 'dat' file!

Next we need to convert this to a 'proper' RINEX format, and for that we can use the long-lived (and no longer supported) `teqc` converter, a legacy UNAVCO code. teqc also has a legacy website: https://www.unavco.org/software/data-processing/teqc/teqc.html

Note again that not all computer platforms are supported, but Linux PCs are. If the command does not work on your machine, try using it on a 

Once again, get the version of the executable that runs on your own machine, and copy it to your working directory.

The syntax for running `teqc` is pretty straightforward, something like: `teqc filename.dat > rinexfile.o`

We can try that...

In [8]:
# some data format futzing
# this is just practice for the wrong gps week -- use the website to figure out the real gps week
filename, extension = splitext(rawfile)
datfile = filename+'.dat'

# and let's run it
print('./teqc ' + datfile + ' > test.o')

!./teqc $datfile > test.o

./teqc 35323331.dat > test.o
? Error ? translation of '35323331.dat' may have started with GPS week 2376 rather than 1351
	(try using '-week 1351' option)
! Notice ! '35323331.dat': GPS week initially set= 1351


We get an error message! Something about a possibly incorrect 'GPS week'! What does that mean?

Well, GPS week is a charming and fairly arcane way of measuring the passing of time. It refers to the number of weeks since the first GPS epoch $-$ which was midnight on January 6, 1980, UTC time. GPS week is transmitted with the GPS signal as a 10 bit number, meaning that every $2^{10}$ weeks (1024 weeks), the counter resets, and the GPS receiver gets confused about the date.

This is the GPS equivalent of the Y2K bug, and if anything it is a more irritating problem as it is: 1) less well known, 2) happens more frequently (1024 weeks is 19.6 years), and 3) not really fixed. It has caused issues, particularly with electronic hardware that uses GPS for timing $-$ see https://en.wikipedia.org/wiki/GPS_week_number_rollover for some examples, and a description of the problem.

If we look at the dates in the RINEX file we just made, we can see it might have some issues:

In [9]:
# lets look at the file you just created
!head -40 test.o

     2.11           OBSERVATION DATA    G (GPS)             RINEX VERSION / TYPE
teqc  2019Feb25                         20250724 19:26:17UTCPGM / RUN BY / DATE
Linux 2.6.32-573.12.1.x86_64|x86_64|gcc -static|Linux 64|=+ COMMENT
BIT 2 OF LLI FLAGS DATA COLLECTED UNDER A/S CONDITION       COMMENT
35323331                                                    MARKER NAME
3331                                                        MARKER NUMBER
-Unknown-           -Unknown-                               OBSERVER / AGENCY
0220413532          TRIMBLE R7          2.32                REC # / TYPE / VERS
-Unknown-           TRM39105.00     NONE                    ANT # / TYPE
 -2376964.3405 -4662095.2733  3635503.3157                  APPROX POSITION XYZ
        0.0000        0.0000        0.0000                  ANTENNA: DELTA H/E/N
     1     1                                                WAVELENGTH FACT L1/2
     7    L1    L2    C1    P1    P2    S1    S2            # / TYPES OF OBSERV
    

The date/time strings are something like `04 12  1  1 18 30.0000000` $-$ 2004/12/01 at 01:18:30. Not July 2024. Indeed, if you compare the dates, they are wrong by just under 20 years. (That should sound suspicious...)

We can consult the [log sheet for the data](VERS.pdf) to see what the actual acquisition date was.

We can also use a [GPS week date converter](http://sopac-old.ucsd.edu/convertDate.shtml) to figure out what the GPS week actually was when the data were acquired.

In [10]:
# what was the GPS week?
# set this to the correct gps week
gpsweek=2375

# let's try again!
print('./teqc -week {0:d} {1:s} > test.o'.format(gpsweek,datfile))
!./teqc -week $gpsweek $datfile > test.o


./teqc -week 2375 35323331.dat > test.o
! Notice ! NAVSTAR GPS SV G10 in '35323331.dat': ToC 2005 Nov 29 22:00:00.000 not in 2025 Jul 15 20:09:00.000 to 6075 Dec 31 23:59:59.999 by +/- 140 min
! Notice ! NAVSTAR GPS SV G32 in '35323331.dat': ToC 2005 Nov 29 22:00:00.000 not in 2025 Jul 15 20:09:00.000 to 6075 Dec 31 23:59:59.999 by +/- 140 min
! Notice ! NAVSTAR GPS SV G23 in '35323331.dat': ToC 2005 Nov 29 22:00:00.000 not in 2025 Jul 15 20:09:00.000 to 6075 Dec 31 23:59:59.999 by +/- 140 min
! Notice ! NAVSTAR GPS SV G27 in '35323331.dat': ToC 2005 Nov 29 22:00:00.000 not in 2025 Jul 15 20:09:00.000 to 6075 Dec 31 23:59:59.999 by +/- 140 min
! Notice ! NAVSTAR GPS SV G08 in '35323331.dat': ToC 2005 Nov 29 22:00:00.000 not in 2025 Jul 15 20:09:00.000 to 6075 Dec 31 23:59:59.999 by +/- 140 min
! Notice ! NAVSTAR GPS SV G18 in '35323331.dat': ToC 2005 Nov 29 22:00:00.000 not in 2025 Jul 15 20:09:00.000 to 6075 Dec 31 23:59:59.999 by +/- 140 min
! Notice ! NAVSTAR GPS SV G24 in '35323331

Lots of complaining this time, but did it make a difference? We can look at the file again to see...

In [11]:
# now the data should be right! yay
!head -40 test.o

     2.11           OBSERVATION DATA    G (GPS)             RINEX VERSION / TYPE
teqc  2019Feb25                         20250724 19:27:14UTCPGM / RUN BY / DATE
Linux 2.6.32-573.12.1.x86_64|x86_64|gcc -static|Linux 64|=+ COMMENT
BIT 2 OF LLI FLAGS DATA COLLECTED UNDER A/S CONDITION       COMMENT
35323331                                                    MARKER NAME
3331                                                        MARKER NUMBER
-Unknown-           -Unknown-                               OBSERVER / AGENCY
0220413532          TRIMBLE R7          2.32                REC # / TYPE / VERS
-Unknown-           TRM39105.00     NONE                    ANT # / TYPE
 -2376964.3405 -4662095.2733  3635503.3157                  APPROX POSITION XYZ
        0.0000        0.0000        0.0000                  ANTENNA: DELTA H/E/N
     1     1                                                WAVELENGTH FACT L1/2
     7    L1    L2    C1    P1    P2    S1    S2            # / TYPES OF OBSERV
    

Those dates look a lot more plausible, so it looks like the data were interpreted correctly this time. We have a RINEX file! But we are still missing a lot of important details $-$ such as the metadata for the acquisition. So let's get on with that...

### 1.3 Estimating antenna height from slant height

When collecting GNSS data, some of the key metadata we collect are the heights of the antennas during data collection. In a typical tripod setup, using GNSS antennas with ground planes, these will usually be slant heights from the station marker to the base of the ground plane, measured using a measuring stick.  

Our protocol is to make three measurements of slant height at different parts of the antenna at equipment set-up and three more at take-down. Assuming that the equipment is not significantly disturbed during the data take, the average of these six measurements is a reasonable estimate of the slant height of the antenna during that data take. But is slant height what we need for data processing? *What information on height is included in the RINEX file?*

Each GNSS antenna has a 'reference point' $-$ the point to which all measurements are referenced. For common antennas like those in the Trimble Zephyr Geodetic family, this is typically the base of the antenna. Note that this is not what we measure when setting up our equipment...

![diagram of antenna height](./antenna_height.png)

So how do we turn our slant height measurements to the base of an antenna ground plane into a measurement of the vertical height of the antenna reference point? Pythagoras, with some light subtraction...

If $h$ is the desired vertical height of the antenna reference point above the benchmark, $R$ is the measured average slant height to the base of the antenna ground plane, $r$ is the antenna radius and $t$ is the vertical distance between the base of the ground plane and the antenna reference point, then...

$$h = \sqrt{(R^2-r^2)}-t$$

Seems straightforward! Where do we get the information for $r$ and $t$, I hear you ask? Well, there are a couple of options: 1) the details are included on a sticker attached to the antenna, and 2) if you know the antenna manufacturer, type and part number you can look it up at the NGS's data repository on GNSS antennas, here: https://www.ngs.noaa.gov/ANTCAL/ (look under "Browse Antenna Information")

In [12]:
# so let's calculate the height, then!

# pre-survey slant height
slant_height_pre=np.mean([130.55,130.55,130.65])/100  # include all three measurements here 
# post-survey slant height
slant_height_post=np.mean([130.40,130.40,130.60])/100  # include all three measurements here 

# antenna radius (from looking up the antenna information)
r = 16.981/100

# vertical distance between base of the ground plane and the antenna reference point
t = (8.546-4.111)/100

# and some calculatin'
R=(slant_height_pre+slant_height_post)/2
h=np.sqrt(R**2-r**2)-t

print('slant height: {0:5.3f} m, vertical height of antenna reference point {1:5.3f} m'.format(R,h))

slant height: 1.305 m, vertical height of antenna reference point 1.250 m


### 1.4 Updating RINEX metadata with teqc

teqc also allows you to edit the metadata of your RINEX file, which, if you look at the header of our recently converted file, is very necessary! You can look at the manual for more details:
https://www.unavco.org/software/data-processing/teqc/doc/UNAVCO_Teqc_Tutorial.pdf 

Or, you can run `teqc +help` and get a very long list! 

Some key options we might want to use:

* -O.at \<antenna type\> $-$ give the NGS antenna code here
* -O.an \<antenna number\> $-$ give the antenna serial number here
* -O.mo \<site code\> $-$ give the four-character benchmark code here
* -O.mn \<monument number\> $-$ if you know the benchmark number, give it here (else give '' to blank it out)
* -O.pe \<h e n\> $-$ give the antenna height (h), plus east and north offsets (0 0) here
* -O.o \<operators\> $-$ give the names of the people who collected this data
* -O.ag \<agency\> $-$ give the name of the agency that collected this data (UCR?)
* -O.r \<runner of teqc\> $-$ give your name here, since you are running teqc!
* -tbin 1d $-$ if necessary, divide your data into daily files


To run this 'editing' mode of teqc, the syntax is:
`teqc <options> <site_code> input_rinex`

In [19]:
# set up teqc to run with all of the options
# now this will rename everything to have the proper header!!

# site information (4 character code)
site_code = "B129"

# operator information (from the logsheet)
operator_names = "K Baraggiotta"
operator_inst = "University of California, Riverside"
my_name = "Katie B"

# antenna information (again, from the logsheet)
antenna_type = "TRM57971.00"
antenna_sn = "1441045101"

# print out the command in case you need to run it somewhere else
option_str="-O.at {0:s} -O.an {1:s} -O.mo {2:s} -O.pe {3:f} 0 0 -O.o '{4:s}' -O.ag '{5:s}' -O.r '{6:s}' -O.mn '' -tbin 1d {7:s} test.o".format(antenna_type,antenna_sn,site_code,h,operator_names,operator_inst,my_name,site_code.lower())

print("./teqc {0:s}".format(option_str))

!./teqc $option_str

./teqc -O.at TRM57971.00 -O.an 1441045101 -O.mo B129 -O.pe 1.249807 0 0 -O.o 'K Baraggiotta' -O.ag 'University of California, Riverside' -O.r 'Katie B' -O.mn '' -tbin 1d b129 test.o
teqc:  creating file 'b1291960.25o' ...
teqc:  creating file 'b1291970.25o' ...


And if all went well, you should have a new, and properly named file, based on the site code you gave! Let's have a look at it!

In [20]:
!ls *o

b1291960.25o  b1291970.25o  test.o


In [27]:
!head -40 b1291970.25o

     2.11           OBSERVATION DATA    G (GPS)             RINEX VERSION / TYPE
teqc  2019Feb25     Katie B             20250724 19:48:16UTCPGM / RUN BY / DATE
Linux 2.6.32-573.12.1.x86_64|x86_64|gcc -static|Linux 64|=+ COMMENT
BIT 2 OF LLI FLAGS DATA COLLECTED UNDER A/S CONDITION       COMMENT
B129                                                        MARKER NAME
                                                            MARKER NUMBER
K Baraggiotta       University of California, Riverside     OBSERVER / AGENCY
0220413532          TRIMBLE R7          2.32                REC # / TYPE / VERS
1441045101          TRM57971.00                             ANT # / TYPE
 -2376964.3405 -4662095.2733  3635503.3157                  APPROX POSITION XYZ
        1.2498        0.0000        0.0000                  ANTENNA: DELTA H/E/N
     1     1                                                WAVELENGTH FACT L1/2
     7    L1    L2    C1    P1    P2    S1    S2            # / TYPES OF OBSERV
 SNR

## 2. Prep your own data now!

If you have collected your own data, now is a good opportunity to prepare it, using all of the steps $-$ extracting the file from the receiver, converting it to dat format, converting that to RINEX format, using the survey log sheet information to estimate the vertical antenna height, and editing the RINEX headers with the necessary information.

You may want to make copies of the various code snippets used above to make a comprehensive code cell that does all of the steps for you $-$ then you will have something you can use in the future!

In [None]:
# compile the code snippets here



## 3. Processing your data with OPUS

OPUS (**O**nline **P**osition **U**ser **S**ervice; https://www.ngs.noaa.gov/OPUS/) is a web-based tool operated by the National Geodetic Survey that processes RINEX format GNSS data that you upload to the website and return a position over email. It is remarkably easy to use, and even if you want to do your own 'proper' processing using a package like GAMIT/GLOBK or GIPSY later on, you will get some sense of the quality of your data from the output. 

OPUS processes data using a [double-difference approach](https://geodesy.noaa.gov/OPUS/about.jsp), choosing three nearby continuous GNSS stations to form double differences, and trilaterating a location relative to those stations. Its results are not as robust as you would obtain from forming a greater number of double-differences from more stations, and it does not work as well in areas where there is a low density of continuous stations, but for southern California, it still works well.

To process data with OPUS, you will need three things:

* a RINEX file with between 2 and 48 hours of data
* the NGS antenna code for the antenna used (unfortunately it does not read that from the headers like some software will do)
* the height of the antenna above the benchmark (unfortunately it does not read that either)

Of course, for the data we have just preprocessed, we know all of those things. (And for properly prepared RINEX files from other people, the necessary information is in the headers.)

So, to process your data, simply go to the website, select your RINEX file, select the antenna type, enter the vertical height, and enter your email address. You can optionally choose the format of the output (I usually choose the version that produces an XML file as well, as it may be easier to ingest the results later on).

Then, hit the 'Upload to Static' button, and wait for the email. It should be done in a few minutes. 

## 4. Estimate the velocity at VERS

Along with the July 2024 data, and the data you may have collected, I have provided RINEX files of other surveys of VERS. By processing these, and compiling date and east, north and up information, you can build a time series of the change in coordinate of the station, and use that to estimate a velocity for the site $-$ either by making a spreadsheet of the values, or by doing it in a more Pythonic way by building a Pandas table, and using an XML reader to populate it with your processed positions. 

(I can give some hints about how to do this in Python, if you are interested.)

My suggested order of tasks: process all the data, plot the changes in position over time (maybe subtract the position of the earliest measurement from the others?), fit a trend line to each component and see what you get. Can you plot your estimated velocity on a map (e.g. using psvelo in GMT or pygmt.plot in PyGMT)? Does it make sense?