In [2]:
%matplotlib widget
import pandas as pd
import pw_helpers

# Introduction to pandas

The observations have been made by a GNSS receiver placed on the roof of a building at CNES (Toulouse), and recorded in `RINEX` format (Receiver Independent Exchange format). The RINEX file has been post-processed by `PRX` (processed RINEX), a program developped by GNSS enthusiasts, to compute all the parameters required to compute a GNSS position. This produced a file, saved in `CSV` format (comma-separated values).

We are going to import this file in a data structure used by the `Pandas` package: the `pandas.DataFrame`.

In [3]:
# load data
data_prx = pw_helpers.prx_csv_to_pandas("./data/TLSE00FRA_R_20230010100_10S_01S_MO.csv",)

# show the first 5 lines
data_prx.head()

Unnamed: 0,time_of_reception_in_receiver_time,constellation,prn,observation_code,code_observation_m,doppler_observation_hz,carrier_observation_m,lli,cn0_dbhz,satellite_position_m,satellite_velocity_mps,satellite_clock_bias_m,satellite_clock_bias_drift_mps,sagnac_effect_m,relativistic_clock_effect_m,group_delay_m,iono_delay_m,tropo_delay_m,approximate_antenna_position_m
0,2023-01-01 01:00:00,C,5,2I,39902330.0,-19.172,39902330.0,,35.6,"[21806994.53795092, 36070439.48847678, -208299...","[0.96315908, -4.77085589, -99.57796611]",75937.584534,0.000464,-39.968915,-0.375844,0.0,4.039967,10.571269,"[4627852.352, 119640.5351, 4372994.4909]"
1,2023-01-01 01:00:00,C,5,6I,39902320.0,,39902320.0,,37.4,"[21806994.53795092, 36070439.48847678, -208299...","[0.96315908, -4.77085589, -99.57796611]",75937.584534,0.000464,-39.968915,-0.375844,0.0,6.11848,10.571269,"[4627852.352, 119640.5351, 4372994.4909]"
2,2023-01-01 01:00:00,C,5,7I,39902320.0,,39902320.0,,39.6,"[21806994.53795092, 36070439.48847678, -208299...","[0.96315908, -4.77085589, -99.57796611]",75937.584534,0.000464,-39.968915,-0.375844,-2.78807,6.756517,10.571269,"[4627852.352, 119640.5351, 4372994.4909]"
3,2023-01-01 01:00:00,C,6,2I,40000440.0,1162.184,40000440.0,,35.4,"[-511620.65870109, 34136312.02260914, 24903968...","[284.82316595, -1247.61084045, 1702.83076064]",62851.435192,-0.001442,-38.441219,2.183579,2.578215,3.704125,8.299848,"[4627852.352, 119640.5351, 4372994.4909]"
4,2023-01-01 01:00:00,C,6,6I,40000430.0,,40000430.0,,36.4,"[-511620.65870109, 34136312.02260914, 24903968...","[284.82316595, -1247.61084045, 1702.83076064]",62851.435192,-0.001442,-38.441219,2.183579,0.0,5.609851,8.299848,"[4627852.352, 119640.5351, 4372994.4909]"


The resulting `pd.DataFrame` contains a data table.  
The `pd.DataFrame` has an index (the first column) which is here just an integer.  
Each rows contains a single GNSS observation and all the associated data. You can see the list of all the available parameters as column headers.

In [4]:
data_prx.columns

Index(['time_of_reception_in_receiver_time', 'constellation', 'prn',
       'observation_code', 'code_observation_m', 'doppler_observation_hz',
       'carrier_observation_m', 'lli', 'cn0_dbhz', 'satellite_position_m',
       'satellite_velocity_mps', 'satellite_clock_bias_m',
       'satellite_clock_bias_drift_mps', 'sagnac_effect_m',
       'relativistic_clock_effect_m', 'group_delay_m', 'iono_delay_m',
       'tropo_delay_m', 'approximate_antenna_position_m'],
      dtype='object')


We are now going to learn how to use `pandas` feature in order to
- access the data,
- filter,
- plot<>

## accessing data
To access a row, you can use the `.iloc[index_val]` method.
Try to access:
- the first line of `data_prx`
- the first two lines of `data_prx`

Note: remember that Python uses a zero indexing convention.

In [5]:
# Complete this code
# first_line = 
first_line = data_prx.iloc[0]

print("Sucess!!\n\n",first_line) if first_line.constellation == 'C' and first_line.prn == 5 else print("Try again...")

Sucess!!

 time_of_reception_in_receiver_time                                  2023-01-01 01:00:00
constellation                                                                         C
prn                                                                                   5
observation_code                                                                     2I
code_observation_m                                                         39902331.273
doppler_observation_hz                                                          -19.172
carrier_observation_m                                                   39902329.096107
lli                                                                                 NaN
cn0_dbhz                                                                           35.6
satellite_position_m                  [21806994.53795092, 36070439.48847678, -208299...
satellite_velocity_mps                          [0.96315908, -4.77085589, -99.57796611]
satellite_clock_bias_

To access a particular column, you can call it either
- with brackets: `data_prx["code_observation_m"]`
- or with a dot: `data_prx.code_observation_m`

This will return another `pandas` structure, called the `pandas.Series`, sharing the same index.
A `pandas.Series` is just a `DataFrame` with a single column.

Using `.head()`, display the 5 first values of the `prn` column of `data_prx`.

In [10]:
# Complete this code
prn_series = data_prx.prn
prn_series = data_prx["prn"]

prn_series.head()

0    5
1    5
2    5
3    6
4    6
Name: prn, dtype: int64

You can also access several columns by putting a `list` of column names between brackets.
This will return another `DataFrame` with several columns.

Using `.head()`, display the 5 first values of the `prn` and `constellation` columns of data_prx

In [9]:
data_prx[["prn","constellation"]].head()

Unnamed: 0,prn,constellation
0,5,C
1,5,C
2,5,C
3,6,C
4,6,C


# conditional filtering
`pandas` allow to quickly filter the rows thanks to the `.loc[condition]` method.

For example, you can keep only the observations with high C/No thanks to the following line:  
`data_high_snr = data_prx.loc[ data_prx.cn0_dbhz > 40 ]`

Try this command and display the number of observations before and after filtering.

In [13]:
data_high_snr = data_prx.loc[ data_prx.cn0_dbhz > 45 ]

print(f"There are {len(data_prx)} observations in the original dataset")
print(f"There are {len(data_high_snr)} observations with a C/No above 45 dbHz")

There are 1896 observations in the original dataset
There are 868 observations with a C/No above 45 dbHz


You can use several conditions when filtering with `.loc[condition_1 and condition_2]`

Extract the observations containing only GPS and L1 C/Q observations (corresponding `observation_code` is `"G"`)

In [None]:
data_gps_c1c = data_prx.loc[ data_prx.]

Creation of filtered dataset

Operation on DataFrame  
compute the satellite orbit radius

Plot  
on single Series
on dataframe, specify x, y, hue

several plots on the same figure : add the mean value over one day

Date : introduce pd.Timestamp. Filter on single epoch or date range.
Compute number of visible satellites at particular epochs

Add a column to an existing DataFrame
https://www.geeksforgeeks.org/different-ways-to-iterate-over-rows-in-pandas-dataframe/