> "We have uploaded some data from our recent Spaarnwoude trial for you to get a first look at.
> 
> 
> All survey results are comprised of two files: files ending with **_VCOG** are the **Vehicle Center of Gravity**, ie, the point reported is the center of the measurement frame on the cart. From this center we can take the algorithm‚Äôs calculated offset in X,Y,Z and using the cable yaw from the algorithm we can determine where the cable is positioned in the real world.
> 
> In some of the VCOG or CCOG files there might be a column called **cable lock**. This is an integer value based on the variance of the measurements and serves to guide as an indicator of whether or not the cable tracker is able to find a good solution.
> 
> The **CCOG** files are the **Cable‚Äôs Real World Position** as output by NavAQ (a hydrographic survey recording suite that we use for fieldwork). So it‚Äôs essentially cart X,Y and Z with cable offset X, cable offset Y and cable offset Z applied.
> 
> **The Problem:** The Spaarnwoude data has a problem; the GPS is a twin-receiver GPS that is capable of outputting heading. This is a new unit for us and it was mounted perpendicular to the direction of travel. The heading is reported from the secondary antenna towards the primary antenna but this was something we got mixed in our initial setup, so in order to correct for the fact that the heading orientation is wrong with regard to the cart‚Äôs travel direction, **we applied a -90 degree correction, which should have been a 90 degree correction.**
> 
> Annoying thing is that we can correct this in the hydrographic survey software and see the correct results, but we can‚Äôt export them right now... Time permitting, I might just draft up the algebra to do the warp myself.
> 
> Unfortunately, with the heading being wrong, experiments 1 and 2 are wrong. These were lines where we did a cart run over the cable‚Äôs actual position. So these should have recorded the cable‚Äôs real world position with 5-10cm accuracy but since the heading is wrong, the real world position is wrong as well at the moment."
>


In [14]:
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
from pathlib import Path
from omegaconf import OmegaConf
# -------------------------------------------------
# LOAD CONFIG
# -------------------------------------------------
config_path = "config.yaml"
config = OmegaConf.load(config_path)
folder_path = Path(config.folder_path)

# Output directory
output_dir = Path("data/renamed/")
output_dir.mkdir(parents=True, exist_ok=True)

# -------------------------------------------------
# FIND ALL VCOG FILES
# -------------------------------------------------
vcog_files = list(folder_path.glob("*VCoG*.csv"))
print(f"üîç Found {len(vcog_files)} VCOG files")

for i, file in enumerate(vcog_files, 1):
    #print(f"‚öô loading ‚Üí {file.name}")
    df = pd.read_csv(file, header=None)
    globals()[f"df{i}_VC"] = df
    print(f"‚úÖ loaded ‚Üí {file.name} with name df{i}_VC")

# -------------------------------------------------
# FIND ALL CCOG FILES
# -------------------------------------------------
ccog_files = list(folder_path.glob("*CCoG*.csv"))
print(f"üîç Found {len(ccog_files)} CCOG files")

for i, file in enumerate(ccog_files, 1):
    #print(f"‚öô loading ‚Üí {file.name}")
    df = pd.read_csv(file, header=None)
    globals()[f"df{i}_CC"] = df
    print(f"‚úÖ loaded ‚Üí {file.name} with name df{i}_CC")


üîç Found 5 VCOG files
‚úÖ loaded ‚Üí Exp_1_VCoG_TOC.csv with name df1_VC
‚úÖ loaded ‚Üí Exp_2_VCoG_TOC_EW.csv with name df2_VC
‚úÖ loaded ‚Üí Exp_3_VCoG_2m_OL_Ncable_WE.csv with name df3_VC
‚úÖ loaded ‚Üí Exp_4_VCoG_4m_OL_Ncable_EW.csv with name df4_VC
‚úÖ loaded ‚Üí Exp_5_VCoG_6m_OL_Ncable_WE.csv with name df5_VC
üîç Found 5 CCOG files
‚úÖ loaded ‚Üí Exp_1_CCoG_TOC.csv with name df1_CC
‚úÖ loaded ‚Üí Exp_2_CCoG_TOC_EW.csv with name df2_CC
‚úÖ loaded ‚Üí Exp_3_CCoG_2m_OL_Ncable_WE.csv with name df3_CC
‚úÖ loaded ‚Üí Exp_4_CCoG_4m_OL_Ncable_EW.csv with name df4_CC
‚úÖ loaded ‚Üí Exp_5_CCoG_6m_OL_Ncable_WE.csv with name df5_CC


## Analysis VCoG Files

In [21]:
df2_VC, df4_VC

(                0           1          2           3      4     5    6    7  \
 0    1.759746e+09  121557.031  619511.00  5806174.96  41.47 -89.7  7.0  3.4   
 1    1.759746e+09  121557.235  619511.00  5806174.96  41.47 -89.7  7.0  3.4   
 2    1.759746e+09  121557.451  619511.00  5806174.96  41.47 -89.7  7.0  3.4   
 3    1.759746e+09  121557.655  619511.00  5806174.96  41.47 -89.7  7.0  3.4   
 4    1.759746e+09  121557.857  619511.00  5806174.96  41.47 -89.7  7.0  3.4   
 ..            ...         ...        ...         ...    ...   ...  ...  ...   
 325  1.759746e+09  121702.025  619466.12  5806176.02  41.50 -84.3  6.9  3.3   
 326  1.759746e+09  121702.229  619466.12  5806176.02  41.50 -84.3  6.9  3.3   
 327  1.759746e+09  121702.432  619466.12  5806176.02  41.50 -84.3  6.9  3.3   
 328  1.759746e+09  121702.634  619466.12  5806176.02  41.50 -84.3  6.9  3.3   
 329  1.759746e+09  121702.836  619466.12  5806176.02  41.50 -84.3  6.9  3.3   
 
       8  9  
 0   NaN  1  
 1   NaN  

In [22]:
print(df1_VC.info(), df2_VC.info(), df3_VC.info(), df4_VC.info(), df5_VC.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3926 entries, 0 to 3925
Data columns (total 10 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   0       3926 non-null   float64
 1   1       3926 non-null   float64
 2   2       3926 non-null   float64
 3   3       3926 non-null   float64
 4   4       3926 non-null   float64
 5   5       3926 non-null   float64
 6   6       3926 non-null   float64
 7   7       3926 non-null   float64
 8   8       0 non-null      float64
 9   9       3926 non-null   int64  
dtypes: float64(9), int64(1)
memory usage: 306.8 KB
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 330 entries, 0 to 329
Data columns (total 10 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   0       330 non-null    float64
 1   1       330 non-null    float64
 2   2       330 non-null    float64
 3   3       330 non-null    float64
 4   4       330 non-null    float64
 5   5       330 non-null    floa

In [24]:
(df1_VC.describe(), df2_VC.describe(), df3_VC.describe(), df4_VC.describe(), df5_VC.describe())

(                  0              1              2             3            4  \
 count  3.926000e+03    3926.000000    3926.000000  3.926000e+03  3926.000000   
 mean   1.759745e+09  120685.378097  619483.169521  5.806124e+06    41.426877   
 std    2.266968e+02     378.407687      42.859000  3.841919e+01     0.082066   
 min    1.759745e+09  120030.777000  619430.170000  5.806074e+06    41.200000   
 25%    1.759745e+09  120347.016750  619440.892500  5.806092e+06    41.380000   
 50%    1.759745e+09  120703.256000  619474.185000  5.806113e+06    41.420000   
 75%    1.759745e+09  121019.486500  619536.080000  5.806173e+06    41.480000   
 max    1.759746e+09  121335.722000  619545.620000  5.806179e+06    41.650000   
 
                  5            6            7    8            9  
 count  3926.000000  3926.000000  3926.000000  0.0  3926.000000  
 mean    -12.773867     5.107539     3.422873  NaN     0.987774  
 std     106.746653     1.422777     1.219275  NaN     0.109908  
 min 

In [64]:
def plot_distributions_plotly(df, df_name="df"):
    """
    Generates and displays an interactive grid of histograms for each numerical column 
    in a DataFrame using Plotly Express and Plotly subplots.
    """
    import plotly.graph_objs as go
    from plotly.subplots import make_subplots
    import numpy as np

    numeric_cols = df.select_dtypes(include=np.number).columns
    cols_to_plot = [col for col in numeric_cols if df[col].nunique() > 1]

    n_plots = len(cols_to_plot)
    n_rows, n_cols = n_plots//3, n_plots//3 + (n_plots % 3 > 0)
    if n_rows == 0:
        n_rows = 1
    fig = make_subplots(rows=n_rows, cols=n_cols, subplot_titles=[str(col) for col in cols_to_plot])

    for idx, col in enumerate(cols_to_plot):
        row = idx // n_cols + 1
        col_idx = idx % n_cols + 1
        fig.add_trace(
            go.Histogram(x=df[col], name=str(col)),
            row=row, col=col_idx
        )

    fig.update_layout(
        height=400 * n_rows,
        width=400 * n_cols,
        title_text=f"Distributions of Numeric Columns in {df_name}",
        showlegend=False,
        template='plotly_white'
    )
    fig.show()


# 2. Call the function
plot_distributions_plotly(df1_VC, df_name="df1_VC")

## Cable Lock 

From the above dfs, it is clear that integer column is the last column in all dfs, since it is consistently 1 and integer in all of the datasets.

## Vehicle's X, Y, Z


From the information in the desciption and experiment_log.csv, We are able to identify the x, y, z of the vehicle. 

### Z 
As the vehicle is being tested in the netherlands, there is supposedly no major height difference between start and end times. Hence the height column must be near-constant in all of the dfs.This can visualized using distribution graphs of the columns. We already did it above section.

From the above graphs, column 4 has the smallest near-constant range. Let's verify that using other dfs as well. 

In [67]:
# Only plot the distribution of the fourth column (index 4) for each VCoG dataframe
plot_distributions_plotly(df1_VC[[4]], df_name="df1_VC")
plot_distributions_plotly(df2_VC[[4]], df_name="df2_VC")
plot_distributions_plotly(df3_VC[[4]], df_name="df3_VC")
plot_distributions_plotly(df4_VC[[4]], df_name="df4_VC")
plot_distributions_plotly(df5_VC[[4]], df_name="df5_VC")

### X, Y

From the description of expeiments logs, we can see that each experiment is happening 2m up north. 
Experiment 2 and 4 are both starting from East to West and just 4 meters apart. Same goes for Experiment 3 and 5 being 4 meters apart and heading in the same direction, i.e., West to East. 
![Alt text](data/description.png)

Hence, if we subtract these two dfs i.e. df2, and df4  or df3 or df5, columns with value around 4 would be Y axis. X should be constant as well


In [72]:
df5_VC- df3_VC, df4_VC- df2_VC

(           0        1     2     3     4    5    6    7   8    9
 0    308.976  508.976  0.35  3.76 -0.05  3.1  0.0  0.0 NaN  0.0
 1    308.979  508.979  0.35  3.75 -0.06  3.2  0.0  0.0 NaN  0.0
 2    308.979  508.979  0.35  3.75 -0.06  3.3  0.0  0.0 NaN  0.0
 3    308.976  508.976  0.35  3.75 -0.06  3.0  0.0  0.0 NaN  0.0
 4    308.973  508.973  0.35  3.75 -0.06  3.3  0.0  0.0 NaN  0.0
 ..       ...      ...   ...   ...   ...  ...  ...  ...  ..  ...
 501      NaN      NaN   NaN   NaN   NaN  NaN  NaN  NaN NaN  NaN
 502      NaN      NaN   NaN   NaN   NaN  NaN  NaN  NaN NaN  NaN
 503      NaN      NaN   NaN   NaN   NaN  NaN  NaN  NaN NaN  NaN
 504      NaN      NaN   NaN   NaN   NaN  NaN  NaN  NaN NaN  NaN
 505      NaN      NaN   NaN   NaN   NaN  NaN  NaN  NaN NaN  NaN
 
 [506 rows x 10 columns],
             0         1     2     3     4    5    6    7   8    9
 0    3376.191  9656.191  0.86  3.75 -0.04 -5.8 -7.0 -3.4 NaN  0.0
 1    3376.191  9656.191  0.86  3.75 -0.04 -5.8 -7.0 -3.4 

We have identified Column 2 to be x and column 3 to be Y since Y remains nearly 4 in all of the cases.

## Cable Yaw (Angle)

From the distribution graphs, Cable Yaw is the column 5 as evident from the discrete distribution occuring at 90, -90 and 0. 

In [76]:
# Only plot the distribution of the fourth column (index 4) for each VCoG dataframe
i=5
plot_distributions_plotly(df1_VC[[i]], df_name="df1_VC")
plot_distributions_plotly(df2_VC[[i]], df_name="df2_VC")
plot_distributions_plotly(df3_VC[[i]], df_name="df3_VC")
plot_distributions_plotly(df4_VC[[i]], df_name="df4_VC")
plot_distributions_plotly(df5_VC[[i]], df_name="df5_VC")

# Calculated Cable Location X,Y

Columns 6 and 7 are caluclated cable locations from the cart center

## Analysis of CCOG

In [119]:
df5_CC

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11
0,1.759749e+09,131436.547,619465.4043,5.806177e+06,40.3795,5.22,0.21,-0.99,0.03,-1.00,-0.03,1
1,1.759749e+09,131436.752,619465.3934,5.806177e+06,40.3602,4.98,0.16,-1.00,0.03,-1.00,-0.02,1
2,1.759749e+09,131436.953,619465.4871,5.806177e+06,40.3399,5.20,0.30,-1.02,0.05,-1.00,-0.02,1
3,1.759749e+09,131437.156,619465.5339,5.806177e+06,40.3239,5.16,0.33,-1.04,0.06,-1.00,-0.04,1
4,1.759749e+09,131437.356,619465.2793,5.806177e+06,40.2638,5.36,0.12,-1.10,0.02,-1.00,-0.03,1
...,...,...,...,...,...,...,...,...,...,...,...,...
422,1.759749e+09,131601.003,619512.0160,5.806176e+06,40.5291,4.98,0.63,-0.88,-0.12,0.99,0.04,1
423,1.759749e+09,131601.205,619511.4420,5.806176e+06,40.5582,4.81,0.03,-0.85,-0.00,1.00,0.02,1
424,1.759749e+09,131601.409,619511.1631,5.806176e+06,40.5604,4.99,-0.22,-0.85,0.05,1.00,0.02,1
425,1.759749e+09,131601.625,619511.7547,5.806176e+06,40.5076,4.92,0.36,-0.90,-0.07,1.00,0.04,1


Let's explore the two equivalent experiments as discussed above. The reading of x, y axis should be similar while offset should increase from nearly 0 to 4. 

In [122]:
df5_CC- df5_VC

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11
0,0.0,0.0,-0.7657,-5.1725,-0.9905,-95.48,0.21,-0.99,,-2.00,,
1,0.0,0.0,-0.7766,-4.9251,-0.9998,-95.82,0.16,-1.00,,-2.00,,
2,0.0,0.0,-0.6829,-5.1616,-1.0201,-95.60,0.30,-1.02,,-2.00,,
3,0.0,0.0,-0.6361,-5.1297,-1.0361,-95.54,0.33,-1.04,,-2.00,,
4,0.0,0.0,-0.8907,-5.2896,-1.0962,-95.44,0.12,-1.10,,-2.00,,
...,...,...,...,...,...,...,...,...,...,...,...,...
422,0.0,0.0,-0.0640,-5.0167,-0.8809,-93.02,0.63,-0.88,,-0.01,,
423,0.0,0.0,-0.6380,-4.7725,-0.8518,-93.19,0.03,-0.85,,0.00,,
424,0.0,0.0,-0.9169,-4.9100,-0.8496,-93.01,-0.22,-0.85,,0.00,,
425,0.0,0.0,-0.3253,-4.9181,-0.9024,-93.08,0.36,-0.90,,0.00,,


In [116]:
df5_VC

Unnamed: 0,0,1,2,3,4,5,6,7,8,9
0,1.759749e+09,131436.547,619466.17,5806182.39,41.37,100.7,0.0,0.0,,1
1,1.759749e+09,131436.752,619466.17,5806182.38,41.36,100.8,0.0,0.0,,1
2,1.759749e+09,131436.953,619466.17,5806182.38,41.36,100.8,0.0,0.0,,1
3,1.759749e+09,131437.156,619466.17,5806182.38,41.36,100.7,0.0,0.0,,1
4,1.759749e+09,131437.356,619466.17,5806182.38,41.36,100.8,0.0,0.0,,1
...,...,...,...,...,...,...,...,...,...,...
422,1.759749e+09,131601.003,619512.08,5806180.59,41.41,98.0,0.0,0.0,,1
423,1.759749e+09,131601.205,619512.08,5806180.59,41.41,98.0,0.0,0.0,,1
424,1.759749e+09,131601.409,619512.08,5806180.59,41.41,98.0,0.0,0.0,,1
425,1.759749e+09,131601.625,619512.08,5806180.59,41.41,98.0,0.0,0.0,,1


## X, Y, Z, offset
Column 2, 3 and 4 are X,Y,Z as they are similar in both experiments indicating same starting and endning positions. Column 5 is offset from cable's location ideally increasing by 2m from experiment 2 to experiment 5. 

In [None]:
df3_CC, df5_CC

(               0           1            2             3        4     5     6   \
 0    1.759749e+09  130927.571  619465.5536  5.806176e+06  40.5361  2.22  0.03   
 1    1.759749e+09  130927.773  619465.5603  5.806176e+06  40.5404  2.20  0.03   
 2    1.759749e+09  130927.974  619465.5275  5.806176e+06  40.5367  2.21 -0.00   
 3    1.759749e+09  130928.180  619465.5472  5.806176e+06  40.5334  2.21  0.02   
 4    1.759749e+09  130928.383  619465.5801  5.806176e+06  40.5396  2.21  0.05   
 ..            ...         ...          ...           ...      ...   ...   ...   
 501  1.759749e+09  131107.819  619512.1414  5.806175e+06  40.5131  1.80  0.07   
 502  1.759749e+09  131108.022  619512.1371  5.806175e+06  40.5204  1.80  0.06   
 503  1.759749e+09  131108.224  619512.1293  5.806175e+06  40.5124  1.80  0.06   
 504  1.759749e+09  131108.429  619512.1236  5.806175e+06  40.5191  1.80  0.05   
 505  1.759749e+09  131108.633  619512.1322  5.806175e+06  40.5141  1.81  0.06   
 
        7     

In [126]:
# file: column_mappings.py

columns_names_vcog = {
    0: 'Timestamp_Epoch',
    1: 'Timestamp_Seconds',
    2: 'Cart_X',
    3: 'Cart_Y',
    4: 'Cart_Z',
    5: 'Yaw_angle',
    6: 'Cable_X',
    7: 'Cable_Y',
    8: 'Unused',
    9: 'Cable_Lock'
}

columns_names_ccog = {
    0: 'Timestamp_Epoch',
    1: 'Timestamp_Seconds',
    2: 'Cable_X',
    3: 'Cable_Y',
    4: 'Cable_Z',
    5: 'Cable_offset_distance',
    6: 'Offsets_1',
    7: 'Offsets_2',
    8: 'Offsets_3',
    9: 'Offsets_4',
    10: 'Offsets_5',
    11: 'Cable_Lock'
}
import json

with open("data/columns_vcog.json", "w") as f:
    json.dump(columns_names_vcog, f, indent=4)

with open("data/columns_ccog.json", "w") as f:
    json.dump(columns_names_ccog, f, indent=4)
