## Documentation of Point Transformer modifications

#### Gazebo Data Structure
05.02.2025 
- 9988 clouds w/poses
- gazebo_pc_record_full_12_42_labeled_1024
    - clouds/
    - poses/

    Up/Downsampling used:
    - clouds with < 1024: random duplication with slight gaussian noise
    - clouds with > 1024: FPS (takes a long time)
    - not removing any clouds

#### 1. train_pose.py
Dataloader: SimNet or ScanNet

#### 2. pointtransformer_pose.py
```self.radius = 1.5```
```input_dim```: change based on number of one-hot encoded classes used

In [None]:
# uniform=False means our .ply files are already downsampled to 1024 points


#### 3. Translation-only prediction: _t.py
- use train_pose_t.py, test_pose_t.py, and pointtransformer_pose_t.py

Model Training+Inference Process: 
0. ```$ conda activate point-transformer```
1. train_pose.py
    - pointtransformer_pose.py: contains PoseLoss()
        - quaternion_to_rotation_matrix() assumes wxyz format as of 12.02.25
2. test_pose.py
    - manually specify best model saved during training to use for inference
    - writes results as .json in XYZW format
3. overlay_pose_estimate.py (MAC)
    - reads from json and for each cloud file we used in training set (which the .json stores the pose prediction for), overlays the cloud on the stl model for visual evaluation.

## Evaluation+Improvements:
- Rotate cloud by prediction, then see how much they overlap (using some metric like average distance of model points to transformed cloud (AD/ADS))
- Need more input (number of scans, labels, etc)?
- Use the ground truth model somehow? i.e. dense ship cloud?
- Experiment with MSG radius
- Data augmentation beyond sensor pose variation
- Examine what the Point Transformer portion learns. How is it representing scans?
- Study how point distribution affects prediction (stern deck is denser than port, for eg.)
- Try maybe an ICP approach for refinement?
    - For this, first plot the input cloud in the space that the model reads it in. The idea is we want to see whether the raw clouds are "perturbed" enough by default, (since they're in the sensor frame), or whether the model actually does learn most of the rotation in the SE3 space.
- Big IDEA for now: review how the input scans are represneted to the model, how+what does it learn about the scan that lets it be roughly transformed "correctly" to the ship in pyvista.
- Try this with the 4.0 val error model. How much do more epochs improve the predictions?

#### Initial Observations:
- Is translation is struggling more than rotation? (either too low or too high)
- 008054.txt is quite off in the yaw direction

#### 13.02:
- [x] Predict position first
    - Don't forget about unit sphere scaling + centroid and how that affects it!
- [ ] read more about decoupling rot and trans. does this allow the model to learn each better? i.e. would I try to learn rotation around origin? This again fits into the unit sphere situation i have right now.
- [ ] Even distribution of scans in dataset (i.e. reduce bias towards close scans)
    - Also means limit range of scan (i.e max furthest point from sensor)
- [ ] visualize top K 
- [x] quickly NEXT, train by concatenating scale, centroid?
    - no real impact on result
- [ ] try quaternion loss
- [ ] NEXT, try training without unit-sphere scaling (just centroid)
    - might need to experiment with adjusting radii if not unit-sphere, or concatenating global scale? think...
- [x] try with trained data (not much of a difference)
- [ ] use ICP (geometric registration problem)? after transformer gets approximate pose
    - "In order to improve the registration perfor-
mance, features on point clouds are also introduced for
matching. "
- [ ] loss function predicting angle-axis, but does this cause issue with conversion?
    - try different loss function (see GDR-Net), other parameterization besides angle axis
- [ ] then try keypoint prediction
    - plot the predicted keypoints after unit sphere scaling applied to them to ensure 

Question: should validation be used durign training and fed back? Or just printed?


QUATERNION FORMAT: WXYZ or XYZW: 12.02.25
- gt stored as WXYZ quaterion
- model learns angle-axis
    - within loss function, its converted to rotation matrix to compare with ground truth (also converted to matrix)
- NOTE: during inference (test_pose.py), we write pred_quat as XYZW to results.json!!
    -  now writes ground truth gt_quat to be XYZW to results.json
- on Mac, in overlay_pose_estimate.py, reads from results.json which has all XYZW format.

<!-- TODO: ground truth should be as XYZW too -->

### TODOs
- Add plotting information to thesis document. Explain how i'm evaluating the results.

- Tried directly regressing pose (translation resitual), but still not perfect. Before working with more/better dataset and dataset fine tuning, keep it how it is, and try

    a) 

        - TRY THIS:Once predicting R,t, apply it to cloud and compute distance to gound truth model. use this as part of the loss (where alignment has lowest loss)
        - this way the model also can compare the actual cloud error vs just the rotation error. Since we keep model in unnormalized space, this may be a work around.
    b)

        - TRY THIS: use PNP algorithm for predicting key points and then compute pose from those keypoints.

    c)  - Still try to get centroid only (i.e. no scale) to work 

Once the "Transformer" is as good as possible, then:
    a) Explore pose refinement (some ICP with ground truth ship model (since we have it for CAD and real ship))

- keep everyting in normalized for training, but maybe give it some information in addition to this info?? Just appending before pose head didn't do anything.
    - IDEA: can we make the "label" be the scale? So each scan cloud has a 4th entry which is the scale computed for that cloud? But then again, that's the same issue as not having a normalized input vector to begin with (i.e. all numbers are [-1,1] but 4th is large)
- Clip scan cloud ranges, and experiment with query_ball_point radius (in case < nsample nearest points are found) 
- Think what ways we can use the Point Transformer output? It's basically just a good cloud feature representation.
- Explore pose refinement (some ICP with ground truth ship model (since we have it for CAD and real ship))

- ONLY THEN: try real data
    - will allow model to learn with real noise

### Next Approach - Feb 21
1. Try different ICP models between scan cloud and ground truth point cloud at inference time.
    - [x] GICP (Open3D)
    - [ ] Go-ICP (global one)
    - [ ] small_gicp
2. Incorporate alignment error into model training architecture 
    - [ ] Apply R,t to cloud and compute alignment error: (_D. 3D Rotation Regression. PoseCNN paper_)
        - [ ] error using point-to-point. Compute error btw both clouds and add it to the loss function.
        - [ ] error comparing scan and global feature. Compute feature vector of transformed cloud and ground truth cloud, where alignment of the two have lower loss (key: what does the "feature difference" look like?).
    - [ ] If not, try PointNetLK to iteratively refine R,t instead of predicting PoseLoss
    - See Deep Closest Point, PCRNet, Go-ICP, DeepVCP/DeepCLR,
    - Explore whether point-to-point or feature-to-feature is better (since we're not registering identical scans)
3. Other ideas:
    - __Resolve scale problem:__
        - concatenate scale to each point in cloud. Challenge: not normalized either
            - Sol: normalize scale dataset wide? Yes, but save the mean-std scales computed on the trianing dataset to apply to the scan at inference.
        - make scale prediction MLP in additon to R,t (_Disentangled 6D Pose Loss.GDR-Net paper_)
            - doesn't force the feature extraction layers to also learn scale, since its all tied into the final estimate.
        - maybe alignment/chamfer solves it. Challenge: may be costly to compute.


The BIG IDEA: we keep Point Transformer in unit sphere space since it can learn features fairly well, but then we a) improve refinement to be robust (deep learning registration options) and/or b) let the model learn with alignment error (in some combination of the two). __Features are more robust than points__

#### QUICK IDEAS 
25.Feb:
- add scale to 4th col point feature (with avg normalized across dataset)
- try alignment loss function (option 1, option 2)
- "intelligent" ICP

## Test List - Point Transformer

#### NOTE: 16.02-19.02 Using squared L2 norm



#### ------------------------ BEST RUN ------------------------
_2025-02-20_21-50_:
```
{'num_points': 1024, 'batch_size': 11, 'use_labels': False, 'optimizer': 'RangerVA', 'lr': 0.001, 'decay_rate': 1e-06, 'epochs': 60, 'dropout': 0.4, 'M': 4, 'K': 64, 'd_m': 512, 'alpha': 10, 'beta': 1, 'radius_max_points': 32, 'radius': 0.2, 'unit_sphere': True}

Using L2 Norm, 12 Dataset
```
#### ------------------------------------------------------------

#### Test 16.02.25 (1): add scale term to pose head input (_t only)
NOTE!!: initially forgot to multiply `predicted_translation` by `scale`

Result dir: pose_estimation/2025-02-16_15-59
Observation: loss not much better original approach, 

Changes: 

- pointtransformer_pose_t.py
    ```
    self.translation_mlp = nn.Sequential(
    nn.Linear(dim_flatten+1, 512),
    ```

- within PointTransformer.forward():
    ```
    # Flatten the feature vector for MLP heads
    global_features = torch.flatten(embedding, start_dim=1)  # [B, dim_flatten]
    if scale.dim() == 1:
        scale = scale.unsqueeze(1)  # Expands shape from [B] to [B, 1]
    # Predict translation residual (normalized space)
    translation_input = torch.cat([global_features, scale], dim=1)
    ```

#### Test 16.02.25 (2): Hybrid - Only Centroid, No Scale (INCOMPLETE)


Set config.unit_sphere = False. The issue currently is that conv2d dimensions for the ball query search aren't working. 

#### Test 19.02.25 (1): Regular training run
Details: 
- ```predicted_translation = predicted_translation_residual * scale + centroid``` applied to model output in PoseLoss()
- 'alpha': 20, 'beta': 2, 'radius_max_points': 32, 'radius': 0.2

Result dir: 2025-02-19_15-10

Observation: decent R, t prediction

#### Test 19.02.25 (2): Learning absolute translation (not residual)
Details: ```predicted_translation = predicted_translation_residual```

Result dir: 2025-02-19_15-20

Observation: terrible translation prediction

#### Test 19.02.25 (3): Regular training run
Details: 'radius_max_points': 32, 'radius': 0.1,

Result dir: 2025-02-19_20-57 (100 epoch: 2025-02-19_23-03, 'radius': 0.2)

Observation: regular

#### Test 19.02.25 (4): Regular training run SimNet2
Details: same as 2025-02-19_20-57, except on SimNet2 (i.e. 15.41 dataset)

Result dir: 2025-02-19_21-27 (100 epoch: 2025-02-19_23-05, 'radius': 0.2)

Observation: better than fewer epochs, but still not much different... a bit worse visually than 12.42 dataset.

#### Test 21.02.25 (1/2): L2 Loss (not squared L2 Loss)
Details: corrected L2 Loss with two different learning rates. Note, training uses lr scheduler.

Result dir: 2025-02-20_21-49 (for 'lr': 0.0005), 2025-02-20_21-50 (for 'lr': 0.001)

Observation: translation a bit better, but rotation not

#### Test 21.02.25 (3):'radius_max_points': 16,'radius': 0.1
Details: tried using default params as specified in papers

Result dir: 2025-02-21_12-33

Observation: quite a lot worse than 32,'radius': 0.2. Likely a property of the clouds that are captured.

#### Test 25.02.25 (1):'lr scheduler step_size': 15, dataset = merged 12,15 (SimNet15)
Details: reduced lr scheduler step size

Dataset: using merged datasets `gazebo_pc_record_full_12_42_1024` and `gazebo_pc_record_full_15_41_1024`.

Result dir: 2025-02-25_14-01 (on MAC, 02-25_16-35)

Observation: similar to best

#### Test 25.02.25 (2):Scale MLP layer
Dataset: 12

Details: lr scheduler step size=15, added scale_mlp(),  and model now predicts scale 

Changes:
- In PoseLoss()
    ```
        loss_scale = F.mse_loss(pred_scale, gt_scale)
        total_loss = (self.alpha * loss_t) + (self.beta * loss_r) + loss_scale
    ```

- in PointTransformer.forward()
    ```
        predicted_scale = self.scale_mlp(global_features)
        scale = scale.unsqueeze(1)  # Expands shape from (B,) to (B,1)
        predicted_translation = predicted_translation_residual * predicted_scale + centroid

    ```

Result dir: 2025-02-25_13-35 (on MAC, 02-25_16-00)

Observation: similar to best
TODO: try with normalized scale instead

#### Test 27.02.25 (1): Larger+more MLP layers
Dataset: 12

Details: lr scheduler step size=15,  made pose MLP heads start at 4096, but shoudl also try 1024

Result dir: 
- 4096:
    - 2025-02-27_12-06
- 1024:
    - 2025-02-27_12-11 (on MAC: 2025-02-27_13-32)

Observation: slightly worse than 16-00 and 16-35

#### Test 03.03.25 (1): New Dataset (far, normal)
Dataset: SimNet_far

Details: lidar samples=512, using truncated normal dist.

Result dir: 
- 2025-03-02_14-49

Observation: No real difference

#### Test 03.03.25 (2): New Dataset (close, normal)
Dataset: SimNet_close

Details: lidar samples=512, using truncated normal dist. reduced theta for C points so we don't cover side of ship as much. 

Result dir: 
- 2025-03-03_12-02 (60 epochs) (on MAC: 2025-03-03_13-48)
- 2025-03-03_14-21 (120 epochs) (on MAC: 2025-03-03_19-35)

Observation: 
- 60 epochs: roll usually better, since flight deck is large flat surface (easy to align)
    - GICP using closest point, which doesn't help translation alignment (14/500)
    - bad one: 13/500
- 120 epochs: marginally better

#### Test 03.03.25 (3): Merged Dataset (close+far, normal)
Dataset: SimNet_close + SimNet_far

Details:  

Result dir: 
- 2025-03-03_14-23 (on MAC: 2025-03-04_09-54)

Observation: 

## Test List - KPLoss

#### Test 14.03.25 (1): first key point prediction attempt (SCALE??). 
Dataset: SimNet_close

Details: DataLoader keypoints global scale, PointTransformer forward() *scale+centroid when return pred_keypoints

Result dir: 
- 2025-03-14_15-23 (on MAC: 2025-03-14_16-06)

Observation: best so far, can't reproduce?

#### Test 14.03.25 (2): global scale for keypoints
Dataset: SimNet_close

Details: DataLoader keypoints -centroid/scale. PointTransformer forward() *scale+centroid before loss

Result dir: 
- 2025-03-14_16-53 (on MAC: )

Observation: not decreasing

#### Test 14.03.25 (3): unit scale for keypoints
Dataset: SimNet_close

Details: DataLoader normalized keypoints, points. PointTransformer forward() doesn't scale back before loss computation (i.e. stay in unit scale)

Result dir: 
-  (on MAC: )

Observation: not decreasing

#### Test 19.03.25 (1): unit scale for keypoints (actually correct)
Dataset: SimNet_close

Details: Not using scale, centroid. all normalized for now.

Result dir: 
-  (on MAC: 2025-03-19_11-04)

Observation: shape preserved interestingly, not pose alignment??

#### Test 19.03.25 (2): unit scale for keypoints, stern+doghouse kp
Dataset: SimNet_close

Details: loss function with 1 1 0.3 1.2 2.5 for each of 5 terms (with weird "size loss" term)

Result dir: 
-  2025-03-19_12-28 (on MAC: 2025-03-19_14-37)

Observation: best keypoint prediciton so far, but not rectangular

#### Test 23.03.25 (1): just MLP for keypoint prediction, different weights. doghouse kp
Dataset: SimNet_close

Details: total loss around 0.002
- self.alpha = 1  # Class (cross entropy) scaling (not used really)
- self.beta = 3   # Smooth L1 scaling for regression
- self.delta = 5 # rotation loss scaling
- self.epsilon = 6 # center loss scaling

Result dir: 
-  2025-03-23_14-07 (on MAC: 2025-03-23_14-43)

Observation: lower with huber loss (smooth_l1_loss) than MSE loss
- rot_loss=0.000x, but keypoint and center loss both around 0.002. Predictions are "around" the doghouse, but not good at all, or rectangular.

KPLoss: "This is the correct architectural match for your problem. The classification-style output (flatten + MLP) is fundamentally mismatched to keypoint prediction because it discards spatial structure that the Point Transformer is designed to preserve."

## Test List - Decoder Attention Loss

#### Test 02.04.25 (1): keypoints predicted using Decoder idea (with per-point learning like in partseg)
Dataset: SimNet_close, keypoints_st_dg_few

Details:
    'alpha': 2,
    'beta': 5,
    'gamma': 3,
    'delta': 1,
    'epsilon': 1,
alignment loss: smooth_l1_loss

Result dir: 
-  2025-04-02_17-11 (on MAC: 2025-04-02_19-14)
- TODO: accidently deleted 2025-04-02_17-11 result dir. Rerun if desired.

Observation: Best keypoint prediction yet. Box shape + orientation very close to ground truth.

#### Test 03.04.25 (1): keypoints predicted using Decoder idea (with per-point learning like in partseg). Same as above, but with 140 keypoints
Dataset: SimNet_close, keypoints (140 doghouse)

Details:
    'alpha': 2,
    'beta': 5,
    'gamma': 3,
    'delta': 1,
    'epsilon': 1,
alignment loss: smooth_l1_loss

Result dir: 
-  2025-04-03_11-34 (on MAC: 2025-04-03_13-35)

Observation: Orientation not as good as 40 prediction, but shape preserved fairly well. Can play with hyperparams? But I think there are just too many points, and the doghouse may not be well represented in the scans (compared to more)

#### Test 03.04.25 (2): same as Test 02.04.25 (1), but with different config hyperparams 
Dataset: SimNet_close, keypoints (140 doghouse)

Details: 
    'alpha': 4,
    'beta': 6,
    'gamma': 5,
    'delta': 1,
    'epsilon': 1,
alignment loss: smooth_l1_loss

Result dir: 
-  2025-04-03_11-58 (on MAC:)

Observation: simiar to first, but a bit worse? TBD
        

#### REDO FIRST (i.e. Test 02.04.25 (1))

except following changes:
step_size=30
100 epochs

Result Dir: 2025-04-03_13-56 (On MAC: 2025-04-04_15-19)

Observation: BEST SO FAR (Keypoint alignment when reprojected in plot_kp_pred.py)

### FIX: z coordinate of cad_keypoints now moved -0.5m
Issue was that the points in ```stl_keypoints_cfg``` (which is parsed by keypoint_label_points.py) isn't shifted down. Thus, the resulting ```cad_keypoints_40_cfg_st_dg_few.txt``` was also not shifted down (to match the gazebo model being 0.5m below the z=0 plane where the sensor origin is set). 

#### Test 08.04.25 (1): with pose_loss and centroid loss (new)
Dataset: SimNet_close, keypoints_st_dg_few (40)

Details: 
    'alpha': 2,
    'beta': 3,
    'gamma': 5,
    'delta': 0.0005,
    'epsilon': 1,
alignment loss: smooth_l1_loss

NOTE: __beta is now centroid Loss__

Result dir: 
- 2025-04-08_16-41 (on MAC: 2025-04-08_17-28)

Observation: rot perfect, translation not

#### Attempt 2:
            'alpha': 2.5,
            'beta': 10,
            'gamma': 3.5,
            'delta': 0.005,
            'epsilon': 1,

Result Dir: 2025-04-08_17-49

#### Test 08.04.25 (2): with pose_loss and procrustes loss (old)
Dataset: SimNet_close, keypoints_st_dg_few (40)

Details: 
    'alpha': 2,
    'beta': 4,
    'gamma': 3.5,
    'delta': 0.0005,
    'epsilon': 1,
alignment loss: smooth_l1_loss

Result dir: 
- 2025-04-08_16-45 (on MAC: 2025-04-08_17-32)

Observation:

GitHub: see commit mentioning procrustes

#### Test 09.04.25 (1): with pose_loss and centroid loss, corected cad kp frames (see Pose Estimation from LiDAR chat)
Dataset: SimNet_close, keypoints_st_dg_few (40)

Details: 
    'alpha': 2.5,
    'beta': 10,
    'gamma': 3.5,
    'delta': 0.005,
    'epsilon': 1,
alignment loss: smooth_l1_loss

Result dir: 
- 2025-04-09_14-40 (on MAC: )

Observation: 

#### Test 09.04.25 (2): like (1), but without pose_loss
Dataset: SimNet_close, keypoints_st_dg_few (40)

Details: 
    'alpha': 2.5,
    'beta': 10,
    'gamma': 3.5,
    'delta': 0.0,
    'epsilon': 1,
alignment loss: smooth_l1_loss

Result dir: 
- 2025-04-09_14-58 (on MAC: )

Observation: 

#### Test 10.04.25 (1): kabsch, centroid loss
Dataset: SimNet_close, keypoints_st_dg_few (40)

Details: 
            'alpha': 4,
            'beta': 2,
            'gamma': 2,
            'delta': 0,
            'epsilon': 1,
alignment loss: smooth_l1_loss

Result dir: 
- 2025-04-10_18-09 (on MAC: 2025-04-10_19-48)

Observation: bad R, t (issue after keypoints predicted, but keypoints look close)

#### Test 10.04.25 (2): kabsch, procrustes loss (no pose_loss delta)
Dataset: SimNet_close, keypoints_st_dg_few (40)

Details: 
            'alpha': 2,
            'beta': 4,
            'gamma': 5,
            'delta': 0.0,
            'epsilon': 1,
alignment loss: smooth_l1_loss

Result dir: 
- 2025-04-10_19-57 (on MAC: 2025-04-11_11-26)

Observation: 

### R,t pred in results.json working
Everything working training (corrected Kabsch for translation)

            'alpha': 2,
            'beta': 4,
            'gamma': 5,
            'delta': 0.0001,
            'epsilon': 1,
2025-04-11_19-24 (On MAC: 2025-04-11_20-31)
Training crashed at epoch 19 (val loss went to nan)

            'alpha': 5,
            'beta': 4,
            'gamma': 7,
            'delta': 0.0,
            'epsilon': 1,
2025-04-11_19-50 (On MAC: 2025-04-13_12-27 (NO GICP) AND 2025-04-16_13-57 (WITH GICP)) 
__BEST ONE__

MAKE SURE TO UPDATE test_pose.py hyperparams based on those used for the selected best_model.pth is used

## REAL DATA

### Real Dataset: ScanNet
2025-05-05_16-44 (inference ON MAC: 2025-05-06_15-31)
- using simple keypoint + pairwise loss
- code saved under train_pose_real.py

### Sim Dataset: SimNet_close
2025-05-05_15-52
- using simple keypoints + pairwise loss

### TRAINED ON SIMULATED, INFERENCE ON REAL
On MAC: pose_est_inference/2025-05-08_11-56 (using 2025-05-05_15-52/best_model.pth for test_pose.py, but ScanNet Dataset)
- Observations: 
    - Only chanhged .pth for test_pose.py. When running on Mac (```results/trans_rot_error.py``` and ```results/viz_results_json```, only changed results.json file), results were worse. However, I didn't do any pre-training, checking whether I need to change anything else (like using STL model somewhere instead of yp cloud?), etc. TODO for later if he wants me to show that we can train with simulation ENTIRELY, and run inference on real scans (which also have noise that simulated training wasn't exposed to).
    - With lots of noise like water and people, the model struggles since those points aren't in Simluated training data. Since I don't do any localization of the ship in the environment for now, this is likely where things fail. 
    - Can also try augmenting

## GETTING READY TO WRITE:
ON MAC
- viz_results_json.py: will plot the predicted and refined R,t using the results.json file as input for all points, scale/centroid, keypoints and predicted/refined poses. Basically visualies the model inference results
    - FOR THESIS, change ```init_T_target_source``` to be ```np.eye(4)``` to show that initial guess from PoinTransformer is crutial. basically shows incorrect alignments of GICP.
- sim_model_gt_w_noise.py: 
    - manually check single scan (in trans_error/ directory)
    - Note that "w_noise" isn't really applicatble anymore in this file. originally, I added noise to the gt keypoints since I didn't have predictions yet. Now, however, I now read directly from the results.json copied from the server after test_pose.py runs.
        - Global methods all take very long (because it never reaches the converged pose, especialy for sparse side scans), so fast keypoint prediciton and refinement is best.

### small_gicp obsevations:
    - on MAC, takes 0.03sec
    - side scans fail worse with keypoint prediction because doghouse/stern (whcih are the keupoint defined regions) don't appear in the scan much. As such, it has little to work with.
        - direct pose regression may not have been much worse, but it's hard to visualize/understand/and may be more sensitive to noise? TEST THIS OUT FIRST by going back in commit history? Or just check the saved model
            - check if keypoints has better translation error (x direction for example)? I notice that even bad keypoint transformer prediction seems to align doghouse/net better

### Next Steps:
- try Simnet Close + Far datasets (need to generate keypoints for that again)

BIG DIFFERENCE SIM vs REAL Dataset: ship is 0.5m down in simualted

### FUTURE WORK:
- more keypoints (i.e. more ship regions). 
    - add classification to predict ship region in scan and THEN predict keypoints. This way model can predict specific ship keypoints based just on "what it sees".
- better real-world dataset (DLIO from UCLA)
    - concerned about real-world ground truth keypoint positions... need to see whether too much error is bad, or if the error still works with the refinement. mainly a dataset issue I presume.
    - HARD IDEA: instead of fixed keypoints, maybe just use "what's already there" and somehow make them dynamic keypoints? i.e. ground truths are hard and may not align too well with . Should use IMU instead? The initial "alignment" error in my manual frame correction continues for the remaining of the whole real-world flight.
- TO HANDLE REAL_WORLD NOISE, learn to segment the ship (i.e. isolate the ship from the scan better, then apply point-transformer to those points (like Frustum points paper does))
- LATER TODO: explore KeypointDETR paper which can get keypooints for other, "unkknown" ship classes?
SEMI-QUICK IMPROVEMENT: make sure all scans are "similarly scaled"... meaning no super far outlier (like single point at bow of ship) that skews the centroid and thus the SortNet query ball radius neighborhood search.

- challenges: normalizing for point transformer kills "global" translation scale and model has to infer it solely from scan cloud... which it can't do perfectly.keypoints are more reliable

### Keypoint Training Instructions:
Current Keypoint Datasets:
- 40 keypoints
    - Details: 20 stern, 20 doghouse
    -Per-scan keypoints in sensor frame: 
    ```data/SimNet_close/gazebo_pc_record_os0_rev06-32_r8_seed42_r_C_normal_mu6_std5_filt1.2_thetaBC-ov3_thetaBF-ov5_rf4-7_labeled_1024/keypoints_st_dg_few/```
    - CAD keypoints in ship frame: ```data/cad_keypoints_40_cfg_st_dg_few.txt```
- 140 keypoints
    - Details: 140 doghouse
    - Per-scan keypoints in sensor frame:
    ```/home/karlsimon/point-transformer/data/SimNet_close/gazebo_pc_record_os0_rev06-32_r8_seed42_r_C_normal_mu6_std5_filt1.2_thetaBC-ov3_thetaBF-ov5_rf4-7_labeled_1024/keypoints```
    - CAD keypoints in ship frame: ```data/cad_keypoints_140```
- To change the keypoints used for training

Code Changes:
train_pose.py/test_pose.py:
- config['num_keypoints']
- cad_keypoint_file

SimNetDataLoader:
- keypoints_dir


### Keypoint Training Ideas:
- Use dimensions from the ground truth point cloud somehow as part of the keypoint losses? I.e. examine again how the keypoints learned, and whether supervision from the ground truth model would help at all? See "Keypoint Prediction for 6DoF" Chat for history.

- TODO: try with MSE instead of Huber loss

## SIM-TO-REAL Training: see config.txt next to train results for hyperparams

### Sim-To-Real Research Goal (I):
- Train with SimNet simlation data, and test with ScanNet_train.

Train: 2025-05-15_17-03 
Test: (On MAC: 3 GICP iters in test_pose.py: On MAC: 2025-05-16_11-54)

Test with ScanNet_test: On MAC: 2025-05-16_14-01

### Sim+Real Training (II):
Pre-train using (I) (i.e. just SimNet). Continue training with ScanNet_train (only 15% of training data used though). Then test with ScanNet_test.

Pre-Train: 2025-05-16_13-26 (On MAC: 2025-05-16_13-59)

### Real Training (III)
Trained on ScanNet_train ONLY (no simulation). Tested on ScanNet_test.

2025-05-15_17-14 (On MAC: 2025-05-16_12-24)


### TODOS:

21.03: add more keypoints, change weights of loss, try with only doghouse

IDEA: fixed box of known size, and its vertices must be placed on the ship. The keypoints are contrained to be this size (how to define this contraint)??
- somehow use the are of the box formed by the key points (IoU?) using a graph-based approach? This way the points that are predicted are contrained
- The ball query still is picking up different features depending on whether the scan cloud is more spread out or not. when scaled and centered to unit sphere, i.e. 0.1m may correspond to different ranges in different scans.

On MAC: stl_keypoints_increase.py will take ship configuration and add points between the "sparse" bounding boxes. This writes to ship_structure_subdivided.json, which can be copied to stl_keypoints_cfg (making sure to paste the json results after "ship_structure = " since the *_increase.py file writes to the json only starting with the '{').

## Test List - GICP

#### ------------------------ BEST RUN ------------------------

#### Using _2025-02-20_21-50_ (i.e. 2025-02-21_12-22 on MAC after running test_pose.py):
- Uses TransformationEstimationForGeneralizedICP with max_iteration=1

- Under `model_ouput/2025-02-21_12-22`
    - fitness_rmse_15_fit_rmse.txt and refined_results fitness_rmse_15_fit_rmse.json
#### ------------------------------------------------------------


## Test List - Key Point Prediction

Adding keypoints to SimNetDataLoader with one-hot label depending on the region that the keypoints enclose. Normalized the same way as the scan.

Arhitecture: Point Transformer to learn features:  


## Pose Estimation Refinement: Two Training Strategies

### Option 1: Point-Level Alignment (Chamfer Distance)
#### Overview:
- After predicting the pose (R, t), apply it to transform the scan cloud.
- Compute alignment loss between the transformed scan and the ground truth ship cloud.
- Loss penalizes misalignment, improving pose predictions over training.

#### Steps:
1. Extract scan features using Point Transformer.
2. Predict pose (R, t) using Pose MLP.
3. Apply the predicted transformation to the scan cloud.
4. Compute Chamfer Distance between transformed scan and ship cloud.
5. Backpropagate alignment loss to refine pose predictions.

#### Loss Function:
- Pose loss (direct R, t error).
- Chamfer Distance loss (point-wise alignment error).
- Total loss combines both.

#### Key Benefits:
- Simple to implement and integrates well into direct pose regression.
- Improves translation prediction, especially with unit scaling issues.
- Does not require a learned feature vector for the ship model.

---

### Option 2: Feature-Based Alignment (Global Feature Comparison)
#### Overview:
- Instead of point-wise alignment, compare feature vectors between the transformed scan and the ship.
- Forces the network to learn feature spaces where aligned scans and ships are similar.

#### Steps:
1. Extract scan features using Point Transformer.
2. Predict pose (R, t) using Pose MLP.
3. Apply the predicted transformation to the scan cloud.
4. Extract a global feature vector from the transformed scan.
5. Extract a fixed global feature vector for the ship cloud.
6. Compute alignment loss based on feature similarity.
7. Backpropagate to refine both feature learning and pose prediction.

#### Loss Function:
- Pose loss (direct R, t error).
- Feature distance loss (L2 difference between transformed scan and ship features).
- Total loss combines both.

#### Key Benefits:
- Helps the model generalize better by enforcing feature-based alignment.
- Encourages the Point Transformer to extract features that naturally align after transformation.
- Reduces the need for iterative refinement at inference.

---

### Summary of Differences
| Feature | Option 1: Chamfer Distance | Option 2: Feature Alignment |
|---------|---------------------------|----------------------------|
| **Alignment Type** | Point-wise (geometry-based) | Feature-space (embedding-based) |
| **Ship Model Representation** | Fixed point cloud (no learned features) | Learned feature representation |
| **Loss Supervision** | Chamfer Distance between point clouds | L2 difference between feature vectors |
| **Computational Complexity** | Moderate | Higher due to feature extraction for ship model |
| **Training Effect** | Directly improves pose prediction | Also improves feature extraction for better pose alignment |

Both approaches refine pose predictions by integrating alignment feedback into training. Option 1 is easier to implement, while Option 2 may provide better generalization.


### Overfitting?
"When the validation loss stops decreasing, while the training loss continues to decrease, your model starts overfitting. This means that the model starts sticking too much to the training set and looses its generalization power. "