<a href="https://colab.research.google.com/github/kuock0129/GPU-Accelerated-3D-Machine-Learning/blob/main/HW1/3DMLGPU_HW1_Part2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 3DMLGPU - HW1 (Part 2)
In the 2nd part of HW1 assignment, we will continue our study of 3D ML datasets. We will work with objects in the mesh representation directly, as well as converting the object into other representations, such as a sampled point cloud and voxel grid.

Regarding QnA: Put your answers/results/analysis in a separate word/pdf report and submit along with this completed notebook. We are expecting brief, to-the-point answers for the questions asked in this notebook.

Note: You may not be able to run the model in one sitting. It is recommended that you connect your google drive to periodically save the model weights/visualizations/results etc. Complete one section at a time.


## Objaverse dataset
Objaverse is a very large collection of 3D objects (over 800,000 models in Objaverse 1.0). In this assignment, rather than using the entire dataset, we will select a manageable subset of object categories to work with, due to resource and time constraints. We will use the Objaverse Python API (via HuggingFace) to fetch object metadata and data. First, let's install and import the necessary libraries for accessing Objaverse and processing 3D models (point cloud sampling)

In [None]:
!pip install -q objaverse trimesh

In [None]:
import numpy as np
import trimesh
import torch

Now, we use the Objaverse API to load the dataset annotations. Objaverse provides a mapping of LVIS categories (a set of ~1200 object classes from an image dataset) to lists of object UIDs (unique identifiers for each 3D model). We will use this to understand the class coverage of Objaverse and to pick a subset of classes for our training.

In [None]:
import objaverse

# Load all object UIDs (just to know how many objects are in Objaverse)
uids = objaverse.load_uids()
print(f"Total number of objects in Objaverse: {len(uids)}")

# Load the LVIS category annotations (may take a few minutes)
lvis_annotations = objaverse.load_lvis_annotations()
print(f"Total number of LVIS categories in Objaverse: {len(lvis_annotations)}")

# Example: list a few categories and the number of objects in each
categories = list(lvis_annotations.keys())
print("Sample categories and object counts:")
for cat in categories[:5]:
    print(f"  {cat}: {len(lvis_annotations[cat])} objects")


## Task-4, Part-1 (10 points)
Select any 2 categories from `lvis_annotations`, visualize a sample point cloud and a voxel grid from each, and answer the following questions:
- Which library did you use to visualize the point cloud?
- How many object categories are there in Objaverse (LVIS annotations)? Are the categories balanced?
- What file format are the models stored in?
- Any other insights?


In [None]:
# [TODO] Your code here: select 1-2 categories, download one sample object from each,
# convert to point cloud and visualize it.

# HINT:
# - Choose a category, e.g., category_name = "Chair" (ensure it exists in lvis_annotations keys).
# - Pick one UID from lvis_annotations[category_name].
# - Download the object: paths = objaverse.load_objects([sample_uid]); path = paths[sample_uid]
# - Load the mesh (e.g., using trimesh): import trimesh; mesh = trimesh.load(path)
#   (If the mesh is a scene with multiple parts, you may need to merge them into one mesh.
#    e.g., mesh = trimesh.util.concatenate(list(mesh.geometry.values())) if mesh is a scene.)
# - Sample points: pts = mesh.sample(NUM_POINTS)  # e.g., NUM_POINTS = 1024
# - Visualize pts in 3D: you can use plotly.graph_objects.Scatter3d, or matplotlib 3D, etc.

# [TODO] Your code here

[TODO]: Your analysis here

## Task-4, Part 2 (13 points): Data Preparation

Now we will prepare a dataset of point clouds and labels from Objaverse for training our models. Choose a subset of object categories to use for the classification task (select 10 categories that have a large number of objects to ensure sufficient training data per class. Explain your reasoning when making this selection.). List your chosen categories. Using the lvis_annotations, gather all object UIDs for those categories. Next, split the objects in each chosen category into a training set and a test/validation set. A common split is 80% for training and 20% for testing, but you can choose an appropriate split (ensure that objects of the same category are divided and none of the test objects appear in training). For each object in the training and test sets:

1. Download the object file using objaverse.load_objects.

2. Load the mesh and sample a point cloud of a fixed size (e.g., 1024 or 2048 points) from the surface of the mesh. You should also apply preprocessing such as normalizing the point cloud (e.g., centering at the origin and scaling to unit sphere) so that all objects have a similar scale and position.

3. Convert the sampled point cloud into a dense voxel grid representation.

4. For a few object instances, generate 2D visualizations of the object in all three representations: mesh, point cloud, and voxel.  Include the figures in a side-by-side plot presentation, with appropriate caption/labels.

(Optional but recommended): You may also compute point normals if needed, but for this assignment we will primarily use just the coordinates (XYZ) as input features. PointNet/PointNet++ can use normals as additional features, but handling normals is not required here.

(Optional but recommended) Data augmentation: Consider applying random transformations to the point clouds for training robustness – for example, random rotations about the upright axis, slight jitter/noise to points, etc.


In [None]:
# [TODO] Your code here: Prepare the data
# Steps:
# 1. Select subset of categories
selected_categories = [ /* e.g., "Chair", "Table", "Lamp", ... */ ]

# 2. Collect UIDs for each selected category
selected_uids = {cat: lvis_annotations[cat] for cat in selected_categories}

# 3. Split into train/test lists for each category
train_uids = []
test_uids = []
train_labels = []
test_labels = []

# [TODO] your code here
# delete the below commented portion
# for label, cat in enumerate(selected_categories):
#     uids_list = selected_uids[cat]
#     # Shuffle and split
#     import random
#     random.shuffle(uids_list)
#     split_idx = int(0.8 * len(uids_list))
#     train_ids_cat = uids_list[:split_idx]
#     test_ids_cat = uids_list[split_idx:]
#     # Add to global lists with label
#     train_uids.extend(train_ids_cat)
#     test_uids.extend(test_ids_cat)
#     train_labels.extend([label] * len(train_ids_cat))
#     test_labels.extend([label] * len(test_ids_cat))

print(f"Total training samples: {len(train_uids)}; Total test samples: {len(test_uids)}")

# 4. Download and process training objects
NUM_POINTS = 1024  # number of points to sample per object
train_points = []
# [TODO] your code here
# for uid in train_uids:
    # path = objaverse.load_objects([uid])[uid]
    # mesh = trimesh.load(path)
    # # If the object is a scene with multiple geometries, merge them
    # if isinstance(mesh, trimesh.Scene):
    #     # Merge all geometry into a single mesh
    #     mesh = trimesh.util.concatenate([trimesh.Trimesh(vertices=g.vertices, faces=g.faces)
    #                                      for g in mesh.geometry.values()])
    # # Sample points uniformly from the surface
    # points = mesh.sample(NUM_POINTS)
    # # Normalize the point cloud (center and scale)
    # points = points - points.mean(axis=0)
    # points = points / np.max(np.linalg.norm(points, axis=1))
    # train_points.append(points.astype(np.float32))

# 5. Download and process test objects (similar to training, but no need for augmentation)
test_points = []
# [TODO] your code here
# for uid in test_uids:
#     path = objaverse.load_objects([uid])[uid]
#     mesh = trimesh.load(path)
#     if isinstance(mesh, trimesh.Scene):
#         mesh = trimesh.util.concatenate([trimesh.Trimesh(vertices=g.vertices, faces=g.faces)
#                                          for g in mesh.geometry.values()])
#     points = mesh.sample(NUM_POINTS)
#     points = points - points.mean(axis=0)
#     points = points / np.max(np.linalg.norm(points, axis=1))
#     test_points.append(points.astype(np.float32))

# Convert to PyTorch tensors and create DataLoaders
train_points = torch.tensor(np.array(train_points))  # shape: (N_train, NUM_POINTS, 3)
train_labels = torch.tensor(np.array(train_labels))
test_points = torch.tensor(np.array(test_points))
test_labels = torch.tensor(np.array(test_labels))

train_dataset = torch.utils.data.TensorDataset(train_points, train_labels)
test_dataset = torch.utils.data.TensorDataset(test_points, test_labels)
train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=32, shuffle=True)
test_loader = torch.utils.data.DataLoader(test_dataset, batch_size=32, shuffle=False)

print("Data preparation done.")
print(f"Train loader batches: {len(train_loader)}; Test loader batches: {len(test_loader)}")


Questions (analysis): Now answer the following based on your data preparation:

1. Which categories did you select for training, and how many samples does each have? (Provide the count per category in your training set.)

2. Describe the preprocessing steps you applied to the raw 3D data. Why is it important to normalize the point clouds (centering and scaling) before training?

3. Did you apply any data augmentations to the training point clouds? If yes, what types and why? If not, what augmentations could be beneficial for point cloud data?

4. Discuss any challenges you encountered in converting Objaverse models to point clouds (for example, issues with certain file types, very complex models, etc., and how you handled them).

## Task-5, Part 1 (15 points): Fine-tuning PointNet
We will now load a pre-trained PointNet model and fine-tune it on our Objaverse subset. In this part, we focus on object classification.


In [None]:
# Clone the PointNet/PointNet++ repository (if not already done)
!git clone https://github.com/yanx27/Pointnet_Pointnet2_pytorch.git
%cd Pointnet_Pointnet2_pytorch


Now, let's set up the PointNet model for classification. if you can load weights from a previous training (e.g., a model trained on ModelNet40 or ShapeNet), load them. If not, you can initialize from scratch.

In [None]:
import torch
from models import pointnet_cls

# Initialize PointNet model for classification with the number of classes in our subset
num_classes = len(selected_categories)
pointnet_model = pointnet_cls.get_model(num_classes, normal_channel=False)  # normal_channel=False since we are not using normals
pointnet_model = pointnet_model.cuda()  # move to GPU if available

# Load pre-trained weights (if available)
pretrained_path = 'path/to/pretrained/pointnet_model.pth'  # TODO: update with actual path if you have it
try:
    pointnet_model.load_state_dict(torch.load(pretrained_path))
    print("Loaded pre-trained PointNet weights.")
except FileNotFoundError:
    print("Pre-trained weights not found, proceeding with random initialization.")


Before training, let's prepare our training loop. We will use an optimizer (e.g., Adam) and a loss function (cross-entropy for classification). We will also log training progress to TensorBoard for visualization of loss/accuracy curves.

In [None]:
import torch.optim as optim
import torch.nn.functional as F
from torch.utils.tensorboard import SummaryWriter

# Set up optimizer and loss
optimizer = optim.Adam(pointnet_model.parameters(), lr=0.001)
num_epochs = 5  # you can increase this for better results if time permits

# TensorBoard writer for logging
writer = SummaryWriter(log_dir="logs/pointnet_finetune")

# Training loop
for epoch in range(num_epochs):
    pointnet_model.train()
    # [TODO] your training code here

    # Evaluate on test set
    pointnet_model.eval()
    # [TODO] your test set eval code here

    # Log to TensorBoard
    writer.add_scalar('PointNet/Loss', train_loss, epoch)
    writer.add_scalar('PointNet/Train_Acc', train_acc, epoch)
    writer.add_scalar('PointNet/Test_Acc', test_acc, epoch)
    print(f"Epoch {epoch+1}/{num_epochs}, Train loss: {train_loss:.4f}, Train acc: {train_acc:.4f}, Test acc: {test_acc:.4f}")


After training, let's evaluate the fine-tuned model on the test set and compute the overall accuracy:

In [None]:
pointnet_model.eval()
correct = 0
total = 0
all_preds = []
all_labels = []
with torch.no_grad():
    for data in test_loader:
        # points, labels = data
        # points = points.transpose(2,1).cuda()
        # labels = labels.cuda()
        # preds, _ = pointnet_model(points)
        # _, predicted = torch.max(preds.data, 1)
        # total += labels.size(0)
        # correct += (predicted == labels).sum().item()
        # all_preds.extend(predicted.cpu().numpy().tolist())
        # all_labels.extend(labels.cpu().numpy().tolist())

pointnet_test_accuracy = correct / total
print(f"PointNet test accuracy on Objaverse subset: {pointnet_test_accuracy:.4f}")


(If you want to monitor training progress, you can run the following in a separate cell to launch TensorBoard: %load_ext tensorboard then %tensorboard --logdir logs)

Questions:

1. What test accuracy did you achieve after fine-tuning PointNet on your Objaverse subset?

2. Is the model overfitting or underfitting? (Hint: compare training vs test accuracy, and also consider the trend of the loss curves)

3. If you started with pre-trained weights, how did the fine-tuning progress compare to what you would expect from training from scratch? (If you didnt have pre-trained weights and trained from scratch, comment on how quickly the model learned and if more epochs would likely improve results.)

## Task-5, Part 2 (15 points): Fine-tuning PointNet++

Now we will fine-tune PointNet++, on the same data and compare its performance to PointNet. Set up the PointNet++ model for classification. The repository we cloned has implementations for PointNet++ (single-scale grouping (SSG) and multi-scale grouping (MSG); we can use SSG for simplicity). We will load a pre-trained PointNet++ model if available (for example, one trained on ModelNet40), similar to PointNet above.

In [None]:
from models import pointnet2_cls_ssg

# Initialize PointNet++ (SSG) model
pointnet2_model = pointnet2_cls_ssg.get_model(num_classes, normal_channel=False)
pointnet2_model = pointnet2_model.cuda()

# Load pre-trained weights if available
pretrained_path = 'path/to/pretrained/pointnet2_model.pth'  # TODO: update if available
try:
    pointnet2_model.load_state_dict(torch.load(pretrained_path))
    print("Loaded pre-trained PointNet++ weights.")
except FileNotFoundError:
    print("Pre-trained weights not found for PointNet++, using random init.")


Now train the PointNet++ model on the Objaverse subset, using a similar training loop as before. (Note: PointNet++ has more parameters and may train slower. You might consider using fewer epochs or a lower learning rate for fine-tuning.)

In [None]:
optimizer = optim.Adam(pointnet2_model.parameters(), lr=0.001)
writer2 = SummaryWriter(log_dir="logs/pointnet2_finetune")
num_epochs = 5  # adjust as needed

for epoch in range(num_epochs):
  # [TODO] your code here
    # train loop

    # # Evaluate on test set

    # writer2.add_scalar('PointNet++/Loss', train_loss, epoch)
    # writer2.add_scalar('PointNet++/Train_Acc', train_acc, epoch)
    # writer2.add_scalar('PointNet++/Test_Acc', test_acc, epoch)
    # print(f"[PointNet++] Epoch {epoch+1}/{num_epochs}, Train loss: {train_loss:.4f}, Train acc: {train_acc:.4f}, Test acc: {test_acc:.4f}")


After fine-tuning, evaluate PointNet++ on the test set:

In [None]:
pointnet2_model.eval()
correct = 0; total = 0
# [TODO] your code here
# all_preds2 = []; all_labels2 = []
# with torch.no_grad():
#     for data in test_loader:
#         points, labels = data
#         points = points.transpose(2,1).cuda()
#         labels = labels.cuda()
#         preds = pointnet2_model(points)
#         _, predicted = torch.max(preds.data, 1)
#         total += labels.size(0)
#         correct += (predicted == labels).sum().item()
#         all_preds2.extend(predicted.cpu().numpy().tolist())
#         all_labels2.extend(labels.cpu().numpy().tolist())

pointnet2_test_accuracy = correct / total
print(f"PointNet++ test accuracy on Objaverse subset: {pointnet2_test_accuracy:.4f}")


Questions:

1. What test accuracy did PointNet++ achieve after fine-tuning?
Compare PointNet++ and PointNet results: which model performed better on the Objaverse subset, and by how much (in terms of accuracy or other metrics)?

2. Why do you think PointNet++ might perform differently than PointNet on this task? Consider the architectural differences (hierarchical local feature learning in PointNet++ vs. global feature in PointNet) and the complexity of the shapes.

3. Did PointNet++ show any signs of overfitting or underfitting? Compare its training vs test performance, and possibly how quickly it converged relative to PointNet.

## Task-6 (15 points): Visualize results and evaluate metrics



Now that we have two models (PointNet and PointNet++) fine-tuned on our Objaverse dataset, let's visualize and quantify their performance further.

1. Confusion Matrix: Compute the confusion matrix for the best model on the test set. Plot or display the confusion matrix, with proper labels for the categories on axes.

2. Precision, Recall, F1-score: Using the test predictions, compute the precision, recall, and F1-score for each category, as well as the overall or average values. You can use sklearn.metrics.classification_report or calculate manually from the confusion matrix.

3. Training Curves: If you logged data to TensorBoard, you should have seen the training and validation accuracy curves. Summarize what the curves showed: did the models' performance plateau early or keep improving? Did one model converge faster than the other?

4. Sample Predictions: (Optional) Pick a few test examples and visualize their point clouds, noting the model's predicted label vs. the true label. Are there any interesting cases of misclassification? (For example, an object that was predicted as a different class - why might the model have confused them?)


Complete the code to compute the confusion matrix and classification metrics:


In [None]:
from sklearn.metrics import confusion_matrix, classification_report

# Choose the model to evaluate (PointNet++ vs PointNet) based on which you want to analyze
y_true = all_labels2  # true labels from test set (for PointNet++ in this example)
y_pred = all_preds2   # predicted labels from test set (PointNet++)
labels = list(range(num_classes))

# Confusion matrix
cm = confusion_matrix(y_true, y_pred, labels=labels)
print("Confusion Matrix (rows=true, cols=pred):")
print(cm)

# Classification report (precision, recall, F1 per class)
target_names = [cat for cat in selected_categories]
report = classification_report(y_true, y_pred, target_names=target_names)
print("\nClassification Report:")
print(report)


(You may visualize the confusion matrix as a heatmap for clarity if you wish, using matplotlib or seaborn. Ensure the axes are labeled with the category names.) Questions:

1. Include the confusion matrix or describe it: which categories are most often confused with each other, based on the confusion matrix? Does this make sense (are those categories visually or geometrically similar)?

2. Report the precision, recall, and F1-score for each class. Did the model perform evenly across all classes, or are some classes much better/worse? Provide possible explanations (e.g., class imbalance, shape complexity, etc.).

3. Discuss the training curves from TensorBoard: did the training and test accuracy diverge (sign of overfitting) or track closely? Did you notice one model converging faster?

4. Provide 1–2 examples of model predictions (if you examined some): were there any misclassifications that stood out, and why do you think the model made those errors?

## Task-7 (15 points): Training from scratch vs. fine-tuning (analysis)

 In this final part, reflect on the differences between training a model from scratch and fine-tuning a pre-trained model for 3D classification.

1. If you fine-tuned from a pre-trained model, how did it benefit your training? Consider factors like initial accuracy on first epoch, speed of convergence, and final accuracy achieved, compared to what you would expect if starting from random initialization.

2. If you have time, you can perform a small experiment: take the PointNet++ model and train it from scratch (random initialization) on the Objaverse subset for the same number of epochs, and compare the learning curve and final accuracy to the fine-tuned version. (This is optional; if you don’t run a full experiment, answer conceptually.)

3. Why is fine-tuning particularly useful when working with a very large dataset like Objaverse or when the number of training samples per class is limited?

4. Compare the two models (PointNet vs PointNet++): which one would you choose if you had a limited computational budget, and which one if you needed the highest accuracy? Explain your reasoning in terms of model complexity (number of parameters, etc.) and performance.
