Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions about applying SPG to our own point cloud data #11

Closed
sycmio opened this issue Mar 13, 2018 · 19 comments
Closed

Questions about applying SPG to our own point cloud data #11

sycmio opened this issue Mar 13, 2018 · 19 comments
Labels

Comments

@sycmio
Copy link

sycmio commented Mar 13, 2018

Hello,
Thank you very much for your great work! It really impressed us and we want to try your code on our own large scale outdoor point cloud data. Here I have some questions:

  1. In which folder should I put your Sementic3D pre-trained model? Seems that it is mentioned in the readme.
  2. Currently there is no RGB information in our data. We plan to just add 0 0 0 after each point's position to match Sementic3D data format. Is that OK?
  3. If we want to directly test your pre-trained model on our new data, do we still have to download all the train data from Sementic3D and do the training for 1 epoch before testing? I suppose this is what happens with your current command line instruction.

Looking forward to your reply!

@loicland
Copy link
Owner

loicland commented Mar 14, 2018

Hi, thank you for your interest in our project.

  1. it doesn't matter when you put it as long as you add the correct path as the --odir argument. I would recommend something like /superpointgraph/results/your_dataset. Please do note that the pretrained model (model.pth.tar) is in a format directly readbable by pytorch, and need not be untarred.

  2. You can put 0 0 0 indeed, but I am afraid that it will harm the performance a lot, since the pretrained PointNet mix the color information directly with the position to create embeddings. If you want to use the semantic3d ground truth to train out model, I would recommend training from scratch on semantic3d_train and withholding the RGB values, it should take a couple hours tops. Download the semantic3d training set, partition it, and then type

python partition/partition_Semantic3D.py --SEMA3D_PATH $SEMA3D_DIR

to preprocess the data set. Then

CUDA_VISIBLE_DEVICES=0 python learning/main.py --dataset sema3d --SEMA3D_PATH $SEMA3D_DIR \
--db_train_name trainval --epochs 500 --lr_steps '[350, 400, 450]' --test_nth_epoch 100 \
--model_config 'gru_10,f_8' --ptn_nfeat_stn 8 --pc_attribs "xyzelpsv" --nworkers 2 \
 --odir "results/sema3d/sema_3d_model"

to train on semantic with no l̶a̶b̶e̶l̶ color. And then run your test on your own dataset (with 000 color) using

CUDA_VISIBLE_DEVICES=0 python learning/main.py --dataset your_data_set --YOUR_DATA_SET_PATH $YOUR_DATA_SET_DIR --epochs -1 --model_config 'gru_10,f_8' -- --pc_attribs "xyzelpsv --ptn_nfeat_stn 8 --nworkers 2 --odir "results/sema3d/sema_3d_model" --resume RESUME

Note that in order for this command to work you should first write have partitoned your data set using an adpated partition function, and have a perprocessing function for your dataset, see issue#6 . Note that as long as the format of your data is the same as Semantic3D or ply, the changes will be minimal (mostly changing directory names). I will soon code a function to help launch the code on new datasets, but not before a couple weeks.

  1. If you really want to directly try our model on your data without retraining from scratch, you don't need to download semabntic3d (set n_epochs to -1). But again, I really advise against it.

Let us know if you encounter problems in this endeavor.

loic

@sycmio
Copy link
Author

sycmio commented Mar 14, 2018

Hi loic,

Thank you for your prompt reply! I will give it a try in the near future. There are two things that I would like to make sure I understand it correctly:

  1. Do I still need to change scripts in partition and preprocessing parts if our data format and segmentation classes are exactly the same as Sementic3D? I understand that if we want to get a better result then we need to modify your script and do the training from scratch. But currently we are just exploring different methods and make a comparison. I think in this case we just need to change --SEMA3D_PATH parameter to our own data folder. Is that correct?

  2. What do you mean by "train on semantic with no labels"? I suppose the training data of Semantic3D contain the sementic label of each point.

Best,
Yongchi

@loicland
Copy link
Owner

Hi,

1 - if you have the exact same format and you just want to do inference it should work yes. You will still be required to partition, then preprocess, then test (with n_epochs = -1). If your folders are named differently than 'train',' test_reduced', and 'test_full' you should adapt the first lines of get_datasets in learning/sema3d_dataset.py but that should be straightforward.

Again, it will likely perform badly since the embedings intrinsically depend on color.

2 - I made a typo, I meant no color.

@sycmio
Copy link
Author

sycmio commented Mar 14, 2018

Hello loic,

Thanks for your answer! I tried your code and made it run successfully. Currently I follow all the dataset structure of Semantic3D (although I only have one txt file, I copy it for 3 times and put them into train/test_full/test_reduced). I only changed the first two lines in get_datasets to match our txt file name. And one question that I have now is that where can I find the segment result of the whole point cloud? I try to run write_Semantic3d and it indeed produce a _pred.ply. However, the point number is drastically reduced (originally we have one million points, but there are only 300 points in _pred.ply). Is it possible to get the semantic label for each original point?

@loicland
Copy link
Owner

loicland commented Mar 15, 2018

Hi,

good to know that it somehow worked!

if you run learning/main.py with n_epochs -1 it will produce the files predictions_val.h5 in the argument of ---odir , which contains the prediction for each superpoint (of which you apparently have 300, which is a bit low).

if you run:
python partition/write_Semantic3D.py --SEMA3D_PATH $SEMA3D_DIR --odir "results/sema3d/fodler_containing_predictions_val.h5" --db_test_name testred
it will produce a .labels```` file in /labels``` with the labels for each point in the original point cloud (so here, 1 000 000), as specified by semantic3d.com.

if you want to visualize the results (on a pruned point cloud), use:

python ./partition/visualize.py --dataset sema3d --ROOT_PATH $SEMA3D_DIR --res_file "results/sema3d/fodler_containing_predictions_val.h5" --file_path 'test_reduced/name_of_your_file' --output_type ifpr

with output_type a: 'i' orginal image, 'f' the geometric features, 'p' the partition (i highly suggest you run this one to make sure the partition worked) and 'r' for the results file.

if you don't want to work with a pruned cloud set --voxel_width 0 in your partition command.

EDIT: I realize that you might have done that already. In which case, I think the problem is that you overpruned the initial data when partioning. Is your scan of very small extent?

For now, you could try to fix it by relaunching your partition with --voxel_wifth 0 to prevent pruning. Then, I will soon change /partition/visualize.py so that it can output the result on the original file and not the pruned one. probably next week

@sycmio
Copy link
Author

sycmio commented Mar 15, 2018

Yes, I have already got the _pred.ply file (which only contains 300 points) in my root/data folder after I run python partition/write_Semantic3d.py --SEMA3D_PATH $SEMA3D_DIR --odir "results/sema3d/trainval_best" --db_test_name testfull and I also got a label file (which contains 1 million labels for each point) under root/labels folder. I had also tried to relaunch partition with python learning/sema3d_dataset.py --SEMA3D_PATH $SEMA3D_DIR --voxel_width 0, but got an error: unrecognized arguments: --voxel_width 0. Am I doing something wrong here?
Again, thanks for your patience and help!

Update: Another thing that confuses me is that after running write_Sementic3d.py I can get one _pred.ply under $SEMA3D_DIR/data/test_full/, and after running visualize.py I can get four .ply files under $SEMA3D_DIR/clouds/test_full/. All of these 5 ply files (1 under $SEMA3D_DIR/data/test_full/ and 4 under $SEMA3D_DIR/clouds/test_full/) contain 300 superpoints. Could you please tell me the difference between them?

@loicland
Copy link
Owner

Hi,

_rgb.ply is the input point cloud after prning
_GT.ply is the point cloud labelled with the ground truth
_geof.ply is a color coding of the computed geometric features
_partition.ply represents the partition, with a random color for each component
_pred.ply is the prediction file

All of these files are for the pruned dataset. It seem like your point cloud only contains 300 points after subsampling with a 5cm grid, so I assume it is very small??

--voxel_width is an argument of python partition_Semantic3d.py:

partition/partition_Semantic3D.py --SEMA3D_PATH $SEMA3D_DIR --voxel_width 0

@sycmio
Copy link
Author

sycmio commented Mar 20, 2018

Yes, I just choose a small part of our point cloud to test. Could you please tell me what does the color of points in _pred.ply mean? Are you assigning each label with a different color (e.g. [255 0 0] for man-made terrain)?
Also, I will try to visualize the label file that I get and report the performance on our dataset. Thank you very much!

@loicland
Copy link
Owner

SPG is designed for semantic segmentatin of large scenes, and is not really well suited for object or small scenes. You could directly use PointNet for that.

See Figure 3 in the paper, or the function get_color_from_label in /partition/provider.py

@maximiliangoettgens
Copy link

maximiliangoettgens commented Mar 20, 2018

Hi,

First of all: @loicland : Great work! Really appreciate the effort.

I was following the exact same approach as @sycmio to try to feed data from the KITTI odometry dataset (LiDAR) to SPG. I registered sets of 50 single LiDAR shots in order to get a 'Scene' with a more dense, richer representation of the environment - especially for a more 'roundup' representation of objects using shots of different view angles as the car proceeds through the street. (sample scene in link below)

I trained SPG as instructed in your comment above withholding RGB.
1.1) Please confirm PC_attribs 'xyzelpsv' means xyz = position; lpsv = linearity, planarity, verticality, scattering according to your paper; e = elevation from groundplane (?)

1.2) Do you make use of the fourth value in Sema3D data? I guess it is intensity. In KITTI I have reflectance values but they seem not comparable and are on a completely different scale so I was withholding these (set all to 0) for now.

2.1) With a model trained on those attributes I get less performance on Semantic3D test_full dataset but still reasonably well accuracy - can you please specify what exactly you mean by 'embedings intrinsically depend on color' ? Is this relevant to generating the SPG or PointNet training? As far as I am educated, PointNet does only optinally include color

2.2) With that same model trained I get extremely bad results on KITTI Scenes - almost everything is classified Building whereas some Trees are classified correctly (result .ply file in link below) but it is the minority - all other classes are missing entirely.

3.1) Partitioning works reasonably well on KITTI data. Generally, the partitioned areas are to large and sometimes overlap different classes but overall I guess it is quite usefull (maybe with some parameter tuning) (partitioned sample also in link below)

3.2) Partitioning quality degraded dramatically when the data was subsampled using a voxelgrid filter (tried different sizes)

At this point I am trying to investigate what causes this performance drop - some ideas are:

  1. Different point densities in Sema3D vs. KITTI
  2. Problem in detection of ground plane and therefore elevation measurements
  3. Overall Scene Height as an issue since the velodyne 64 (especially when set up like in KITTI dataset) provides only data up to an height of ~3-4m

Since the dependency on RGB appears not to be a showstopper and the partitioning shows fair results I am in good hope the framework will adapt to KITTI. Any comment is greatly appreciated.

Best,
Max

PS: Regarding my Scenes: As mentioned it is 50 shots registered into one scene - no downsampling or voxelgrid filter since this has harmed the partitioning performance a lot. (EDIT: Just to be clear: I didnt use a voxelgrid filter before partitioning with the SPG framework - the filter in the partitioning script of SPG is still set to 5cm) Attached is the original input PC (with intensity and RGB set to 0) as a .txt file and all intermediate/end results from SPG.
( https://drive.google.com/open?id=1L6pNT1KCc4SNiadptwkkX6CvFUIcOoop )

@loicland
Copy link
Owner

loicland commented Mar 20, 2018

Hi maximilian

Very interesting stuff here.

I trained SPG as instructed in your comment above withholding RGB.

So you retrain on Semantic without RGB from scratch right? No fine tuning on your data set though?

1.1) yes for xyzlpvs. e is actually the z coordinate divided by 100 (see line 91 of learning/sema3d_dataset.py). In semantic, the captor is at (0,0,0). Looking at your example the z value seems very different (goes from -50 to 50? in feet?). The problems stems from this most likely. More on this below.

1.2) We do not use the intensity, as it was very noisy on semantic3d.

2.1) By ''embedings intrinsically depend on color' I discourage trying to apply directly the models trained on Semantic3D with RGB on data without RGB. I encourage to train a model from scratch on the semantic dataset without RGB. It seems to me that it is what you did?

The original PointNet does only include color but ours does, as well as lpsv values. This can be altered though, by using --pc_attribs xyzelpsv

2.2) The bad performance is IMO due to the elevation and possibly the scale as well. Do you know what units the x y and z correspond to? There are some artifacts below the road which might throw off the scaling. A quick on dirty fix would be to rescale your z to that the road is around 0, and the average height of buildings is around 0.2 (=20m/100).

Ideally I should implement a RANSAC based groundplane extraction + smart normalization. Will try to do it next week.

3.1) Partitioning seems okay-ish, but I would try to decrease the --reg_strength parameter. try 0.3-0.5 maybe

3.2) Partitioning might not mesh well with velodyne 64

At this point I am trying to investigate what causes this performance drop - some ideas are:
Different point densities in Sema3D vs. KITTI

Could be, but the subsampling of superpoints should mostly mitigate that

Problem in detection of ground plane and therefore elevation measurements
Overall Scene Height as an issue since the velodyne 64 (especially when set up like in KITTI dataset) provides only data up to an height of ~3-4m

Yes, the problem stems most likely from the normalization of z. And yes, also the lack of ground plane detection algorithm.

Let me know if it gets better once you rescaled z properly and decrease the reg_strength!

loic

@maximiliangoettgens
Copy link

maximiliangoettgens commented Mar 21, 2018

Hi Loic,

Very interesting stuff here.

indeed interesting! I think we are on a good track here generalizing SPG to a (much more common) velodyne LiDAR. Further steps to think of might be more GPU support in partitioning to make the code run faster for e.g. robotics applications.

So you retrain on Semantic without RGB from scratch right? No fine tuning on your data set though?

That is correct, Sir.

1.1) As far as I know, the scale is 1m = 1 unit. However, the data was in a camera coordinate frame so the Z-axis that you were inspecting was actually pointing along the street. With your explanation of the height dependencies, the poor performance makes perfectly sense. If you look at the _geof.ply on Google Drive it actually recognizes planes normal to driving direction as ground planes. At this point: Can you explain in detail the exact color coding of the _geof.ply files?

1.2) OK

1.3) Yes, that is what I did.

2.2) I think in first place the problem stem from the wrong coordinate frame as mentioned above, yes. I can try to further increase the performance by rescaling but I guess more potential lies in different ideas mentioned for now.

Ideally I should implement a RANSAC based groundplane extraction + smart normalization. Will try to do it next week.

Cool.

3.1) I will try playing around with the reg_strength (had no time yet but seems promising)

3.2) One may observe that the performance of the partitioning works better in areas further from the center. So possible reasons for now might be:

  • Still the higher point density in these areas
  • More noise / much less sharp edges

I will try to tune the registration algorithm even further to reduce noise and play around with how many PCs I register (maybe 10 instead of 50 works better)

Now some good news: With just the transformation of the coordinate frame (rotation into global frame + manually set ground plane to z=0) and removal of artetacts below ground plane we have a working baseline for optimization (see picture + files). As for the partitioning, the performance of segmentation performs much better in sparse regions. (see pictures) I think the large low vegetation area in the middle is due to high noise / poor segmentation mentioned in 3.2. Unfortunately the segmentation of cars and road performs particularly bad. Any thoughts?

I will keep tuning parameters for now and get in touch once I have any further questions/results. I guess there is high potential in this endeavour. Thank you very much for the help up to this point, though.

Best,
Max

Files: https://drive.google.com/open?id=19iYOlx5zKMHQPkkvjciREOIisZRpcUgZ

dense_region_1
sparse_region_1
sparse_region_2
sparse_region_3

@loicland
Copy link
Owner

loicland commented Mar 21, 2018

Further steps to think of might be more GPU support in partitioning to make the code run faster for e.g. robotics applications.

We are working to CPU-parallelizing the cut pursuit algorithm, but it should take a few month at least. Running it on a GPU might be possible but tricky and beyond my abilities. Other partition algorithm could be used however.

the data was in a camera coordinate frame so the Z-axis that you were inspecting was actually pointing along the street.

Aah well there it is then. The z axis is particularized in many different step of the algorithm : computation of the verticality feature, of the elevation, of the superedge features, as a rotation axis of the SPG random rotation augmentation scheme etc... It will impact the features computation, the partition, the embeding and the edge filters, i.e. everything. To be honest I am surprised it was working at all!

You absolutely should switch the rows 2 and 3 in your .txt, or in your data reading function.

At this point: Can you explain in detail the exact color coding of the _geof.ply files?

Yes, I will complete the help on the subject. Red = linearity, Green = planarity, Blue = Verticality. Consequently, the road should be lime green in _geof.ply, and not the walls, I should have noticed that.

Now some good news: With just the transformation of the coordinate frame (rotation into global frame + manually set ground plane to z=0) and removal of artetacts below ground plane we have a working baseline for optimization (see picture + files).

So you switched y and z already? Did you divide the elevation by 100 as well as in sema3d_dataset.py l91? Can you post the geof.ply file so I can check the geof are as should be?

As for the partitioning, the performance of segmentation performs much better in sparse regions. (see pictures) I think the large low vegetation area in the middle is due to high noise / poor segmentation mentioned in 3.2

Maybe you should try voxelizing again. Semantic has 5 cm voxelization! Or you could fine tune the model if you have some ground truth available.

Unfortunately the segmentation of cars and road performs particularly bad. Any thoughts?

I will be able to tell you more if I can see the _geof.ply and the _partition.ply

@maximiliangoettgens
Copy link

maximiliangoettgens commented Mar 21, 2018

Hey Loic,

We are working to CPU-parallelizing the cut pursuit algorithm, but it should take a few month at least. Running it on a GPU might be possible but tricky and beyond my abilities. Other partition algorithm could be used however.

Nice to hear, lets not get too much into detail here - just one last question: What is the most time consuming step according to your experience? I guess it is the nearest neighbour search? I found some works on GPU optimized NNS already - maybe worth a shot.

Aah well there it is then. The z axis is particularized in many different step of the algorithm

Yes, definetly. To be honest, I just forgot about that transformation while dealing with tons of conversions and registration.

You absolutely should switch the rows 2 and 3 in your .txt, or in your data reading function.

Yes, I did that already.

Red = linearity, Green = planarity, Red = Verticality.

I guess you mean Blue = Verticality? Please confirm.

So you switched y and z already? Did you divide the elevation by 100 as well as in sema3d_dataset.py l91?

I switched the axes, yes, but did not do the division by 100, yet. Can you explain in more detail what is the benefit of that scaling(PointNet takes elements on unit scale as an input?) ? As you can see, my buildings are mostly just around 5m high due to the limitations of the velodyne LiDAR - if I understood correctly, you would try rescaling with parameter 25 (5/25 = 0.2) in line 91 ?

Can you post the geof.ply file so I can check the geof are as should be?

The files are attached already in the comment above (Google Drive Link) Does the Link work for you?

Maybe you should try voxelizing again. Semantic has 5 cm voxelization!

What's the thought behind this? Do you mean the raw Semantic3D has 5cm voxelization before being fed into SPG framework or while partitioning within the framework already?

Or you could fine tune the model if you have some ground truth available.

Unfortunately, I don't.

It would be nice if you leave some thoughts about the _geof and _partition!

Best,
Max

@sycmio
Copy link
Author

sycmio commented Mar 21, 2018

Note from @loicland : moved to #15

@sycmio
Copy link
Author

sycmio commented Mar 21, 2018

@loicland Sure. I have already opened a new issue #15.

@loicland
Copy link
Owner

loicland commented Mar 21, 2018

hi @maximiliangoettgens

What is the most time consuming step according to your experience? I guess it is the nearest neighbour search? I found some works on GPU optimized NNS already - maybe worth a shot.

We are about to release a new version of the article with extended studies of the computation times. The computation of the nn and Voronoi neighborhood do take a significant part of the time, and are not optimized at all. Improving it would be great!

The pruning and feature computation are quite fast, the partition is slowish but will be 10x faster once parallelized. If speed is really an issue, sub-sampling works great, and tends to increase the accuracy as well by decreasing the geometric and radiometric noise.

I guess you mean Blue = Verticality? Please confirm.

Confirmed and corrected above.

I switched the axes, yes, but did not do the division by 100, yet. Can you explain in more detail what is the benefit of that scaling(PointNet takes elements on unit scale as an input?) ?

Our PointNet can potentially take more than just xyz rgb but elevation, and lpsv as well.

As you can see, my buildings are mostly just around 5m high due to the limitations of the velodyne LiDAR - if I understood correctly, you would try rescaling with parameter 25 (5/25 = 0.2) in line 91 ?

Semantic3D has more range in height, so we divided by 100. Might not have been the smartest choice, but the value line 91 should be the same when you train on Semantic and infer on your cloud.

What's the thought behind this? Do you mean the raw Semantic3D has 5cm voxelization before being fed into SPG framework or while partitioning within the framework already?

No, our code starts by pruning (with the --voxel_width argument). Since you didn't change it and its default is 5cm for partition_semantic3d.py, you can see that your files are reduced by almost a factor 3 (6 to 2 millions), due to the 5cm pruning. So it's all good.

It would be nice if you leave some thoughts about the _geof and _partition!

So I've been looking at it in details. If you check a _geof.ply file from Semantic3D and from your cloud you can see they are quite different. Since Semantic uses a highly precise fixed LiDAR, its acquisition is way more precise. As a results, the road is very "flat" whereas yours is almost 40 cm "deep". Same for the buildings. Consequently, the planarity of roads and façades is too low, the scattering too high and the verticality too high/low respectively. Hence, the algorithm thinks it is seeing low volumetric slightly vertical objects everywhere : bushes, the yellow class.

Without any kind of ground truth for fine tuning, the only thing I can think of would be to only use 'xyze' for the pointnets (--pc_attribs xyze --ptn_nfeat_stn 4) and hope for the best. I am running this right now, will let you know how it goes.

EDIT: after a quick and dirty test it works slightly better, see ply file. However, we have the following mistakes, wich will be hard to overcome:

a) confusion between road and grass (no color + volumetric road = its thinks its grass)
b) confusion between cars and bushes (effect of the fuzzy acquisition)

for a) you could just merge the classes road and grass directly in the training on semantic3D. For the rest, I think only fine tuning will help.

of interest: the partition makes the quick annotation of data sets easier.

@maximiliangoettgens
Copy link

super-quick update on this one since I have to leave now: I tried without registering pointclouds at all (raw lidar shots) and it seems that the cavehat is in the registration step (where the blurriness comes into play) - crisp ground detection and fair segmentation. The background on the registration was that pointnet seemed very robust to uniform distributed point-dropout, but dropped performance a lot when removing faces (e.g. LiDAR view angle problem - only one face of object visible) However, I think for this particular framework that does not play much of a role (not sure though) - attached is a trial on a single LiDAR shot - no more parameters optimized https://drive.google.com/open?id=1JwpblgPuXixvtnESPSFzwQUKW3rQZCUI not to mention that the performance is super fast on such few points (full pipeline incl. visualization~ 20sec on i5 4670k)

@loicland
Copy link
Owner

Great. You could try to increase --k_nn_adj in partition.py to avoid the "line effect" in the periphery.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants