This is a working example for training and evaluating a RocNet model from a LiDAR dataset.
Some prerequisites need specific versions, limited by Open3D and by the python versions supported by the cluster.:
- Python 3.11: Required by Open3D 0.18
- CUDA (this example, and also
setup.[bat,sh]assumes CUDA 11.8) open3d==0.18.0: Was the stable version during RocNet developmentnumpy==1.26.4: Required for compatibility with Open3d 0.18 (you'll get some fun silent crashes if you use a newer version with Open3D 0.18)
- Install python 3.11 (here)
- Install CUDA 11.8 (here)
- Acquire this repository
git clone --depth 1 https://altitude.otago.ac.nz/rocnet/rocnet-example.git - Run
setup.bat(windows/cmd) orsetup.sh(linux or windows/git-bash) - Download example-data.zip (approx 1.3GB), this is a set of source
.lazfiles, a dataset which supports voxel grid resolutions up to 128, and a training run (including model weights) for a model that uses 64-grid inputs. It contains three subfolders:laz- a colletion of.lazfilesdataset- a dataset of tiles created from the.lazfilesweights- a training config file, and an example training run with a set of model weights and training progress snapshots
- Copy the
datafolder insideexample-data.ziptorocnet/example/data
To use the example scripts make sure that the virutal environment from step 3 above is active, and then invoke the train/test/examine scripts like this:
# Plot loss curve of training run, print some info about the resulting model
python examine_training_run.py ./data/weights/
# Load some tiles from the test dataset, encode/decode them, print compression ratio
# and some meterics of lossiness, and visualise the original and recovered
# Use the N and B keys to cycle through the example tiles
python test_tile.py ./data/weights
# Load some tiles from the test dataset, encode/decode them, print compression ratio
# and some meterics of lossiness, and visualise the original and recovered
python test_file.py ./data/weights --visualise
# Start a new training run with the configuration in ./data/weights/train.toml
python train.py ./data/weightsUsage pattern for any script goes like this (using train.py as an example):
- Activate the python environment
- run
python train.py $TGT_WORK_DIR- To run on a SLURM login node you'll need to run
export OMP_NUM_THREADS=1first
- To run on a SLURM login node you'll need to run
- If
$TGT_WORK_DIR/train.tomlexists then training run and model config is loaded from that file and a training run is started. - If
$TGT_WORK_DIR/train.tomldoes not exist it will be created and populated with default values, the script will then exist and prompt you to modify the defaul values and re-run step 1
The above training example is the same for the other scripts, however they load different config files depending on what they do
examine_dataset.pyexpectstrain.tomltile.pyexpectstile.tomltest_file.py,test_tile.pyandexamine_training_run.pyexpecttest.toml
Default values for config are described in the rocnet package, except for tile.toml which is defined in tile.py.
Class, function, and module documentation is in the docstrings.
All scripts accept a --help argument which will provide brief invocation instructions.
These instructions will get you something like the example data, but perhaps with a larger input dataset for training and tiling.
- Acquire LIDAR data (e.g. from opentopography)
- Run
python tile.py $TGT_OUT_DIRto create a dataset in$TGT_OUT_DIR, the script will exit, and you should edit$TGT_OUT_DIR/tile.toml:input_dirshould point to the folder containing the laz files acquired in step 1 (e.g../data/laz/)grid_dimandvox_sizeshould be chosen so that most of the scan fits within the height ofgrid_dimandvox_sizeshould be chosen so that continuous surfaces produce continuous 'shells' of occupied voxels.grid_dimmust be a power of two.- Ensure that the relevant transforms are added (especially for smaller
.lazscans) - Set
clean: trueif you need to clean outliers and noisy points from the input data (e.g. if you didn't use the 'exclude noise' option when) See the note abouttransformsintile.pyan decide if you need to add any
- Re-run
python tile.py $TGT_OUT_DIRto create the tiled dataset (this will probably take some time.) - Run
python train.py $TGT_WORK_DIR, edittrain.tomlso that:dataset_pathto point to the dataset foldergrid_dimshould be a power of two, and less than or equal to the datasetgrid_dim- You may need to change
max_samples, and/orbatch_sizedepending on your hardware, datset size,grid_dim, and model config (in the[model]section of the config file)
- Re-run
python train.py $TGT_WORK_DIRto start a training run, it'll create atrain_<TIMESTAMP>folder which will contain a bunch of stuff, including the log file and snapshots of the model weights and model loss values during training.- If you hit out-of-memory errors check the log file to see how much is being used, and modify
batch_size,max_samples, or some of the model parameters to reduce memory consumption.
- If you hit out-of-memory errors check the log file to see how much is being used, and modify
- After training is finished, you can use
python examine_training_run.py $TGT_WORK_DIR(repeating the usage pattern of editingtest.toml) to see the loss graph, orpython test_tile.py $TGT_WORK_DIRto visualise individual original and encoded/recovered tilespython test_file.py $TGT_WORK_DIR --visualiseto compute some lossiness and compression ratio metrics, and also to visualise the original file and the encoded/recovered file (with the file(s) specified intest.toml)
Set up the necessary software, pyenv, and get the dataset:
- Get this repository
- Check which versions of python and CUDA are installed, check that the pytorch
index_urlinsetup.shis correct for your version of CUDA - Run setup.sh to get all the prerequisites
- Transfer a dataset to the cluster (
$DATASET_DIR)
To create a rocnet $TGT_WORK_DIR, follow the General Usage instructions and edit the resulting train.toml.
Copy slurm-template.sh there, rename it to slurm.sh and fill in all placeholders denoted by {{double-curly-braces}}.
To start a training run cd $TGT_WORK_DIR and then sbatch slurm.sh which will produce a slurm log file at slurm-{job-num}.out in $TGT_WORK_DIR.
Use squeue --me to see the status of the job, how long it's been running, etc.
