Parallel Domain Educational Suite

The Parallel Domain platform contains a rich and diverse set of tools for generating and using synthetic data for machine learning model training in the autonomous vehicle and drone spaces. Data Lab allows us to configure and capture scenarios which provide us valuable training data for a given use case, while the PD-SDK gives us the tools to easily load and interact with this output along with well-known public datasets in a unified development space of Python objects, making it easy to load into training pipelines and encode into our chosen ontology and file schema. It also contains tools to evaluate our datasets at a statistical level, helping to guide the process of tuning our data generation parameters in a way which produces the most potent training data for our use case.

The recommended workflow when using this repository is to first follow the Data Lab educational notebooks, which will empower the user to produce Parallel Domain data for themselves and give introductory exposure to PD-SDK in the process. Then, once the user is comfortable generatng datasets, they can follow up by digging deeper into the features offered by the PD-SDK to get the most out of their data quckly by becoming familiar with the structure of the Python objects and the methods they offer, including comparing their data at a quantitative level to their real-world data by calculating statistics which are relevant to their use case (i.e. classwise pixel distributions for semantic segmentation model training).

Data Lab Educational Notebooks

Notebooks covering demonstrations of all things Data Lab, organized by use case. New users can follow the guided tour outlined on the README page to quickly familizarize themselves with the functionality of Data Lab, starting simple and building in use case complexity along the way. By the end of the series, the user should be comfortable using Data Lab to produce data tailored to their own use case by having full control over the environment as well as the placement and behavior of the ego and other agents within it. To begin, we utilize generators that guide the behavior of the native Parallel Domain simulation stack, then expand on this by mixing in agents which obey custom simulation behaviors constructed by the user. These concepts then unlock more advanced usage of Data Lab, including the powerful generative AI capabilities of Reactor, which allows us to place custom objects into our scenes using textual prompting, providing valuable training examples of difficult-to-record agent types (i.e. pedestrians with strollers or using mobility devices).

PD-SDK Masterclass Notebooks

The PD-SDK provides an intuitive interface for loading and interacting with machine learning datasets, but to achieve full mastery over the rich and diverse features it offers us, a guided tour can once again help us to reduce onboarding time for the user and streamline their workflows. This section will aim to do just that, by shifting the focus from dataset generation over to interacting with premade datasets using the PD-SDK for various use cases. We will look at using decoders to load datasets into our Python objects, the features offered by these objects, and how we can use these features to easily and effectively use our data. We will then look at the visualization capabilities offered by the PD-SDK of simulation files as well as rendered data and associated statistics. Finally, we review the use of encoders, which allow us to manipulate the ontology and annotation rules of our data so that there is no label space domain gap between our synthetic and real data, as well as enabling us to write the data back out to disk in a file schema of our choice, which can be desirable for easy loading into established training pipelines.

Usage Instructions

For first install and when getting updates:

Install Git LFS (Large File Storage) by following the instructions here: Git LFS Installation Docs
Pull this repo locally and navigate to it in a terminal
Create a fresh Python virtual environment (venv) for working with this repo by running python -m venv ./venv
Activate your venv in your terminal by running one of the following:
- Windows Powershell: ./venv/scripts/activate.ps1
- Linux: ./venv/bin/activate
Install requirements to your venv by running pip install . in your terminal

For each use:

Navigate to this repo in a terminal
Activate your venv:
- Windows Powershell: ./venv/scripts/activate.ps1
- Linux: ./venv/bin/activate
Ensure PYTHONPATH environment variable is empty string:
- Windows Powershell: $env:PYTHONPATH=""
- Linux: export PYTHONPATH=""
Run jupyter notebook
- Note that this launches a Jupyter classic server and browser window. Optionally, you can use JupyterLab or VSCode for more advanced notebook features

Name		Name	Last commit message	Last commit date
Latest commit History 66 Commits
data_lab/notebooks		data_lab/notebooks
pd_education		pd_education
pd_sdk/notebooks		pd_sdk/notebooks
.gitattributes		.gitattributes
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
config.ini		config.ini
directory_gen		directory_gen
nbgitconvert		nbgitconvert
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Parallel Domain Educational Suite

Data Lab Educational Notebooks

PD-SDK Masterclass Notebooks

Usage Instructions

For first install and when getting updates:

For each use:

About

Releases

Packages

Languages

License

parallel-domain/pd-education

Folders and files

Latest commit

History

Repository files navigation

Parallel Domain Educational Suite

Data Lab Educational Notebooks

PD-SDK Masterclass Notebooks

Usage Instructions

For first install and when getting updates:

For each use:

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages