This project is no longer under active maintenance. It is read-only, but you can still clone or fork the repo. Check here for further info. Please contact innereye_info@service.microsoft.com if you run into trouble with the "Archived" state of the repo.
InnerEye-CreateDataset contains tools to convert medical datasets in DICOM-RT format to NIFTI. Datasets converted using this tool can be consumed directly by InnerEye-DeepLearning.
Among the core features of this tools are:
- Resampling of the dataset to a common voxel size.
- Renaming of ground truth structures
- Making the structures in a dataset mutually exclusive (this is required by some loss functions in InnerEye-DeepLearning)
- Creating empty structures if they are missing from the dataset
- Discarding subjects that do not have all the required structures
- Augmenting the dataset by combining multiple structures into one, via set operations (intersection, union)
- Removing parts of structures that are lower/higher than other structures in terms of their z coordinate
- Computing statistics of a dataset, to identify outliers and possible annotation errors
Get the installer from Git for Windows
The installer will prompt you to "Select Components". Make sure that you tick the boxes for:
- Git LFS (Large File Support).
- Git Credential Manager for Windows.
After the installation, open a command prompt or the Git Bash:
- Run
git lfs install
to set up the hooks in git - Run
git config --global core.autocrlf true
to ensure that line endings are working as expected
Clone the InnerEye-CreateDataset repository on your machine: Run git lfs clone --recursive https://github.com/microsoft/InnerEye-CreateDataset
You need an installation of [Visual Studio 2019]. If you have an existing installation, start the Visual Studio Installer, click on "More..." -> "Modify".
In the "Workloads" section, the following items need to be selected:
- .NET Development
- Desktop development with C++
In the "Individual Components" section, make sure the following are ticked:
- .NET:
- .NET 6.0 Runtime
- .NET Core 3.1 Runtime (Long Term Support)
- Everything with .NET Framework 4.6.2 (and all higher framework versions for good measure)
- .NET SDK
- Compilers, build tools, and runtimes:
- .NET Compiler Platform SDK
- C++ 2019 Redistributable Update
- MSVC v142 - VS 2019 C++ x64/x86 build tools (Latest)
- C++ CMake tools for Windows
- Debugging and testing:
- C++ AddressSanitizer
- C++ profiling tools
- Development actitives:
- C++ core features
- F# language support
- SDKs, libraries and frameworks:
- C++ ATL for latest v142 build tools (x86 & x64)
- Windows 10.0.19041.0
As well as the above listed componenets, some others may be installed also as part of the selected workloads.
Then open the Source\projects\CreateDataset.sln
solution.
You will see a dialog box suggesting that you upgrade two C++ projects to the latest toolset. Choose NOT to upgrade.
Make sure that the required nuget package sources are available for the solution:
-
Open Tools->NuGet Package Manager->Package Manager Settings
-
Choose NuGet Package Manager->Package Sources
-
Add the following sources to the list, if they are not there:
-
Select the above sources, and deselect others
Verify that all projects loaded correctly.
- In the Visual Studio menu, make sure that "Test" / "Test Settings" / "Default Processor Architecture" is set to x64.
- Build the solution ("Build" -> "Build Solution"). If it fails, build again.
To run tests: After the build, tests should be visible in the Test Explorer.
To use the tool you will need a DICOM-RT dataset with the ground truth scans and rt-struct files describing the ground truth segmentations. The folder structure should have the files for each subject in a separate folder. Inside a folder, the script will search all subdirectories for files as well.
Now, create a parent folder called, for example, datasets
and place your DICOM-RT dataset folder inside. The folder
structure should resemble the following
* datasets
* DICOM-RT dataset
* subject 1
* DICOM files for subject 1
* series 2
* DICOM files for subject 2
.
.
.
The simplest form of the command to run is
InnerEye.CreateDataset.Runner.exe dataset --datasetRootDirectory=<path to directory holding all datasets> --niftiDatasetDirectory=<name of the folder to write to> --dicomDatasetDirectory=<name of dataset to be converted>
datasetRootDirectory
is the path to a folder that holds one or more datasets.dicomDatasetDirectory
is the name of the folder, indatasetRootDirectory
, with the DICOM-RT dataset.niftiDatasetDirectory
is the name of the folder to which the NIFTI dataset should be written. This folder will be created indatasetRootDirectory
- One common switch is the
geoNorm
switch that performs normalization on the dataset voxel sizes, which takes the sizes in millimeters for the x, y, and z dimensions. For example--geoNorm 1;1;2
A description of the major commandline options that control the dataset creation can be found here.
To analyse a dataset, run
InnerEye.CreateDataset.Runner.exe analyze --datasetFolder=<full path to the NIFTI dataset folder to analyse>
This will create a folder called statistics
inside the dataset folder with several csv files containing dataset statistics.
A detailed explanation of the csv files is available here.
This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.
When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.
This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.