-
Notifications
You must be signed in to change notification settings - Fork 3
Synthetic Data
- Synthetic Data Sets as Gold Standard
- Sets with regular shapes and Gaussian noise
- Sets with irregular shapes and Poisson noise
- Sets with a single large irregular shape and Poisson noise
Every set of algorithms has to be validated, either against a hand-labelled gold standard or a synthetic data, where the ground truth is known.
A series of synthetic data sets that reproduce different behaviour characteristics of migrating neutrophils were generated in MATLAB. The data sets consisted of six artificial neutrophils that travelled along paths that presented different conditions of tortuosity, times to activation and proximity to other neutrophils during 98 time frames.
Numerous data sets of neutrophils in zebrafish were carefully observed before setting the characteristics. Six trajectories were manually determined by setting the row,column positions of the centroids at every time point for 98 time frames. Each trajectory was designed so that it would represent different neutrophil behaviours: some trajectories were very oriented and had movements with uniform distance between time frames, whilst others were less uniform and would move at different velocities, some were tortuous whilst others were straight. The trajectories of cells 1 and 2 collided several times in the second half of the time frames whilst cells 3 and 4 collided at the beginning of the movement. Cell 6 mi- grated without meandering and then stopped at the end (which represents the wound area of an inflammation-based experiment) whilst 5 presented a delayed activation. The x,y,t trajectories can be downloaded in Matlab format from --> HERE <--
Each time frame consisted of 11 slices of z-stack each with 275 x 275 pixels, where the neutrophils were formed by Gaussian distributions of higher intensities than the background. The orientation of the Gaussians varied according to the displacement of the artificial neutrophils, i.e. they were round when the cells were static, or elongated when in movement. The tracks with the Gaussians were saved as the gold standard and five different data sets were generated by adding varying levels of white Gaussian noise resulting in data sets with distributions with increasing similarity between the neutrophils and the background reflected by the decreasing values of the Bhattacharyya Distance (1.61, 1.25, 1, 0.66, 0.45) as defined by Coleman 1979.
The datasets are large zipped files (600MB aprox.) are hosted in ZENODO. If using separately, please cite the unique dois.