Skip to content

Dataset generator for LaTeX with bounding boxes

License

Notifications You must be signed in to change notification settings

kakainet/TexSet

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

29 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

TexSet

Generate own dataset

Preparation

  • Modify config/job.json
{
    "max-depth": 4, # max depth of elements
    "parts": 4, # number of threads
    "samples-in-part": 10, # number of elements generated by a one thread without level augmentation
    "level-augmentation": true, # do the same job for depth = 0, 1, 2, ..., max-depth - 1 - each job with next thread
    "deeper-chance": 1 # chance that recursion of a generating tree will go deeper, but bounded by max-depth
}
  • Modify config/aug_job.json if you need augmentation
{
    "threads": 12,
    "samples-lvl-percent": { # how many % of samples from given level will be augmented
        "1": 0.2,
        "2": 0.1,
        "3": 0.05
    }
}
  • Modify config/rescale.json
{
    "1": 0.5, # Expression with depth 1 will be treated as they were 50% of the original size.
    "2": 0.35, # Expression with depth 2 will be treated as they were 35% of the orginal size.
    "3": 0.2, # ...
    "4": 0.15 # leafs as they were 15% of the orginal size.
}

Generate step

python3 toolkit/generate_dataset.py --job config/job.json --aug-job config/aug_job.json 

WARNING: parts * ([level-augmentation ? (max-depth)! : 1]) threads will be created.

About

Dataset generator for LaTeX with bounding boxes

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages