Chromosoma

Just define your data DNA and generate your dataset.

High Level Overview

Input

Users just need to define the schema model configuration in a json file. For each field of the schema a name, a type and a set of rules need to be defined.

It follows an example of `schema.json definition:

{
   "instances": 10,

   "output": "result",

   "format": "csv",

   "fields":[
      {
         "name":"name",
         "dataType":"string",
         "rules":[
            {
                "type": "set",
                "values": ["dave","simon"],
                "distribution": 1.0
            }
         ]
      },
      {
         "name":"age",
         "dataType":"int",
         "rules":[
            {
               "type":"set",
               "values":[
                  100
               ],
               "distribution":0.1
            },
            {
               "type":"range",
               "min":10,
               "max":99,
               "distribution":0.9
            }
         ]
      },
      {
         "name":"budget",
         "dataType":"decimal",
         "rules":[
            {
               "type":"set",
               "values":[
                  100
               ],
               "distribution":0.5
            },
            {
               "type":"range",
               "min":1,
               "max":10,
               "distribution":0.5
            }
         ]
      },
      {
         "name":"married",
         "dataType":"boolean",
         "rules":[
            {
               "type":"boolean",
               "false":0.0,
               "true":1.0
            }
         ]
      }
   ]
}

Output

result.csv

dave,64,1.3272667719937015,true
dave,66,100.0,true
simon,16,7.887171701724464,true
simon,100,100.0,true
dave,50,4.378826132850798,true
simon,48,100.0,true
simon,24,1.2484780989173947,true
simon,100,100.0,true
dave,37,100.0,true
dave,48,100.0,true
simon,81,9.032302178134143,true

Supported Field Types

string
int
decimal
boolean
date (TODO)

Rules

Every field comes with a set of rules, and every rule comes with a distribution. The distribution you define is used within the generation engine to understand how to model your data. The sum of the rule distributions for a single rule should be equal to 1.

Supported string rules

String set: the string to be generated is randomly selected from values set

{
  "name": "first name",
  "dataType": "string",
  "rules": [
      {
        "type": "set",
        "values": ["dave","simon"],
        "distribution": 1.0
      }
    ]
}

In the example above, all the first names will be equal to dave or simon.

Supported int rules

Integer set: the integer to be generated is randomly selected from values set
Range: the integer to be generated is randomly selected between min and `max

{
     "name":"age",
     "dataType":"int",
     "rules":[
        {
           "type":"set",
           "values":[
              100
           ],
           "distribution":0.1
        },
        {
           "type":"range",
           "min":10,
           "max":99,
           "distribution":0.9
        }
     ]
}

In the example above ~10% of your ages will be equal to 100 and ~90% of your ages will be between 10 and 99.

Supported decimal rules

Same as integer rules (specialisation will be implemented soon).

Supported boolean rules

Boolean: just define the false and true distribution values

{
     "name":"married",
     "dataType":"boolean",
     "rules":[
        {
           "type":"boolean",
           "false":0.0,
           "true":1.0
        }
     ]
}

In the example above, all the married rows will be equal to false

Supported Output Format

CSV (with , separator)
AVRO
JSON (TODO)
JDBC (TODO)
REST (TODO)

Generation Engine Modeling

How to run

The only supported mode now is standalone.

More interactive running mode will be developed soon.

git clone https://github.com/holydrinker/chromosoma.git
cd chromosoma
sbt assembly
java -jar target/scala-2.12/chromosoma-assembly-0.1.0.jar <path-to-schema>.json

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
.github/workflows		.github/workflows
doc		doc
project		project
src		src
.gitignore		.gitignore
.scalafmt.conf		.scalafmt.conf
README.md		README.md
build.sbt		build.sbt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Chromosoma

High Level Overview

Input

Output

Supported Field Types

Rules

Supported string rules

Supported int rules

Supported decimal rules

Supported boolean rules

Supported Output Format

Generation Engine Modeling

How to run

About

Releases

Packages

Languages

holydrinker/chromosoma

Folders and files

Latest commit

History

Repository files navigation

Chromosoma

High Level Overview

Input

Output

Supported Field Types

Rules

Supported string rules

Supported int rules

Supported decimal rules

Supported boolean rules

Supported Output Format

Generation Engine Modeling

How to run

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages