Skip to content

Configurable library for generating random and yet realistic datasets, compliant with a provided Avro schema and supporting JSON, Avro and YAML as output formats.

License

Notifications You must be signed in to change notification settings

logaritex/data-generator

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

40 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Data Generator

latest Maven Central Publish to the Maven Central

Documentation: www.logaritex.com/data-generator.

The DataGenerator library uses annotated Apache Avro Schema to help you generate random and yet realistic datasets, supporting JSON, Avro and YAML output formats.

The Avro Schemas can be annotated with Data Faker and SpEL expressions to adapt the generated content to any particular use-case or data model.

Data Generator allows, configuring dependencies between the fields of a single or different Schemas.

Quick Start

Add the data-generator dependency to your project:

<dependency>
  <groupId>com.logaritex.data</groupId>
  <artifactId>data-generator</artifactId>
  <version>0.0.3-SNAPSHOT</version>
</dependency>

Create an Avro Schema with data Faker and/or SpEL expressions to hint the desired field content:

namespace: io.simple.clicksteram
type: record
name: User
fields:
  - name: id
    type: string
    doc: "#{id_number.valid}"   # (1)
  - name: sendAt
    type:
      type: long
      logicalType: timestamp-millis
    doc: "[[T(System).currentTimeMillis()]]" # (2)
  - name: fullName
    type: string
    doc: "#{name.fullName}"
  - name: email
    type: string
    doc: "#{internet.emailAddress}"
  - name: age
    type: int
    doc: "#{number.number_between '8','80'}"
  1. Generate realistic random IDs using the Faker's IdNumber provider.
  2. Generate a timestamp (now), using Spring Expression Language (SpEL) to call the Java static method: #!java java.lang.System.currentTimeMillis().

Run the DataGenerator with the user.yaml schema the generate few data instances:

Iterator<GenericData.Record> iterator = 
    new DataGenerator(
        DataUtil.uriToSchema("file:/user.yaml"), // (1)
        3) //(2)
    .iterator();

while (iterator.hasNext()) {
    System.out.println(iterator.next());
}
  1. Initialize the generator with the user.yaml schema.
  2. Number of instances to generate.

the result should look like this:

{ 
  "id": "263-73-3809", 
  "sendAt": 1645529931141, 
  "fullName": "Mohammed Goldner V", 
  "email": "joeann.glover@hotmail.com", 
  "age": 78
},
{ 
  "id": "360-46-4449", 
  "sendAt": 1645529931181, 
  "fullName": "Ms. Winston Gutmann", 
  "email": "louanne.kunze@yahoo.com", 
  "age": 13
}

Next follow the step-by-step user guide.

Features

For full documentation visit logaritex/data-generator.

Build:

./mvnw clean install -DskipTests

About

Configurable library for generating random and yet realistic datasets, compliant with a provided Avro schema and supporting JSON, Avro and YAML as output formats.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages