- a fake data modeler
Python Shell Makefile
Switch branches/tags
Nothing to show
Latest commit 2be7091 Jan 11, 2018 okay okay [debug] add --exit-on-error and --debug flags. (0.0.13)
* update README and EXAMPLES is a program for generating fake data from composable yaml templates.

The idea behind is that it should be easy to model fake data that has an interesting shape. Currently, many fake data generators model their data as a collection of IID variables; with we can stitch together those variables into a more coherent model.

some example uses for are:

  • generating mock application data in test environments
  • validating the usefulness of statistical techniques
  • creating synthetic datasets for performance tuning databases


  • declarative syntax
  • use basic faker.rb fields with #{} interpolators
  • sample and join data from CSV files
  • lambda expressions, switch and mixture fields
  • nested and composable templates
  • static variables and hidden fields

an example template

# a person generator
  min_age: 10
  minor_age: 13
  working_age: 18

    random: gauss(25, 5)
    # minimum age is $min_age
    finalize: max($min_age, value)

      - value: M
      - value: F

  name: "#{}"
    value: "#{job.title}"
    onlyif: this.age > $working_age

    template: address/usa.yaml
  phone: # add a phone if the person is older than the minor age
    template: device/phone.yaml
    onlyif: this.age > ${minor_age}

  # we model our height as a gaussian that varies based on
  # age and gender
    lambda: this._base_height * this._age_factor
      - onlyif: this.gender == "F"
        random: gauss(60, 5)
      - onlyif: this.gender == "M"
        random: gauss(70, 5)

      - onlyif: this.age < 15
        lambda: 1 - (20 - (this.age + 5)) / 20
      - default:
        value: 1

how its different

some specific examples of what can do:

  • generate proportional populations using census data and CSVs
  • create realistic zipcodes by state, city or region (also using CSVs)
  • create a taxi trip dataset with a cost model based on geodistance
  • add seasonal patterns (daily, weekly, etc) to data



# install with python
pip install plaitpy

# or with pypy
pypy-pip install plaitpy

cloning the repo for development

git clone

# get the fakerb repo
git submodule init
git submodule update

generating records from command line

specify a template as a yaml file, then generate records from that yaml file.

# a simple example (if cloning repo)
python templates/timestamp/uniform.yaml

# if is installed via pip templates/timestamp/uniform.yaml

generating records from API

import plaitpy
t = plaitpy.Template("templates/timestamp/uniform.yaml")
print t.gen_record()
print t.gen_records(10)

looking up faker fields also simplifies looking up faker fields:

# list faker namespaces --list
# lookup faker namespaces --lookup name

# lookup faker keys
# (-ll is short for --lookup) --ll name.suffix


yaml file commands

  • see docs/


  • see docs/
  • also see templates/ dir


  • see docs/

future direction

Currently, models independent markov processes - future investigation into modeling processes that can interact with each other is needed.

If you have ideas on features to add, open an issue - Feedback is appreciated!