Setup logging config, so we can see the error message from the mocktail library and ignore debug messages.

In [6]:
import json
import logging

from mocktail import mocktail, providers, serializers
from mocktail.fields.int import Int
from mocktail.fields.str import Str
from mocktail.providers import single_int

logging.basicConfig(level=logging.ERROR, format="~ %(message)s",)

Data model can be defined as pure Python class. Let's star with the simple one.

In [7]:
class User1:
    username: str
    age: int

To generate data simply call `mocktail` function by passing model class and number of records. Let's create 5 users

In [8]:
users = mocktail(User1, 5)
print(json.dumps(users, indent=2))

[
  {
    "username": "lshiyTZhjT",
    "age": -7824922703287259689
  },
  {
    "username": "bizOlAvrvu",
    "age": 8054835154187599207
  },
  {
    "username": "HaajYLSGOV",
    "age": 3016212116900990693
  },
  {
    "username": "KUIiJjmrPE",
    "age": 8320385039784821487
  },
  {
    "username": "EEoIqxcsDN",
    "age": 3862723318852378904
  }
]


This creates data records as array of dictionaries. While values are valid strings and integers, `username` and `age` field values looks odd. Let's customize `age` field so it is a little bit more realistic.

In [9]:
class User2:
    username: str
    age = Int(min_value=0, max_value=100)

In [10]:
users = mocktail(User2, 5)
print(json.dumps(users, indent=2))

[
  {
    "age": 86,
    "username": "AvjxoORZBz"
  },
  {
    "age": 11,
    "username": "rgqDsJBncy"
  },
  {
    "age": 55,
    "username": "TvrgzCBcJr"
  },
  {
    "age": 34,
    "username": "YXuXlZBWha"
  },
  {
    "age": 5,
    "username": "OrYjtcayHR"
  }
]


But what about username? Can we do better? 
Sure, let's introduce a concept of providers. Providers are classes that generate actual values.

In [11]:
class User3:
    username = Str(provider=providers.Username())
    age = Int(provider=providers.Age())

In [12]:
users = mocktail(User3, 5)
print(json.dumps(users, indent=2))

[
  {
    "username": "xblackwell",
    "age": 98
  },
  {
    "username": "allenkrystal",
    "age": 56
  },
  {
    "username": "lisa00",
    "age": 85
  },
  {
    "username": "nicole66",
    "age": 44
  },
  {
    "username": "davidhoffman",
    "age": 50
  }
]


Now we can see realistic data is generated.

In technical terms provider is `Generator[T, None, None] | Iterator[T]`. We have more of them. Let's add more fields to the User.

In [13]:
class User4:
    first_name = Str(provider=providers.FirstName())
    last_name = Str(provider=providers.LastName())
    address = Str(provider=providers.Address())
    country = Str(provider=providers.Country())
    username = Str(provider=providers.Username())
    age = Int(provider=providers.Age())
    
print(json.dumps(mocktail(User4, 10), indent=2))

[
  {
    "first_name": "Anne",
    "last_name": "Stewart",
    "address": "92233 Carlos Springs\nHansenview, MT 30752",
    "country": "Angola",
    "username": "sandrajenkins",
    "age": 93
  },
  {
    "first_name": "Brittany",
    "last_name": "Miller",
    "address": "03406 Marc Lodge Apt. 829\nPort Vanessaville, VA 82269",
    "country": "Croatia",
    "username": "eric25",
    "age": 64
  },
  {
    "first_name": "Christine",
    "last_name": "Nelson",
    "address": "300 Huynh Rapid\nRyanport, NV 22191",
    "country": "Malaysia",
    "username": "tnewman",
    "age": 75
  },
  {
    "first_name": "Robert",
    "last_name": "Brown",
    "address": "632 Victoria Landing\nAnthonyborough, MO 77943",
    "country": "Holy See (Vatican City State)",
    "username": "obrandt",
    "age": 18
  },
  {
    "first_name": "Emily",
    "last_name": "Greene",
    "address": "394 Joseph Skyway\nNew Steven, AR 29979",
    "country": "Hungary",
    "username": "fowleramy",
    "age": 55
  },
 

In [14]:
users = mocktail(User4, 5)
print(json.dumps(users, indent=2))

[
  {
    "first_name": "Susan",
    "last_name": "Avery",
    "address": "17634 Elizabeth Field\nDavidstad, CO 61564",
    "country": "Croatia",
    "username": "matthewweaver",
    "age": 76
  },
  {
    "first_name": "David",
    "last_name": "Garner",
    "address": "637 Scott Rest\nJamesburgh, AK 68812",
    "country": "Bouvet Island (Bouvetoya)",
    "username": "xclark",
    "age": 24
  },
  {
    "first_name": "Jennifer",
    "last_name": "Wells",
    "address": "31689 Jessica Villages\nCainview, PR 85766",
    "country": "Christmas Island",
    "username": "qclark",
    "age": 68
  },
  {
    "first_name": "Leroy",
    "last_name": "Stone",
    "address": "8159 James Walk\nPort Heatherville, MD 67334",
    "country": "Bhutan",
    "username": "kgreen",
    "age": 89
  },
  {
    "first_name": "Kenneth",
    "last_name": "Hall",
    "address": "2622 Melinda Freeway Apt. 431\nKathyborough, NV 48404",
    "country": "Indonesia",
    "username": "barry77",
    "age": 99
  }
]


Another concept is serializers. Serializers provide a way for you to customize the output. 
To create a CSV string you just tell mocktail to use `CsvString` serializer like this:

In [15]:
print(mocktail(User4, 5, serializer=serializers.CsvString()))

"first_name","last_name","address","country","username","age"
"Julia","Parker","013 Samantha Lodge
South Christopherberg, AL 40872","Sri Lanka","leonardcharlene",87
"Erin","Hanson","4526 Yang Crest Apt. 824
North Jessica, PR 40123","Jersey","danielle78",12
"Anna","Davis","08091 Robert Ridges
Michaelmouth, MI 10192","Timor-Leste","vmcdonald",63
"Gary","Carpenter","USNV Randall
FPO AA 63590","French Polynesia","qhardy",89
"Colin","King","255 Nguyen Center
East Jacobtown, IL 37714","Mali","yorkjo",20



You can also use `CsvFile` serializer to create a CSV file with the generated data. This will create `test_output.csv` file in the same directory, or you can pass `pathlib.Path` pointing to the location of the file.

In [16]:
csv_file_serializer = serializers.CsvFile('test_output.csv')
mocktail(User4, 5, serializer=csv_file_serializer)

PosixPath('test_output.csv')

If you have tests that require inline SQL CTE with the mock data you can use `SqlCte` serializer. When creating serializer instance you need to pass alias for created CTE. 

In [22]:
from mocktail.serializers._sql import SqlCte
class User5:
    username = Str(provider=providers.Username())
    first_name = Str(provider=providers.FirstName())
    last_name = Str(provider=providers.LastName())
    age = Int(provider=providers.Age())
    
print(mocktail(User5, 5, serializer=SqlCte("UserModel")))

WITH "UserModel" AS 
(SELECT username AS username, first_name AS first_name, last_name AS last_name, age AS age 
FROM (VALUES ('nunezbrandi', 'Jessica', 'Malone', 63), ('browngabrielle', 'Corey', 'Hughes', 61), ('amandaestes', 'Kristi', 'Garrison', 73), ('jacksonjoseph', 'Lindsay', 'Warner', 2), ('matthewmaynard', 'Joshua', 'Todd', 65)))
 SELECT "UserModel".username, "UserModel".first_name, "UserModel".last_name, "UserModel".age 
FROM "UserModel"


You can expect something like 
```sql
WITH "UserModel" AS (
    SELECT username AS username, first_name AS first_name, last_name AS last_name, age AS age 
    FROM (
        VALUES ('bsmith', 'Ryan', 'Smith', 18), ('jjenkins', 'Patricia', 'Sanchez', 89), ('derrick95', 'Kelly', 'Bradley', 21), ('joshua30', 'Colleen', 'Eaton', 93), ('agonzalez', 'John', 'Hendricks', 30)
    )
) 
SELECT "UserModel".username, "UserModel".first_name, "UserModel".last_name, "UserModel".age 
FROM UserModel
```

Other serializers are not implemented yet, but you can expect availability of different type of data file formats. 

What about testing edge cases with data? Something like age is less than 0?

In [23]:
class User6:
    age = Int(provider=[
        (0.20, single_int(-1)), 
        (0.80, providers.Age())
    ])
    name = Str(provider=providers.FullName())

This model will contain around 20% of values in `age` field being `-1` (which obviously invalid for the age). With this data you can test your behaviour with the invalid data seeded.
Here we create 1000 records and print number of records where `age=-1`, which should be around 20%.

In [24]:
test_users = mocktail(User6, 1000)

print(f"Number of users with age equal to -1: {len(list(filter(lambda x: x["age"] == -1, test_users)))}")

Number of users with age equal to -1: 191


In the latest model, you saw a new way to define providers using an array of tuples. In each tuple, the first element is a decimal between 0 and 1, which represents the likelihood of using the provider instance, given as the second element. This lets you introduce invalid data into your models for testing purposes.

Another feature you saw is the `single_int` provider, which, as the name suggests, uses the integer given as the first argument each time this provider is called.

Obviously there are much more possibilities that not yet ready. Stay tuned for more updates.

Happy coding :)