Setup logging config, so we can see the error message from the mocktail library and ignore debug messages.

In [31]:
import json
import logging

from mocktail import mocktail, providers, serializers
from mocktail.fields.int import Int
from mocktail.fields.str import Str
from mocktail.providers import single_int

logging.basicConfig(level=logging.ERROR, format="~ %(message)s",)

Data model can be defined as pure Python class. Let's star with the simple one.

In [32]:
class User1:
    username: str
    age: int

To generate data simply call `mocktail` function by passing model class and number of records. Let's create 5 users

In [33]:
users = mocktail(User1, 5)
print(json.dumps(users, indent=2))

This creates data records as array of dictionaries. While values are valid strings and integers, `username` and `age` field values looks odd. Let's customize `age` field so it is a little bit more realistic.

In [36]:
class User2:
    username: str
    age = Int(min_value=0, max_value=100)

In [37]:
users = mocktail(User2, 5)
print(json.dumps(users, indent=2))

[
  {
    "age": 64,
    "username": "fErJqWacCH"
  },
  {
    "age": 27,
    "username": "FULHKCpAXg"
  },
  {
    "age": 60,
    "username": "bmWLdyziwb"
  },
  {
    "age": 4,
    "username": "AuRyngZTaz"
  },
  {
    "age": 8,
    "username": "BRiZFSPldb"
  }
]


But what about username? Can we do better? 
Sure, let's introduce a concept of providers. Providers are classes that generate actual values.

In [38]:
class User3:
    username = Str(provider=providers.Username())
    age = Int(provider=providers.Age())

[
  {
    "username": "robin99",
    "age": 12
  },
  {
    "username": "fmarquez",
    "age": 75
  },
  {
    "username": "ybradford",
    "age": 31
  },
  {
    "username": "ruth82",
    "age": 35
  },
  {
    "username": "aaronkelly",
    "age": 23
  }
]


In [None]:
users = mocktail(User3, 5)
print(json.dumps(users, indent=2))

Now we can see realistic data is generated.

In technical terms provider is `Generator[T, None, None] | Iterator[T]`. We have more of them. Let's add more fields to the User.

In [49]:
class User4:
    first_name = Str(provider=providers.FirstName())
    last_name = Str(provider=providers.LastName())
    address = Str(provider=providers.Address())
    country = Str(provider=providers.Country())
    username = Str(provider=providers.Username())
    age = Int(provider=providers.Age())
    
print(json.dumps(mocktail(User4, 10), indent=2))

[
  {
    "first_name": "Stacy",
    "last_name": "Parsons",
    "address": "Unit 7904 Box 4380\nDPO AE 01026",
    "country": "Slovenia",
    "username": "dwaynehensley",
    "age": 12
  },
  {
    "first_name": "Travis",
    "last_name": "Singh",
    "address": "75557 Robert Lock Suite 176\nNew Michelechester, MA 15641",
    "country": "Congo",
    "username": "whiteshawn",
    "age": 73
  },
  {
    "first_name": "Thomas",
    "last_name": "Aguilar",
    "address": "PSC 5766, Box 9803\nAPO AP 09919",
    "country": "Martinique",
    "username": "tylercraig",
    "age": 58
  },
  {
    "first_name": "Benjamin",
    "last_name": "Espinoza",
    "address": "092 Thomas Isle Suite 601\nSouth Gabriellehaven, CT 25210",
    "country": "Saint Martin",
    "username": "paynedavid",
    "age": 40
  },
  {
    "first_name": "Cody",
    "last_name": "Morales",
    "address": "722 Melissa Lights Apt. 950\nFloresborough, MH 79424",
    "country": "Aruba",
    "username": "jon71",
    "age": 52
  

In [None]:
users = mocktail(User4, 5)
print(json.dumps(users, indent=2))

Another concept is serializers. Serializers provide a way for you to customize the output. 
To create a CSV string you just tell mocktail to use `CsvString` serializer like this:

In [50]:
print(mocktail(User4, 5, serializer=serializers.CsvString()))

"first_name","last_name","address","country","username","age"
"Pamela","Green","710 Parker Summit
Port Michaelstad, ME 20576","Ecuador","davidgoodman",5
"John","Lucas","398 Allison Stravenue
Lake Crystal, MT 99783","Nicaragua","richardedwards",17
"Shelly","Clark","7507 Roberts Corners
South Alisonland, HI 08998","Denmark","danielholmes",33
"Christopher","Mitchell","Unit 1232 Box 0443
DPO AP 65967","Guyana","uballard",84
"Richard","Stephens","034 Parker Gardens
Lake Keithfurt, NJ 48335","Djibouti","kbradshaw",6



You can also use `CsvFile` serializer to create a CSV file with the generated data. This will create `test_output.csv` file in the same directory, or you can pass `pathlib.Path` pointing to the location of the file.

In [60]:
csv_file_serializer = serializers.CsvFile('test_output.csv')
mocktail(User4, 5, serializer=csv_file_serializer)

test_output.csv


If you have tests that require inline SQL CTE with the mock data you can use `SqlCte` serializer. When creating serializer instance you need to pass alias for created CTE. 

In [43]:
class User5:
    username = Str(provider=providers.Username())
    first_name = Str(provider=providers.FirstName())
    last_name = Str(provider=providers.LastName())
    age = Int(provider=providers.Age())
    
print(mocktail(User5, 5, serializer=serializers.SqlCte("UserModel")))

WITH "UserModel" AS 
(SELECT username AS username, first_name AS first_name, last_name AS last_name, age AS age 
FROM (VALUES ('bsmith', 'Ryan', 'Smith', 18), ('jjenkins', 'Patricia', 'Sanchez', 89), ('derrick95', 'Kelly', 'Bradley', 21), ('joshua30', 'Colleen', 'Eaton', 93), ('agonzalez', 'John', 'Hendricks', 30)))
 SELECT "UserModel".username, "UserModel".first_name, "UserModel".last_name, "UserModel".age 
FROM "UserModel"


You can expect something like 
```sql
WITH "UserModel" AS (
    SELECT username AS username, first_name AS first_name, last_name AS last_name, age AS age 
    FROM (
        VALUES ('bsmith', 'Ryan', 'Smith', 18), ('jjenkins', 'Patricia', 'Sanchez', 89), ('derrick95', 'Kelly', 'Bradley', 21), ('joshua30', 'Colleen', 'Eaton', 93), ('agonzalez', 'John', 'Hendricks', 30)
    )
) 
SELECT "UserModel".username, "UserModel".first_name, "UserModel".last_name, "UserModel".age 
FROM UserModel
```

Other serializers are not implemented yet, but you can expect availability of different type of data file formats. 

What about testing edge cases with data? Something like age is less than 0?

In [46]:
class User6:
    age = Int(provider=[
        (0.20, single_int(-1)), 
        (0.80, providers.Age())
    ])
    name = Str(provider=providers.FullName())

This model will contain around 20% of values in `age` field being `-1` (which obviously invalid for the age). With this data you can test your behaviour with the invalid data seeded.
Here we create 1000 records and print number of records where `age=-1`, which should be around 20%.

In [47]:
test_users = mocktail(User6, 1000)

print(f"Number of users with age equal to -1: {len(list(filter(lambda x: x["age"] == -1, test_users)))}")

Number of users with age equal to -1: 102


In the latest model, you saw a new way to define providers using an array of tuples. In each tuple, the first element is a decimal between 0 and 1, which represents the likelihood of using the provider instance, given as the second element. This lets you introduce invalid data into your models for testing purposes.

Another feature you saw is the `single_int` provider, which, as the name suggests, uses the integer given as the first argument each time this provider is called.

Obviously there are much more possibilities that not yet ready. Stay tuned for more updates.

Happy coding :)