# Test Data Generation

- generating good test data can be challenging
- use Hypothesis library - https://hypothesis.readthedocs.io/en/latest/index.html
- Hypothesis is a Python library for creating unit tests by automatically generating meaningful test data
    - helps create edge test cases in your code, you'd not have thought to look for
    - can use it with `pytest` and `unittest` libraries
- hypothesis provides property-based testing 
- designed to test the aspects of a data property that should always be true
- allows for a range of inputs to be programmed and tested within a single test, rather than having to write a different test (hard-coded inputs) for every value that you want to test
- let's you do **fuzz testing**
    - an automated software testing method that injects invalid, malformed, or unexpected inputs into a system to reveal software defects and security vulnerabilities 
- install hypothesis library
- more detailed examples: [https://semaphoreci.com/blog/property-based-testing-python-hypothesis-pytest](https://semaphoreci.com/blog/property-based-testing-python-hypothesis-pytest) 

```bash
pip install hypothesis
```

- see what data you can generate and how docs: [https://hypothesis.readthedocs.io/en/latest/data.html#](https://hypothesis.readthedocs.io/en/latest/data.html#)

In [1]:
! pip install hypothesis

Defaulting to user installation because normal site-packages is not writeable


In [2]:
def add(nums:list[int]) -> int:
    s: int = 0
    for n in nums:
        s += n
    return s

In [3]:
# typical unittesting; hardcoded input provided to functions/methods provides expected output
assert add([1, 2, 3]) == 6, '1 2 3 did NOT add to 6'
assert add([1, 3, -1, 0, -1]) == 2, '1, 3, -1, 0, -1 did NOT add to 2'
print('all tests done...')

all tests done...


In [4]:
# see settings docs: https://hypothesis.readthedocs.io/en/latest/settings.html
from hypothesis import given, settings, Verbosity
import hypothesis.strategies as some

In [5]:
# by default generates 200 random lists of integers
@given(some.lists(some.integers()))
def test_add(nums):
    #print(nums) # uncomment it to see what nums are generated
    assert add(nums) == sum(nums)

In [6]:
test_add()

In [7]:
# more examples...
@given(some.integers(), some.integers())
# can set it to control the no. of examples, database, randomization, etc.
@settings(max_examples=100, verbosity=Verbosity.verbose, derandomize=True) 
def test_ints_are_commutative(x, y):
    assert x + y == y + x

In [8]:
test_ints_are_commutative()

Trying example: test_ints_are_commutative(
    x=0,
    y=0,
)
Trying example: test_ints_are_commutative(
    x=-1153,
    y=0,
)
Trying example: test_ints_are_commutative(
    x=-1153,
    y=25546,
)
Trying example: test_ints_are_commutative(
    x=-4626724384397176481,
    y=0,
)
Trying example: test_ints_are_commutative(
    x=-4626724384397176481,
    y=20354,
)
Trying example: test_ints_are_commutative(
    x=10264,
    y=0,
)
Trying example: test_ints_are_commutative(
    x=10264,
    y=-741279336,
)
Trying example: test_ints_are_commutative(
    x=17605,
    y=0,
)
Trying example: test_ints_are_commutative(
    x=17605,
    y=-62,
)
Trying example: test_ints_are_commutative(
    x=18915,
    y=0,
)
Trying example: test_ints_are_commutative(
    x=18915,
    y=-3375,
)
Trying example: test_ints_are_commutative(
    x=18915,
    y=-3375,
)
Trying example: test_ints_are_commutative(
    x=18915,
    y=18915,
)
Trying example: test_ints_are_commutative(
    x=-51,
    y=127,
)
Tryin

In [9]:
# explicitly give name to data
@given(x=some.integers(), y=some.integers())
def test_ints_cancel(x, y):
    assert (x + y) - y == x

In [10]:
test_ints_cancel()

In [11]:
# generate lists of arbitrary length (usually between 0 and
# 100 elements) whose elements are integers.
@given(some.lists(some.integers()))
def test_reversing_twice_gives_same_list(xs):
    ys = list(xs)
    ys.reverse()
    ys.reverse()
    assert ys == xs 

In [12]:
test_reversing_twice_gives_same_list()

In [13]:
@given(some.tuples(some.booleans(), some.text()))
def test_look_tuples_work_too(t):
    # A tuple is generated as the one you provided, 
    # with the corresponding types in those positions.
    assert len(t) == 2
    assert isinstance(t[0], bool)
    assert isinstance(t[1], str)

In [14]:
# generate even numbers between 10 and 20
# use min_value and max_value or map method
@given(some.integers(min_value=5, max_value=10).map(lambda x: x*2))
def test_somefunc(num):
    print(num)
    #assert test some functions using nums!

In [15]:
test_somefunc()

10
20
12
14
18
16


In [16]:
# can compose types...
# list with at most 100 integers with min value of 1
@given(some.lists(some.integers(min_value=1), max_size=100))
def test_func1(nums):
    print(nums)
    

In [17]:
test_func1()

[]
[1]
[5487336869382857238]
[2764]
[59]
[5620]
[4]
[20955]
[23703]
[12879]
[3161955261355021408]
[2941, 25667, 29028]
[3716, 25667, 29028]
[3716, 25667, 67, 29028]
[3716, 25667, 67, 17628, 1, 37, 169414159211292165057177715228503479645, 11778, 30096]
[3716, 25667, 67, 37, 1, 37, 169414159211292165057177715228503479645, 11778, 30096]
[3716, 25667, 67, 37, 1, 37, 169414159211292165057177715228503479645, 11778, 12434, 83]
[770, 67, 67, 37, 1, 37, 169414159211292165057177715228503479645, 11778, 12434, 83]
[10996, 2244, 10586]
[10996, 2244, 10586, 32689, 1840, 77]
[10996, 2244, 10586, 32689, 8, 333]
[10996, 2244, 10586, 32689, 8, 8]
[10996, 2244, 10586, 2142241668]
[10996, 2244, 10586, 2142241668]
[2244, 2244, 10586, 2142241668]
[12066, 5559, 4098, 12904, 8457, 22917, 23942, 29636, 3908, 626125932, 3444, 7, 25953, 14, 10048, 2299510075982910757]
[12066, 5559, 4098, 12904, 8457, 22917, 23942, 14, 29636, 3908, 626125932, 3444, 7, 25953, 14, 10048, 2299510075982910757]
[12066, 5559, 4098, 129

In [18]:
# define a function that takes an integer value between 1 and 10 and returns the square root of the value
def int_sqrt(n: int) -> float:
    # is this the correct implementation?
    return n**0.5

In [19]:
def test_int_sqrt():
    assert int_sqrt(9) == 3, 'sqrt(9) != 3'
    assert int_sqrt(4) == 2, 'sqrt(4) != 2'
    #assert int_sqrt(100) == 10, 'sqrt(100) != 10'
    # any problem here...?
    print('all tests PASS...')

In [20]:
test_int_sqrt()

all test PASS...


In [21]:
# property-based testing using hypothesis
from dataclasses import dataclass
import hypothesis.strategies as st

@dataclass
class TestData:
    int_value: st.SearchStrategy[int]

# generating correct input data range
test_data = TestData(int_value=st.integers(min_value=1, max_value=10))

In [22]:
@given(st.data())
def test_int_sqrt(data: st.DataObject):
    import math

    an_int = data.draw(test_data.int_value)
    root = int_sqrt(an_int)
    # TODO: uncomment to see the test data
    #print(an_int, root) 

    assert isinstance(an_int, int)
    assert root == math.sqrt(an_int)
    print('all answer correct')

In [23]:
test_int_sqrt()

all answer correct
all answer correct
all answer correct
all answer correct
all answer correct
all answer correct
all answer correct
all answer correct
all answer correct
all answer correct


In [24]:
# what if you pass string, -negative, 0, float, larger than 10 values...

# let's test for -ve values
@given(some.integers(min_value=-100000, max_value=-1))
def test_int_sqrt_negative(n: int):
    # this should throw AssertionError, but does it...?
    try:
        #print(n)
        root = int_sqrt(n)
    except AssertionError:
        # this must be printed... to pass the test
        print('assertion error thrown...PASS')
    else:
        print('FAIL')

In [25]:
test_int_sqrt_negative()

FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL


In [27]:
# let's test for larger than 10 values
@given(some.integers(min_value=11, max_value=100))
def test_int_sqrt_larger_positives(n: int):
    # this should throw AssertionError, but does it...?
    try:
        #print(n)
        root = int_sqrt(n)
    except AssertionError:
        # this must be printed... to pass the test
        print('assertion error thrown...PASS')
    else:
        print('FAIL')

In [28]:
test_int_sqrt_larger_positives()

FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL


In [29]:
# let's test with float values
@given(some.floats())
def test_int_sqrt_floats(n: float):
    # this should throw AssertionError, but does it...?
    try:
        #print(n)
        root = int_sqrt(n)
    except AssertionError:
        # this must be printed... to pass the test
        print('assertion error thrown...PASS')
    else:
        print('FAIL')

In [30]:
test_int_sqrt_floats()

FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL


In [34]:
# let's test with some strings
@given(some.text())
def test_int_sqrt_strings(n: str):
    # this should throw AssertionError, but does it...?
    try:
        #print(n)
        root = int_sqrt(n)
    except AssertionError:
        # this must be printed... to pass the test
        print('assertion error thrown...PASS')
    else:
        print('FAIL')

In [35]:
test_int_sqrt_strings()

assertion error thrown...PASS
assertion error thrown...PASS
assertion error thrown...PASS
assertion error thrown...PASS
assertion error thrown...PASS
assertion error thrown...PASS
assertion error thrown...PASS
assertion error thrown...PASS
assertion error thrown...PASS
assertion error thrown...PASS
assertion error thrown...PASS
assertion error thrown...PASS
assertion error thrown...PASS
assertion error thrown...PASS
assertion error thrown...PASS
assertion error thrown...PASS
assertion error thrown...PASS
assertion error thrown...PASS
assertion error thrown...PASS
assertion error thrown...PASS
assertion error thrown...PASS
assertion error thrown...PASS
assertion error thrown...PASS
assertion error thrown...PASS
assertion error thrown...PASS
assertion error thrown...PASS
assertion error thrown...PASS
assertion error thrown...PASS
assertion error thrown...PASS
assertion error thrown...PASS
assertion error thrown...PASS
assertion error thrown...PASS
assertion error thrown...PASS
assertion 

## Fix int_sqrt( ) so all property-based test PASS

- since the test uses AssertionError exception, use assert to fix various properties of the input data
    - assert num >= 1 and num <= 10
    - ...
- you could decide to throw your custom error for invalid data and assert those errors accordingly

## property-based testing demo

- see `src/unittesting/inventory` folder
    - a simple order processing and stock control system
    - burrowed from book "The Pragmatic Programmer" by David Thomas and Andrew Hunt
- two classes:
    - `Warehouse` and `Order` in two separate modules
- run several test modules provided in the order:

```bash
pytest test_order.py
pytest test_order1.py # <-- this property-based testing using hypothesis will find error
pytest test_warehouse.py
pytest test_order_fixed.py
```

- performs several property-based tests
- automatically generates test data using `hypothesis`
- finds the data that causes tests to fail
    - use the data to create the separate explicit `unittest` - which becomes your regression test
    - since the data is generated randomly, you may not guarantee the same data will be generated
- property-based tests often surprise you!

### regression test
- focus on the subset of unit tests targeting a subset of new code/feature
- a type of software testing technique that re-runs functional and non-functional tests to ensure that a software application works as intended after any code changes, updates, revisions, improvements, or optimizations
- change int_sqrt( ) function to accept values from 0 to 100
- see if the existing test passes
    - do you need new property-based tests?