# Generating Code _from_ Your Tests

It is somewhat typical to want to generate unit tests from code, but when using _test-driven development_, you write a test _first_, then the code to make it _pass_. So, let's try doing that.

## Setting Up

Like most of the other recipes, we will make inference calls against a model, IBM Granite 20b Code Instruct 8k in this case, that is hosted remotely on [Replicate](https://replicate.com/), hosted in the [ibm-granite](https://replicate.com/ibm-granite) organization. 

The notebook depends on the Granite [utils](https://github.com/ibm-granite-community/utils) package for integration with LLMs using the [Langchain](https://www.langchain.com/) framework.

> **TIP:** See the [Getting Started with Replicate](https://github.com/ibm-granite-community/granite-kitchen/blob/main/recipes/Getting_Started/Getting_Started_with_Replicate.ipynb) notebook in the [granite-kitchen](https://github.com/ibm-granite-community/granite-kitchen) repo for more information about using Replicate.

### Install the required Langchain and Replicate packages

Include a granite-community package with some simple utility functions.

In [None]:
!pip install git+https://github.com/ibm-granite-community/utils \
    "langchain_community<0.3.0" \
    replicate

In [None]:
from ibm_granite_community.notebook_utils import get_env_var

## Generate the Code from a Hypothesis _Property-Based_ Test Suite

We will use a testing library called [Hypothesis](https://hypothesis.readthedocs.io/en/latest/) that helps you think about the _properties_ the unit under test must satisfy. We will define some Hypothesis tests in a string below, then save it to a file so we can execute the tests. They cover a `Rational` class (for rational numbers) that doesn't exist yet. Finally, we will use Granite to generate an implementation of `Rational` that hopefully allows the tests to pass.  

The next two cells install `hypothesis` and load the first set of tests for `Rational` from a file.

In [None]:
!pip install 'hypothesis[cli]'

Next, we define our Hypothesis tests as a string and then save the string to a file so we can run the tests later. (Note that we have to use a hack to nest `"""` strings.)

> **Note:** This test file is adapted from [this GitHub project](https://github.com/deanwampler/tdd-hypothesis-example).

In [None]:
tests = '''
# Example unit tests using Hypothesis for property-based testing.
# Adapted from: https://github.com/deanwampler/tdd-hypothesis-example
# Hypothesis website: https://hypothesis.readthedocs.io/en/latest/

from hypothesis import given, strategies as st
import unittest
from rational import Rational
from math import gcd

class TestRational(unittest.TestCase):
    """
    Test the features implemented currently by Rational.
    Add new tests for Rational arithmetic operations, like multiplication and addition,
    watch the test fail, then implement the feature and ensure the test now passes.
    See also other properties described in the Rational Wikipedia page:
    https://en.wikipedia.org/wiki/Rational_number
    
    Also, try adding a second way to construct Rationals that accepts a string
    argument, "M/N". (Now you really have to think about handling input errors!) 
    What are the requirements for valid strings, e.g., for "M" and "N"?
    If an invalid string is provided, how should the error be handled?
    """

    # Disallow zero for the denominator!

    nonzero_integers = st.integers().filter(lambda i: i != 0)

    @given(st.integers(), nonzero_integers)
    def test_init_takes_numerator_denominator(self, numer, denom):
        """
        A "relatively-trivial" test, but note that the returned
        numerator and denominator will be divided by their greatest
        common divisor.
        """
        rat = Rational(numer, denom)
        divisor = gcd(numer, denom)
        self.assertEqual(numer // divisor, rat.numerator)
        self.assertEqual(denom // divisor, rat.denominator)

    @given(st.integers())
    def test_zero_denominator_raises(self, numer):
        """
        Don't allow zero for the denominator!!
        """
        with self.assertRaises(ValueError):
            rat = Rational(numer, 0)

    @given(st.integers(), nonzero_integers)
    def test_a_rational_equals_itself(self, numer, denom):
        """
        This test passes without adding a custom __eq__ method. 
        Without the __eq__ method, would this test actually use
        "logical" instance equality or just locations in memory?
        """
        rat = Rational(numer, denom)
        self.assertEqual(rat, rat)

    @given(st.integers(), nonzero_integers)
    def test_identical_rationals_are_equal(self, numer, denom):
        """
        Would this one pass if we deleted (or commented out) our custom __eq__ method? 
        Try it!
        """
        rat1 = Rational(numer, denom)
        rat2 = Rational(numer, denom)
        self.assertEqual(rat1, rat2)

    @given(st.integers(), nonzero_integers, nonzero_integers)
    def test_equality_for_two_rationals_with_num_and_dom_that_are_multiples_of_each_other(self, numer, denom, multiple):
        """
        Rule: a/b == c/d iff ad == bc
        Since a*M/b*M == a/b, then a*M/b*M == c/d
        """
        rat1 = Rational(numer*multiple, denom*multiple)
        rat2 = Rational(numer, denom)
        self.assertEqual(rat1, rat2)

    @given(st.integers(), nonzero_integers, st.integers(), nonzero_integers)
    def test_two_non_identical_rationals_are_not_equal_to_each_other(self, numer1, denom1, numer2, denom2):
        """
        Rule: a/b == c/d iff ad == bc
        This is a better test, because it randomly generates different instances.
        However, the test has to check for the case where the two values happen to be
        equivalent!
        """
        rat1 = Rational(numer1, denom1)
        rat2 = Rational(numer2, denom2)
        if numer1*denom2 == numer2*denom1:
            self.assertEqual(rat1, rat2)
        else:
            self.assertNotEqual(rat1, rat2)

if __name__ == "__main__":
    unittest.main()
'''

In [None]:
with open("test_rational.py", mode="w+") as f:  # save to a file.
    f.write(tests)

To explain the syntax briefly, the `@given` _decorators_ tell `hypothesis` to generate example values that will be passed as arguments to the test methods. 

For example, `test_init_takes_numerator_denominator` takes three parameters: `self` an integer numerator and a nonzero denominator. The `@given` will generate 100 examples of each value (by default) and call the test with combination of all those values. Note that this eliminates the need for you to generate a set of good examples yourself. The assertions in the test will verify that the expected logic is satisfied. 

See the [hypothesis](https://hypothesis.readthedocs.io/en/latest/) documentation for more details.

## Try Generating an Implementation of `Rational`

Let's now try to generate an implementation of `Rational` that passes the tests.

First, we define a default _system prompt_ we'll pass as part of the inference call.

In [None]:
default_system_prompt = """
Role: Python Code Generator.
User Input: Python tests, written using the hypothesis test library, https://hypothesis.readthedocs.io/en/latest/.
Output: The Python code that implements the functionality in the Python hypothesis tests, so the tests pass. DO NOT print the tests in the output output. DO NOT print markdown or other separate documentation. DO print the Python code that makes the tests pass. DO add class and method documentation comments explaining what the code does, the arguments passed, etc.
Validity: Only valid Python code is generated, which allows the input Python hypothesis tests to pass.
"""

Next, we define the model to use and a dictionary of parameters to pass to the `Replicate` constructor.

In [None]:
from langchain_community.llms import Replicate

In [None]:
model_id="ibm-granite/granite-20b-code-instruct-8k"
 
input_parameters = {      
        "top_k": 60,
        "top_p": 0.3, 
        "max_tokens": 2000,
        "min_tokens": 0,
        "temperature": 0.3, 
        "presence_penalty": 0,
        "frequency_penalty": 0,
        "system_prompt": default_system_prompt
        }

Now we create a `Replicate` instance, which makes a call to the Replicate service to authenticate.

> **TIP:** If you get an authentication or similar error in the next cell, see the suggestions mentioned above about using Replicate.

In [None]:
granite_via_replicate = Replicate(
            model=model_id,
            model_kwargs=input_parameters,
            replicate_api_token=get_env_var('REPLICATE_API_TOKEN'),
        )

### Perform Inference

Finally, we invoke the model to generate `Rational` from the test code.

In [None]:
prompt = f"""
Here is the Hypothesis test code:
```Python
{tests}
```

Print Python code that makes the test code pass.
"""

replicate_response = granite_via_replicate.invoke(prompt)

print(f"Granite response from Replicate: {replicate_response}")

You can find a good implementation of `Rational` [`rational-good-example.py`, in the GitHub repo](https://github.com/ibm-granite-community/granite-code-cookbook/blob/main/recipes/Code_Gen_from_Tests/rational-good-example.py).

How closely does the Granite output match this code? Does it appear to satisfy the requirements for rationals in the [Wikipedia article](https://en.wikipedia.org/wiki/Rational_number)? 

If important logic is missing, try changing the `prompt`, which is currently very generic:
1. Make the prompt more specific about the properties of rational numbers.
2. Add the link to the Wikipedia page mentioned above (which is also mentioned in the test code comments).

Try any changes to the prompt one at a time to see their relative impact. Also, it's useful to run the query several times for each prompt change to see how the results vary.

### Do the Tests Pass??

Let's run the tests! Execute the next cell to create a `rational` directory and files we need:

In [None]:
!rm -rf rational
!mkdir -p rational
!echo "from .rational import Rational" > rational/__init__.py
!ls -al rational

Note that the test code expects to find a `Rational` type in the `rational` package, which is why we did some of the steps in the previous cell.

Now copy the generated `Rational` class definition in the output above and paste it in between the `"""` quotes in the next cell. _If the generated code has `"""` comments_, either delete them, replace them with `'''` (triple single quotes), or add a `\` in front of the first of the three `"` _for every case_, like we did for the tests above.

**Don't** include the test code, any markdown, or other text that was part of the generated output!

**Do** include appropriate `import` statements, e.g., you may see that the generated code calls `gcd` (_greatest common divisor_), which would require this import `from math import gcd`. (If you aren't sure why `gcd` is useful, see the Wikipedia article linked above about the properties of rational numbers.)

Is the indentation correct in the code? Make sure each level is properly indented.

In [None]:
rational = """
"""

Finally, create a new `rational/rational.py` file to hold your generated `Rational` code.

In [None]:
with open("rational/rational.py", mode="w+") as f:  # save to a file.
    f.write(rational)

In [None]:
!ls -al rational

We're finally ready to run the tests, by running the following cell.

In [None]:
!python ./test_rational.py

Did the tests pass? If not, what can you change in the generated `Rational` code to make them pass. For example, compare the generated code to the [`rational-good-example.py` implementation in the GitHub repo](https://github.com/ibm-granite-community/granite-code-cookbook/blob/main/recipes/Code_Gen_from_Tests/rational-good-example.py) we mentioned above. Can you modify the prompt to generate these improvements? 

For comparison and convenience, here is `rational-good-example.py`, stripped of comments. If you try using this code instead for the definition of the `rational` variable above, the tests should pass.

```python
rational = '''
from math import gcd

class Rational:
    def __init__(self, numerator, denominator):
        if denominator == 0:
            raise ValueError("Cannot create a Rational with a zero denominator.")

        divisor = gcd(numerator, denominator)
        self.numerator = numerator // divisor
        self.denominator = denominator // divisor

    def __str__(self):
        return f"{self.numerator}/{self.denominator}"

    def __eq__(self, other):
        return self.numerator * other.denominator == self.denominator * other.numerator
'''
```

## For Additional Practice

1. Read more about [Hypothesis](https://hypothesis.readthedocs.io/en/latest/) and _property-based_ testing, a powerful technique for structuring your tests, and designing the corresponding code, in a rigorous way, and not just for mathematical types, like `Rational`.
2. Add new tests to `test_rational.py` for other operators, like `*`, `/`, `+`, `-`, `<`, `<=`, `>`, `>=`, then regenerate `Rational` and see if the implementations of these operators are properly generated. Keep in mind that `Rational(N*numerator, N*denominator)`, for some integer `N`, numerator `numerator`, and denominator `denominator`, is always "rationalized" to `Rational(numerator, denominator)`.
3. Try writing a new test suite for a `Complex` number type and see how well an implementation for it is generated.