# Generating Code _from_ Your Tests

It is somewhat typical to want to generate unit tests from code, but when using _test-driven development_, you write a test _first_, then the code to make it _pass_. So, let's try doing that.

## Setting Up

Like most of the other recipes, we will make inference calls against a model, IBM Granite 20b Code Instruct 8k in this case, that is hosted remotely on [Replicate](https://replicate.com/), hosted in the [ibm-granite](https://replicate.com/ibm-granite) organization. 

The notebook depends on the Granite [utils](https://github.com/ibm-granite-community/utils) package for integration with LLMs using the [Langchain](https://www.langchain.com/) framework.

> **TIP:** See the [Getting Started with Replicate](https://github.com/ibm-granite-community/granite-kitchen/blob/main/recipes/Getting_Started/Getting_Started_with_Replicate.ipynb) notebook in the [granite-kitchen](https://github.com/ibm-granite-community/granite-kitchen) repo for more information about using Replicate.

### Install the required Langchain and Replicate packages

Include a granite-community package with some simple utility functions.

In [None]:
!pip install git+https://github.com/ibm-granite-community/utils \
    "langchain_community<0.3.0" \
    replicate

In [None]:
from ibm_granite_community.notebook_utils import set_env_var, get_env_var

## Generate the Code from a Hypothesis _Property-Based_ Test Suite

We will use a testing library called [Hypothesis](https://hypothesis.readthedocs.io/en/latest/) that helps you think about the _properties_ the unit under test must satisfy. We will load from a file some Hypothesis tests for a non-existent `Rational` class (for rational numbers). Then we will use Granite to generate an implementation of `Rational` that hopefully allows the tests to pass.  

The next two cells install `hypothesis` and load the first set of tests for `Rational` from a file.

In [None]:
!pip install 'hypothesis[cli]'

Load our Hypothesis tests as a string.
This test file is adapted from this GitHub project: 

https://github.com/deanwampler/tdd-hypothesis-example

In [None]:
with open("test_rational.py") as f:
    tests = f.read()
print(tests)

To explain the syntax briefly, the `@given` _decorators_ tell `hypothesis` to generate example values that will be passed as arguments to the test methods. 

For example, `test_init_takes_numerator_denominator` takes three parameters: `self` an integer numerator and a nonzero denominator. The `@given` will generate 100 examples of each value (by default) and call the test with combination of all those values. Note that this eliminates the need for you to generate a set of good examples yourself. The assertions in the test will verify that the expected logic is satisfied. 

See the [hypothesis](https://hypothesis.readthedocs.io/en/latest/) documentation for more details.

## Try Generating an Implementation of `Rational`

Let's now try to generate an implementation of `Rational` that passes the tests.

First, we define a default _system prompt_ we'll pass as part of the inference call.

In [None]:
default_system_prompt = """
Role: Python Code Generator.
User Input: Python tests, written using the hypothesis test library, https://hypothesis.readthedocs.io/en/latest/.
Output: The Python code that implements the functionality in the Python hypothesis tests, so the tests pass. DO NOT print the tests in the output output. DO NOT print markdown or other separate documentation. DO print the Python code that makes the tests pass. DO add class and method documentation comments explaining what the code does, the arguments passed, etc.
Validity: Only valid Python code is generated, which allows the input Python hypothesis tests to pass.
"""

Next, we define the model to use and a dictionary of parameters to pass to the `Replicate` constructor.

In [None]:
from langchain_community.llms import Replicate

In [None]:
model_id="ibm-granite/granite-20b-code-instruct-8k"
 
input_parameters = {      
        "top_k": 60,
        "top_p": 0.3, 
        "max_tokens": 2000,
        "min_tokens": 0,
        "temperature": 0.3, 
        "presence_penalty": 0,
        "frequency_penalty": 0,
        "system_prompt": system_prompt
        }

Now we create a `Replicate` instance, which makes a call to the Replicate service to authenticate.

> **TIP:** If you get an authentication or similar error in the next cell, see the suggestions mentioned above about using Replicate.

In [None]:
granite_via_replicate = Replicate(
            model=model_id,
            model_kwargs=input_parameters,
            replicate_api_token=get_env_var('REPLICATE_API_TOKEN'),
        )

### Perform Inference

Finally, we invoke the model to generate `Rational` from the test code.

In [None]:
prompt = f"""
Here is the Hypothesis test code:
```Python
{tests}
```

Print Python code that makes the test code pass.
"""

replicate_response = granite_via_replicate.invoke(prompt)

print(f"Granite response from Replicate: {replicate_response}")

You can find a good implementation of `Rational` in `./rational/rational-good-example.py`. 

How closely does the Granite output match this code? Does it appear to satisfy the requirements for rationals in the [Wikipedia article](https://en.wikipedia.org/wiki/Rational_number)? 

If important logic is missing, try changing the `prompt`, which is currently very generic:
1. Make the prompt more specific about the properties of rational numbers.
2. Add the link to the Wikipedia page mentioned above (which is also mentioned in the test code comments).

Try any changes to the prompt one at a time to see their relative impact. Also, it's useful to run the query several times for each prompt change to see how the results vary.

### Do the Tests Pass??

Copy the generated `Rational` class definition in the output above and paste it into the `rational/rational.py` file, which currently contains just comments.

Don't include the test code or any markdown or other text included in the generated output!

Do include appropriate `import` statements, e.g., you may see that the generated code calls `gcd` (_greatest common divisor_), which would require this import `from math import gcd`. (If you aren't sure why `gcd` is useful, see the Wikipedia article linked above about the properties of rational numbers.)

Is the indentation correct in the code? Make sure each level is properly indented.

Then, run the following cell, which will run the test code we loaded above.

In [None]:
!python ./test_rational.py

Did the tests pass? If not, what can you change in the generated `Rational` code to make them pass. (For example, compare the generated code to the implementation in `./rational/rational-good-example.py`.) Can you modify the prompt to generate these improvements? 

## For Additional Practice

1. Read more about [Hypothesis](https://hypothesis.readthedocs.io/en/latest/) and _property-based_ testing, a powerful technique for structuring your tests, and designing the corresponding code, in a rigorous way, and not just for mathematical types, like `Rational`.
2. Add new tests to `test_rational.py` for other operators, like `*`, `/`, `+`, `-`, `<`, `<=`, `>`, `>=`, then regenerate `Rational` and see if the implementations of these operators are properly generated. Keep in mind that `Rational(N*numerator, N*denominator)`, for some integer `N`, numerator `numerator`, and denominator `denominator`, is always "rationalized" to `Rational(numerator, denominator)`.
3. Try writing a new test suite for a `Complex` number type and see how well an implementation for it is generated.