# Converting text and code in questions and answers
This notebook loads a jsonl file that contains a list of dictionaries of the following format:
```
{
    "imports":"from skimage.io import imread\nimport stackview",
    "text":"You can load and show an image like this",
    "code":"image = imread('blobs.tif')\nstackview.insight(image)"
}
```

It will turn this information into a list of dictionaries in this format:
```
{
    "question:":"How can I open and visualize blobs.tif ?",
    "answer":"You can do this like this:\n\n```python\n```from skimage.io import imread\nimport stackview\n\nimage = imread('blobs.tif')\nstackview.insight(image)"
}
```

This is done using a languge model. Executing this notebook for about 70 entries costs about $0.10.

In [1]:
import json
import openai
import time
from bia_bob import bob
import os
from bia_bob._utilities import filter_out_blacklist, save_jsonl_file, load_jsonl_file
import json
import ipywidgets as widgets
from IPython.display import display
from bia_bob._utilities import load_jsonl_file, save_jsonl_file

In [3]:
training_data = load_jsonl_file("imports_text_code_selected.jsonl")
training_data[0]

{'imports': 'import numpy as np\nfrom matplotlib import pyplot as plt\n\nimage1 = np.ones((3,5))\nimage1\n\nimage2 = np.random.random((3,5))\nimage2',
 'text': '\n\nNow we take the average over columns, which means along the first axis or ```axis=0```:',
 'code': 'np.mean(image2, axis=0)'}

In [4]:
def prompt(message:str, model="gpt-3.5-turbo"):
    """A prompt helper function that sends a message to openAI
    and returns only the text response.
    """
    import openai
    response = openai.ChatCompletion.create(
        model=model,
        messages=[{"role": "user", "content": message}]
    )
    return response['choices'][0]['message']['content']

In [5]:
def make_question_answer(imports, text, code):
    question = prompt("Reformulated the following so that it becomes a question asking for Python code.\n\n" + text)
    refined_code = prompt(f"""
Add necessary import statements to Python code so that the code works. 
Do not add import statements that are not necessary.
These are the available import statements:
```
{imports}
```

And this is the code:
```
{code}
```

Respond with code only.
""")
    refined_code = refined_code.replace("```python", "").replace("```", "")
    
    
    explanation = prompt(f"""Explain the following code shortly, but do not explain import statements:
```python
{refined_code}
```
""")

    answer = f"""
{explanation}

```python
{refined_code}
```
""" 

    return question, answer

q, a = make_question_answer(training_data[0]['imports'], training_data[0]['text'], training_data[0]['code'])

print(f"""
Q: {q}

A: {a}
""")


Q: How can we calculate the average over columns using Python code, specifically along the first axis or ```axis=0```?

A: 
Without any import statements, there is no additional code to explain.

```python
No additional import statements are necessary for the code to work.
```




In [6]:
questions_answers = []
for i, entry in enumerate(training_data):
    print(i)
    q,a = make_question_answer(entry['imports'], entry['text'], entry['code'])
    questions_answers.append({
        "question":q,
        "answer":a,
    })

    save_jsonl_file(questions_answers, "questions_answers.jsonl")

    # do not overheat the OpenAI API
    time.sleep(10)

0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69


Timeout: Request timed out: HTTPSConnectionPool(host='api.openai.com', port=443): Read timed out. (read timeout=600)