# Develop extraction queries with ChatGPT

In [11]:
import os

from dotenv import load_dotenv
from common.llm_protocol import ChatGPT

load_dotenv()

# initialize modules
llm_helper = ChatGPT(os.getenv("OPENAI_API_KEY"))

# Develop Runable Query Code
Use the following principles:
- Fewshot examples should be used
- Set text apart

In [6]:
RUNABLE_CODE_QUERY = """
Act as a python developer. Be concise and provide only code and no explanation. 
Write a sample code for the {function_name} function from the {library_name} python library. 
The function must only be called once. 
If an array is needed for the function, label it as array1. 
If more than one array is needed then increment the count. 
If a constant is needed, then label it constant 1. 
If more than one constant is needed, increment the count.
"""

results = llm_helper.run_query(RUNABLE_CODE_QUERY.format(function_name="random.choice", library_name="numpy"))
print(results)
                               

NameError: name 'llm_helper' is not defined

In [5]:
RUNABLE_CODE_QUERY.format(function_name="random.choice", library_name="numpy")

NameError: name 'RUNABLE_CODE_QUERY' is not defined

* Start with functions that only require single arary input

In [7]:
RUNABLE_CODE_QUERY = """
Act as python developer and be concise and format answer as code only.
Write a minimum working code for the function enclosed triple backticks from the library enclosed in tripple dashes.
You will be penalized for any variable that is defined but not used.

Function Name: ```{function_name}```
Library Name: ---{library_name}---

Follow the following rules:
1. Add all required imports
2. Define the input array as numpy array1 = [1,2,3,4,5] 
3. If the function requires any constants, then define them as constant1 = 1, constant2 = 2
4. Run the function and store results in a variable named result

Examples:

Function Name: ```all```
Library Name: ---numpy---
Code:
import numpy as np
array1 = np.array([1, 2, 3, 4, 5])
result = np.all(array1)

Function Name: ```sin```
Library Name: ---numpy---
Code:
import numpy as np
array1 = np.array([1, 2, 3, 4, 5])
result = np.sin(array1)

Function Name: ```{function_name}```
Library Name: ---{library_name}---
Code:
"""

unit_tests = [
    ["random.choice", "numpy"],
    ["fft.fft", "numpy"],
    ["tan", "numpy"],
    ["degress", "numpy"],
    ["floor", "numpy"],
    ["sum", "numpy"],
]

for (func, lib) in unit_tests:
    result = llm_helper.run_query(RUNABLE_CODE_QUERY.format(function_name=func, library_name=lib))
    print(result)    

NameError: name 'llm_helper' is not defined

# Alternative Code Query

In [21]:
ALTERNATE_CODE_QUERY = """
Find five alternative python functions that can be used instead of the function enclosed by triple backticks.
Each alternative function has to exist and be a real function.
Function Name: ```{library_name}.{function_name}```

Follow the following steps:
1. Describe the purpose of the original function.
2. Generate a list of five alternative functions that can be used instead of the original function.
3. Provide the output as CSV with the following columns:
- function_name: print original functio name ```{library_name}.{function_name}```
- alternative_library_name: unabbreviated alternative library name
- alternative_function_name: full alternative function name and path
"""

unit_tests = [
    ["random.choice", "numpy"],
    ["fft.fft", "numpy"],
    ["tan", "numpy"],
    ["degress", "numpy"],
    ["floor", "numpy"],
    ["sum", "numpy"],
]

for (func, lib) in unit_tests:
    print(func, lib)
    result = llm_helper.run_query(ALTERNATE_CODE_QUERY.format(function_name=func, library_name=lib))
    print(result)    

random.choice numpy
1. Purpose of the original function: 
The function ```numpy.random.choice``` randomly selects elements from a given array or list with or without replacement.

2. List of five alternative functions:
- ```random.sample``` from the ```random``` library
- ```choice``` from the ```random``` library
- ```choices``` from the ```random``` library
- ```sample``` from the ```numpy.random``` library
- ```randint``` from the ```numpy.random``` library

3. Output as CSV:
```
function_name,alternative_library_name,alternative_function_name
numpy.random.choice,random,random.sample
numpy.random.choice,random,random.choice
numpy.random.choice,random,random.choices
numpy.random.choice,numpy.random,numpy.random.sample
numpy.random.choice,numpy.random,numpy.random.randint
```
fft.fft numpy
1. The purpose of the original function ```numpy.fft.fft``` is to compute the one-dimensional discrete Fourier Transform.

2. List of five alternative functions:
- Function Name: ```scipy.fft.fft```

In [16]:
pandas.util._random.choice

NameError: name 'pandas' is not defined

In [10]:
print(ALTERNATE_CODE_QUERY.format(function_name=func, library_name=lib))




Give five alternatives to the function enclosed triple backticks from the library enclosed in tripple dashes.
Each alternative must be exactly the same.
Function Name: ```random.choice```
Library Name: ---numpy---

Provide the output as CSV with the following columns:
- function_name: full function name and path of ```random.choice```
- library_name: unabbreviated library name of ---numpy---
- alternative_library_name: unabbreviated alternative library name
- alternative_function_name: full alternative function name and path



In [14]:
import random

array1 = [1,2,3,4,5]
random.choices(array1)

[1]

* Then move to functions that require multiple arrays


In [36]:
import random
k = [1,2,3,4,5]
random.shuffle(k)
k

[2, 5, 4, 1, 3]

In [None]:
RUNABLE_CODE_QUERY = """
Act as python developer and be concise and format answer as code only.
Write a minimum working code for the function enclosed triple backticks from the library enclosed in tripple dashes.
You will be penalized for any variable that is defined but not used.

Function Name: ```{function_name}```
Library Name: ---{library_name}---

Follow the following rules:
1. Add all required imports
2. Define the minimum number of required numeric arrays of shape (1,5) and name them array1, array2, etc.
3. Define the minumum number of required constants and name them constant1, constant2, etc.
4. Run the function and store results in a variable named result

Examples:


Function Name: ```all```
Library Name: ---numpy---
Code:
import numpy as np
array1 = np.array([1, 2, 3, 4, 5])
result = np.all(array1)

Function Name: ```numpy```
Library Name: ---dot---
Code:
import numpy as np
array1 = np.array([1, 2, 3, 4, 5])
array2 = np.array([6, 7, 8, 9 ,10])
result = np.dot(array1, array2)

Function Name: ```sin```
Library Name: ---numpy---
Code:
import numpy as np
array1 = np.array([1, 2, 3, 4, 5])
result = np.sin(array1)

Function Name: ```{function_name}```
Library Name: ---{library_name}---
Code:
"""

In [3]:
import numpy as np

array1 = np.array([1, 2, 3])
result = np.sin(array1)

In [20]:
import numpy as np
array1 = np.array([1, 2, 3, 4, 5])
array2 = np.array([6, 7, 8, 9 ,10])
result = np.random.choice(array1, size=3, replace=False)