# This notebook try to test with alpaca repo

In [1]:
import fire
import os
import tqdm
import random
import time

# import numpy as np
from dotenv import load_dotenv
import google.generativeai as genai
from rouge_score import rouge_scorer

from utils_gemini import (
    seed_instruction_data_loader,
    machine_instruction_data_loader,
    encode_prompt,
    GeminiGenerationArguments,
    gemini_completion,
    post_process_gpt3_response,
)

  from .autonotebook import tqdm as notebook_tqdm


In [2]:
load_dotenv()

genai.configure(api_key=os.environ["GEMINI_API"])

In [4]:
seed_tasks_path="./seed_tasks.jsonl"
num_instructions_to_generate=10  # 20K samples
model_name="gemini-1.5-flash"  # TODO: change to pro
num_prompt_instructions=3
request_batch_size=5
temperature=1.0
top_p=1.0
num_cpus=16

In [5]:
request_idx = 0
seed_instruction_data = seed_instruction_data_loader(seed_tasks_path)
machine_instruction_data = machine_instruction_data_loader()

# use rouge-L for evaluate instruction
scorer = rouge_scorer.RougeScorer(["rougeL"], use_stemmer=False)

progress_bar = tqdm.tqdm(total=num_instructions_to_generate)
if machine_instruction_data:
    progress_bar.update(len(machine_instruction_data))
    

Loaded 21 human-written seed instructions


  0%|                                                                                                                         | 0/10 [00:00<?, ?it/s]

In [6]:
all_instructions = [d["instruction"] for d in seed_instruction_data] + [
    d["instruction"] for d in machine_instruction_data
]
all_instruction_tokens = [
    scorer._tokenizer.tokenize(inst) for inst in all_instructions
]

In [7]:
request_idx += 1

batch_inputs = []
for _ in range(request_batch_size):
    # only sampling from the seed tasks
    prompt_instructions = random.sample(
        seed_instruction_data, num_prompt_instructions
    )
    prompt = encode_prompt(prompt_instructions)
    batch_inputs.append(prompt)

In [9]:
len(batch_inputs)

5

In [8]:
print(batch_inputs[0])

You are asked to come up with a set of 20 diverse code generation task instructions. These task instructions will be given to a GPT model and we will evaluate the GPT model for completing the instructions.

Here are the requirements:
1. Try not to repeat the verb for each instruction to maximize diversity.
2. The language used for the instruction also should be diverse. For example, you should combine questions with imperative instrucitons.
3. The type of instructions should be diverse. The list should include diverse types of programming tasks like open-ended generation, classification, editing, optimization etc.
2. A GPT language model should be able to complete the instruction. For example, do not ask the assistant to create any visual or audio output. For another example, do not ask the assistant to wake you up at 5pm or set a reminder because it cannot perform any action.
3. The instructions should be in English.
4. The instructions should at least 1 to 2 sentences long. Either an

In [10]:
decoding_args = GeminiGenerationArguments(
    temperature=temperature,
    candidate_count=1,
    max_output_tokens=3072,
    top_p=top_p,
    stop_sequences=["\n20", "20.", "20."],
)

request_start = time.time()
print(f"Calling {model_name}...")
results = gemini_completion(
    prompts=batch_inputs,
    model_name=model_name,
    batch_size=request_batch_size,
    decoding_args=decoding_args,
)

request_duration = time.time() - request_start
print(f"request took - {request_duration}")

Calling gemini-1.5-flash...



[A
[A
[A
[A
[A
prompt_batches: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [01:32<00:00, 18.53s/it]

request took - 92.63910031318665





In [11]:
len(results)

5

In [13]:
print(results[0]['text'])

###
4. Instruction:  Develop a JavaScript function that determines whether a given string is a palindrome.  Consider case-insensitive comparisons and ignore spaces.
4. Input: "A man, a plan, a canal: Panama"
4. Output: 
```javascript
function isPalindrome(str) {
  str = str.toLowerCase().replace(/[^a-z0-9]/g, "");
  return str === str.split("").reverse().join("");
}

console.log(isPalindrome("A man, a plan, a canal: Panama")); // true
```
###
5. Instruction:  Craft a short Python script to calculate the factorial of a non-negative integer.  Handle potential errors gracefully.
5. Input: <noinput>
5. Output:
```python
import math

def factorial(n):
  if not isinstance(n, int) or n < 0:
    raise ValueError("Input must be a non-negative integer.")
  return math.factorial(n)

try:
  print(factorial(5))  # Output: 120
  print(factorial(-1)) # Raises ValueError
except ValueError as e:
  print(f"Error: {e}")
```
###
6. Instruction:  Could you provide a Java method that efficiently searches fo

In [None]:
process_start = time.time()
instruction_data = []

In [15]:
print(results[1]['text'])

###
4. Instruction: Craft a Python function that calculates the factorial of a given non-negative integer.  The function should handle edge cases gracefully, such as inputting zero or a negative number.
4. Input:
<noinput>
4. Output:
```python
def factorial(n):
  """Calculates the factorial of a non-negative integer."""
  if n < 0:
    return "Factorial is not defined for negative numbers."
  elif n == 0:
    return 1
  else:
    result = 1
    for i in range(1, n + 1):
      result *= i
    return result

```
###
5. Instruction:  Develop a Java method that reverses a given string.
5. Input:
<noinput>
5. Output:
```java
public class StringReverser {
    public static String reverseString(String str) {
        return new StringBuilder(str).reverse().toString();
    }
}
```
###
6. Instruction:  Could you devise a SQL query to retrieve all customers from a database table named 'Customers' who live in 'California'?
6. Input:
<noinput>
6. Output:
```sql
SELECT * FROM Customers WHERE State =

In [26]:
# try to post process
import re

num_prompt_instructions=3
response = results[0]

In [42]:
raw_instructions = f"{num_prompt_instructions+1}. Instruction:" + response["text"]
raw_instructions = re.split("###", raw_instructions)

raw_instructions

['4. Instruction:',
 '\n4. Instruction:  Develop a JavaScript function that determines whether a given string is a palindrome.  Consider case-insensitive comparisons and ignore spaces.\n4. Input: "A man, a plan, a canal: Panama"\n4. Output: \n```javascript\nfunction isPalindrome(str) {\n  str = str.toLowerCase().replace(/[^a-z0-9]/g, "");\n  return str === str.split("").reverse().join("");\n}\n\nconsole.log(isPalindrome("A man, a plan, a canal: Panama")); // true\n```\n',
 '\n5. Instruction:  Craft a short Python script to calculate the factorial of a non-negative integer.  Handle potential errors gracefully.\n5. Input: <noinput>\n5. Output:\n```python\nimport math\n\ndef factorial(n):\n  if not isinstance(n, int) or n < 0:\n    raise ValueError("Input must be a non-negative integer.")\n  return math.factorial(n)\n\ntry:\n  print(factorial(5))  # Output: 120\n  print(factorial(-1)) # Raises ValueError\nexcept ValueError as e:\n  print(f"Error: {e}")\n```\n',
 '\n6. Instruction:  Could 

In [43]:
raw_instructions[1]

'\n4. Instruction:  Develop a JavaScript function that determines whether a given string is a palindrome.  Consider case-insensitive comparisons and ignore spaces.\n4. Input: "A man, a plan, a canal: Panama"\n4. Output: \n```javascript\nfunction isPalindrome(str) {\n  str = str.toLowerCase().replace(/[^a-z0-9]/g, "");\n  return str === str.split("").reverse().join("");\n}\n\nconsole.log(isPalindrome("A man, a plan, a canal: Panama")); // true\n```\n'

In [44]:
# remember to delete first element and last element
raw_instructions = raw_instructions[1:-1]

In [47]:
len(raw_instructions)

16

In [48]:
idx = 0
inst = raw_instructions[0]

In [50]:
idx += num_prompt_instructions + 1 
splitted_data = re.split(f"{idx}\.\s+(Instruction|Input|Output):", inst)

In [51]:
splitted_data

['\n',
 'Instruction',
 '  Develop a JavaScript function that determines whether a given string is a palindrome.  Consider case-insensitive comparisons and ignore spaces.\n',
 'Input',
 ' "A man, a plan, a canal: Panama"\n',
 'Output',
 ' \n```javascript\nfunction isPalindrome(str) {\n  str = str.toLowerCase().replace(/[^a-z0-9]/g, "");\n  return str === str.split("").reverse().join("");\n}\n\nconsole.log(isPalindrome("A man, a plan, a canal: Panama")); // true\n```\n']

In [53]:
inst = splitted_data[2].strip()
input = splitted_data[4].strip()
input = "" if input.lower() == "<noinput>" else input
output = splitted_data[6].strip()

In [55]:
inst, input, output

('Develop a JavaScript function that determines whether a given string is a palindrome.  Consider case-insensitive comparisons and ignore spaces.',
 '"A man, a plan, a canal: Panama"',
 '```javascript\nfunction isPalindrome(str) {\n  str = str.toLowerCase().replace(/[^a-z0-9]/g, "");\n  return str === str.split("").reverse().join("");\n}\n\nconsole.log(isPalindrome("A man, a plan, a canal: Panama")); // true\n```')

In [2]:
import json

seed_tasks_path="./prompts/seed_tasks.json"
seed_tasks = 
seed_tasks = [json.loads(seed) for seed in open(seed_tasks_path, "r")]

JSONDecodeError: Expecting value: line 2 column 1 (char 2)