<a href="https://colab.research.google.com/github/sourcesync/kagglex_gemma/blob/gw%2Finitial/colab/Jorge_vertexai.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# This notebook demonstrates
* Colab notebook authenticating to a GCP account (with specific project_id and region )
* Tests prompting Gemini models via VertexAI
* Test QA on a PDF sent with prompt


# Install required packages

In [36]:
%%time
!pip install --upgrade google-cloud-aiplatform
!pip install google-colab
!pip install -q  loguru

CPU times: user 141 ms, sys: 23.9 ms, total: 165 ms
Wall time: 11.4 s


# Import required packages

In [37]:
from google.colab import userdata
from google.colab import auth
from google.cloud import aiplatform

# Authenticate to GCP account

In [38]:
%%time
PROJECT_ID = "occamai-205217" # Set this to a valid project id in your google cloud account
REGION = 'us-central1' # Set this to the GCP region you want to use
auth.authenticate_user(project_id=PROJECT_ID) # First time, this might prompt you to login to your google cloud account
!gcloud config get core/account # This will validate if your authentication was successful and echo the login email
!gcloud config get-value project # Verify by echo'ing the project config
!gcloud config set compute/region {REGION} # Set the region
!gcloud config get-value compute/region # Verify by echo'ing the region config

George.Williams@gmail.com
occamai-205217
Updated property [compute/region].
us-central1
CPU times: user 106 ms, sys: 16.3 ms, total: 122 ms
Wall time: 6.61 s


# Send a prompt and PDF To Gemini 1.5

In [39]:
%%time
import base64
import vertexai
from vertexai.generative_models import GenerativeModel, SafetySetting, Part
from loguru import logger
import json
def multiturn_generate_content():
    # JORGE - vertexai.init(project="generated-mote-434518-u9", location="us-central1")
    vertexai.init(project=PROJECT_ID, location=REGION)
    model = GenerativeModel(
        "gemini-1.5-flash-002",
        system_instruction=[LLM_instructions]
    )

    chat = model.start_chat()

    dataset_size = 0
    iteration = 0
    while dataset_size <100:

      # Sending call to api?
      chat = model.start_chat()
      response = chat.send_message(
              [document, prompt],
              generation_config=generation_config,
      )

      ## postprocessing
      q_a_data = response.text.replace('jsonl','').replace("```",'').strip().splitlines()
      dataset_size = len(q_a_data)
      iteration += 1
      # logger.info(f'Iteration {iteration}, Amount of q-a sets: {dataset_size} ')

      # Specify the file path
      file_path = "output.jsonl"

      # Open the file in write mode
      with open(file_path, "a") as f:
          # Iterate over the generator and write each item as a JSON string

          for item in q_a_data:
            # logger.debug(f'item: {item}')
            json_item = json.loads(item)
            json.dump(item, f)
            f.write("\n")
      # Reopen the file in read mode to check the size
      with open(file_path, "r") as f:
          lines = f.readlines()
          dataset_size = len(lines)

      logger.info(f"Current dataset size: {dataset_size}")

    # Outputting chat metadata?
    #logger.info( f""" \n MetaData:\n{response}  """)

    # Outputting Gemini Response
    logger.info(f"\n Gemini Ouput:\n{response.text}")

    # TODO:
    #   How do convert string data to json file?
    #     - Split on each newline or "{" then convert that one line into a jsonl then write to a file
    #     - What about the text before the json? Could use replace or strip function
    #   How do I get a full 1000 q-a sets
    #     - Probelm is output token limit. Must loop over calls and append to json file
    #     - Check length of json file or store it in dictionary convert it at end
    #   Prompt limit size
    #     - Needle probelm llm usually considers top and bottom as important
    #     - Chunk the pdf into chapters generate 200 q-a prompts for that chapter
    #     - Append it to the json  file  (20 chapters = 4000  Q-A)
    #
    return response

# Encoding pdf
encoded_pdf = base64.b64encode( open("/content/deflection-and-stiffness.pdf", "rb").read() ).decode("utf-8")

document = Part.from_data(data=base64.b64decode(encoded_pdf),
    mime_type="application/pdf",
)
prompt = """Create a diverse question-answer dataset on deflection and stiffness, covering basic to complex concepts. The dataset will train an engineering assistant bot, ensuring the questions reflect real-world scenarios engineers typically encounter. Generate 50 sets"""
LLM_instructions = """Generate high-quality question-answer sets from input files (PDFs, CSV, text) for finetuning a model following these steps:

1. **Input Processing**:
  - Extract key concepts from text, PDFs, or data files.
  - Focus on important themes, facts, and trends.

2. **Question Formulation**:
  - Create clear, concise, and diverse questions (fact-based, inferential, etc.).
  - Ensure each question is relevant to core ideas in the input.

3. **Answer Generation**:
  - Provide accurate, concise answers that align with the context of the input.
  - Avoid generic or overly long responses.

4. **Quality Control**:
  - Ensure full coverage of key points.
  - Review for consistency, and eliminate redundant questions.

5. **Output Format (JSONL)**:
  - Use the format:
   ```
   {"question": "Question", "answer": "Answer"}
   ```
  - Each Q-A pair should be on a new line.

6. **Special Cases**:
  - Extract key insights from tables, diagrams, and multilingual text.
  - Generate appropriate Q-A pairs based on these elements.

7. **Refinement**:
  - Review and refine Q-A sets for improvement in complexity and coverage."""

generation_config = {
    "max_output_tokens": 8192,
    "temperature": 1.6,
    "top_p": 0.95,
}


data = multiturn_generate_content()

[32m2024-10-05 02:26:49.687[0m | [1mINFO    [0m | [36m__main__[0m:[36mmultiturn_generate_content[0m:[36m50[0m - [1mCurrent dataset size: 315[0m
[32m2024-10-05 02:26:49.690[0m | [1mINFO    [0m | [36m__main__[0m:[36mmultiturn_generate_content[0m:[36m56[0m - [1m
 Gemini Ouput:
```jsonl
{"question": "What is the fundamental relationship between force and deflection in a linear spring?", "answer": "Force is directly proportional to deflection: F = ky, where k is the spring constant."}
{"question": "A cantilever beam is subjected to a point load at its free end.  How does the deflection at the free end change as the load increases?", "answer": "The deflection increases proportionally to the applied load, assuming the elastic limit isn't exceeded."}
{"question": "Explain the concept of 'spring rate' and provide its unit in the English system.", "answer": "Spring rate (k) is the change in force required per unit deflection.  The unit is lb/in or lbf/in."}
{"question": "A

CPU times: user 346 ms, sys: 60.7 ms, total: 407 ms
Wall time: 47.9 s
