# **LLMs for I-Os: A Functionalities and Applications Master Class - APIs**

In this notebook, we will going over how to interact with LLMs API. Although we will be focusing on OpenAI, code for Anthropic and Gemini are also included.

Before we start, you will need to get an API key from OpenAI. Please follow these steps:


1.   Go to: https://platform.openai.com/docs/overview
2.   Register an account
3.   During the registration process, you will be given an API keys. It is important that you save the API key in a secure place.
4.   For Google Colab, you can add a key into Secrets section in the sidebar for easy importing.



# **Interacting with LLM API**

In [1]:
# Installing packages
!pip install openai
!pip install anthropic
!import google.generativeai as genai

Collecting anthropic
  Downloading anthropic-0.49.0-py3-none-any.whl.metadata (24 kB)
Downloading anthropic-0.49.0-py3-none-any.whl (243 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m243.4/243.4 kB[0m [31m18.0 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: anthropic
Successfully installed anthropic-0.49.0
/bin/bash: line 1: import: command not found


In [2]:
# Loading general package
from pathlib import Path # If your data in a different folders
import pandas as pd

In [3]:
# Defining path
project_path = Path.cwd().parent

## **Interacting with OPENAI API**

In [4]:
# Loading library
from openai import OpenAI
from google.colab import userdata
import re
import os
import json

In [5]:
# Initializing client
os.environ['OPENAI_API_KEY'] = userdata.get('key')
client = OpenAI(
    api_key = os.environ.get('OPENAI_API_KEY'))

In [6]:
# Prompt Creation
prompt = f"""
Create a situational judgement test (SJT) item to assess the following skill:

# Skill
- Active Listening — Giving full attention to what other people are saying, taking time to understand the points being made, asking questions as appropriate, and not interrupting at inappropriate times.

# Output Format
Provide the response in JSON format with the following structure:

```json
{{
  "question": "<Insert SJT question here>",
  "options": {{
    "A": "<Insert Option A>",
    "B": "<Insert Option B>",
    "C": "<Insert Option C>",
    "D": "<Insert Option D>"
  }},
  "correct_answer": "<Insert Correct Answer Letter (A, B, C, or D)>",
  "rationale": "<Insert detailed rationale explaining why the correct answer is the best choice>"
}}
"""

In [7]:
# Calling OPENAI API
response = client.chat.completions.create(
            model="chatgpt-4o-latest", # Getting the latest chatgpt model
            messages=[{"role": "system", "content": prompt}],
            response_format={ "type": "json_object" },
        )

In [8]:
# Inspecting output
response_text = response.choices[0].message.content
print(response_text)

{
  "question": "You are in a team meeting where a colleague, Maria, is explaining a new proposal that she has spent a lot of time developing. As she is presenting, you realize you have a suggestion that might improve one of her points. What should you do?",
  "options": {
    "A": "Wait until Maria finishes her explanation, then share your suggestion and ask follow-up questions to clarify her perspective.",
    "B": "Interrupt Maria politely to quickly offer your suggestion before you forget it.",
    "C": "Start discussing your suggestion with another team member quietly while Maria is still presenting.",
    "D": "Tune out and check your phone quickly since the topic doesn't seem directly relevant to your role."
  },
  "correct_answer": "A",
  "rationale": "Option A demonstrates active listening by allowing the speaker to fully express their ideas without interruption, showing respect and attentiveness. Waiting until Maria finishes ensures you understand her entire point before resp

In [9]:
# Convert response_text to dictionary
response_dict = json.loads(response_text)

# Create DataFrame
result_df = pd.DataFrame([{
    "Question": response_dict["question"],
    "Option A": response_dict["options"]["A"],
    "Option B": response_dict["options"]["B"],
    "Option C": response_dict["options"]["C"],
    "Option D": response_dict["options"]["D"],
    "Correct Answer": response_dict["correct_answer"],
    "Rationale": response_dict["rationale"]
}])

In [10]:
result_df

Unnamed: 0,Question,Option A,Option B,Option C,Option D,Correct Answer,Rationale
0,"You are in a team meeting where a colleague, M...","Wait until Maria finishes her explanation, the...",Interrupt Maria politely to quickly offer your...,Start discussing your suggestion with another ...,Tune out and check your phone quickly since th...,A,Option A demonstrates active listening by allo...


## **Interacting with GEMINI API**

In [11]:
# Loading library
import google.generativeai as genai

In [12]:
# Initializing environment
os.environ['Gemini'] = userdata.get('GOOGLE_API_KEY')
genai.configure(api_key=os.environ.get('Gemini'))

In [13]:
# Gemini prompting
prompt = """Create a situational judgement test (SJT) item to assess the following skill.

Skill:
- Active Listening — Giving full attention to what other people are saying, taking time to understand the points being made, asking questions as appropriate, and not interrupting at inappropriate times.

Use this JSON schema:

SJT_Item = {
  "question": str,
  "options": {
    "A": str,
    "B": str,
    "C": str,
    "D": str
  },
  "correct_answer": str,
  "rationale": str
}

Return: SJT_Item
"""

In [14]:
# Calling model
model = genai.GenerativeModel("gemini-1.5-flash")
response = model.generate_content(prompt)

In [15]:
# Getting response text
response_text = response.text

# Extract JSON from response_text
cleaned_json_text = re.sub(r"```json|```", "", response_text).strip()

# Parse JSON into a dictionary
response_dict = json.loads(cleaned_json_text)

# Convert to Pandas DataFrame
result_df = pd.DataFrame([{
    "Question": response_dict["question"],
    "Option A": response_dict["options"]["A"],
    "Option B": response_dict["options"]["B"],
    "Option C": response_dict["options"]["C"],
    "Option D": response_dict["options"]["D"],
    "Correct Answer": response_dict["correct_answer"],
    "Rationale": response_dict["rationale"]
}])

In [16]:
result_df

Unnamed: 0,Question,Option A,Option B,Option C,Option D,Correct Answer,Rationale
0,"You are in a meeting with your team, discussin...",Interrupt Mark and politely remind him that Sa...,Let Mark continue to offer his suggestions and...,Discreetly signal to Sarah that she should hur...,Ignore Mark's interruption and focus solely on...,A,Option A demonstrates active listening by prio...


## **Interacting with Anthropic API**

In [17]:
# Loading library
import anthropic

In [18]:
# Initializing client
os.environ['Claude'] = userdata.get('claude_key')

client = anthropic.Anthropic(
    api_key = os.environ['Claude'],
)

In [19]:
# Getting response
message = client.messages.create(
        model="claude-3-5-sonnet-latest",
        max_tokens=1024,
        messages=[
            {"role": "user", "content": prompt}
        ]
    )

In [20]:
response_text = message.content[0].text

In [21]:
response_text

'{\n  "question": "You are in a team meeting where a colleague is presenting their concerns about a recent project. While they are speaking, you notice they seem hesitant and are frequently pausing. What would be the most effective way to demonstrate active listening in this situation?",\n  \n  "options": {\n    "A": "Immediately jump in with solutions whenever they pause to show you understand their concerns",\n    "B": "Maintain eye contact, nod occasionally, and wait until they finish before asking clarifying questions",\n    "C": "Take detailed notes and focus on your notepad to ensure you don\'t miss any important points",\n    "D": "Summarize their points while they\'re speaking to demonstrate you\'re following along"\n  },\n  \n  "correct_answer": "B",\n  \n  "rationale": "Option B best demonstrates active listening because it shows respect for the speaker by allowing them to complete their thoughts without interruption, while still showing engagement through non-verbal cues (ey

In [22]:
# Parse JSON into a dictionary
response_dict = json.loads(response_text)

# Convert to Pandas DataFrame
result_df = pd.DataFrame([{
    "Question": response_dict["question"],
    "Option A": response_dict["options"]["A"],
    "Option B": response_dict["options"]["B"],
    "Option C": response_dict["options"]["C"],
    "Option D": response_dict["options"]["D"],
    "Correct Answer": response_dict["correct_answer"],
    "Rationale": response_dict["rationale"]
}])

In [23]:
result_df

Unnamed: 0,Question,Option A,Option B,Option C,Option D,Correct Answer,Rationale
0,You are in a team meeting where a colleague is...,Immediately jump in with solutions whenever th...,"Maintain eye contact, nod occasionally, and wa...",Take detailed notes and focus on your notepad ...,Summarize their points while they're speaking ...,B,Option B best demonstrates active listening be...


# **How to run your own local LLM**

Before running local LLM, make sure that you enable GPU on Google Colab. Here's how to do it:


1.   On top-right bar, click on the upside-down triangle icon
2.   Click *Change runtime type*
3.   Click *L4 GPU*
4.   Click *Save*

In [24]:
#Intalling packages
!pip install vllm
!pip install bitsandbytes



In [57]:
import torch
import os
from vllm import LLM, SamplingParams
from google.colab import drive
import pandas as pd
from tqdm import tqdm
import re
import ast

In [26]:
# Defining model parameter

### See this list for all model - we recommened using bnb model since they quantized to be smaller
### Depending on your VRAM availability, you might have to use a smaller model
### Rule of thumbs: VRAM should be larger than amount of parameters (EX: 50VRAM for 48B model)
### https://docs.unsloth.ai/get-started/all-our-models
model_id = "unsloth/DeepSeek-R1-Distill-Qwen-14B-unsloth-bnb-4bit"

In [27]:
# Loading model

### During the model loading, you can check the amount of batch you can run concurrently.
llm = LLM(model=model_id,
          dtype=torch.bfloat16,
          quantization="bitsandbytes",
          load_format="bitsandbytes",
          max_model_len=3000,
          tensor_parallel_size= torch.cuda.device_count(),
          )

config.json:   0%|          | 0.00/1.98k [00:00<?, ?B/s]

INFO 03-28 17:33:12 [config.py:585] This model supports multiple tasks: {'score', 'reward', 'embed', 'generate', 'classify'}. Defaulting to 'generate'.
INFO 03-28 17:33:14 [llm_engine.py:241] Initializing a V0 LLM engine (v0.8.2) with config: model='unsloth/DeepSeek-R1-Distill-Qwen-14B-unsloth-bnb-4bit', speculative_config=None, tokenizer='unsloth/DeepSeek-R1-Distill-Qwen-14B-unsloth-bnb-4bit', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.bfloat16, max_seq_len=3000, download_dir=None, load_format=LoadFormat.BITSANDBYTES, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=bitsandbytes, enforce_eager=False, kv_cache_dtype=auto,  device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='xgrammar', reasoning_backend=None), observability_config=ObservabilityConfig(show_hidden_metrics=False, otlp_traces_endpoint=None, coll

tokenizer_config.json:   0%|          | 0.00/6.78k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/11.4M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/472 [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/236 [00:00<?, ?B/s]

INFO 03-28 17:33:18 [cuda.py:291] Using Flash Attention backend.
INFO 03-28 17:33:18 [parallel_state.py:954] rank 0 in world size 1 is assigned as DP rank 0, PP rank 0, TP rank 0
INFO 03-28 17:33:18 [model_runner.py:1110] Starting to load model unsloth/DeepSeek-R1-Distill-Qwen-14B-unsloth-bnb-4bit...
INFO 03-28 17:33:19 [loader.py:1155] Loading weights with BitsAndBytes quantization. May take a while ...
INFO 03-28 17:33:20 [weight_utils.py:265] Using model weights format ['*.safetensors']


model-00001-of-00003.safetensors:   0%|          | 0.00/4.94G [00:00<?, ?B/s]

model-00003-of-00003.safetensors:   0%|          | 0.00/4.35G [00:00<?, ?B/s]

model-00002-of-00003.safetensors:   0%|          | 0.00/4.97G [00:00<?, ?B/s]

INFO 03-28 17:34:01 [weight_utils.py:281] Time spent downloading weights for unsloth/DeepSeek-R1-Distill-Qwen-14B-unsloth-bnb-4bit: 41.279350 seconds


model.safetensors.index.json:   0%|          | 0.00/182k [00:00<?, ?B/s]

Loading safetensors checkpoint shards:   0% Completed | 0/3 [00:00<?, ?it/s]


Loading safetensors checkpoint shards:   0% Completed | 0/3 [00:00<?, ?it/s]


INFO 03-28 17:34:12 [model_runner.py:1146] Model loading took 13.3900 GB and 53.667233 seconds
INFO 03-28 17:34:15 [worker.py:267] Memory profiling takes 2.52 seconds
INFO 03-28 17:34:15 [worker.py:267] the current vLLM instance can use total_gpu_memory (22.16GiB) x gpu_memory_utilization (0.90) = 19.94GiB
INFO 03-28 17:34:15 [worker.py:267] model weights take 13.39GiB; non_torch_memory takes 0.04GiB; PyTorch activation peak memory takes 1.42GiB; the rest of the memory reserved for KV Cache is 5.09GiB.
INFO 03-28 17:34:16 [executor_base.py:111] # cuda blocks: 1738, # CPU blocks: 1365
INFO 03-28 17:34:16 [executor_base.py:116] Maximum concurrency for 3000 tokens per request: 9.27x
INFO 03-28 17:34:19 [model_runner.py:1442] Capturing cudagraphs for decoding. This may lead to unexpected consequences if the model is not static. To run the model in eager mode, set 'enforce_eager=True' or use '--enforce-eager' in the CLI. If out-of-memory error occurs during cudagraph capture, consider decre

Capturing CUDA graph shapes: 100%|██████████| 35/35 [00:47<00:00,  1.35s/it]

INFO 03-28 17:35:06 [model_runner.py:1570] Graph capturing finished in 47 secs, took 0.87 GiB
INFO 03-28 17:35:06 [llm_engine.py:447] init engine (profile, create kv cache, warmup model) took 53.91 seconds





In [59]:
# Setting sampling param
sampling_params = SamplingParams(temperature=0.7,
                                 max_tokens=3000,
                                 #top_p=1,
                                 #presence_penalty=0,
                                 #frequency_penalty=0,
                                 )

In [60]:
# Running individual call
## Prompt
prompt = f"""Create a situational judgement test (SJT) item to assess the following skill.

Skill:
- Active Listening — Giving full attention to what other people are saying, taking time to understand the points being made, asking questions as appropriate, and not interrupting at inappropriate times.

Use this JSON schema:

SJT_Item = {{
  "question": str,
  "options": {{
    "A": str,
    "B": str,
    "C": str,
    "D": str
  }},
  "correct_answer": str,
  "rationale": str
}}

Return: SJT_Item
"""

In [61]:
# Generate response
outputs = llm.generate(prompt, sampling_params)

# Extracting response
text_output = outputs[0].outputs[0].text

Processed prompts: 100%|██████████| 1/1 [00:35<00:00, 35.76s/it, est. speed input: 3.47 toks/s, output: 16.78 toks/s]


In [62]:
# Helper function to process json
def extract_json_from_output(llm_output):
    # ===== Pattern 1: ```json ... ```
    pattern_backticks = r"```json(.*?)```"
    match = re.search(pattern_backticks, llm_output, re.DOTALL)
    if match:
        json_str = match.group(1).strip()
        try:
            return json.loads(json_str)
        except json.JSONDecodeError:
            pass  # Try other patterns if this fails

    # ===== Pattern 2: ''' ... '''
    pattern_single_quotes = r"'''(.*?)'''"
    match = re.search(pattern_single_quotes, llm_output, re.DOTALL)
    if match:
        json_str = match.group(1).strip()
        # Clean escaped characters
        json_str = json_str.replace("\\n", "\n").replace("\\'", "'")
        try:
            return json.loads(json_str)
        except json.JSONDecodeError:
            try:
                # Try Python dict parsing
                data = ast.literal_eval(json_str)
                return data
            except Exception:
                pass

    # ===== Pattern 3: Raw JSON-like block
    pattern_curly = r"({.*})"
    match = re.search(pattern_curly, llm_output, re.DOTALL)
    if match:
        json_str = match.group(1).strip()
        try:
            return json.loads(json_str)
        except json.JSONDecodeError:
            try:
                data = ast.literal_eval(json_str)
                return data
            except Exception:
                pass

    # ===== If none of the patterns worked
    raise ValueError("No valid JSON block found in the LLM output.")

def sjt_item_to_df(sjt_dict):
    # Turn json format into df
    item = sjt_dict["SJT_Item"]
    df = pd.DataFrame([{
        "Question": item["question"],
        "Option A": item["options"]["A"],
        "Option B": item["options"]["B"],
        "Option C": item["options"]["C"],
        "Option D": item["options"]["D"],
        "Correct Answer": item["correct_answer"],
        "Rationale": item["rationale"]
    }])

    return df

In [63]:
sjt_item = extract_json_from_output(text_output)
result_df = sjt_item_to_df(sjt_item)

In [64]:
# Inspecting df
result_df

Unnamed: 0,Question,Option A,Option B,Option C,Option D,Correct Answer,Rationale
0,You are in a team meeting where a junior emplo...,"You say, 'Thank you for sharing, [Name]. I app...",You check your email while they're presenting ...,You interrupt them mid-presentation to suggest...,"You say, 'I don't think this will work because...",A,Option A demonstrates active listening by givi...


## Running multiple request

In [65]:
# Creating multiple skill df
skills = [
    {"skill": "Coordination", "definition": "Adjusting actions in relation to others' actions."},
    {"skill": "Instructing", "definition": "Teaching others how to do something."},
    {"skill": "Negotiation", "definition": "Bringing others together and trying to reconcile differences."},
    {"skill": "Active Learning", "definition": "Understanding the implications of new information for both current and future problem-solving and decision-making."},
    {"skill": "Active Listening", "definition": "Giving full attention to what other people are saying, taking time to understand the points being made, asking questions as appropriate, and not interrupting at inappropriate times."}
]

# Create DataFrame
skill_df = pd.DataFrame(skills)

In [66]:
# Creating result_df
result_df =pd.DataFrame(columns=[
        "Question",
        "Option A",
        "Option B",
        "Option C",
        "Option D",
        "Correct Answer",
        "Rationale"
    ])

In [67]:
# Running batch call
## Assuming that you have a df with list of skill you want assess, you can import the df and run this code
## Defining batch size
BATCH_SIZE = 30

## Looping through batches
for i in tqdm(range(0, len(skill_df), BATCH_SIZE), total=len(skill_df) // BATCH_SIZE + 1):
    batch_df = skill_df.iloc[i: i + BATCH_SIZE]

    # Creating batch prompts
    prompts = []
    for _, row in batch_df.iterrows():
        skill = row['skill']
        definition = row['definition']
        prompt = f"""Create a situational judgement test (SJT) item to assess the following skill and its definition.
        Skill:
        {skill}
        Definition:
        {definition}

        # Output:
        Provide the SJT item in strict JSON format, wrapped inside triple single quotes ('''). Do not include any explanation, comments, or additional text. Only return the JSON block.

        Example:
        '''
        {{
        "SJT_Item": {{
            "question": "Your question here",
            "options": {{
            "A": "Option A",
            "B": "Option B",
            "C": "Option C",
            "D": "Option D"
            }},
            "correct_answer": "A",
            "rationale": "Your rationale here"
        }}
        }}
        '''
        """

        prompts.append(prompt)

    # Generating responses in batch using vLLM
    outputs = llm.generate(prompts, sampling_params)

    # Creating a temporary DataFrame to store batch results
    for j, row in enumerate(batch_df.itertuples(index=False)):
        response = outputs[j].outputs[0].text
        sjt_item = extract_json_from_output(response)
        holder = sjt_item_to_df(sjt_item)

        # Concatenating the batch results to the main result DataFrame
        result_df = pd.concat([result_df, holder], ignore_index=True)

  0%|          | 0/1 [00:00<?, ?it/s]
Processed prompts:   0%|          | 0/5 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s][A
Processed prompts:  20%|██        | 1/5 [00:41<02:47, 41.76s/it, est. speed input: 4.24 toks/s, output: 4.33 toks/s][A
Processed prompts:  40%|████      | 2/5 [00:44<00:57, 19.05s/it, est. speed input: 7.73 toks/s, output: 8.37 toks/s][A
Processed prompts:  60%|██████    | 3/5 [00:48<00:23, 11.79s/it, est. speed input: 10.71 toks/s, output: 12.17 toks/s][A
Processed prompts:  80%|████████  | 4/5 [00:48<00:07,  7.40s/it, est. speed input: 14.50 toks/s, output: 16.35 toks/s][A
Processed prompts: 100%|██████████| 5/5 [00:49<00:00,  9.81s/it, est. speed input: 17.86 toks/s, output: 20.68 toks/s]
100%|██████████| 1/1 [00:49<00:00, 49.06s/it]


In [68]:
result_df

Unnamed: 0,Question,Option A,Option B,Option C,Option D,Correct Answer,Rationale
0,You and your team are working on a group proje...,You ignore the issue and hope it gets resolved...,You individually finish the incomplete tasks t...,You communicate with your team members to unde...,You reprimand your team members for not workin...,C,The correct answer is C because coordinating w...
1,You are training a new employee on how to use ...,Continue with the demonstration and assume the...,"Pause the demonstration, ask the employee if t...",Speed up the demonstration to cover all the st...,Assign them a task to practice on their own wi...,B,The best way to address the situation is to pa...
2,You are part of a team working on a project. Y...,Call a team meeting to discuss everyone's pers...,Take the aggressive approach yourself to show ...,Let the team members decide on their own witho...,Suggest a compromise that balances both approa...,A,Calling a team meeting to discuss everyone's p...
3,You are working on a project and receive new d...,Ignore the new data and continue with the orig...,Review the new data to understand its implicat...,Share the new data with the team and delegate ...,Report the new data to your supervisor and wai...,B,Active learning involves understanding the imp...
4,You are in a meeting with a colleague who is e...,Interrupt them to ask clarifying questions as ...,"Wait until they finish speaking, then ask a fe...",Nod occasionally and say 'I see' to show you'r...,Listen silently without giving any feedback un...,B,The best way to respond is to wait until your ...
