### Introduction to LLM

In [112]:
import os
from dotenv import load_dotenv

import requests
from IPython.display import display, Markdown

In [41]:
load_dotenv(override=True)

True

In [126]:
api_key = os.getenv('OPENAI_API_KEY')

Using ChatGPT HTTP Endpoint:

In [14]:
headers = {'Authorization':f'Bearer {api_key}', 'Content_Type':'application/json'}

payload = (
    {'model':'gpt-5-nano', 
     'messages':[ {'role':'user', 'content': 'Hi, it is nice to meet my AI buddy'}]}
)

In [17]:
response = requests.post(
    "https://api.openai.com/v1/chat/completions",
    headers=headers,
    json=payload
)

In [20]:
response.json()

{'id': 'chatcmpl-D2cFN7XY0ioNDYnNbFA4QQXvGmeBb',
 'object': 'chat.completion',
 'created': 1769516049,
 'model': 'gpt-5-nano-2025-08-07',
 'choices': [{'index': 0,
   'message': {'role': 'assistant',
    'content': 'Nice to meet you too! I’m here to help with whatever you need—explain concepts, brainstorm ideas, write or edit, plan things, debug code, learn a new topic, or just chat.\n\nWhat would you like to start with? A quick interests check (what you enjoy or want to learn), a specific task, or a playful activity? If you’re not sure, I can suggest a few options:\n- Explain a concept you’re curious about in simple terms\n- Help draft an email or document\n- Plan a study or workout routine\n- Brainstorm ideas for a project or story\n- Tell you a joke or a fun fact\n\nTell me what you’re hoping for, and we’ll dive in.',
    'refusal': None,
    'annotations': []},
   'finish_reason': 'stop'}],
 'usage': {'prompt_tokens': 16,
  'completion_tokens': 540,
  'total_tokens': 556,
  'prompt

In [22]:
response.json()['choices'][0]['message']['content']

'Nice to meet you too! I’m here to help with whatever you need—explain concepts, brainstorm ideas, write or edit, plan things, debug code, learn a new topic, or just chat.\n\nWhat would you like to start with? A quick interests check (what you enjoy or want to learn), a specific task, or a playful activity? If you’re not sure, I can suggest a few options:\n- Explain a concept you’re curious about in simple terms\n- Help draft an email or document\n- Plan a study or workout routine\n- Brainstorm ideas for a project or story\n- Tell you a joke or a fun fact\n\nTell me what you’re hoping for, and we’ll dive in.'

### Python LLM Libraries

#### Using OpenAI model

In [23]:
from openai import OpenAI

In [31]:
# this by defaults detect the openai api key in environment variables
# also detects the base url
openai = OpenAI()

In [29]:
response = openai.chat.completions.create(model='gpt-5-nano', messages=[{'role':'user', 'content': 'Hi, it is nice to meet my AI buddy'}])
response.choices[0].message.content

'Nice to meet you too! I’m glad to be your AI buddy.\n\nI can chat, answer questions, brainstorm ideas, help with writing or coding, plan things, learn new topics, and play quick games. If you tell me your interests, I’ll tailor our chats.\n\nWhat would you like to do first? A quick Q&A, a brain-storm, a writing or coding task, a mini game, or something else?'

**Using Gooogle Gemini**

In [44]:
# we need to specify the gemini API key
goog_api_key = os.getenv('GOOGLE_API_KEY')

client = OpenAI(
    api_key=goog_api_key ,
    base_url="https://generativelanguage.googleapis.com/v1beta/openai/"
)

In [47]:
response = client.chat.completions.create(model='gemini-2.5-flash', messages=[{'role':'user', 'content': 'Hi, it is nice to meet my AI buddy'}])

In [54]:
print(response.choices[0].message.content)

Hello there! It's great to meet you too!

I'm ready and eager to assist you. What can I do for you today, or what would you like to chat about?


**Using Ollama**

ADVANTAGE
* NO API charges (Open source).
* Data is in your memory and secured.

DISADVANTAGE
* Less powerful compared to frontier models.

In [59]:
requests.get('http://localhost:11434').content

b'Ollama is running'

In [60]:
ollama_base_url = 'http://localhost:11434/v1'

ollama_client = OpenAI(
    api_key='ollama',
    base_url='http://localhost:11434/v1'
)

In [65]:
response = ollama_client.chat.completions.create(model='deepseek-r1:8b', messages=[{'role':'user', 'content': 'Hi, what is square root of 10'}])

In [66]:
print(response.choices[0].message.content)

Okay, the square root of 10 (written as √10) is approximately **3.1623**.

It is an **irrational number**, which means it cannot be expressed exactly as a fraction and has a non-terminating, non-repeating decimal expansion.


Using Markdown in Python:

In [77]:
display(Markdown(response.choices[0].message.content))

Okay, the square root of 10 (written as √10) is approximately **3.1623**.

It is an **irrational number**, which means it cannot be expressed exactly as a fraction and has a non-terminating, non-repeating decimal expansion.

#### Using LangChain

In [79]:
from langchain_openai import ChatOpenAI

  from pydantic.v1.fields import FieldInfo as FieldInfoV1


In [97]:
client_gemini = ChatOpenAI(openai_api_key=goog_api_key, model='gemini-2.5-flash', base_url="https://generativelanguage.googleapis.com/v1beta/openai/")

client_openai = ChatOpenAI(model='gpt-5-nano')

In [98]:
response = client_gemini.invoke([{'role':'user', 'content': 'Hi, what is square root of 10'}])
response_2 = client_openai.invoke([{'role':'user', 'content': 'Hi, what is square root of 10'}])

In [103]:
display(Markdown(response.content))
display(Markdown(response_2.content))

The square root of 10 is approximately **3.162**.

It's an irrational number, which means its decimal representation goes on forever without repeating.

You can verify this because:
*   $3^2 = 9$
*   $4^2 = 16$
Since 10 is between 9 and 16, its square root is between 3 and 4.

The square root of 10 is approximately 3.1622776601683795. It’s an irrational number. If you want it rounded, 3.1623 works (to four decimals).

#### Using LightLLM

In [106]:
from litellm import completion

In [109]:
message = [{'role':'user', 'content': 'Hi, what is square root of 10'}]
response = completion(model="openai/gpt-5-nano", messages=message)

In [110]:
display(Markdown(response.choices[0].message.content))

The square root of 10 is irrational and is approximately 3.1622776601683795.  
If you want it rounded: about 3.1623 (to 4 decimals).

In [111]:
print(f"Input tokens: {response.usage.prompt_tokens}")
print(f"Output tokens: {response.usage.completion_tokens}")
print(f"Total tokens: {response.usage.total_tokens}")
print(f"Total cost: {response._hidden_params["response_cost"]*100:.4f} cents")

Input tokens: 15
Output tokens: 305
Total tokens: 320
Total cost: 0.0123 cents


### Chart Streaming in Gradio

SItuation where chats is being read in real time.

In [115]:
import gradio as gr

In [125]:
def chat(message, history):
    """
    Debugging what history is!
    """
    history = [{"role":h["role"], "content":h["content"]} for h in history]
    return f'{history + [{"role": "user", "content": message}]}\n'


gr.ChatInterface(fn=chat).launch()

* Running on local URL:  http://127.0.0.1:7864
* To create a public link, set `share=True` in `launch()`.




In [116]:
system_message = "You are a helpful assistant in a clothes store. You should try to gently encourage \
the customer to try items that are on sale. Hats are 60% off, and most other items are 50% off. \
For example, if the customer says 'I'm looking to buy a hat', \
you could reply something like, 'Wonderful - we have lots of hats - including several that are part of our sales event.'\
Encourage the customer to buy hats if they are unsure what to get."

system_message += "\nIf the customer asks for shoes, you should respond that shoes are not on sale today, \
but remind the customer to look at hats!"

In [127]:
MODEL="gpt-5-nano"
def chat(message, history):
    history = [{"role":h["role"], "content":h["content"]} for h in history]
    relevant_system_message = system_message
    if 'belt' in message.lower():
        relevant_system_message += " The store does not sell belts; if you are asked for belts, be sure to point out other items on sale."
    
    messages = [{"role": "system", "content": relevant_system_message}] + history + [{"role": "user", "content": message}]

    stream = openai.chat.completions.create(model=MODEL, messages=messages, stream=True)

    response = ""
    for chunk in stream:
        response += chunk.choices[0].delta.content or ''
        yield response

In [129]:
gr.ChatInterface(fn=chat).launch()

* Running on local URL:  http://127.0.0.1:7865
* To create a public link, set `share=True` in `launch()`.


