<a href="https://colab.research.google.com/github/sufiyansayyed19/LLM_Learning/blob/main/W2D2_2_Gradio.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Level 4: State & Memory (`gr.State`)

### The Problem: "The Amnesia Issue"
In a web app, every time you click a button, the Python function runs, finishes, and **destroys all its local variables**.

If you use a global variable (e.g., `history = []` at the top of your script), **every user** who opens your app will share that same variable. If User A chats, User B sees it. That is a security nightmare.

### The Solution: `gr.State()`
`gr.State()` is a special variable that lives **in the user's browser session**. It is private to that specific user and persists between clicks.

### The Project: "The Never-Ending Story" üìñ
We will build an app where the AI adds one sentence to a story every time you click "Next", referencing everything that happened before.

In [1]:
!pip install -q gradio litellm

[2K   [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m11.6/11.6 MB[0m [31m100.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m278.1/278.1 kB[0m [31m20.3 MB/s[0m eta [36m0:00:00[0m
[?25h

In [6]:
import gradio as gr
from litellm import completion
import os
from google.colab import userdata

# 1. Setup API
try:
    os.environ['OPENROUTER_API_KEY'] = userdata.get('OPEN_ROUTE_API_KEY')
except:
    pass

# --- 2. THE LOGIC ---
def add_story_segment(current_story_state):
    # 'current_story_state' comes from gr.State
    # It acts like a memory bank

    # Check if empty
    if not current_story_state:
        prompt = "Start a story about a futuristic Cyberpunk city in one short sentence."
        current_story_state = []
    else:
        # Create context from previous sentences
        context = " ".join(current_story_state)
        prompt = f"Here is the story so far: '{context}'. Write the next ONE short sentence of the story."

    # Call LLM
    response = completion(
        model="openrouter/openai/gpt-4o-mini",
        messages=[{"role": "user", "content": prompt}]
    )
    new_sentence = response.choices[0].message.content
    print(f"current_story_state before addding new sentence: {current_story_state}")
    # Update State
    current_story_state.append(new_sentence)

    print(f"current_story_state after addding new sentence: {current_story_state}")
    # Format for Display (Join with newlines)
    display_text = "\n\n".join(current_story_state)

    # Return TWO things:
    # 1. The visible text for the screen
    # 2. The hidden list for the State
    return display_text, current_story_state

# --- 3. THE UI ---
with gr.Blocks(theme="soft") as demo:
    gr.Markdown("# üìñ The Infinite Story Generator")
    gr.Markdown("Click the button to generate the next line of the story.")

    # VISIBLE OUTPUT
    story_display = gr.Textbox(label=" The Story So Far", lines=10)

    # BUTTON
    btn_next = gr.Button("‚úçÔ∏è Write Next Sentence")

    # HIDDEN MEMORY (The Magic Part)
    # We initialize it as an empty list []
    memory_bank = gr.State([])

    # WIRING
    # Input: The hidden memory_bank
    # Outputs: The visible textbox AND the hidden memory_bank (to update it)
    btn_next.click(
        fn=add_story_segment,
        inputs=memory_bank,
        outputs=[story_display, memory_bank]
    )

  with gr.Blocks(theme="soft") as demo:


In [7]:
import warnings
warnings.filterwarnings("ignore", category=UserWarning, module="pydantic")

demo.launch(debug=True)

It looks like you are running Gradio on a hosted Jupyter notebook, which requires `share=True`. Automatically setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. This cell will run indefinitely so that you can see errors and logs. To turn off, set debug=False in launch().
* Running on public URL: https://4848ff73f9352f2c0c.gradio.live

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)


current_story_state before addding new sentence: []
current_story_state after addding new sentence: ['Neon lights flickered through the smoggy haze of Neo-Tokyo, where the line between humanity and machinery blurred beneath the towering megacorporate spires.']
current_story_state before addding new sentence: ['Neon lights flickered through the smoggy haze of Neo-Tokyo, where the line between humanity and machinery blurred beneath the towering megacorporate spires.']
current_story_state after addding new sentence: ['Neon lights flickered through the smoggy haze of Neo-Tokyo, where the line between humanity and machinery blurred beneath the towering megacorporate spires.', 'In this concrete jungle, a rogue hacker named Kairo navigated the digital shadows, seeking the truth buried beneath layers of corporate deceit.']
current_story_state before addding new sentence: ['Neon lights flickered through the smoggy haze of Neo-Tokyo, where the line between humanity and machinery blurred beneath 



### üîë Key Takeaway
Notice the `.click()` method:
`outputs=[story_display, memory_bank]`

We are outputting to the **Screen** and back into the **State**. This is the standard "Read-Modify-Write" loop for Gradio apps.

## Level 5: Streaming & Markdown

### The Problem: "The Waiting Game"
By default, Python waits for the function to **finish completely** before sending *any* data to the UI. For an LLM writing a long essay, this looks like the app has frozen for 10 seconds.

### The Solution: Generators (`yield`)
Instead of `return`, we use Python's `yield` keyword. This keeps the connection open and sends data in chunks.

### The Project: "The Live Coder" üë®‚Äçüíª
We will build a bot that writes Python code. We will use **Markdown** rendering so the code looks beautiful (syntax highlighting), and **Streaming** so it looks like it's being typed live.


In [9]:
import gradio as gr
from litellm import completion
import os

# --- 1. THE LOGIC (GENERATOR) ---
def stream_code_solution(user_request):

    messages = [
        {"role": "system", "content": "You are an expert Python Coder. Respond in Markdown. Always use code blocks."},
        {"role": "user", "content": user_request}
    ]

    # A. Turn on Streaming in LiteLLM
    response_stream = completion(
        model="openrouter/openai/gpt-4o-mini",
        messages=messages,
        stream=True # <--- CRITICAL
    )

    partial_message = ""

    # B. Loop through the chunks
    for chunk in response_stream:
        content = chunk.choices[0].delta.content or ""
        partial_message += content

        # C. YIELD, DON'T RETURN
        # This pushes the current state to the UI, then pauses execution
        yield partial_message

# --- 2. THE UI ---
with gr.Blocks(theme="monochrome") as demo:
    gr.Markdown("# ‚ö° Live Coding Assistant")

    with gr.Row():
        inp = gr.Textbox(label="What code do you need?", placeholder="e.g. Write a Snake game in Python")
        btn = gr.Button("Generate Code", variant="primary")

    # We use 'gr.Markdown' instead of 'gr.Textbox' for the output
    # This renders bold text, headers, and code snippets properly!
    out = gr.Markdown(label="Live Output")

    # WIRING
    btn.click(stream_code_solution, inp, out)

  with gr.Blocks(theme="monochrome") as demo:


In [10]:
demo.launch(debug=True)

It looks like you are running Gradio on a hosted Jupyter notebook, which requires `share=True`. Automatically setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. This cell will run indefinitely so that you can see errors and logs. To turn off, set debug=False in launch().
* Running on public URL: https://f73a68af7301ad45cc.gradio.live

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)


Keyboard interruption in main thread... closing server.
Killing tunnel 127.0.0.1:7860 <> https://f73a68af7301ad45cc.gradio.live


