## 1) Prereqs

- OS: macOS, Windows, or Linux

- Python: 3.9+

- Hardware: 8‚Äì16 GB RAM recommended. A GPU helps, but CPU is fine for small models.

## 2.1) Install Ollama

In [None]:
brew install ollama

## 2.2) Start the server

Usually auto-starts. If not use the following command

In [None]:
ollama serve

The above command will start the server, and we want to keep the server start, so open a new terminal window for next steps.
Keep ollama serve running in one terminal window and do your work in another.

## 2.3) Pull a chat model<br>
Pull command, downloads the Modelfile which is the file (or set of files) that contains all the information for a trained AI model to run.<br>
It contains:
- Weights / Parameters: Numbers that the model learned during training (the ‚Äúbrain‚Äù of the AI).

- Configuration / Architecture: How the model is built (layers, attention heads, hidden units, etc.).

- Vocabulary / Tokenizer info: Helps the model understand words, sentences, or tokens.


In [None]:
ollama pull llama3:8b

## Popular Chat Models You Could Use
1. OpenAI GPT (e.g., GPT-4, GPT-3.5)

- Pros: Best accuracy, handles complex queries, huge ecosystem (ChatGPT, API).

- Cons: Paid (metered by tokens), requires internet (cloud-based).

2. Anthropic Claude (Claude 3)

- Pros: Strong reasoning, long context windows, safer outputs.

- Cons: Paid, cloud-based only, no local running.

3. Google Gemini (Gemini 1.5)

- Pros: Good with multimodal (text + images), very large context sizes.

- Cons: Paid, cloud-based, newer ecosystem.

4. Mistral / Mixtral

- Pros: Open-source, lightweight, can run locally (good for experiments), gives speed and efficiency on modest hardware.

- Cons: Smaller than GPT-4, less consistent on complex reasoning.

5. LLaMA (Large Language Model Accelerator, Developed by Meta AI)

- Pros: Open-source, widely used, multiple sizes (7B, 13B, 70B) depending on your machine, you need high accuracy and you can afford more compute.

- Cons: Needs proper serving (Ollama, LM Studio, Hugging Face, etc.).

6. Ollama (Framework that handles running the model for us)

- Lets us run open-source models locally.
- Handles GPU/CPU optimization, easy API (localhost:11434).
- Perfect for quick prototyping without paying API fees.
- Easy Setup ‚Äî just ollama serve and the API is ready.
- Privacy ‚Äî IT queries stay on your laptop, not sent to cloud.
- Supports Many Models ‚Äî you can swap llama2, mistral, gemma, etc.
- Great for our Requirement ‚Äî since we wanted to test on 5‚Äì10 IT support scenarios, a lightweight local model is more than enough. LLaMA 3 model (from Meta).
- Our version is 8B (8 billion parameters ‚Üí a ‚Äúsmall/medium‚Äù size that runs on many modern laptops).

## 2.4) Quick terminal test
We can omit this test because our goal is to use the model through a program or browser, like the upcoming Python chatbot. <br>
Use ctrl+d to exit the chatbot.

In [None]:
ollama run llama3:8b

## 3) Create a Helpdesk ‚ÄúPersona‚Äù (Modelfile)
Save this code as Helpdesk.Modelfile <br>
With 'pull' we downloaded the base model from Ollama.

It contains the pretrained AI brain ‚Äî it can answer general questions, do reasoning, etc.

But it doesn‚Äôt know our specific IT helpdesk context (like company policies, email systems, or common IT issues). So it isn't enough?

In other words its fine-tuning / specialization. Creating our custom version of LLaMA, now specialized for IT support.

3.1) Modelfile

In [None]:
FROM llama3:8b \\start from base model
SYSTEM """
You are an IT Helpdesk assistant. Write clear, step-by-step instructions.
Prioritize safety and least-privilege actions. Ask 1 clarifying question if needed,
but provide an initial checklist first. Keep answers short and actionable.
If the issue is risky or needs admin rights, warn the user.
"""
#this is System Prompt which stays fixed, role based and are kind of rules. Actual questions by users are called User Prompt which changes everytime. Both keep the chatbot in its role.
PARAMETER temperature 0.3
#PARAMETER temperature 0.3 controls how creative vs. focused the chatbot is.Low temperature (0.1 ‚Äì 0.3) ‚Üí The bot is serious, consistent, and less random. Good for IT Helpdesk (clear, step-by-step answers). High temperature (0.7 ‚Äì 1.0) ‚Üí The bot is more creative, varied, and chatty. Good for marketing, brainstorming, or writing stories. Temperature = creativity knob. Lower = predictable, to-the-point. Higher = more variety, less predictable.
PARAMETER num_ctx 4096
#num_ctx 4096 = the model‚Äôs working memory, about 3,000 words. It lets our chatbot remember the current conversation and instructions without losing track. Bigger context = model can handle longer conversations or bigger documents. If the chat goes beyond this limit, the oldest messages get dropped and the model may ‚Äúforget‚Äù them. Smaller context = less memory, but uses less RAM/VRAM and runs faster. If you had num_ctx 8192 or 16384, you could feed in manuals, FAQs, or logs for even richer answers."Restart" ‚Üí 1 token. "the" ‚Üí 1 token. "com" + "puter" ‚Üí 2 tokens. "." ‚Üí 1 token

3.2) Create the model

In [None]:
ollama create helpdesk -f Helpdesk.Modelfile
# This command creates a new Ollama model called helpdesk using the instructions and configuration defined in Helpdesk.Modelfile. After this, you can run this helpdesk model locally to answer IT support questions or simulate a helpdesk agent.

3.3) Sanity check

In [None]:
ollama run helpdesk
# with this command we are confirming the model is sane enough to run before moving on to more detailed testing or deployment. With this command we can see the chatbot open and ready in our terminal window.
# But we dont want to run it in th terminal. So we create basic chat client in next step.

## 4) Basic Python Chat Client (through local REST API)
This calls http://localhost:11434/api/chat and streams the reply.<br>
Ollama can turn your model into a service on your own computer that you can talk to using normal web requests (like how websites work). Think of it like setting up a mini website for your AI model, but it‚Äôs private and only you can use it on your computer.<br>
Save this file as chat_client.py

In [None]:
# file: chat_client.py

# 1.We need to import requests because:it lets Python to talk  with websites or web-based APIs like Ollama. It simplifies sending and receiving HTTP requests. It handles JSON, headers, status codes etc.
import requests

# 2.The function defined here will chat with ollama
def chat_with_ollama(messages, model="helpdesk", stream=True):
  #message- what we want to say to the AI
  #model- which AI 'brain' to use (in our case helpdesk)
  #stream- true=live, false=all at once

# 3.Set server address and data
    url = "http://localhost:11434/api/chat" #at this url, the AI is linstening on our computer
    payload = { #payload which is a python dictionary is like a box which contains everything the AI needs, like model, message and stream
        "model": model,
        "messages": messages,
        "stream": stream
    }

# 4.Sending the request ie our box of information to the AI
    with requests.post(url, json=payload, stream=stream) as r: #Sending the request
        r.raise_for_status() #checks for erros

# 5. Read live streaming response, covering both cases: stream 'on' (small peices, like real conversation) or 'off' (whole reply)
        if stream:
            full = ""
            for line in r.iter_lines():
                if not line:
                    continue
                part = line.decode("utf-8")
                # Each line is a JSON chunk like: {"message":{"role":"assistant","content":"..."},"done":false}
                # We'll extract content safely:
                try:
                    import json
                    obj = json.loads(part)
                    chunk = obj.get("message", {}).get("content", "")
                    print(chunk, end="", flush=True)
                    full += chunk
                    if obj.get("done"):
                        break
                except Exception:
                    pass
            print()
            return full

# 6.Non-streaming response
        else:
            data = r.json()
            content = data["message"]["content"]
            print(content)
            return content

# 7.This is the Main block. system message tells AI how to behave and user message is what the user wants help with. AI replied according to both of these
if __name__ == "__main__":
    msgs = [
        {"role": "system", "content": "Follow the helpdesk rules."},
        {"role": "user", "content": "My email isn't sending‚Äîhelp!"}
    ]
    chat_with_ollama(msgs)


## 5.1) Simple Web UI (Using Streamlit)
Streamlit is an open-source Python library that makes it easy to create and share beautiful, custom web apps for machine learning and data science projects. <br>
Save this file as app.py

In [None]:
# file: app.
# 1. Imports
import streamlit as st #streamlit makes web app UI
import requests # requests sends messages to your local Ollama REST API.
import json #json handles the JSON data coming back from the model.

# 2. Page Setup
st.set_page_config(page_title="IT Helpdesk Chatbot for IT 7133", page_icon="üí¨") # Sets the web app‚Äôs title and emoji icon.
st.title("üí¨ IT Helpdesk (Ollama)") # Sets the web app‚Äôs title and emoji icon.

# 3. Sidebar Controls
model = st.sidebar.text_input("Model", value="helpdesk") #Lets you choose which model to use (default: helpdesk).
temperature = st.sidebar.slider("Temperature", 0.0, 1.0, 0.3, 0.1) #temperature = st.sidebar.slider("Temperature", 0.0, 1.0, 0.3, 0.1)

# 4. Chat history setup
# Keeps track of the whole conversation (system, user, assistant).
# Starts with a system role = ‚ÄúYou are an IT Helpdesk assistant.‚Äù
if "history" not in st.session_state:
    st.session_state.history = [{"role":"system","content":"You are an IT Helpdesk assistant."}]

# 5. Display past messages
# Loops through old messages and shows them in the chat UI.
for m in st.session_state.history:
    if m["role"] != "system":
        with st.chat_message(m["role"]):
            st.markdown(m["content"])

# 6. Handle new user input
# lets user type a new message and add it to the conversation history
user_msg = st.chat_input("Describe your IT issue...")
if user_msg:
    st.session_state.history.append({"role":"user","content":user_msg})
    with st.chat_message("user"):
        st.markdown(user_msg)

# 7. Send request to Ollama
# Sends the full chat history to the local Ollama API.
# Includes chosen model + temperature.
# Uses streaming, so responses arrive chunk by chunk.
    with st.chat_message("assistant"):
        url = "http://localhost:11434/api/chat"
        payload = {
            "model": model,
            "messages": st.session_state.history,
            "options": {"temperature": temperature},
            "stream": True
        }
        resp = requests.post(url, json=payload, stream=True)

# 8. Stream and display model‚Äôs reply
# As the model replies, each chunk is displayed in real-time.
# full collects the entire response.

        full = ""
        spot = st.empty()
        for line in resp.iter_lines():
            if not line:
                continue
            try:
                obj = json.loads(line.decode("utf-8"))
                chunk = obj.get("message", {}).get("content", "")
                full += chunk
                spot.markdown(full)
                if obj.get("done"):
                    break
            except Exception:
                pass

# 9. Save assistant‚Äôs reply to history
# Adds the model‚Äôs final reply to the conversation history.
# Ensures the chatbot ‚Äúremembers‚Äù for the next turn.
        st.session_state.history.append({"role":"assistant","content":full})

# In short, This code makes a web-based IT Helpdesk chatbot:
# You open it in your browser.
# Type your IT issue.
# The message goes to your local Ollama model (helpdesk) via REST API.
# The model streams its reply back in real-time.
# Streamlit shows the conversation history like a chat app.

## 5.2)Run the file

In [None]:
pip install streamlit requests
streamlit run app.py

The file will run and the chatbot will open up in the browser. <br>
Closing the browser tab doesn‚Äôt stop it, the server is still alive in the background. ctrl+c is used to stop the server.

## Safety Tips

- Temperature: start at 0.2‚Äì0.4 for consistent, checklist-style answers.

- Refuse risky steps: bake warnings into the System prompt (already done).

- Timebox outputs: ‚ÄúKeep it under 10 steps unless critical.‚Äù

- Ask 1 clarifying question only (prevents endless loops).

## Make It Feel ‚ÄúIT-Desk Real‚Äù

**Triage first: network? account? device? scope?** <br>
Triage = figure out what type of problem the user has before giving solutions.<br>
Ask yourself:
- Is it a network problem? (Wi-Fi, VPN)
- Is it an account problem? (login, password)
- Is it a device problem? (laptop, phone, printer)
- What is the scope? (affects one user, a department, or the whole company)
- This ensures the chatbot responds appropriately instead of giving generic answers.

**Least privilege: user steps first; admin steps only if necessary**

- Always start with actions a normal user can do.
- Only suggest admin-level actions (like changing system settings or server configs) if absolutely needed.
- Helps prevent mistakes and keeps advice safe and practical.

**Escalation hooks: ‚ÄúIf step X fails, open a ticket with code Y and attach logs Z.‚Äù**

- Real IT helpdesk has escalation procedures.
The chatbot should guide the user:
- If a fix fails ‚Üí don‚Äôt panic; follow a predefined escalation process.
- Include ticket codes, logs, screenshots for the next support tier.
Makes the bot behave like a real IT team member.

**Environment variables: point the bot to your help articles later (see RAG below)**

- You can give the chatbot links or references to your internal knowledge base (like help articles).
- Environment variables or configuration settings let the bot dynamically fetch relevant info instead of hardcoding everything.
- RAG (Retrieval-Augmented Generation): a method where the model retrieves relevant documents and uses them to answer questions more accurately.

## Nice Upgrades

- RAG (knowledge base): add your FAQs/KB PDFs and do retrieval ‚Üí pass summaries into the prompt.

- Ticketing integration: push a summary to Jira/ServiceNow via webhooks.

- Logging: store conversations + scores from eval.py to a CSV for tracking improvements.

- Custom models: create multiple Modelfiles (e.g., helpdesk_windows, helpdesk_mac).