<a href="https://colab.research.google.com/github/soltaniehha/Big-Data-Analytics-for-Business/blob/master/13-K8s/Run_an_LLM_Locally.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Leveraging GenAI Workflows

This notebook showcases a simple workflow for leveraging and running an open-source Large Language Model (LLM) such as Llama locally to analyze flight data and generate insights, culminating in a basic interactive application using Gradio.

First, we install transformers and gradio packages:

In [1]:
!pip install -qU transformers gradio

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m44.0/44.0 kB[0m [31m2.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m12.0/12.0 MB[0m [31m147.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m21.6/21.6 MB[0m [31m123.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m55.4/55.4 kB[0m [31m5.3 MB/s[0m eta [36m0:00:00[0m
[?25h

### Logging Into Hugging Face in Colab (Using Secrets)

To access and load Llama in a notebook environment, one must first authenticate their HuggingFace account.

1. Create a Hugging Face token at: https://huggingface.co/settings/tokens  
   – Give it a name and copy the `hf_...` token.

2. In Colab, open the left sidebar → **Secrets** → **Add secret**.  
   – Name it `HF_TOKEN` and paste the token.

3. If the model is gated, open its Hugging Face page (for this example go to https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct), **EXPAND the license agreement**, fill in the required contact info at the bottom, and click **“I Agree.”**

4. In your notebook, read the secret with `userdata.get("HF_TOKEN")` and log in.


In [2]:
from huggingface_hub import login
from google.colab import userdata

login(token = userdata.get('HF_TOKEN'))

The next cell loads a lightweight version of Llama onto the notebook. This is the model we will engage with for our simple data analysis task.

In [3]:
# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.2-1B-Instruct")
model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3.2-1B-Instruct")

tokenizer_config.json:   0%|          | 0.00/54.5k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/9.09M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/296 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/877 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/2.47G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/189 [00:00<?, ?B/s]

In [4]:
messages = [{"role": "user", "content": "Who are you?"}]

inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


I'm an artificial intelligence model known as Llama. Llama stands for "Large Language Model Meta AI."<|eot_id|>


Now that the model is loaded, we prepare the file we are interested in. The file is a year of aggregated flight data from a previous lecture.

In [6]:
import pandas as pd
path = '/content/2015-summary.csv'  # uploading file to notebook instance, path may vary
summary = pd.read_csv(path)
summary = summary.sort_values("count", ascending = False)
test = summary.head()

In [7]:
test

Unnamed: 0,DEST_COUNTRY_NAME,ORIGIN_COUNTRY_NAME,count
81,United States,United States,370002
214,United States,Canada,8483
115,Canada,United States,8399
139,United States,Mexico,7187
43,Mexico,United States,7140


Above, we load the aggregated data and sort it by the flight routes that are taken most frequently. Now, we share it with the loaded model and ask it to generate a brief report.

In [8]:
messages = [
    {"role": "user", "content":f"""
    The following includes some of the most common international flight routes within our airline. Please only list the international routes and briefly suggest what this means for our team's short term operations: {test}"""}
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens = 250)
print(f"\n{tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])}")

Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.



Based on the provided international flight routes, here are some potential implications for your team's short-term operations:

1. **Increased demand for US flights**: The presence of 370,000+ passengers from the US to Canada (count: 8483) and 7187 passengers from the US to Mexico (count: 139) indicates a higher demand for US flights. This may require your team to increase the frequency of US flights, consider adding new destinations, or adjusting the schedule of existing US flights.

2. **Need for more Canada flights**: The presence of 8399 passengers from Canada to the US (count: 115) suggests that there is a demand for Canada flights. Your team may need to increase the frequency of Canada flights, consider adding new destinations, or adjust the schedule of existing Canada flights.

3. **Mexico market opportunities**: The presence of 7140 passengers from Mexico to the US (count: 43) indicates a potential market opportunity for Mexico. Your team may need to explore new routes or freq

The model does a decent job summarizing the data based on the prompt we give it. That said, in a notebook, something like this is not very scalable. In order to scale up, we'll need to house the model in something different.

### Preparing the interactive Gradio application

In [11]:
import gradio as gr

# Global chat history: list of {"role": "...", "content": "..."}
chat_history = []

def predict_flight_insights(user_question):
    # Access global variables from previous cells
    global summary, tokenizer, model, chat_history

    # Construct the prompt for the current user turn
    full_prompt = f"""Given the following flight data:
{summary.to_string(index=False)}

User Query: {user_question}"""

    # Build messages: previous turns + current user message
    messages = chat_history + [
        {"role": "user", "content": full_prompt}
    ]

    # Tokenize with chat template
    inputs = tokenizer.apply_chat_template(
        messages,
        add_generation_prompt=True,
        tokenize=True,
        return_dict=True,
        return_tensors="pt",
    ).to(model.device)

    # Generate response from the model
    outputs = model.generate(**inputs, max_new_tokens=500)
    response_text = tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]).strip()

    # Update chat history with this turn
    chat_history.append({"role": "user", "content": full_prompt})
    chat_history.append({"role": "assistant", "content": response_text})

    return response_text

# Create the Gradio interface
iface = gr.Interface(
    fn=predict_flight_insights,
    inputs=gr.Textbox(lines=5, label="Enter your question about the flight data:"),
    outputs=gr.Textbox(lines=10, label="Model Response:"),
    title="Flight Data Insight Generator",
    description="Ask questions about the provided flight routes and get insights from the Llama model."
)

# Launch the app
iface.launch(share=True)


Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
* Running on public URL: https://0d5fa4454a07c1c76f.gradio.live

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)


