# <span style="font-width:bold; font-size: 3rem; color:#1EB182;"><img src="images/icon102.png" width="38px"></img> **Hopsworks Feature Store** </span><span style="font-width:bold; font-size: 3rem; color:#333;">- Part 04: LLM Inference</span>
    
## 🗒️ This notebook is divided into the following sections:
1. Connect to the Hopsworks AI Lakehouse
2. Retrieve the feature view and predictor.
3. Load the LLM.
4. Configure langchain and the context manager
5. Ask questions

In [1]:
!pip install -r requirements.txt --quiet

In [2]:
import joblib

from functions.llm_chain import (
    load_model, 
    get_llm_chain, 
    generate_response,
)

## Connect to Hopsworks

In [3]:
# connect to Hopsworks

import hopsworks

project = hopsworks.login()

fs = project.get_feature_store()

2024-09-19 13:54:24,439 INFO: Python Engine initialized.

Logged in to project, explore it here https://hopsworks0.logicalclocks.com/p/119


## Get Departures Feature View and Predictor

In [4]:
# Retrieve the 'air_quality_fv' feature view
feature_view = fs.get_feature_view(
    name='departures_agg',
    version=1,
)

# Initialize batch scoring
feature_view.init_batch_scoring(1)

In [5]:
# Retrieve model serving
ms = project.get_model_serving()

# Retrieve bitcoin predictor
model_deployment = ms.get_deployment("latedeparturemodel")

## ⬇️ LLM Loading

In [6]:
# Load the LLM and its corresponding tokenizer.
model_llm, tokenizer = load_model()




Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


2024-09-19 13:59:09,864 INFO: We will use 90% of the memory on device 0 for storing the model, and 10% for the buffer to avoid OOM. You can set `max_memory` in to a higher value to use more memory (at your own risk).


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]




In [7]:
!nvidia-smi

Thu Sep 19 13:59:22 2024       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.90.07              Driver Version: 550.90.07      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|   0  Tesla V100S-PCIE-32GB          On  |   00000000:00:05.0 Off |                    0 |
| N/A   44C    P0             41W /  250W |   28964MiB /  32768MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                

## ⛓️ LangChain

In [8]:
# Create and configure a language model chain.
llm_chain = get_llm_chain(
    tokenizer,
    model_llm,
)

## 🧬 Model Inference


In [9]:
QUESTION = "Hi! who are you?"

response = generate_response(
    QUESTION,
    feature_view,
    model_deployment,
    model_llm,
    tokenizer,
    llm_chain,
    verbose=True,
)

print(response)

🗓️ Today's date: Thursday, 2024-09-19
📖 

Hello! I am a route planner assistant that can help you with analyzing historical public transport departures in Stockholm. I can provide information about the frequency and number of late departures to help you plan your journey better.


In [10]:
QUESTION = "Are there expected late departures today?"

response = generate_response(
    QUESTION,
    feature_view,
    model_deployment,
    model_llm,
    tokenizer,
    llm_chain,
    verbose=True,
)

print(response)

Finished: Reading data from Hopsworks, using Hopsworks Feature Query Service (1.07s) 
Scheduled time: 2024-09-19
Departure key: 9295-2024-09-19T04:34:57+02:00
🗓️ Today's date: Thursday, 2024-09-19
📖 {'lateness_probability': 0.00020051002502441406, 'scheduled_time': '2024-09-19'}

Yes, there is a possibility of late departures today. However, the probability is quite low at 0.02005%. It is still recommended to leave a little earlier to account for any unforeseen delays.


In [None]:
QUESTION = "Were there expected late departures yesterday?"

response = generate_response(
    QUESTION,
    feature_view,
    model_deployment,
    model_llm,
    tokenizer,
    llm_chain,
    verbose=True,
)

print(response)

In [11]:
QUESTION = "How many late departures are from 2024-09-10 till 2024-09-19?"

response = generate_response(
    QUESTION,
    feature_view,
    model_deployment,
    model_llm,
    tokenizer,
    llm_chain,
    verbose=True,
)

print(response)

Finished: Reading data from Hopsworks, using Hopsworks Feature Query Service (0.69s) 
🗓️ Today's date: Thursday, 2024-09-19
📖 Departures information between 2024-09-10 and 2024-09-19:
Date: 2024-09-18 02:31:00+00:00; Expected: 2024-09-18 02:31:00+00:00; Number of issues: 0.0; Number of late departures: 0.0; Number of deviations: 0.0; Deviations severity: 0.0;
Date: 2024-09-18 02:53:30+00:00; Expected: 2024-09-18 02:53:43+00:00; Number of issues: 0.0; Number of late departures: 0.0; Number of deviations: 0.0; Deviations severity: 0.0;
Date: 2024-09-18 09:04:00+00:00; Expected: 2024-09-18 09:05:16+00:00; Number of issues: 0.0; Number of late departures: 0.0; Number of deviations: 0.0; Deviations severity: 0.0;
Date: 2024-09-18 09:06:00+00:00; Expected: 2024-09-18 09:09:52+00:00; Number of issues: 0.0; Number of late departures: 0.0; Number of deviations: 0.0; Deviations severity: 0.0;
Date: 2024-09-18 21:53:00+00:00; Expected: 2024-09-18 21:54:21+00:00; Number of issues: 0.0; Number of l

In [12]:
QUESTION = "How many departure issues occurred last month?"

response = generate_response(
    QUESTION, 
    feature_view, 
    model_deployment,
    model_llm,
    tokenizer,
    llm_chain,
    verbose=True,
)

print(response)

Finished: Reading data from Hopsworks, using Hopsworks Feature Query Service (0.58s) 
🗓️ Today's date: Thursday, 2024-09-19
📖 Not information found about Departures on this date range

Last month, there were no reported departure issues on the public transport in Stockholm.


In [None]:
QUESTION = "How many departure deviations were planned for last month?"

response = generate_response(
    QUESTION, 
    feature_view, 
    model_deployment,
    model_llm,
    tokenizer,
    llm_chain,
    verbose=True,
)

print(response)

In [None]:
QUESTION = "How long will take me to commute if I leave now?"

response = generate_response(
    QUESTION,
    feature_view,
    model_deployment,
    model_llm,
    tokenizer,
    llm_chain,
    verbose=True,
)

print(response)

In [None]:
QUESTION = "At what time should I leave if I want to reach before 10:00 to my office?"

response = generate_response(
    QUESTION,
    feature_view,
    model_deployment,
    model_llm,
    tokenizer,
    llm_chain,
    verbose=True,
)

print(response)