## <span style='color:#ff5f27'> 📝 Imports

In [1]:
# !pip install -r requirements.txt --quiet

In [2]:
import joblib

from functions.llm_chain import (
    load_model, 
    get_llm_chain, 
    generate_response,
)

## <span style="color:#ff5f27;"> 🔮 Connect to Hopsworks Feature Store </span>

In [3]:
import hopsworks

project = hopsworks.login()

fs = project.get_feature_store() 

Connected. Call `.close()` to terminate connection gracefully.

Logged in to project, explore it here https://snurran.hops.works/p/5242
Connected. Call `.close()` to terminate connection gracefully.


## <span style="color:#ff5f27;"> ⚙️ Feature View Retrieval</span>

In [4]:
# Retrieve the 'air_quality_fv' feature view
feature_view = fs.get_feature_view(
    name='air_quality_fv',
    version=1,
)

# Initialize batch scoring
feature_view.init_batch_scoring(1)

## <span style="color:#ff5f27;">🪝 Retrieve AirQuality Model from Model Registry</span>

In [5]:
# Retrieve the model registry
mr = project.get_model_registry()

# Retrieve the 'air_quality_xgboost_model' from the model registry
retrieved_model = mr.get_model(
    name="air_quality_xgboost_model",
    version=1,
)

# Download the saved model artifacts to a local directory
saved_model_dir = retrieved_model.download()

Connected. Call `.close()` to terminate connection gracefully.
Downloading model artifact (0 dirs, 6 files)... DONE

In [6]:
# Load the XGBoost regressor model and label encoder from the saved model directory
model_air_quality = joblib.load(saved_model_dir + "/xgboost_regressor.pkl")
encoder = joblib.load(saved_model_dir + "/label_encoder.pkl")

# Display the retrieved XGBoost regressor model
model_air_quality

## <span style='color:#ff5f27'>⬇️ LLM Loading

In [7]:
# Load the LLM and its corresponding tokenizer.
model_llm, tokenizer = load_model()

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


2024-05-14 20:00:29,253 INFO: We will use 90% of the memory on device 0 for storing the model, and 10% for the buffer to avoid OOM. You can set `max_memory` in to a higher value to use more memory (at your own risk).


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]



## <span style='color:#ff5f27'>⛓️ LangChain

In [8]:
# Create and configure a language model chain.
llm_chain = get_llm_chain(
    tokenizer,
    model_llm,
)



## <span style='color:#ff5f27'>🧬 Model Inference


In [9]:
QUESTION = "Who are you?"

response = generate_response(
    QUESTION,
    feature_view,
    model_air_quality,
    encoder,
    model_llm,
    tokenizer,
    llm_chain,
    verbose=True,
)

print(response)

🗓️ Today's date: Tuesday, 2024-05-14
📖 
I am an expert in air quality analysis.


In [10]:
QUESTION1 = "What was the air quality from 2024-01-10 till 2024-01-14 in New York?"

response1 = generate_response(
    QUESTION1, 
    feature_view, 
    model_air_quality, 
    encoder,
    model_llm, 
    tokenizer,
    llm_chain,
    verbose=True,
)

print(response1)

Finished: Reading data from Hopsworks, using ArrowFlight (8.25s) 
🗓️ Today's date: Tuesday, 2024-05-14
📖 Air Quality Measurements for New York:
Date: 2024-01-10; Air Quality: 7.2
Date: 2024-01-11; Air Quality: 5.9
Date: 2024-01-12; Air Quality: 10.8
Date: 2024-01-13; Air Quality: 5.9
Date: 2024-01-14; Air Quality: 5.1
The air quality in New York from January 10th to January 14th was generally moderate. The measurements show that the air quality fluctuated during this period, with a high of 10.8 on January 12th and lows of 5.1 and 5.9 on different days. Overall, it's a good time to be outside and enjoy the fresh air, but you may want to avoid strenuous outdoor activities on the 12th.


In [11]:
QUESTION11 = "When and what was the maximum air quality from 2024-01-10 till 2024-01-14 in New York?"

response11 = generate_response(
    QUESTION11, 
    feature_view, 
    model_air_quality,
    encoder,
    model_llm,
    tokenizer,
    llm_chain,
    verbose=True,
)

print(response11)

Finished: Reading data from Hopsworks, using ArrowFlight (8.58s) 
🗓️ Today's date: Tuesday, 2024-05-14
📖 Air Quality Measurements for New York:
Date: 2024-01-10; Air Quality: 7.2
Date: 2024-01-11; Air Quality: 5.9
Date: 2024-01-12; Air Quality: 10.8
Date: 2024-01-13; Air Quality: 5.9
Date: 2024-01-14; Air Quality: 5.1
The maximum air quality during that period in New York was on January 12th with an air quality of 10.8. This level is considered to be unhealthy for sensitive groups, and it is advisable to limit outdoor activities.


In [12]:
QUESTION12 = "When and what was the minimum air quality from 2024-01-10 till 2024-01-14 in New York?"

response12 = generate_response(
    QUESTION12, 
    feature_view,  
    model_air_quality, 
    encoder,
    model_llm, 
    tokenizer,
    llm_chain,
    verbose=True,
)

print(response12)

Finished: Reading data from Hopsworks, using ArrowFlight (7.79s) 
🗓️ Today's date: Tuesday, 2024-05-14
📖 Air Quality Measurements for New York:
Date: 2024-01-10; Air Quality: 7.2
Date: 2024-01-11; Air Quality: 5.9
Date: 2024-01-12; Air Quality: 10.8
Date: 2024-01-13; Air Quality: 5.9
Date: 2024-01-14; Air Quality: 5.1
The minimum air quality during that period in New York was on January 14th, with an air quality of 5.1. This indicates that the air quality on that day was quite good, and it would be safe for you to go for a walk or engage in outdoor activities.


In [13]:
QUESTION2 = "What was the air quality yesterday in London?"

response2 = generate_response(
    QUESTION2,
    feature_view,
    model_air_quality,
    encoder,
    model_llm,
    tokenizer,
    llm_chain,
    verbose=True,
)

print(response2)

Finished: Reading data from Hopsworks, using ArrowFlight (7.90s) 
🗓️ Today's date: Tuesday, 2024-05-14
📖 Air Quality Measurements for London:
Date: 2024-05-13; Air Quality: 10.5
Yesterday, the air quality in London was safe for most people. However, it might have been slightly uncomfortable for those with respiratory issues.


In [14]:
QUESTION = "What was the air quality like last week in London?"

response = generate_response(
    QUESTION,
    feature_view, 
    model_air_quality,
    encoder,
    model_llm,
    tokenizer,
    llm_chain,
    verbose=True,
)

print(response)

Finished: Reading data from Hopsworks, using ArrowFlight (8.33s) 
🗓️ Today's date: Tuesday, 2024-05-14
📖 Air Quality Measurements for London:
Date: 2024-05-07; Air Quality: 14.2
Date: 2024-05-08; Air Quality: 15.1
Date: 2024-05-09; Air Quality: 23.4
Date: 2024-05-10; Air Quality: 26.2
Date: 2024-05-11; Air Quality: 23.1
Date: 2024-05-12; Air Quality: 16.5
Date: 2024-05-13; Air Quality: 10.5
Date: 2024-05-14; Air Quality: 5.9
Last week in London, the air quality was generally moderate to good. The readings for the days you provided show that the air quality was improving over the week, with levels ranging from 5.9 on May 14th to 14.2 on May 7th. Overall, the air quality was safe for most activities, but it would be advisable to check for any local advisories before engaging in outdoor activities.


In [15]:
QUESTION3 = "What will the air quality be like in London in 2024-05-20?"

response3 = generate_response(
    QUESTION3, 
    feature_view, 
    model_air_quality,
    encoder,
    model_llm, 
    tokenizer,
    llm_chain,
    verbose=True,
)

print(response3)

Finished: Reading data from Hopsworks, using ArrowFlight (7.92s) 
🗓️ Today's date: Tuesday, 2024-05-14
📖 Air Quality Measurements for London:
Date: 2024-05-14; Air Quality: 5.9
Date: 2024-05-15; Air Quality: 10.88
Date: 2024-05-16; Air Quality: 11.99
Date: 2024-05-17; Air Quality: 11.6
Date: 2024-05-18; Air Quality: 11.56
Date: 2024-05-19; Air Quality: 11.52
Date: 2024-05-20; Air Quality: 11.52
The air quality in London on 2024-05-20 is expected to be at a moderate level, with an Air Quality index of 11.52. This is within the safe range, but it might not be the best day for outdoor activities, especially if you have respiratory issues. It would be advisable to keep an eye on the air quality and possibly choose a different day for more strenuous activities.


In [16]:
QUESTION4 = "What will the air quality be like in Chicago tomorrow?"

response4 = generate_response(
    QUESTION4, 
    feature_view, 
    model_air_quality, 
    encoder,
    model_llm, 
    tokenizer,
    llm_chain,
    verbose=True,
)

print(response4)

Finished: Reading data from Hopsworks, using ArrowFlight (7.62s) 
🗓️ Today's date: Tuesday, 2024-05-14
📖 Air Quality Measurements for Chicago:
Date: 2024-05-14; Air Quality: 15.0
Date: 2024-05-15; Air Quality: 8.76
Tomorrow, the air quality in Chicago is expected to be significantly better than today. The air quality measurement for tomorrow, based on our data, is 8.76. This level indicates that the air quality is considered good, and it is safe for outdoor activities such as walking or cycling.


In [17]:
QUESTION5 = "What will the air quality be like in London next Sunday?"

response5 = generate_response(
    QUESTION5, 
    feature_view, 
    model_air_quality, 
    encoder,
    model_llm, 
    tokenizer, 
    llm_chain,
    verbose=True,
)

print(response5)

Finished: Reading data from Hopsworks, using ArrowFlight (7.82s) 
🗓️ Today's date: Tuesday, 2024-05-14
📖 Air Quality Measurements for London:
Date: 2024-05-14; Air Quality: 5.9
Date: 2024-05-15; Air Quality: 10.88
Date: 2024-05-16; Air Quality: 11.99
Date: 2024-05-17; Air Quality: 11.6
Date: 2024-05-18; Air Quality: 11.56
Date: 2024-05-19; Air Quality: 11.52
Based on the air quality measurements for London, next Sunday, 2024-05-19, the air quality is expected to be at 11.52. This level falls within the moderate range, which means it is safe for most people to go outside, but those with respiratory issues may want to limit their exposure. It is advisable to check for any local alerts or updates before planning any outdoor activities.


In [18]:
QUESTION7 = "What will the air quality be like on May 18 in London?"

response7 = generate_response(
    QUESTION7,
    feature_view,
    model_air_quality,
    encoder,
    model_llm,
    tokenizer,
    llm_chain,
    verbose=True,
)

print(response7)

Finished: Reading data from Hopsworks, using ArrowFlight (7.65s) 
🗓️ Today's date: Tuesday, 2024-05-14
📖 Air Quality Measurements for London:
Date: 2024-05-14; Air Quality: 5.9
Date: 2024-05-15; Air Quality: 10.88
Date: 2024-05-16; Air Quality: 11.99
Date: 2024-05-17; Air Quality: 11.6
Date: 2024-05-18; Air Quality: 11.56
The air quality on May 18 in London is expected to be slightly unhealthy for sensitive groups, with a reading of 11.56. While it may not be ideal for everyone, those with respiratory issues should take extra precautions. It is still generally safe for most people to go outside, but you may want to limit prolonged exposure and consider using a mask.


In [19]:
QUESTION = "Can you please explain different PM2_5 air quality levels?"

response = generate_response(
    QUESTION, 
    feature_view, 
    model_air_quality, 
    encoder,
    model_llm, 
    tokenizer,
    llm_chain,
    verbose=True,
)

print(response)

🗓️ Today's date: Tuesday, 2024-05-14
📖 




Sure, I'd be happy to explain the different PM2.5 air quality levels. PM2.5 refers to particulate matter with a diameter of 2.5 micrometers or less. It's a measure of the concentration of these tiny particles in the air.

Here are the general air quality categories based on PM2.5 levels:

1. Good (0-12 µg/m³): At this level, the air quality is considered to be safe and suitable for all populations, including those who are sensitive to air pollution.

2. Moderate (12-35 µg/m³): The air quality is generally safe, but people who are sensitive to air pollution may experience mild discomfort. It's usually safe for most activities, including outdoor exercise.

3. Unhealthy for Sensitive Groups (35-55 µg/m³): People with lung or heart conditions, children, and the elderly may experience health effects. It's generally safe for most people, but sensitive groups should avoid prolonged outdoor exertion.

4. Unhealthy (55-150 µg/m³): Everyone may experience health effects, including respiratory sy

---

## <span style='color:#ff5f27'>🧬 Inference with OpenAI


In [20]:
from openai import OpenAI
import os
import getpass

from functions.llm_chain import generate_response_openai

In [21]:
os.environ["OPENAI_API_KEY"] = os.getenv("OPENAI_API_KEY") or getpass.getpass('🔑 Enter your OpenAI API key: ')

client = OpenAI(
    api_key=os.environ["OPENAI_API_KEY"],
)

🔑 Enter your OpenAI API key:  ···················································


In [22]:
QUESTION = "What was the air quality like last week in London?"

response = generate_response_openai(   
    QUESTION,
    feature_view,
    model_air_quality,
    encoder,
    client,
    verbose=True,
)
print(response)

2024-05-14 20:28:16,300 INFO: HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
Finished: Reading data from Hopsworks, using ArrowFlight (8.38s) 
🗓️ Today's date: Tuesday, 2024-05-14
📖 Air Quality Measurements for London:
Date: 2024-05-06; Air Quality: 16.4
Date: 2024-05-07; Air Quality: 14.2
Date: 2024-05-08; Air Quality: 15.1
Date: 2024-05-09; Air Quality: 23.4
Date: 2024-05-10; Air Quality: 26.2
Date: 2024-05-11; Air Quality: 23.1
Date: 2024-05-12; Air Quality: 16.5
Date: 2024-05-13; Air Quality: 10.5
2024-05-14 20:28:40,843 INFO: HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
Last week in London, the air quality varied, starting at a good level of 16.4 on the 6th of May, indicating it was quite safe for outdoor activities. It slightly improved further on the 7th with a level of 14.2, and remained fairly stable and good on the 8th at 15.1, suggesting that conditions were conducive for spending time outside. However, ther

In [23]:
QUESTION4 = "What will the air quality be like in Chicago tomorrow?"

response4 = generate_response_openai(
    QUESTION4,
    feature_view,
    model_air_quality,
    encoder,
    client,
    verbose=True,
)

print(response4)

2024-05-14 20:28:42,202 INFO: HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
Finished: Reading data from Hopsworks, using ArrowFlight (7.96s) 
🗓️ Today's date: Tuesday, 2024-05-14
📖 Air Quality Measurements for Chicago:
Date: 2024-05-14; Air Quality: 15.0
Date: 2024-05-15; Air Quality: 8.76
2024-05-14 20:28:57,762 INFO: HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
The air quality in Chicago tomorrow will be excellent, with a reading of 8.76. It will be a wonderful day to enjoy outdoor activities, such as going for a walk or a bike ride, as the air will be very clean and healthy to breathe.


---