In [21]:
from dotenv import load_dotenv
load_dotenv()


True

In [33]:
with open("RealCostOfHS2.txt", "r", encoding="UTF-8") as file:
    source = file.read()

In [35]:
query = "What is HS2 and Why is it a big deal?"

In [34]:
# pip install fastembed scikit-learn openai numpy
from fastembed import TextEmbedding
from sklearn.metrics.pairwise import cosine_similarity
import numpy as np
from openai import OpenAI

# SETUP
client = OpenAI()
# documents = [
#     "LlamaIndex is a framework for connecting data to LLMs.",
#     "FastEmbed is a high-performance embedding generation library by Qdrant.",
#     "Qdrant is a vector database written in Rust."
# ]

documents = source 

# 1. EMBED (Local & Free)
# FastEmbed returns a generator, so we convert to list
embed_model = TextEmbedding(model_name="BAAI/bge-small-en-v1.5")
doc_embeddings = list(embed_model.embed(documents)) # List of numpy arrays

# 2. RETRIEVE (Manual Math with Sklearn)
query = "What is HS2 and why is it a big deal?"
query_embedding = list(embed_model.embed([query]))[0]

# Calculate similarity between Query and ALL documents
# We must stack the list of arrays into a single matrix for sklearn
scores = cosine_similarity([query_embedding], np.stack(doc_embeddings))[0]

# Find the index of the highest score
best_doc_index = np.argmax(scores)
retrieved_doc = documents[best_doc_index]

print(f"Retrieved: {retrieved_doc} (Score: {scores[best_doc_index]:.4f})")

# 3. GENERATE (OpenAI)
prompt = f"Context: {retrieved_doc}\nQuestion: {query}\nAnswer:"
response = client.chat.completions.create(
    model="gpt-5-mini",
    messages=[{"role": "user", "content": prompt}]
)
print("AI Answer:", response.choices[0].message.content)

Retrieved: 
 (Score: 0.6781)


2025-12-19 17:11:23,697 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


AI Answer: Short version
HS2 (High Speed 2) is the UK government’s major new high‑speed rail programme, originally planned to link London with major northern cities by building new high‑speed lines. It’s a big deal because it’s one of the largest and most expensive infrastructure projects in modern UK history and touches transport capacity, regional economic strategy, the environment, public spending priorities and politics.

What HS2 is
- A purpose‑built high‑speed railway intended to run from London into the Midlands and north of England. Trains would run on new track at much higher speeds than existing lines.
- Designed to increase north–south rail capacity, cut journey times between major city pairs, and free up space on existing routes for more local and freight services.
- Planned in phases: Phase 1 (London – Birmingham), Phase 2a (Birmingham – Crewe) and Phase 2b (further north to Manchester and an eastern leg toward Leeds). (The original multi‑phase route was modified by govern

In [37]:
# pip install llama-index llama-index-llms-openai
# import os
from llama_index.core import VectorStoreIndex, Document

# SETUP
# os.environ["OPENAI_API_KEY"] = "sk-..."
# documents = [Document(text="LlamaIndex connects data to LLMs.")]
documents = [Document(text=source)]

# 1. INDEX (Auto-Embeds via OpenAI API)
# LlamaIndex handles the API calls, batching, and vector storage automatically.
index = VectorStoreIndex.from_documents(documents)

# 2. RETRIEVE & GENERATE
query_engine = index.as_query_engine()
response = query_engine.query(query)

print(response)

2025-12-19 17:12:13,347 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


HS2, or High Speed 2, is a major railway construction project in the UK. It's a big deal due to its scale, cost, and impact on communities and the environment. The project has been criticized for its initial "costs plus" contracts, which led to overspending as the government guaranteed to cover almost all unforeseen expenditure. The project is also proving expensive due to its aspiration to be faster than other European high-speed railways, requiring costly engineering solutions. Additionally, the human costs are significant, with homes being demolished, farms divided, and local wildlife disrupted. Despite these challenges, HS2 is undergoing a "reset" to become more cost-effective and productive.


In [38]:
# pip install llama-index llama-index-embeddings-fastembed
from llama_index.core import VectorStoreIndex, Document, Settings
from llama_index.embeddings.fastembed import FastEmbedEmbedding


# --- THE MAGIC SWITCH ---
# We globally configure LlamaIndex to use FastEmbed running locally on your CPU
Settings.embed_model = FastEmbedEmbedding(model_name="BAAI/bge-small-en-v1.5")

# documents = [
#     Document(text="FastEmbed runs locally and saves API costs."),
#     Document(text="LlamaIndex orchestrates the retrieval flow.")
# ]

documents = [Document(text=source)]

# 1. INDEX (Local & Free)
# This no longer calls OpenAI. It runs on your laptop.
index = VectorStoreIndex.from_documents(documents)

# 2. RETRIEVE & GENERATE
# The retrieval uses local vectors; the final answer uses GPT-4
query_engine = index.as_query_engine()
response = query_engine.query(query)

print(response)

2025-12-19 17:13:02,238 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


HS2, or High Speed 2, is a major railway construction project in the UK. It's a big deal due to its scale, cost, and impact on communities and the environment. The project has faced criticism for its "costs plus" contracts, which led to overspending, and its aspiration to be faster than other European high-speed railways, which necessitated expensive engineering. The project has also had significant human and environmental costs, including the demolition of homes, division of farms, and disruption of wildlife habitats. Despite these challenges, HS2 is undergoing a "reset" to become more cost-effective and productive, and to secure better commercial deals with suppliers.


In [None]:
import os
import qdrant_client
from llama_index.core import (
    VectorStoreIndex,
    SimpleDirectoryReader,
    StorageContext,
    Settings,
)
from llama_index.vector_stores.qdrant import QdrantVectorStore
from llama_index.embeddings.fastembed import FastEmbedEmbedding
from llama_index.llms.openai import OpenAI

# 1. SETUP: LLM & Embeddings
# We use OpenAI for generation (requires key) but FastEmbed for embeddings (free/local)
os.environ["OPENAI_API_KEY"] = os.getenv('OPENAI_API_KEY')
Settings.llm = OpenAI(model="gpt-4")
Settings.embed_model = FastEmbedEmbedding(model_name="BAAI/bge-small-en-v1.5")

# 2. CONNECT: Qdrant
# Note: For local docker use host="localhost". For in-memory (testing) use location=":memory:"
client = qdrant_client.QdrantClient(location=":memory:") 

# 3. STORAGE: Configure LlamaIndex to use Qdrant
vector_store = QdrantVectorStore(client=client, collection_name="HS2")
storage_context = StorageContext.from_defaults(vector_store=vector_store)

# 4. INDEX: Load, Embed (Locally), and Store (in Qdrant)
# # Create a dummy file if you don't have one
# if not os.path.exists("data"):
#     os.makedirs("data")
#     with open("data/test.txt", "w") as f:
#         f.write("uv is an extremely fast Python package installer written in Rust.")

documents = SimpleDirectoryReader("./data").load_data()
index = VectorStoreIndex.from_documents(
    documents,
    storage_context=storage_context,
)

# 5. QUERY
query_engine = index.as_query_engine()
response = query_engine.query(query)
print(response)

2025-12-19 17:18:15,221 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


HS2, or High Speed 2, is a high-speed railway under construction in England. It's a significant project due to its scale and impact. The route must be as straight as possible for high-speed travel, requiring complex engineering solutions to navigate the intricate landscape of southern England. The line includes 52 major viaducts and five tunnels, totaling more than 40 miles. The project involves multiple consortiums, including British building giants and European companies with high-speed rail experience. The construction has a substantial impact on the surrounding areas, with many homes bought by HS2 and significant changes to the landscape. The project has faced criticism and resistance from some local communities and environmental campaigners. However, it's also seen as a symbol of economic growth and modern infrastructure development.
