In [18]:
import os
from dotenv import load_dotenv

load_dotenv("./../../All_LLM_tutorial/.env")

True

In [19]:
#### Example
from llama_index.core import SimpleDirectoryReader

In [20]:
reader = SimpleDirectoryReader(input_dir="articles")
docs = reader.load_data()

In [21]:
print(f"Count of Techcrunch articles: {len(docs)}")
print(docs[0])

Count of Techcrunch articles: 21
Doc ID: 6dbe3af2-ddea-4f46-9f81-c84494a30af0
Text: Signaling that investments in the supply chain sector remain
robust, Pando, a startup developing fulfillment management
technologies, today announced that it raised $30 million in a Series B
round, bringing its total raised to $45 million.  Iron Pillar and
Uncorrelated Ventures led the round, with participation from existing
investors Nexus Vent...


In [22]:
from llama_index.core import VectorStoreIndex

# 1. Documents에서 VectorStoreIndex를 직접로드
index = VectorStoreIndex.from_documents(docs, show_progress=True)

Parsing nodes: 100%|██████████| 21/21 [00:00<00:00, 494.87it/s]
Generating embeddings: 100%|██████████| 50/50 [00:08<00:00,  5.71it/s]


In [23]:
# Load VectorStoreIndex by selecting the splitter(chunk_size, chunk_overlap) and embedded model directly

from llama_index.core.node_parser import SentenceSplitter
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.core import VectorStoreIndex


embed_model = OpenAIEmbedding()

node_parser = SentenceSplitter(chunk_size=256, chunk_overlap=20)
nodes = node_parser.get_nodes_from_documents(docs)
index = VectorStoreIndex(nodes, embed_model=embed_model, show_progress=True)

Generating embeddings: 100%|██████████| 202/202 [00:04<00:00, 47.10it/s]


In [25]:
# 'similarity_top_k' refers to the number of top k chunks with the highest similarity.

base_retriever = index.as_retriever(similarity_top_k=5)
source_nodes = base_retriever.retrieve("What is the CMA generative ai?")

In [34]:
# check source_nodes

for node in source_nodes:
    # print(node.metadata)
    print(f"---------------------------------------------")
    print(f"Score: {node.score:.3f}")
    print(node.get_content())
    print(f"---------------------------------------------\n\n")

---------------------------------------------
Score: 0.862
The expectation is that the CMA’s Digital Markets Unit, up and running since 2021 in shadow form, will (finally) gain legislative powers in the coming years to apply pro-active “pro-competition” rules which are tailored to platforms that are deemed to have “strategic market status” (SMS). So we can speculate that providers of powerful foundational AI models may, down the line, be judged to have SMS — meaning they could expect to face bespoke rules on how they must operate vis-a-vis rivals and consumers in the U.K. market.

The U.K.’s data protection watchdog, the ICO, also has its eye on generative AI. It’s another existing oversight body which the government has tasked with paying special mind to AI under its plan for context-specific guidance to steer development of the tech through the application of existing laws.

---------------------------------------------


---------------------------------------------
Score: 0.860
Wel

In [40]:
from llama_index.llms.openai import OpenAI

query_engine = index.as_query_engine(streaming=True, similarity_top_k=2)

In [41]:
response = query_engine.query("What is the CMA generative ai?")
response.print_response_stream()

The CMA generative AI refers to generative artificial intelligence models that are being reviewed by the Competition and Markets Authority (CMA) in the UK. These models include large language models and generative AI that have the potential to transform various aspects of activities conducted by individuals and businesses.

In [38]:
# Or you can set llm 
llm = OpenAI(model="gpt-4o-mini",temperature=0)
query_engine = index.as_query_engine(
  streaming=True, 
  similarity_top_k=2, 
  llm=llm
)

In [39]:
response = query_engine.query("What is the CMA generative ai?")
response.print_response_stream()

The CMA generative AI refers to the initial review conducted by the U.K.'s Competition and Markets Authority (CMA) focusing on AI foundational models, including large language models and generative AI technologies. This review aims to assess competition and consumer protection aspects related to the development and use of these models, with the goal of understanding their evolution and establishing guiding principles for their future use. The CMA plans to publish the findings of this review in early September, following a period for stakeholders to submit their responses.