This loads environment variables from a .env file into your system environment so you can securely access them in your code using os.getenv().

In [1]:
import os
from dotenv import load_dotenv
load_dotenv()

True

This checks whether the OPENAI_API_KEY exists in the environment variables and prints a message if it is set.

In [2]:
if os.environ['OPENAI_API_KEY']:
    print("API Key is set.")

API Key is set.


This imports the ChatOpenAI class from LangChain’s OpenAI integration, which allows you to interact with OpenAI chat models (like GPT) inside your Python application.

In [4]:
from langchain_openai import ChatOpenAI


In [5]:
llm = ChatOpenAI(model="gpt-3.5-turbo",temperature=0)

This creates a ChatOpenAI language model instance using the "gpt-5-nano" model with temperature set to 0, meaning the responses will be more deterministic and consistent.

In [6]:
llm = ChatOpenAI(model="gpt-5-nano",temperature=0)

In [None]:
llm.invoke("What is AI? Tell me in one line")

This sends the prompt to the language model and stores the response in result; result.content extracts and displays the generated text from the model’s reply.

In [None]:
result = llm.invoke("What is AI? Tell me in one line")
result.content

### RAG Implementation With your own text data

### Step 1 : Preparing Document for your Text

This imports the Document class from LangChain, which is used to represent text data along with optional metadata for processing in LLM or RAG workflows.

In [None]:
from langchain_core.documents import Document

In [None]:
my_text = """Artificial intelligence (AI) is the capability of computational systems to perform tasks typically associated with human intelligence, such as learning, reasoning, problem-solving, perception, and decision-making. It is a field of research in computer science that develops and studies methods and software that enable machines to perceive their environment and use learning and intelligence to take actions that maximize their chances of achieving defined goals.[1]

High-profile applications of AI include advanced web search engines (e.g., Google Search); recommendation systems (used by YouTube, Amazon, and Netflix); virtual assistants (e.g., Google Assistant, Siri, and Alexa); autonomous vehicles (e.g., Waymo); generative and creative tools (e.g., language models and AI art); and superhuman play and analysis in strategy games (e.g., chess and Go). However, many AI applications are not perceived as AI: "A lot of cutting edge AI has filtered into general applications, often without being called AI because once something becomes useful enough and common enough it's not labeled AI anymore."[2][3]

Various subfields of AI research are centered around particular goals and the use of particular tools. The traditional goals of AI research include learning, reasoning, knowledge representation, planning, natural language processing, perception, and support for robotics.[a] To reach these goals, AI researchers have adapted and integrated a wide range of techniques, including search and mathematical optimization, formal logic, artificial neural networks, and methods based on statistics, operations research, and economics.[b] AI also draws upon psychology, linguistics, philosophy, neuroscience, and other fields.[4] Some companies, such as OpenAI, Google DeepMind and Meta,[5] aim to create artificial general intelligence (AGI) – AI that can complete virtually any cognitive task at least as well as a human.
"""

This creates a LangChain Document object where page_content stores your AI text, and wraps it inside a list (docs) so it can be used in RAG pipelines like text splitting, embedding, and retrieval.

In [None]:
### Langchain creates a document for the above my_text 

docs = [Document(page_content=my_text)]
docs

In [None]:
### or

docs = [Document(page_content=my_text, metadata={"source": "ABC","documnetID":"doc1"})]
docs

### Step 2 : Splitting the documents into chunks

This splits your large document into smaller chunks of 500 characters each with 50-character overlap, making it easier for embedding and retrieval, while automatically preserving metadata for each chunk.

In [None]:
## It divides the document into multiple chunks and adds metadata to each chunk

from langchain_core.text_splitter import RecursiveCharacterTextSplitter 

In [None]:
splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)
chunks = splitter.split_documents(docs)
chunks

### Step 3 : Creating Embeddings for the chunks

This loads the OpenAI embedding model and converts the query “What is AI?” into a numerical vector representation, which can be used for similarity search in RAG systems.

In [None]:
from langchain_openai import OpenAIEmbeddings

In [None]:
embedding_model = OpenAIEmbeddings(model="text-embedding-3-small")

In [None]:
embedding_model.embed_query("What is AI?")

### Step 4 : Create and store embeddings in vector store

This stores your document chunks inside a Chroma vector database, automatically converting them into embeddings for similarity search.
The loop manually generates embedding vectors for each chunk and stores them in a list, but Chroma.from_documents() already handles this internally.

In [None]:
from langchain_community.vectorstores import Chroma

In [None]:
vectorstore = Chroma.from_documents(documents=chunks,embedding=embedding_model)

In [None]:
vectors = []
for doc in chunks:
    vector = embedding_model.embed_documents([doc.page_content])
    vectors.append(vector)

### Step 5 : Sematic Search

This searches the vector database for the top 3 most similar document chunks related to the query “What is AI” and returns the most relevant text sections based on embedding similarity.

In [None]:
vectorstore.similarity_search("What is AI",k=3)

### Talk to LLM

In [None]:
context = vectorstore.similarity_search("What is AI",k=3)
llm.invoke("What is AI? Tell me in one line",context=context)

In [None]:
#or
context = vectorstore.similarity_search("What is AI",k=3)
response = llm.invoke(f"What is AI? Tell me in one line",context=context)
response.content