### RAG Citation: Enhancing AI-Generated Content with Automatic Citations (A Non-LLM Approach)
#### 3. Example-LangChain

#### 1. Reading text file

In [1]:
## Data Ingestion
from langchain_community.document_loaders import TextLoader
loader=TextLoader("speech.txt")
text_documents=loader.load()
# text_documents

#### 2. add your OpenAI key

* create .env file
* add your OPENAI_API_KEY='sk-****'

In [2]:
import os
from dotenv import load_dotenv
load_dotenv()

os.environ["OPENAI_API_KEY"] = os.getenv("OPENAI_API_KEY")

In [3]:
from langchain.text_splitter import RecursiveCharacterTextSplitter
text_splitter=RecursiveCharacterTextSplitter(chunk_size=1000,chunk_overlap=200)
documents=text_splitter.split_documents(text_documents)
documents[:5]

[Document(metadata={'source': 'speech.txt'}, page_content='I am honored to be with you today at your commencement from one of the finest universities in the world. I never graduated from college. Truth be told, this is the closest I’ve ever gotten to a college graduation. Today I want to tell you three stories from my life. That’s it. No big deal. Just three stories.\n\nThe first story is about connecting the dots.\n\nI dropped out of Reed College after the first 6 months, but then stayed around as a drop-in for another 18 months or so before I really quit. So why did I drop out?'),
 Document(metadata={'source': 'speech.txt'}, page_content='It started before I was born. My biological mother was a young, unwed college graduate student, and she decided to put me up for adoption. She felt very strongly that I should be adopted by college graduates, so everything was all set for me to be adopted at birth by a lawyer and his wife. Except that when I popped out they decided at the last minut

#### 3. ingesting in vectorDB

In [4]:

## Vector Embedding And Vector Store
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import Chroma
db = Chroma.from_documents(documents,OpenAIEmbeddings())

In [5]:
query = "How long did it take for Apple to grow from a garage operation to a $2 billion company?"



retriever = db.as_retriever(
            search_type="similarity", search_kwargs={"k": 1}
        )
relevant_docs = retriever.invoke(query)

In [6]:
relevant_docs

[Document(metadata={'source': 'speech.txt'}, page_content='My second story is about love and loss.\n\nI was lucky — I found what I loved to do early in life. Woz and I started Apple in my parents’ garage when I was 20. We worked hard, and in 10 years Apple had grown from just the two of us in a garage into a $2 billion company with over 4,000 employees. We had just released our finest creation — the Macintosh — a year earlier, and I had just turned 30. And then I got fired. How can you get fired from a company you started? Well, as Apple grew we hired someone who I thought was very talented to run the company with me, and for the first year or so things went well. But then our visions of the future began to diverge and eventually we had a falling out. When we did, our Board of Directors sided with him. So at 30 I was out. And very publicly out. What had been the focus of my entire adult life was gone, and it was devastating.')]

#### 4. LangChain-Rag Code

In [7]:
# !pip install langchainhub

In [8]:
from langchain_community.chat_models import ChatOpenAI
from langchain_core.output_parsers import StrOutputParser

llm = ChatOpenAI(model="gpt-4o-mini")

  warn_deprecated(


In [9]:
from langchain import hub
prompt = hub.pull("empler-ai/rag-prompt")

In [10]:
writer = prompt | llm | StrOutputParser()

In [11]:
content=[doc.page_content for doc in relevant_docs]
context = "\n\n".join(content)

In [12]:
answer=writer.invoke({"question":query,"context":context})

In [13]:
answer

'It took Apple 10 years to grow from a garage operation to a $2 billion company.'

#### 5. RagCitation

In [14]:
import uuid
def generate_uuid():
    unique_id = uuid.uuid4()
    return str(unique_id)


context = []
for document in content:
    context.append(
        {
            "source_id": generate_uuid(),
            "document": document,
            "meta": [],
        }
    )

In [15]:
from rag_citation import CiteItem, Inference

inference = Inference(spacy_model="md", embedding_model="md")
cite_item = CiteItem(answer=answer, context=context)

In [16]:
output=inference(cite_item)

100%|██████████| 4/4 [00:02<00:00,  1.80it/s]


In [17]:
output.citation

[{'answer_sentences': 'It took Apple 10 years to grow from a garage operation to a $2 billion company.',
  'cite_document': [{'document': 'We worked hard, and in 10 years Apple had grown from just the two of us in a garage into a $2 billion company with over 4,000 employees.',
    'source_id': '65940ca0-a470-400c-bf57-2264b7c6427a',
    'entity': [{'word': 'Apple', 'entity_name': 'ORG'},
     {'word': '10 years', 'entity_name': 'DATE'},
     {'word': '$2 billion', 'entity_name': 'MONEY'}],
    'meta': []}]}]

In [18]:
output.missing_word

[]

In [19]:
output.hallucination

False