<h1 style="padding: 25px 25px; background-color: lightblue; font-family: Sans-Serif; color:black; text-align: center">
Querying PDF With LangChain and Cassandra</h1>

### Import packages :

In [None]:
# LangChain components to use
from langchain.vectorstores.cassandra import Cassandra
from langchain.indexes.vectorstore import VectorStoreIndexWrapper
from langchain.llms import OpenAI
from langchain.embeddings import OpenAIEmbeddings

# Support for dataset retrieval with Hugging Face
from datasets import load_dataset

# With CassIO, the engine powering the Astra DB integration in LangChain, you will also initialize the DB connection:
import cassio

# Read PDF
from PyPDF2 import PdfReader

#### Provide your secrets:

Replace the following with your Astra DB connection details and your OpenAI API key:

In [2]:
import os

OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
ASTRA_DB_APPLICATION_TOKEN = os.getenv("ASTRA_DB_APPLICATION_TOKEN")
ASTRA_DB_ID = os.getenv("ASTRA_DB_ID")

Provide the path of the PDF file:

In [3]:
pdfreader = PdfReader('The Importance of Paragraph Writing.pdf')

Read text from pdf:

In [4]:
from typing_extensions import Concatenate
raw_text = ''
for i, page in enumerate(pdfreader.pages):
    content = page.extract_text()
    if content:
        raw_text += content

In [5]:
raw_text

'See discussions, st ats, and author pr ofiles f or this public ation at : https://www .researchgate.ne t/public ation/343449753\nThe Importance of Paragraph Writing: An Introduction\nArticle \xa0\xa0 in\xa0\xa0HuS S Int ernational Journal of R esearch in Humanities and Social Scienc es · August 2020\nCITATIONS\n11READS\n61,667\n2 author s, including:\nOmid W ali\nNang arhar Univ ersity\n9 PUBLICA TIONS \xa0\xa0\xa011 CITATIONS \xa0\xa0\xa0\nSEE PROFILE\nAll c ontent f ollo wing this p age was uplo aded b y Omid W ali on 05 A ugust 2020.\nThe user has r equest ed enhanc ement of the do wnlo aded file.International Journal of Latest Research in Humanities and Social Science (IJLRHSS)  \nVolume 03 - Issue 07, 2020  \nwww.ijlrhss.com || PP. 44-50 \n44 | Page                                                                                                                        www.ijlrhss.com   \nThe Importance of Paragraph Writing:  An Introduction  \n \nOmid Wali  \nPh.D. Research Scholar

Initialize the connection to your database:

In [6]:
cassio.init(token=ASTRA_DB_APPLICATION_TOKEN, database_id=ASTRA_DB_ID)

Create the LangChain embedding and LLM objects for later usage:

In [None]:
llm = OpenAI(openai_api_key=OPENAI_API_KEY)
embedding = OpenAIEmbeddings(openai_api_key=OPENAI_API_KEY)

Create your LangChain vector store backed by Astra DB:

In [8]:
astra_vector_store = Cassandra(
    embedding=embedding,
    table_name="LangchainVectorStore",
    session=None,
    keyspace=None,
)

In [9]:
from langchain.text_splitter import CharacterTextSplitter
# We need to split the text using Character Text Split such that it sshould not increse token size
text_splitter = CharacterTextSplitter(
    separator = "\n",
    chunk_size = 800,
    chunk_overlap  = 200,
    length_function = len,
)
texts = text_splitter.split_text(raw_text)

In [10]:
texts[:50]

['See discussions, st ats, and author pr ofiles f or this public ation at : https://www .researchgate.ne t/public ation/343449753\nThe Importance of Paragraph Writing: An Introduction\nArticle \xa0\xa0 in\xa0\xa0HuS S Int ernational Journal of R esearch in Humanities and Social Scienc es · August 2020\nCITATIONS\n11READS\n61,667\n2 author s, including:\nOmid W ali\nNang arhar Univ ersity\n9 PUBLICA TIONS \xa0\xa0\xa011 CITATIONS \xa0\xa0\xa0\nSEE PROFILE\nAll c ontent f ollo wing this p age was uplo aded b y Omid W ali on 05 A ugust 2020.\nThe user has r equest ed enhanc ement of the do wnlo aded file.International Journal of Latest Research in Humanities and Social Science (IJLRHSS)  \nVolume 03 - Issue 07, 2020  \nwww.ijlrhss.com || PP. 44-50',
 'Volume 03 - Issue 07, 2020  \nwww.ijlrhss.com || PP. 44-50 \n44 | Page                                                                                                                        www.ijlrhss.com   \nThe Importance of Paragraph Wri

### Load the dataset into the vector store



In [11]:
astra_vector_store.add_texts(texts[:50])

print("Inserted %i headlines." % len(texts[:50]))

astra_vector_index = VectorStoreIndexWrapper(vectorstore=astra_vector_store)

Inserted 42 headlines.


### Run the Q&A cycle

Simply run the cells and ask a question -- or `quit` to stop. (you can also stop execution with the "▪" button on the top toolbar)

Here are some suggested questions:
- How to write a good paragraph?
- What are the kinds of paragraphs?

In [12]:
first_question = True
while True:
    if first_question:
        query_text = input("\nEnter your question (or type 'quit' to exit): ").strip()
    else:
        query_text = input("\nWhat's your next question (or type 'quit' to exit): ").strip()

    if query_text.lower() == "quit":
        break

    if query_text == "":
        continue

    first_question = False

    print("\nQUESTION: \"%s\"" % query_text)
    answer = astra_vector_index.query(query_text, llm=llm).strip()
    print("ANSWER: \"%s\"\n" % answer)

    print("FIRST DOCUMENTS BY RELEVANCE:")
    for doc, score in astra_vector_store.similarity_search_with_score(query_text, k=4):
        print("    [%0.4f] \"%s ...\"" % (score, doc.page_content[:84]))


QUESTION: "How to write a good topic sentence?"
ANSWER: "A good topic sentence should have descriptive adjectives before the noun, use words like "several" and "many", or include numbers. These features can help the writer develop their paragraph and provide enough information for the reader."

FIRST DOCUMENTS BY RELEVANCE:
    [0.9371] "Example: The University of Georgia is the first public chartered university in the s ..."
    [0.9326] "discuss each one of them briefl y (Boardman & Frydenberg, 2008 ).  
 
2.2. The Topic ..."
    [0.9296] " You write a topic sentence  
 You eliminate irrelevant ideas  
 You make an outl ..."
    [0.9228] "understanding, here is a diagram  of English style organization  (Boardman, 2008).   ..."

QUESTION: "How to write atention-grabbing sentences?"
ANSWER: "To write attention-grabbing sentences, start by using descriptive adjectives before nouns. You can also use words like "several" or "many" to add interest, or incorporate numbers into your sent