# Question your document using Langchain

I have created a PDF document named 'highway' and saved it in my local drive. The text of the document is:

"I went on my bucket list road trip all the to arctic ocean visa Alaska highway and dempster highway. I live
in Edmonton and always wanted to go there as Canada is the only country that has public road to the
arctic ocean. I left o may 29 2024 and returned home on june 12 2024. I went to Whitehorse via stewart
cassiar highway and on my way back I took Alaska highway from Whitehorse."

Below we will use Langchain to ask questions about our document. Please note that this is just a demo and in real 
world examples this can be very powerful tool to get answers from large documents.



In [8]:
# Import required libraries

import warnings
warnings.filterwarnings("ignore")

from transformers import pipeline
from langchain import LLMChain, PromptTemplate
from langchain.llms import HuggingFacePipeline
import PyPDF2

# Function to read PDF and extract text
def read_pdf(file_path):
    with open(file_path, "rb") as file:
        pdf_reader = PyPDF2.PdfReader(file)
        text = ""
        for page_num in range(len(pdf_reader.pages)):
            text += pdf_reader.pages[page_num].extract_text()
    return text

# Function to answer questions using LangChain and a pre-trained model
def answer_question_llm(context, question):
    # Hugging Face QA pipeline expects input in a dictionary format
    qa_input = {"context": context, "question": question}
    
    # Wrapping the Hugging Face pipeline with HuggingFacePipeline
    hf_pipeline = pipeline("question-answering", model="distilbert-base-uncased-distilled-squad")
    wrapped_llm = HuggingFacePipeline(pipeline=hf_pipeline)
    
    # Directly using the Hugging Face pipeline for Q&A
    answer = hf_pipeline(question=question, context=context)
    return answer['answer']

# Specify the path to your PDF file
pdf_path =  r"C:\Users\Fawad\highway.pdf"

# Read the PDF file
context = read_pdf(pdf_path)

# Example question
question = "where do I live?"

# Get the answer
answer = answer_question_llm(context, question)
print(f"Answer: {answer}")

All PyTorch model weights were used when initializing TFDistilBertForQuestionAnswering.

All the weights of TFDistilBertForQuestionAnswering were initialized from the PyTorch model.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFDistilBertForQuestionAnswering for predictions without further training.


Answer: Edmonton
