
<div class="markdown-google-sans">
<h2>Chatgpt-PDF</h2>
</div>

The developed chatbot is designed to efficiently address frequently asked questions (FAQs) by leveraging cutting-edge technologies like OpenAI and Langchain. It's been specifically trained on a PDF document containing the FAQ content. When presented with a user's query, the chatbot intelligently processes the input, extracts the relevant information from the PDF, and provides a precise and accurate response.

<div class="markdown-google-sans">

## **Install Dependencies**
</div>



In [None]:
pip install -r requirements.txt

<div class="markdown-google-sans">

## **Train Dataset**
</div>

1.   Data Collection

> The project began with collecting a PDF file containing frequently asked questions and their corresponding answers. This PDF file served as the knowledge base for the chatbot.

2.   PDF Processing

> The PDF file was processed to extract text and convert it into a structured format that the chatbot could work with. This involved using libraries like PyPDF2 or other PDF processing tools.





<div class="markdown-google-sans">

## **Training the Model**
</div>


1.   Open the **app.py** file in your code editor.
2.   Locate the section where the OpenAI API key needs to be added.
3.   Insert your OpenAI API key in the designated place. If you haven't obtained an API key yet, make sure to sign up on the OpenAI platform and acquire your API key.
4.   Save the **app.py** file.
5.   You are now ready to initiate the training process.

**Note:** Ensure that you keep your API key confidential and do not share it publicly.



1.   Open your terminal or command prompt.
2.   Execute the following command:

In [None]:
python3 app.py



3.   To train the model, execute app.py. This script will utilize your training dataset Upon completion, the trained model will be ready for deployment.




In [None]:
from langchain.document_loaders import DirectoryLoader
from PyPDF2 import PdfReader
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import FAISS

from langchain.chains.question_answering import load_qa_chain
from langchain.llms import OpenAI

from langchain.chains import RetrievalQA

os.environ["OPENAI_API_KEY"] = 'OPENAI_API_KEY'

docsearch = None
chain = None
rqa = None
directory = './data'

def load_docs(directory):
  loader = DirectoryLoader(directory)
  print(loader)
  documents = loader.load()
  return documents

def loadPDF():
    print("LOAD PDF")

    raw_text = ''
    documents = load_docs(directory)
    for document in documents:
        if document.page_content:
            raw_text +=  document.page_content
    print("raw_text : ", raw_text)

    text_splitter = CharacterTextSplitter(
        separator = "\n",
        chunk_size = 1000,
        chunk_overlap  = 200, #striding over the text
        length_function = len,
    )
    texts = text_splitter.split_text(raw_text)


    embeddings = OpenAIEmbeddings()

    global docsearch
    docsearch = FAISS.from_texts(texts, embeddings)
    docsearch.embedding_function
    global chain
    chain = load_qa_chain(OpenAI(), chain_type="stuff") # we are going to stuff all the docs in at once

    chain.llm_chain.prompt.template

    retriever = docsearch.as_retriever(search_type="similarity", search_kwargs={"k":4})

    global rqa
    rqa = RetrievalQA.from_chain_type(llm=OpenAI(), chain_type="stuff", retriever=retriever, return_source_documents=True)

    print("PDF LOADED")


4.   Once the training process is complete, The API will be accessible on port 3001, allowing users to interact with the chatbot.

    Link : http://localhost:3001/

# Example

1.   Which Languages supported?
2.   Does it supported in Desktop Offfline mode?
3.   How solutions offer currently?

# Implementation Demo link
  
1.   link: http://122.169.118.18:3001/

