# RAG - Q&A Application

## Import the API key

In [2]:
import os
from dotenv import load_dotenv, find_dotenv 

load_dotenv(find_dotenv(), override=True)

API_OPENAI = os.getenv("API_OAI_KEY")

## Loadint PDF

### Creating a loading function

In [9]:
def load_document(file):
    import os 
    _, extension = os.path.splitext(file)
    if extension == ".pdf":
        from langchain.document_loaders import PyPDFLoader # import the `pypdf` module to make it work
        print(f"Loading {file}...")
        loader = PyPDFLoader(file)
    elif extension == ".docx":
        from langchain.document_loaders import Docx2txtLoader # import the `docx2txt` module to make it work
        print(f"Loading {file}...")
        loader = Docx2txtLoader(file)
    data = loader.load()
    print("Done ... ")
    return data

### Running the code

In [5]:
data = load_document(file="us_constitution.pdf")
print(f"You have {len(data)} pages in you data")

Loading us_constitution.pdf...
Done ... 
You have 41 pages in you data


In [10]:
data = load_document(file="the_great_gatsby.docx")
print(f"You have {len(data)} pages in you data")

Loading the_great_gatsby.docx...
Done ... 
You have 1 pages in you data


## External source, e.g., Wikipedia

In [12]:
# pip install wikipedia -q
def load_from_wikipedia(query, lang="en", load_max_docs=2):
    from langchain.document_loaders import WikipediaLoader
    loader = WikipediaLoader(query=query, lang=lang, load_max_docs=load_max_docs)
    data = loader.load()
    return data

In [13]:
data = load_from_wikipedia("GPT4")
print(data[0].page_content)

Generative Pre-trained Transformer 4 (GPT-4) is a multimodal large language model trained and created by OpenAI and the fourth in its series of GPT foundation models. It was launched on March 14, 2023, and made publicly available via the paid chatbot product ChatGPT Plus, via OpenAI's API, and via the free chatbot Microsoft Copilot.  As a transformer-based model, GPT-4 uses a paradigm where pre-training using both public data and "data licensed from third-party providers" is used to predict the next token. After this step, the model was then fine-tuned with reinforcement learning feedback from humans and AI for human alignment and policy compliance.: 2 
Observers reported that the iteration of ChatGPT using GPT-4 was an improvement on the previous iteration based on GPT-3.5, with the caveat that GPT-4 retains some of the problems with earlier revisions. GPT-4, equipped with vision capabilities (GPT-4V), is capable of taking images as input on ChatGPT. OpenAI has not revealed technical 