# Rememberizer

>[Rememberizer](https://rememberizer.ai/) is a knowledge enhancement service for AI applications created by  SkyDeck AI Inc.

This notebook shows how to retrieve documents from `Rememberizer` into the Document format that is used downstream.

# Preparation

You will need an API key: you can get one after creating a common knowledge at [https://rememberizer.ai](https://rememberizer.ai/). Once you have an API key, you must set it as an environment variable `REMEMBERIZER_API_KEY` or pass it as `rememberizer_api_key` when initializing `RememberizerRetriever`.

`RememberizerRetriever` has these arguments:
- optional `top_k_results`: default=10. Use it to limit number of returned documents. 
- optional `rememberizer_api_key`: required if you don't set the environment variable `REMEMBERIZER_API_KEY`.

`get_relevant_documents()` has one argument, `query`: free text which used to find documents in the common knowledge of `Rememberizer.ai`

# Examples

## Basic usage

In [1]:
# Setup API key
from getpass import getpass

REMEMBERIZER_API_KEY = getpass()

In [2]:
import os
from langchain.retrievers import RememberizerRetriever

os.environ["REMEMBERIZER_API_KEY"] = REMEMBERIZER_API_KEY
retriever = RememberizerRetriever(top_k_results=5)

In [3]:
docs = retriever.get_relevant_documents(query="How does Large Language Models works?")

In [4]:
docs[0].metadata  # meta-information of the Document

{'id': 13646493,
 'document_id': '17s3LlMbpkTk0ikvGwV0iLMCj-MNubIaP',
 'name': 'What is a large language model (LLM)_ _ Cloudflare.pdf',
 'type': 'application/pdf',
 'path': '/langchain/What is a large language model (LLM)_ _ Cloudflare.pdf',
 'url': 'https://drive.google.com/file/d/17s3LlMbpkTk0ikvGwV0iLMCj-MNubIaP/view',
 'size': 337089,
 'created_time': '',
 'modified_time': '',
 'integration': {'id': 347, 'integration_type': 'google_drive'}}

In [5]:
print(docs[0].page_content[:400])  # a content of the Document

before, or contextualized in new ways. on some level they " understand " semantics in that they can associate words and concepts by their meaning, having seen them grouped together in that way millions or billions of times. how developers can quickly start building their own llms to build llm applications, developers need easy access to multiple data sets, and they need places for those data sets 


# Usage in a chain

In [6]:
OPENAI_API_KEY = getpass()

In [7]:
os.environ["OPENAI_API_KEY"] = OPENAI_API_KEY

In [8]:
from langchain.chains import ConversationalRetrievalChain
from langchain_openai import ChatOpenAI

model = ChatOpenAI(model_name="gpt-3.5-turbo") 
qa = ConversationalRetrievalChain.from_llm(model, retriever=retriever)

In [10]:
questions = [
    "What is RAG?",
    "How does Large Language Models works?",
]
chat_history = []

for question in questions:
    result = qa.invoke({"question": question, "chat_history": chat_history})
    chat_history.append((question, result["answer"]))
    print(f"-> **Question**: {question} \n")
    print(f"**Answer**: {result['answer']} \n")

-> **Question**: What is RAG? 

**Answer**: RAG stands for Retrieval-Augmented Generation. It is an AI framework that retrieves facts from an external knowledge base to enhance the responses generated by large language models, such as LLMs. This helps ensure the information provided is accurate and up-to-date, as well as allows users to understand the generative process of the models. 

-> **Question**: How does Large Language Models works? 

**Answer**: Large Language Models (LLMs) work by analyzing massive datasets of language, typically gathered from the internet, to comprehend and generate human language text. LLMs are built on machine learning, specifically using a type of neural network called transformer models. These models use a mathematical technique called self-attention to understand context in human language, allowing them to interpret and generate text even in new or vague contexts. LLMs are trained via deep learning, where they learn to recognize distinctions between cha