<a href="https://colab.research.google.com/github/kamazoun/promptAI/blob/main/project_langchain.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## First, let's make sure that we have the dependencies we need by installing them from requirements.txt

In [None]:
!pip install -r requirements.txt -q

## Now that we have the package installed, let's import them and load the API keys that we will use throughout the project.

In [None]:
import os
from dotenv import load_dotenv, find_dotenv

load_dotenv(find_dotenv(), override=True)

True

## Let's get some answers from OpenAI API through LangChain

In [None]:
# Import the OpenAI wrapper
from langchain.llms import OpenAI

llm = OpenAI(model_name='text-davinci-003', temperature = 0.7, max_tokens = 128)
query = 'Can you tell me what is the original paper of bitcoin?'
llm(query)


'\n\nThe original paper of Bitcoin is titled "Bitcoin: A Peer-to-Peer Electronic Cash System" and was published in 2008 by Satoshi Nakamoto. It can be found at https://bitcoin.org/bitcoin.pdf.'

## Good. We will use the link that the Davinci engine provided us to download the original Bitcoin paper and build a question answering app that will help us understand it.

### First let's write a function that will load the content of the Bitcoin PDF paper.

In [None]:
from langchain.document_loaders import PyPDFLoader

def load_pdf(path):
    name, extension = os.path.splitext(path)

    if not extension == '.pdf':
        raise Exception('Please provide a PDF file')

    loader = PyPDFLoader(path)
    return loader.load()


### Let's test the function by getting the content of the PDF and storing it in a variable

In [None]:
data = load_pdf('bitcoin.pdf')

print(f'Your document contains {len(data)} pages')

Your document contains 9 pages


### Now, let's write some functions to help us break the document content into chunks and embed it into a Pinecone vector store

In [None]:
import pinecone
import tiktoken
from langchain.vectorstores import Pinecone
from langchain.embeddings.openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings()
pinecone.init(api_key=os.environ.get('PINECONE_API_KEY'), environment=os.environ.get('PINECONE_ENV'))

def chunk_data(data):
    '''
        Breaks the data into chunks
    '''
    from langchain.text_splitter import RecursiveCharacterTextSplitter
    text_splitter = RecursiveCharacterTextSplitter(chunk_size=256, chunk_overlap=0)
    return text_splitter.split_documents(data)


def embedding_cost(texts):
    '''
        Calculates the cost of the embedding
    '''
    enc = tiktoken.encoding_for_model('text-embedding-ada-002')
    total_tokens = sum([len(enc.encode(page.page_content)) for page in texts])
    print(f'Embedding Cost in USD: {total_tokens / 1000 * 0.0004:.6f}')


def insert_embedding(index_name, chunks):
    '''
        Creates a Pinecone index for the embeddings
    '''
    pinecone.create_index(index_name, dimension=1536, metric='cosine')
    vector_store = Pinecone.from_documents(chunks, embeddings, index_name=index_name)
    return vector_store


def fetch_embeddings(index_name):
    '''
        Fetches the vector store from a Pinecone index that should exist already
    '''
    if index_name in pinecone.list_indexes():
        print(f'Index {index_name} already exists. Loading embeddings ... ', end='')
        vector_store = Pinecone.from_existing_index(index_name, embeddings)
        return vector_store


def delete_pinecone_index(index_name='all'):
    '''
        Removes the pinecone index with the given name if given
        Otherwise deletes all indices.
    '''
    if index_name == 'all':
        indexes = pinecone.list_indexes()
        print('Deleting all indexes ... ')
        for index in indexes:
            pinecone.delete_index(index)
    else:
        print(f'Deleting index {index_name} ...', end='')
        pinecone.delete_index(index_name)




  from tqdm.autonotebook import tqdm


### Let's get the data chunks and calculate the embedding costs

In [None]:
chunks = chunk_data(data)
len(f'The data has been broken into {len(chunks)} chunks')

40

In [None]:
embedding_cost(chunks)

Embedding Cost in USD: 0.002017


### Let's create the vector store

In [None]:
# The choosen picone index name
pinecone_index = 'bitcoin-paper'

# Now we create the vector store
vector_store = insert_embedding(pinecone_index, chunks)

### We can verify in our Pinecone account that the index has been created with the correct number of vectors

## Now let's build the actual question answering function

In [None]:
def ask_with_memory(vector_store, question, chat_history=[]):
    '''
        Queries the vector database with the question, then return the chunks associated to the matching vector in natural language
    '''
    from langchain.chains import ConversationalRetrievalChain
    from langchain.chat_models import ChatOpenAI

    llm = ChatOpenAI(temperature=1)
    retriever = vector_store.as_retriever(search_type='similarity', search_kwargs={'k': 3})

    crc = ConversationalRetrievalChain.from_llm(llm, retriever)
    result = crc({'question': question, 'chat_history': chat_history})
    chat_history.append((question, result['answer']))

    return result, chat_history

### Let's build a loop to ask questions continously

In [None]:
i = 1
chat_history = []

print("Write Quit or Exit to quit")
while True:
    q = input(f"Question #{i}")
    i = i + 1
    if q.lower() in ["quit","exit"]:
        print("Quitting")
        break
    result, _ = ask_with_memory(vector_store, q, chat_history)
    print (result['answer'])
    print("----------------------------------------------------------------------")

Write Quit or Exit to quit


Question #1 What does the document talk about?


The given context does not provide enough information to determine what the document is specifically about.
----------------------------------------------------------------------


Question #2 Can you sum up the goal of bitcoin?


The goal of Bitcoin, as outlined in the context, is to be a purely peer-to-peer version of electronic cash. It aims to provide a decentralized system for online transactions without the need for a central authority to issue currency. The goal is to create a digital currency system that is secure, efficient, and free from inflation.
----------------------------------------------------------------------


Question #3 What are its main components?


Il n'y a pas suffisamment de contexte pour répondre à cette question. Peut-être faites-vous référence à un texte ou à un document spécifique dont nous ne disposons pas ici. Pourriez-vous fournir plus de détails ou clarifier votre question ?
----------------------------------------------------------------------


Question #4 does the context contain explanation on how to build a bitcoin network?


Oui, le contexte donne une explication générale de la construction d'un réseau Bitcoin. Il mentionne que le réseau Bitcoin utilise la preuve de travail pour enregistrer un historique public des transactions et que cela devient rapidement pratiquement impossible à modifier pour un attaquant si les nœuds sont honnêtes. Cependant, les détails spécifiques sur la construction d'un réseau Bitcoin ne sont pas fournis dans ce contexte.
----------------------------------------------------------------------


Question #5 quit


Quitting
