# Langchain
Aprendendo sobre langchain

In [1]:
# Carregando chave openai
import os
from dotenv import load_dotenv

load_dotenv()
openai_api_key = os.getenv("OPENAI_API_KEY")

## Exemplo
Exemplo de código langchain

In [2]:
from langchain_openai import ChatOpenAI

# Initialize the OpenAI model
model = ChatOpenAI(
    openai_api_key=openai_api_key,  # Your OpenAI API key
    model="gpt-4",                  # Model name, e.g., "gpt-4o", "gpt-4o-mini"
    temperature=0.7,                # Controls creativity of the output; 0.0 for deterministic, higher for more diverse outputs
    max_tokens=150,                 # Maximum number of tokens to generate
    n=1,                            # Number of completions to generate
    stop=None,                      # Stop sequence; can be a string or list of strings where the API should stop generating further tokens
    presence_penalty=0.6,           # Penalizes new tokens based on their presence in the text so far; helps avoid repetition
    frequency_penalty=0.5           # Penalizes new tokens based on their frequency in the text so far; reduces repetition
)

# Define the prompt
prompt = "Explain the basics of neural networks in simple terms."

# Generate the response
response = model.invoke(prompt)

# Display the result
print(response.content)

Neural networks are computer systems that are designed to mimic the way the human brain learns and processes information. 

At its most basic, a neural network takes in inputs, which it then processes in hidden layers using weights that are adjusted during training. Then it gives an output. The weights are adjusted to find patterns in data.

These networks consist of three main parts: the input layer, hidden layers, and output layer. The input layer receives various forms of information from the outside world; this is the data that the network will learn about. The hidden layers process this input by weighing its importance (i.e., assigning numerical values), and then create connections between various inputs based on these values. Finally, the output layer provides the result – what the network


## Instalando bibliotecas

!pip install langchain-community unstructured langchain-openai openai faiss-cpu

!pip install msoffcrypto-tool    #pediu para instalar no erro ao rodar o "load()" do ExcelLoader

## Excel

In [2]:
from langchain_community.document_loaders import UnstructuredExcelLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain.vectorstores.faiss import FAISS
from IPython.display import display, Markdown

In [36]:
excel_path = "6.langchain/Reviews.xlsx"

In [29]:
# Carregar dados excel
# mode=elements ira dividir os dados em componentes individuais
loader = UnstructuredExcelLoader(excel_path, mode="elements")
docs = loader.load()

In [None]:
# Mostrar os 5 primeiros elementos
docs[:5]

In [34]:
# Dividir os documentos em chunks
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size = 2000,
    chunk_overlap = 200
)
chunks = text_splitter.split_documents(docs)

In [None]:
# Mostrar primeiros 5 chunks
chunks[:5]

In [11]:
# Embeddings com OPenAI Embedding model
embeddings = OpenAIEmbeddings(
    openai_api_key=openai_api_key,
    model="text-embedding-3-large"
)

# Criando index FAISS
db_faiss = FAISS.from_documents(chunks, embeddings)
db_faiss

<langchain_community.vectorstores.faiss.FAISS at 0x709044c0acf0>

In [30]:
# Tentativa de Retrieval System
query = "give me my worst reviews"

# Retrieve o contexto -> Langchain usa Cosine Distance Metric
docs_faiss = db_faiss.similarity_search_with_score(query, k = 5)

In [None]:
# Mostrar os 5 primeiros elementos
docs[:5]

In [25]:
# inspecionando o docs_faiss
docs_faiss[0][0].page_content

'2021-12-20 08:30:29+00:00 5 Master Time Series Analysis and Forecasting with Python 2024 Leo TiÅ¡ljariÄ‡ 2021-12-18 16:56:12+00:00 5 Master Time Series Analysis and Forecasting with Python 2024 Nikhitha 2021-12-18 10:51:14+00:00 4.5 Econometrics and Statistics for Business in R & Python Konstantinos Tsiamasiotis 2021-12-18 04:59:38+00:00 5 Econometrics and Statistics for Business in R & Python Angeliki Kafritsa 2021-12-17 08:37:51+00:00 5 Econometrics and Statistics for Business in R & Python Rosie Liu 2021-12-16 22:21:46+00:00 4 Econometrics and Statistics for Business in R & Python Renuka Gusain 2021-12-16 11:10:09+00:00 4.5 Master Time Series Analysis and Forecasting with Python 2024 Yhasreen Abrahim 2021-12-16 10:33:42+00:00 3 content too simple and basic time series XGBoost for Business: Machine Learning Course in Python & R Jacob Lackey 2021-12-16 04:00:47+00:00 3 Too many spelling errors and grammatical mistakes on the powerpoint slides Econometrics and Statistics for Business 

In [22]:
context_text = "\n\n".join([doc.page_content for doc, _score in docs_faiss])
context_text

"2021-12-20 08:30:29+00:00 5 Master Time Series Analysis and Forecasting with Python 2024 Leo TiÅ¡ljariÄ‡ 2021-12-18 16:56:12+00:00 5 Master Time Series Analysis and Forecasting with Python 2024 Nikhitha 2021-12-18 10:51:14+00:00 4.5 Econometrics and Statistics for Business in R & Python Konstantinos Tsiamasiotis 2021-12-18 04:59:38+00:00 5 Econometrics and Statistics for Business in R & Python Angeliki Kafritsa 2021-12-17 08:37:51+00:00 5 Econometrics and Statistics for Business in R & Python Rosie Liu 2021-12-16 22:21:46+00:00 4 Econometrics and Statistics for Business in R & Python Renuka Gusain 2021-12-16 11:10:09+00:00 4.5 Master Time Series Analysis and Forecasting with Python 2024 Yhasreen Abrahim 2021-12-16 10:33:42+00:00 3 content too simple and basic time series XGBoost for Business: Machine Learning Course in Python & R Jacob Lackey 2021-12-16 04:00:47+00:00 3 Too many spelling errors and grammatical mistakes on the powerpoint slides Econometrics and Statistics for Business 

In [26]:
# Criando um prompt simples para o RAG System
prompt = f"""
based on this context {context_text}
please answer this question {query}
if you don't know the answer just say you don't know.
"""

In [31]:
# Chamando a OpenAI API com langchain
model = ChatOpenAI(
    openai_api_key=openai_api_key,
    model="gpt-4o-mini",
    temperature=0
)
response_text = model.invoke(prompt)

In [32]:
# Exibindo resultados
display(Markdown(response_text.content))

Based on the provided context, here are the worst reviews:

1. **Loïc Legros (2023-11-08)**: Rating: 1
   - Comments: "Not a good course :\n1- some python code are not up to date and doesn't work.\n2- the course doesn't add any value compare to the simple reading of either wikipedia article or documentation of python module used. Considered this course as a audio version of those.\n3- instructor doesn't really know python and code could easily be improved."

2. **Anonymized User (2021-12-16)**: Rating: 3
   - Comments: "content too simple and basic time series."

3. **Anonymized User (2021-12-16)**: Rating: 3
   - Comments: "Too many spelling errors and grammatical mistakes on the powerpoint slides."

4. **Udit Gupta (2022-06-19)**: Rating: 3
   - Comments: "Forecasting Models & Time Series Analysis for Business in R."

5. **Advik Keshary (2022-01-16)**: Rating: 3.5
   - Comments: "Econometrics and Statistics for Business in R & Python."

These reviews reflect lower ratings and negative feedback regarding course content, quality, and instructor effectiveness.

In [33]:
# Preparando os dados não-estruturados
def prepare_excel(file_path):
    # Carregando os dados
    loader = UnstructuredExcelLoader(file_path, mode="elements")
    docs = loader.load()
    
    # Divindo o texto em chunks
    text_splitter = RecursiveCharacterTextSplitter(
        chunk_size=2000,
        chunk_overlap=200
    )
    chunks = text_splitter.split_documents(docs)
    
    # Preparando os embeddings
    embeddings = OpenAIEmbeddings(
        openai_api_key=openai_api_key,
        model="text-embedding-3-large"
    )

    # FAISS Index
    db_faiss = FAISS.from_documents(chunks, embeddings)

    return db_faiss

In [35]:
# Preparar a função retrieve e generate (RAG)
def ask(db, query, k):
    # Pegando o contexto
    docs_faiss = db.similarity_search_with_score(query, k=k)
    context_text = "\n\n".join([doc.page_content for doc, _score in docs_faiss])

    # Definindo o prompt
    prompt = f"""
    based on this context {context_text}
    please answer this question {query}
    if you don't know the answer just say you don't know.
    """
    
    # Chamando a LLM
    model = ChatOpenAI(
        openai_api_key=openai_api_key,
        model="gpt-4o-mini",
        temperature=0
    )
    return model.invoke(prompt)

In [37]:
# Preparar os dados excel
db_excel = prepare_excel(excel_path)

In [38]:
# Definir query
query = """
Analyse the reviews, choose the ones with the worst comments, and transcribe them
"""

In [40]:
# Perguntar
response = ask(db_excel, query, 5)

In [42]:
# Mostrar resposta
display(Markdown(response.content))

Based on the provided context, here are the reviews with the worst comments transcribed:

1. **Loïc Legros** (2023-11-08 17:02:11+00:00) - Rating: 1
   - "Not a good course :\n1- some python code are not up to date and doesn't work.\n2- the course doesn't add any value compare to the simple reading of either wikipedia article or documentation of python module used. Considered this course as a audio version of those.\n3- instructor doesn't really know python and code could easily be improved."

2. **Anish** (2023-05-20 16:11:20+00:00) - Rating: 2
   - "Please speak clear English, so that we can understand you courses. I am not able to understand some words while you are speaking. This is the basic things for any courses. Audience should understand what you speak."

3. **Yestantis Dobson** (2024-03-12 18:34:04+00:00) - Rating: 2
   - "I think it needs to provide opportunities to practice using the formula."

4. **Kalagana Venkata Rao** (2023-06-18 11:20:37+00:00) - Rating: 2.5
   - No specific comment provided, but the rating indicates dissatisfaction.

These reviews highlight significant issues with course content, instructor clarity, and overall value.