<a href="https://colab.research.google.com/github/petersonchiquetto/LangChain-DataAgents/blob/main/LangChain_DataAgents.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Tools Used and Potential Applications in the Project

In the context of this project, the tools **LangChain**, **Groq**, and **Pandas** play fundamental roles in achieving the goal of creating data-driven language agents. Below, I detail each tool and some potential applications within the project's scope.

---

## **LangChain**

### What is it?
LangChain is a robust framework that facilitates the creation of applications based on language models (LLMs). It provides a set of modules to seamlessly connect different components, such as memory, external tools, and agents.

### Relevant Features for the Project:
- **Call Chaining:** Allows different workflow steps (e.g., natural language interpretation and operation execution) to be connected logically.
- **Agents:** Provides the ability to create custom agents that interpret natural language commands and translate them into specific actions.
- **Memory:** Enables the storage of context from previous interactions, which is useful for maintaining a fluid conversation with the agent.

### Potential Applications:
1. **Query Automation:** Create agents that, using natural language queries, can perform exploratory analyses on datasets.
2. **Customization:** Adjust the agent's behavior to meet specific needs, such as financial or scientific analysis.
3. **Scalability:** Facilitate integration with other APIs and tools to expand the project's capabilities.

---

## **Groq**

### What is it?
Groq is a high-performance inference engine that enables the efficient use of large-scale language models (LLMs). It is optimized to deliver fast and accurate responses, even in complex scenarios.

### Relevant Features for the Project:
- **Access to LLMs:** Allows integration with state-of-the-art models like Llama3 to interpret natural language queries.
- **Performance:** Ideal for applications requiring fast and accurate responses, even with large data volumes.

### Potential Applications:
1. **Natural Language Interpretation:** Enable the agent to understand questions and generate detailed responses.
2. **Response Generation:** Provide explanations or analyses based on the data, such as correlations or trends.
3. **Model Customization:** Adapt the engine to cater to specific application domains, such as sales, operations, or healthcare analysis.

---

## **Pandas**

### What is it?
Pandas is a widely used library for data manipulation and analysis in Python. It offers data structures such as **DataFrames**, which are highly efficient for working with tabular data.

### Relevant Features for the Project:
- **Data Manipulation:** Enables efficient loading, cleaning, and transformation of data.
- **Statistical Analysis:** Provides methods for calculations like mean, median, variance, and correlation.
- **Visualization:** Integrates with visualization libraries to create graphs and tables.

### Potential Applications:
1. **Data Exploration:** Perform basic operations like filtering columns, calculating statistics, or identifying outliers.
2. **Agent Support:** Provide analysis results to the agent, complementing the responses to user queries.
3. **Custom Transformations:** Allow the agent to perform adjustments to the data, such as creating derived columns or applying complex filters.

---

## **Integration of Tools in the Project**

The combination of these tools ensures the smooth functionality of the project. Below is an example of the workflow:

1. **The user asks a natural language question** about the dataset.
2. **Groq** processes the query, using the Llama3 language model to understand the user's intent.
3. **LangChain** translates the intent into concrete actions, such as performing a statistical analysis on the DataFrame.
4. **Pandas** executes the required operation on the data and returns the results to the agent.
5. **The agent responds to the user** with the processed information, explaining the results in an understandable manner.

---

## **Examples of Future Applications**

1. **Financial Analysis:** Build a system where users can query financial data with questions like "What was the average profit over the past 5 years?"
2. **Healthcare Monitoring:** Apply the agents to analyze medical data and answer questions like "Which patients are at higher risk for diabetes?"
3. **Industrial Process Management:** Use the agents to identify bottlenecks in production by analyzing KPIs.

---

In [None]:
import pandas as pd

df = pd.read_csv("https://raw.githubusercontent.com/vqrca/agentes_langchain/refs/heads/main/Dados/dados_entregas.csv")
df.head()

Unnamed: 0,ID_pedido,anos_experiencia_agente,classificacao_agente,latitude_loja,longitude_loja,latitude_entrega,longitude_entrega,data_pedido,hora_pedido,hora_retirada,clima,trafego,veiculo,area,categoria_produto,tempo_entrega
0,ialx566343618,8,4.9,22.745049,75.892471,22.765049,75.912471,2022-03-19,11:30:00,11:45:00,Ensolarado,Alto,Motocicleta,Urbano,Roupas,120
1,akqg208421122,10,4.5,12.913041,77.683237,13.043041,77.813237,2022-03-25,19:45:00,19:50:00,Chuvoso,Congestionamento,Scooter,Metropolitano,Eletronicos,165
2,njpu434582536,16,4.4,12.914264,77.6784,12.924264,77.6884,2022-03-19,08:30:00,08:45:00,Tempestade,Baixo,Motocicleta,Urbano,Esportes,130
3,rjto796129700,8,4.7,11.003669,76.976494,11.053669,77.026494,2022-04-05,18:00:00,18:10:00,Ensolarado,Medio,Motocicleta,Metropolitano,Cosmeticos,105
4,zguw716275638,11,4.6,12.972793,80.249982,13.012793,80.289982,2022-03-26,13:30:00,13:45:00,Nublado,Alto,Scooter,Metropolitano,Brinquedos,150


In [None]:
pip install langchain-groq -q

In [None]:
from google.colab import userdata
GROQ_API = userdata.get('GROQ_API')

In [None]:
from langchain_groq import ChatGroq

In [None]:
llm = ChatGroq(temperature=0, groq_api_key=GROQ_API, model_name='llama3-70b-8192')

In [None]:
llm = ChatGroq(temperature=0, groq_api_key=GROQ_API, model_name='llama3-70b-8192')

In [None]:
ai_msg = llm.invoke(
    """
    Eu tenho um dataframe chamado 'df' com as colunas 'anos_experiencia_agente' e 'tempo_entrega'.
    Escreva o código Python com a biblioteca Pandas para calcular a correlação entre as duas colunas.
    Retorne o Markdown para o trecho de código Python e nada mais.
    """
)

In [None]:
print(ai_msg.content)

```
correlacao = df['anos_experiencia_agente'].corr(df['tempo_entrega'])
print("A correlação entre 'anos_experiencia_agente' e 'tempo_entrega' é de {:.2f}".format(correlacao))
```


In [None]:
correlacao = df['anos_experiencia_agente'].corr(df['tempo_entrega'])
print("A correlação entre 'anos_experiencia_agente' e 'tempo_entrega' é de {:.2f}".format(correlacao))

A correlação entre 'anos_experiencia_agente' e 'tempo_entrega' é de -0.25
