<a href="https://colab.research.google.com/github/peremartra/Large-Language-Model-Notebooks-Course/blob/main/3-LangChain/3_3_Data_Analyst_Agent.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<div>
    <h1>Large Language Models Projects</a></h1>
    <h3>Apply and Implement Strategies for Large Language Models</h3>
    <h2>3.3-Create a Data Analyst Assistant using a LLM Agent.</h2>
</div>

by [Pere Martra](https://www.linkedin.com/in/pere-martra/)
________
Model: OpenAI

Colab Environment: CPU.

Keys:
* Agents
* LLMAgent

Related article: [Create Your Own Data Analyst Assistant With Langchain Agents](https://pub.towardsai.net/create-your-own-data-analyst-assistant-with-langchain-agents-722f1cdcdd7e?sk=7add94d79b9ca41ccbbc60828d7f60f1)
_______


# Create an LLMAgent with LangChain

We will create an agent using LangChain and the OpenAI API to analyze data within Excel files.

This agent will have the ability to identify relationships between variables, clean the data, select appropriate models, and execute them to generate future predictions.

In essence, it will function as a data scientist assistant, streamlining our day-to-day tasks.

## Intalling and importing libraries

In [None]:
!pip install -q langchain==0.1.2
!pip install -q langchain_experimental==0.0.49
!pip install -q langchain-openai==0.0.2

We use the **os** library to store Environ variables. Like OPENAI_API_KEY.

Get you OpenAI API  Key: https://platform.openai.com/

In [None]:
import os
from getpass import getpass
os.environ["OPENAI_API_KEY"] = getpass("OpenAI API Key: ")

## Loading the Data

In [None]:
from google.colab import drive
drive.mount('/content/drive')

In [None]:
!pip install kaggle

In [None]:
import os
#This directory should contain you kaggle.json file with you key
os.environ['KAGGLE_CONFIG_DIR'] = '/content/drive/MyDrive/kaggle'

In [None]:
!kaggle datasets download -d goyaladi/climate-insights-dataset

In [None]:
import zipfile

# Define the path to your zip file
file_path = '/content/climate-insights-dataset.zip'
with zipfile.ZipFile(file_path, 'r') as zip_ref:
    zip_ref.extractall('/content/drive/MyDrive/kaggle')


We are using a kaggle Dataset:
https://www.kaggle.com/datasets/goyaladi/climate-insights-dataset

You can download the CSV. Feel free to use the Dataset you are more interested In, or your own Data.




In [None]:
import pandas as pd
csv_file='/content/drive/MyDrive/kaggle/climate_change_data.csv'
#creating the document with Pandas.
document = pd.read_csv(csv_file)

In [None]:
document.head(5)

In [None]:
#If you want to use your own CSV just execute this Cell
#from google.colab import files

#def load_csv_file():
#  """Loads a CSV file into a Pandas dataframe."""

#  uploaded_file = files.upload()
#  file_path = next(iter(uploaded_file))
#  document = pd.read_csv(file_path)
#  return document

#if __name__ == "__main__":
#  document = load_csv_file()
#  print(document)

# Creating the Agent
This is the easiest Agent we can create with LangChain, we only need to import the **create_pandas_dataframe_agent**.

Time to create our little assistant, and we need only a call.

We let **OpenAI** decide which model to use. However, we specify a **temperature** value of 0 to its parameter, so that it is not imaginative. This is much better when we want the model to be able to give commands to the different libraries it can use.



In [None]:
from langchain.agents.agent_types import AgentType
from langchain_experimental.agents.agent_toolkits import create_pandas_dataframe_agent

from langchain_openai import ChatOpenAI
from langchain_openai import OpenAI

In [None]:
sm_ds_OAI = create_pandas_dataframe_agent(
    OpenAI(temperature=0),
    document,
    verbose=True
)

In [None]:
sm_ds_Chat = create_pandas_dataframe_agent(
    ChatOpenAI(temperature=0),
    document,
    handle_parsing_errors=True,
    verbose=True
)

We are going to test 2 different models. The recommendation is use the one created withg the class OpenAI, but judge it by yourself.

## First Question.

In [None]:
sm_ds_OAI.invoke("Analyze this data, and write a brief explanation around 100 words.")

In [None]:
document.info()

The description of the data made by the Agent is acurated.

In [None]:
sm_ds_Chat.invoke("""Analyze this data, and write a brief explanation around 100 words. """)

The second Agent is unable to solve this question.

## Second Question.

In [None]:
sm_ds_OAI.run("Do you think is possible to forecast the temperature?")


The model thinks that is possible to forecast the temperature, but difficult because the weak correlation between variables.

I don't now why the model decided to create a graphic bar.

In [None]:
sm_ds_Chat.run("Do you think is possible to forecast the temperature?")

The second Agent, created with ChatOpenAI has a different opinion.

## Third question

In [None]:
sm_ds_OAI.run("""
Can you create a line graph containing the anual average co2 emissions over the years?
""")


## Fourth question

In [None]:
sm_ds_OAI.run("""
Create a line graph with seaborn containing the anual average co2 emissions in Portugal over the years.
""")

## Last Question.

In [None]:
sm_ds_OAI.invoke("""Select a forecasting model to forecast the temperature.
Use this kind of model to forecast the average temperature
for year in Port Maryberg in Malta for the next 5 years.
Write the temperatures forecasted in a table.""")

# Conclusions

This is one of the most powerful and, at the same time, easiest to use agents. We have seen how with just a few lines of code, we had an agent capable of following our instructions to analyze, clean, and generate charts from our data. Not only that, but it has also been able to draw conclusions and even decide which algorithm was best for forecasting the data.

The world of agents is just beginning, and many players are entering the field, such as Hugging Face, Microsoft, or Google. Their capabilities are not only growing with new tools but also with new language models.

**It's a revolution that we cannot afford to miss and will change many things.**