<a href="https://colab.research.google.com/github/peremartra/Large-Language-Model-Notebooks-Course/blob/main/3-LangChain/LangChain_Agent_create_Data_Scientist_Assistant.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<div align="center">
<h1><a href="https://github.com/peremartra/Large-Language-Model-Notebooks-Course">LLM Hands On Course</a></h1>
    <h3>Understand And Apply Large Language Models</h3>
    <p>by <b>Pere Martra</b></p>
    <h2>Create a Data Analyst Agent with LangChain </h2>
</div>

<br>

<div align="center">
    &nbsp;
    <a target="_blank" href="https://www.linkedin.com/in/pere-martra/"><img src="https://img.shields.io/badge/style--5eba00.svg?label=LinkedIn&logo=linkedin&style=social"></a>
    
</div>

<br>
<hr>

# Create an LLMAgent with LangChain

We are going to create an Agent With LangChain that using the OpenAI API, will be able to analyze the data contained in an Excel file.

It will be able to find relationships between variables, clean the data, search for a model, and execute it to make future predictions.

In summary, it will act as a Data Scientist Assistant, helping us in our day-to-day tasks.

## Intalling and importing libraries

In [None]:
!pip install -q langchain==0.1.2
!pip install -q langchain_experimental==0.0.49
!pip install -q langchain-openai==0.0.2

We use the **os** library to store Environ variables. Like OPENAI_API_KEY.

Get you OpenAI API  Key: https://platform.openai.com/

In [None]:
import os
from getpass import getpass
os.environ["OPENAI_API_KEY"] = getpass("OpenAI API Key: ")

## Loading the Data

In [None]:
from google.colab import drive
drive.mount('/content/drive')

In [None]:
!pip install kaggle

In [None]:
import os
#This directory should contain you kaggle.json file with you key
os.environ['KAGGLE_CONFIG_DIR'] = '/content/drive/MyDrive/kaggle'

In [None]:
!kaggle datasets download -d goyaladi/climate-insights-dataset

In [None]:

import zipfile

# Define the path to your zip file
file_path = '/content/climate-insights-dataset.zip'
with zipfile.ZipFile(file_path, 'r') as zip_ref:
    zip_ref.extractall('/content/drive/MyDrive/kaggle')


We are using a kaggle Dataset:
https://www.kaggle.com/datasets/goyaladi/climate-insights-dataset

You can download the CSV. Feel free to use the Dataset you are more interested In, or your own Data.




In [None]:
import pandas as pd
csv_file='/content/drive/MyDrive/kaggle/climate_change_data.csv'
#creating the document with Pandas.
document = pd.read_csv(csv_file)

In [None]:
document.head(5)

In [None]:
#If you want to use your own CSV just execute this Cell
#from google.colab import files

#def load_csv_file():
#  """Loads a CSV file into a Pandas dataframe."""

#  uploaded_file = files.upload()
#  file_path = next(iter(uploaded_file))
#  document = pd.read_csv(file_path)
#  return document

#if __name__ == "__main__":
#  document = load_csv_file()
#  print(document)

# Creating the Agent
This is the easiest Agent we can create with LangChain, we only need to import the **create_pandas_dataframe_agent**.

Time to create our little assistant, and we need only a call.

We let **OpenAI** decide which model to use. However, we specify a **temperature** value of 0 to its parameter, so that it is not imaginative. This is much better when we want the model to be able to give commands to the different libraries it can use.



In [None]:
from langchain.agents.agent_types import AgentType
from langchain_experimental.agents.agent_toolkits import create_pandas_dataframe_agent

from langchain_openai import ChatOpenAI
from langchain_openai import OpenAI

In [None]:
sm_ds_OAI = create_pandas_dataframe_agent(
    OpenAI(temperature=0),
    document,
    verbose=True
)

## First Question.

In [None]:
sm_ds_OAI.invoke("Analyze this data, and write a brief explanation around 100 words.")

In [None]:
document.info()

The description of the data made by the Agent is acurated.

## Second Question.

In [None]:
sm_ds_OAI.run("Do you think is possible to forecast the temperature?")


The model thinks that is possible to forecast the temperature, but difficult because the weak correlation between variables.

I don't now why the model decided to create a graphic bar.

## Third question

In [None]:
sm_ds_OAI.run("""
Can you create a line graph containing the anual average co2 emissions over the years?
""")


## Fourth question

In [None]:
sm_ds_OAI.run("""
Create a line graph with seaborn containing the anual average co2 emissions in Portugal over the years.
""")

## Last Question.

In [None]:
sm_ds_OAI.invoke("""Select a forecasting model to forecast the temperature.
Use this kind of model to forecast the average temperature
for year in Port Maryberg in Malta for the next 5 years.
Write the temperatures forecasted in a table.""")

Note how the agent is capable of installing the necessary software for the actions it needs to perform.

# Conclusions

This is one of the most powerful and, at the same time, easiest to use agents. We have seen how with just a few lines of code, we had an agent capable of following our instructions to analyze, clean, and generate charts from our data. Not only that, but it has also been able to draw conclusions and even decide which algorithm was best for forecasting the data.

The world of agents is just beginning, and many players are entering the field, such as Hugging Face, Microsoft, or Google. Their capabilities are not only growing with new tools but also with new language models.

**It's a revolution that we cannot afford to miss and will change many things.**