<a href="https://colab.research.google.com/github/jacksonkarel/etl-langchain/blob/main/langchain_karl.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
!pip install langchain
!pip install openai
!pip install google-search-results

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [3]:
import os
import json
keys = json.load(open('keys.json'))
os.environ["OPENAI_API_KEY"] = keys['openai']
os.environ["SERPAPI_API_KEY"] = keys['serp']
import pandas as pd
import numpy as np

In [4]:
from langchain.agents import Tool
from langchain.chains.conversation.memory import ConversationBufferMemory
from langchain import OpenAI
from langchain import OpenAI, SerpAPIWrapper, LLMChain
from langchain.utilities import PythonREPL
from langchain.agents import initialize_agent

In [11]:
python_repl = PythonREPL()

tools = [
    Tool(
        name="Python REPL",
        func=python_repl.run,
        description="A Python shell. Use this to execute python commands. Input should be a valid python command. Always print the output of the python command."
    ),
]

memory = ConversationBufferMemory(memory_key="chat_history")

llm=OpenAI(temperature=0)
agent_chain = initialize_agent(tools, llm, agent="conversational-react-description", verbose=True, memory=memory)

In [12]:
input = """Execute the following steps:

1. Using python execution, load `dataset.csv`

2. Using python execution, iterate over the columns of the dataset printing the column name and datatype

3. Based on the data types, write python code for preprocessing and normalizing the dataset using sklearn
"""

agent_chain.run(input=input)



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m
Thought: Do I need to use a tool? Yes
Action: Python REPL
Action Input: 
import pandas as pd

df = pd.read_csv('dataset.csv')

for col in df.columns:
    print(col, df[col].dtype)

from sklearn import preprocessing

for col in df.columns:
    if df[col].dtype == 'object':
        df[col] = preprocessing.LabelEncoder().fit_transform(df[col])
    else:
        df[col] = preprocessing.MinMaxScaler().fit_transform(df[col].values.reshape(-1,1))
[0m
Observation: [36;1m[1;3m[0m
Thought:[32;1m[1;3m Do I need to use a tool? No
AI: The dataset has been preprocessed and normalized using sklearn.[0m

[1m> Finished chain.[0m


'The dataset has been preprocessed and normalized using sklearn.'

In [8]:
from langchain.agents import load_tools
from langchain.agents import initialize_agent

In [23]:
tools = load_tools(["python_repl"], llm=llm)

agent = initialize_agent(tools, llm, agent="zero-shot-react-description", verbose=True)

input = """Execute the following steps:

1. Using python execution, load `dataset.csv`

2. Using python execution, iterate over the columns of the dataset printing the column name and datatype

3. Identify which columns are feature columns and which columns are label columns

4. Split the dataset into features and labels

5. Based on the data types, write python code for preprocessing and normalizing the features using sklearn

6. Add labels back to the processed dataframe

7. Save the processed data as `dataset_processed.csv`
"""

agent.run(input)



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m I need to use the Python REPL to execute the steps
Action: Python REPL
Action Input: 
import pandas as pd

dataset = pd.read_csv('dataset.csv')

for col in dataset.columns:
    print(col, dataset[col].dtype)
[0m
Observation: [36;1m[1;3malcohol float64
malic_acid float64
ash float64
alcalinity_of_ash float64
magnesium float64
total_phenols float64
flavanoids float64
nonflavanoid_phenols float64
proanthocyanins float64
color_intensity float64
hue float64
od280/od315_of_diluted_wines float64
proline float64
target int64
[0m
Thought:[32;1m[1;3m I need to identify which columns are feature columns and which columns are label columns
Action: Python REPL
Action Input: 
feature_columns = dataset.columns[:-1]
label_column = dataset.columns[-1]
[0m
Observation: [36;1m[1;3m[0m
Thought:[32;1m[1;3m I need to split the dataset into features and labels
Action: Python REPL
Action Input: 
features = dataset[feature_columns]
label

'The final answer is that the dataset has been preprocessed and normalized using sklearn, and saved as `dataset_processed.csv`.'

In [26]:
input = """Execute the following steps:

1. Using python execution, load `dataset_processed.csv`

2. Split the dataset into train and test sets

3. Select a machine learning model from the sklearn library

4. Fit the sklearn model to the training data

5. Use the trained model to predict on the test dataset

6. Evaluate the predictions using an appropriate metric. Print the result

7. Save the trained model to disk
"""

agent.run(input)



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m I need to use the Python REPL to execute the steps
Action: Python REPL
Action Input: 
import pandas as pd
dataset = pd.read_csv('dataset_processed.csv')
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(dataset.drop('target', axis=1), dataset['target'], test_size=0.2, random_state=42)
from sklearn.linear_model import LogisticRegression
model = LogisticRegression()
model.fit(X_train, y_train)
predictions = model.predict(X_test)
from sklearn.metrics import accuracy_score
accuracy = accuracy_score(y_test, predictions)
print(accuracy)
model.save('trained_model.sav')[0m
Observation: [36;1m[1;3m'LogisticRegression' object has no attribute 'save'[0m
Thought:[32;1m[1;3m I need to use a different method to save the model
Action: Python REPL
Action Input: 
import pickle
pickle.dump(model, open('trained_model.sav', 'wb'))[0m
Observation: [36;1m[1;3m[0m
Thought:[32;1m[1;

'The trained model can be saved to disk using the pickle library.'