<a href="https://colab.research.google.com/github/rabbitmetrics/langchain-13-min/blob/main/notebooks/langchain-13-min.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
# Load environment variables

from dotenv import load_dotenv,find_dotenv
load_dotenv(find_dotenv())

False

In [2]:
# Run basic query with OpenAI wrapper

from langchain.llms import OpenAI
llm = OpenAI(model_name="text-davinci-003")
llm("explain large language models in one sentence")

'\n\nLarge language models are complex machine learning models built from large datasets of natural language data and used to generate text, recognize speech, and perform other language-related tasks.'

In [3]:
# import schema for chat messages and ChatOpenAI in order to query chatmodels GPT-3.5-turbo or GPT-4

from langchain.schema import (
    AIMessage,
    HumanMessage,
    SystemMessage
)
from langchain.chat_models import ChatOpenAI

In [4]:
chat = ChatOpenAI(model_name="gpt-3.5-turbo",temperature=0.3)
messages = [
    SystemMessage(content="You are an expert data scientist"),
    HumanMessage(content="Write a Python script that trains a neural network on simulated data ")
]
response=chat(messages)

print(response.content,end='\n')

Sure, here is an example Python script that trains a simple neural network on simulated data using the Keras library:

```python
import numpy as np
from keras.models import Sequential
from keras.layers import Dense

# Generate simulated data
X = np.random.rand(1000, 10)
y = np.sum(X, axis=1)

# Define the neural network architecture
model = Sequential()
model.add(Dense(32, input_dim=10, activation='relu'))
model.add(Dense(1, activation='linear'))

# Compile the model
model.compile(loss='mean_squared_error', optimizer='adam')

# Train the model on the simulated data
model.fit(X, y, epochs=50, batch_size=32)

# Evaluate the model on new data
X_test = np.random.rand(100, 10)
y_test = np.sum(X_test, axis=1)
mse = model.evaluate(X_test, y_test)
print("Mean squared error:", mse)
```

This script generates a simulated dataset with 1000 samples, where each sample has 10 features. The target variable is the sum of the features for each sample. The neural network has one hidden layer with 32 uni

In [5]:
# Import prompt and define PromptTemplate

from langchain import PromptTemplate

template = """
You are an expert data scientist with an expertise in building deep learning models. 
Explain the concept of {concept} in a couple of lines
"""

prompt = PromptTemplate(
    input_variables=["concept"],
    template=template,
)

In [6]:
# Run LLM with PromptTemplate

llm(prompt.format(concept="autoencoder"))

'\nAutoencoders are neural networks that learn to compress data using an encoding layer and then decode the data back to its original form using a decoding layer. They are typically used for dimensionality reduction, denoising, and feature learning.'

In [7]:
# Import LLMChain and define chain with language model and prompt as arguments.

from langchain.chains import LLMChain
chain = LLMChain(llm=llm, prompt=prompt)

# Run the chain only specifying the input variable.
print(chain.run("autoencoder"))


An autoencoder is a type of artificial neural network used to learn efficient data coding in an unsupervised manner. It consists of an encoder and a decoder which maps an input to a compressed representation and then reconstructs the input from this representation. Autoencoders are used for dimensionality reduction, feature learning, and generative modeling.


In [8]:
# Define a second prompt 

second_prompt = PromptTemplate(
    input_variables=["ml_concept"],
    template="Turn the concept description of {ml_concept} and explain it to me like I'm five in 500 words",
)
chain_two = LLMChain(llm=llm, prompt=second_prompt)

In [30]:
# Define a sequential chain using the two chains above: the second chain takes the output of the first chain as input

from langchain.chains import SimpleSequentialChain
overall_chain = SimpleSequentialChain(chains=[chain, chain_two], verbose=True)

# Run the chain specifying only the input variable for the first chain.
explanation = overall_chain.run("autoencoder")
print(explanation)



[1m> Entering new SimpleSequentialChain chain...[0m
[36;1m[1;3m
An autoencoder is a type of artificial neural network that is used to learn a compressed representation of input data. It works by learning a series of weights that map an input vector to a hidden representation, and then from the hidden representation back to the original input vector, thus reconstructing the input vector from the hidden representation.[0m
[33;1m[1;3m

An autoencoder is a type of machine learning, which is a way for computers to learn by themselves. It is used to learn how to compress data, which means to take a lot of information and make it smaller. This is useful because it can help make computers work faster, as well as help make data easier to store. 

To do this, an autoencoder uses something called an artificial neural network. This is a type of program that can “think” like a human brain. It can take in a lot of information and figure out what is important and what is not. 

The autoencod

In [35]:
# Import utility for splitting up texts and split up the explanation given above into document chunks

from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size = 100,
    chunk_overlap  = 0,
)

texts = text_splitter.create_documents([explanation])
texts

[Document(page_content='An autoencoder is a type of machine learning, which is a way for computers to learn by themselves.', metadata={}),
 Document(page_content='It is used to learn how to compress data, which means to take a lot of information and make it', metadata={}),
 Document(page_content='smaller. This is useful because it can help make computers work faster, as well as help make data', metadata={}),
 Document(page_content='easier to store.', metadata={}),
 Document(page_content='To do this, an autoencoder uses something called an artificial neural network. This is a type of', metadata={}),
 Document(page_content='program that can “think” like a human brain. It can take in a lot of information and figure out what', metadata={}),
 Document(page_content='is important and what is not.', metadata={}),
 Document(page_content='The autoencoder takes in a bunch of information, called an input vector. It then looks for patterns', metadata={}),
 Document(page_content='in the information.

In [36]:
# Individual text chunks can be accessed with "page_content"

texts[0].page_content

'An autoencoder is a type of machine learning, which is a way for computers to learn by themselves.'

In [37]:
# Import and instantiate OpenAI embeddings

from langchain.embeddings import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(model_name="ada")

In [38]:
# Turn the first text chunk into a vector with the embedding

query_result = embeddings.embed_query(texts[0].page_content)
print(query_result)

[-0.032400242618663484, 0.02961954813553015, 0.012451332853294165, 0.013707795025690132, 0.036355007029591484, 0.01485096925874316, -0.025499999235399546, -0.00615357414394351, 0.020062196773389693, 0.0035402362301041913, -0.042678512151226845, -0.014037358416159945, 0.015067245641192595, 0.03312116451437665, 0.0249850560885445, -0.01296627600014921, 0.04239014488305768, -0.011555330697432215, 0.021071485937272018, 0.0064728389694994415, 0.027106623790738214, 0.005278170514893168, 0.06335864144590064, -0.016910742777485895, 0.03748788269773063, 0.03396566918905637, 0.007399737285764314, 0.047745555100466235, -0.0015886005533402108, -0.009665488156381324, -0.027662763059893906, 0.00663762113039563, 0.02484087245445992, 0.021627625206427713, 0.044491115455423634, 0.02683885365239681, 0.03544870724013822, -0.011781906576118096, -0.031287964080352094, -0.013707795025690132, 0.004601019848370406, -0.014037358416159945, -0.009310177422304248, 0.023502019900107782, -0.01162742344579707, -0.02

In [39]:
# Import and initialize Pinecone client

import os
import pinecone
from langchain.vectorstores import Pinecone


pinecone.init(
    api_key=os.getenv('PINECONE_API_KEY'),
    environment=os.getenv('PINECONE_ENV')
)
pinecone.list_indexes()

['example-index']

In [40]:
# Upload vectors to Pinecone

index_name = "example-index"
#pinecone.create_index("example-index", dimension=1024)
search = Pinecone.from_documents(texts, embeddings, index_name=index_name)

In [41]:
# Do a simple vector similarity search

query = "What is magical about an autoencoder?"
result = search.similarity_search(query)

print(result)

[Document(page_content='To do this, an autoencoder uses something called an artificial neural network. This is a type of', metadata={}), Document(page_content='An autoencoder is a type of machine learning, which is a way for computers to learn by themselves.', metadata={}), Document(page_content='is important and what is not.', metadata={}), Document(page_content='The autoencoder then takes the hidden representation and turns it back into the original input', metadata={})]


In [42]:
# Import Python REPL tool and instantiate Python agent

from langchain.agents.agent_toolkits import create_python_agent
from langchain.tools.python.tool import PythonREPLTool
from langchain.python import PythonREPL
from langchain.llms.openai import OpenAI

agent_executor = create_python_agent(
    llm=OpenAI(temperature=0, max_tokens=1000),
    tool=PythonREPLTool(),
    verbose=True
)

In [49]:
# Execute the Python agent

agent_executor.run("Find the roots(zeros) of the quadratic function 3 * x**2 + 2*x -1")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m I need to solve the equation 3 * x**2 + 2*x -1
Action: Python REPL
Action Input: import numpy as np[0m
Observation: [36;1m[1;3m[0m
Thought:[32;1m[1;3m I can use the numpy function to solve the equation
Action: Python REPL
Action Input: np.roots([3,2,-1])[0m
Observation: [36;1m[1;3m[0m
Thought:[32;1m[1;3m The output of the function is the roots of the equation
Final Answer: The roots of the equation 3 * x**2 + 2*x -1 are -1 and 0.33333[0m

[1m> Finished chain.[0m


'The roots of the equation 3 * x**2 + 2*x -1 are -1 and 0.33333'