<a href="https://colab.research.google.com/github/rabbitmetrics/langchain-13-min/blob/main/notebooks/langchain-13-min.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [60]:
# Load environment variables

from dotenv import load_dotenv,find_dotenv
load_dotenv(find_dotenv())
# load_dotenv('.env')

True

In [None]:
# Run basic query with OpenAI wrapper

from langchain.llms import OpenAI
llm = OpenAI(model_name="text-davinci-003")
llm("explain large language models in one sentence")

In [61]:
# import schema for chat messages and ChatOpenAI in order to query chatmodels GPT-3.5-turbo or GPT-4

from langchain.schema import (
    AIMessage,
    HumanMessage,
    SystemMessage
)
from langchain.chat_models import ChatOpenAI

In [62]:
chat = ChatOpenAI(model_name="gpt-3.5-turbo",temperature=0.3)
messages = [
    SystemMessage(content="You are an expert data scientist"),
    HumanMessage(content="Write a Python script that trains a neural network on simulated data ")
]
response=chat(messages)

print(response.content,end='\n')

Sure! Here's an example Python script that trains a simple neural network on simulated data using the Keras library:

```python
import numpy as np
from keras.models import Sequential
from keras.layers import Dense

# Generate simulated data
np.random.seed(0)
X = np.random.rand(1000, 10)
y = np.random.randint(2, size=(1000, 1))

# Define the model
model = Sequential()
model.add(Dense(32, input_dim=10, activation='relu'))
model.add(Dense(1, activation='sigmoid'))

# Compile the model
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

# Train the model
model.fit(X, y, epochs=10, batch_size=32)

# Evaluate the model
loss, accuracy = model.evaluate(X, y)
print(f'Loss: {loss}, Accuracy: {accuracy}')
```

In this script, we first generate simulated data using `numpy.random.rand` and `numpy.random.randint` functions. We create a neural network model using the `Sequential` class from Keras and add two dense layers. The first layer has 32 units and uses the ReLU a

In [63]:
# Import prompt and define PromptTemplate

from langchain import PromptTemplate

template = """
You are an expert data scientist with an expertise in building deep learning models. 
Explain the concept of {concept} in a couple of lines
"""

prompt = PromptTemplate(
    input_variables=["concept"],
    template=template,
)

In [64]:
# Run LLM with PromptTemplate

llm(prompt.format(concept="autoencoder"))

'\nAutoencoders are a type of artificial neural network that are used to learn compact representations of data, typically for dimensionality reduction and feature extraction. They are composed of an encoder and a decoder, which work together to learn an efficient representation of the input data.'

In [65]:
# Import LLMChain and define chain with language model and prompt as arguments.

from langchain.chains import LLMChain
chain = LLMChain(llm=llm, prompt=prompt)

# Run the chain only specifying the input variable.
print(chain.run("autoencoder"))


Autoencoders are a type of neural network that learn to encode data into a latent representation and then decode it back to its original form. They can be used for dimensionality reduction, data denoising, and feature learning.


In [66]:
# Define a second prompt 

second_prompt = PromptTemplate(
    input_variables=["ml_concept"],
    template="Turn the concept description of {ml_concept} and explain it to me like I'm five in 500 words",
)
chain_two = LLMChain(llm=llm, prompt=second_prompt)

In [67]:
# Define a sequential chain using the two chains above: the second chain takes the output of the first chain as input

from langchain.chains import SimpleSequentialChain
overall_chain = SimpleSequentialChain(chains=[chain, chain_two], verbose=True)

# Run the chain specifying only the input variable for the first chain.
explanation = overall_chain.run("autoencoder")
print(explanation)



[1m> Entering new SimpleSequentialChain chain...[0m
[36;1m[1;3m
An autoencoder is a type of artificial neural network that takes an input and attempts to reconstruct it at the output layer. It works by compressing the input into a hidden layer, and then reconstructing the input at the output layer by learning the weights of the model. Autoencoders are used for unsupervised learning, for feature extraction and for dimensionality reduction.[0m
[33;1m[1;3m

An autoencoder is like a magician. It takes an input, which can be anything from a picture to a number, and it attempts to make it disappear. It does this by compressing the input into a hidden layer, shrinking it down so that it’s much smaller. Then, the autoencoder tries to make the input reappear. It uses a special kind of smartness, called learning, to figure out how to do this. To learn, the autoencoder looks at the input and the output and changes the weights of the connections between them. 

Autoencoders are used when 

In [68]:
# Import utility for splitting up texts and split up the explanation given above into document chunks

from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size = 100,
    chunk_overlap  = 0,
)

texts = text_splitter.create_documents([explanation])


In [69]:
# Individual text chunks can be accessed with "page_content"

texts[0].page_content

'An autoencoder is like a magician. It takes an input, which can be anything from a picture to a'

In [70]:
# Import and instantiate OpenAI embeddings

from langchain.embeddings import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(model_name="ada")

In [71]:
# Turn the first text chunk into a vector with the embedding

query_result = embeddings.embed_query(texts[0].page_content)
print(query_result)

[-0.020977719222853308, 0.023827294418455335, 0.014561124470314105, 0.011903542359068018, 0.025908899139278786, 0.005840617362116563, -0.004756869854187625, -0.001205006291921359, 0.036316920880750836, 0.010448440210788053, -0.03738803499767372, -0.019350834621859294, 0.01979544989152908, 0.05626394352467466, 0.04716955602924748, -0.027566098290751746, 0.049473465590938036, 0.004511826306493519, 0.006598482986735495, -0.0002805681496021442, 0.018077619776453027, -0.019451883123455794, 0.07667578927594239, -0.016278952722744967, 0.058810369490196815, 0.02825322996425313, 0.009458163032497174, 0.031810146533675844, -0.003834799948813084, -0.04438060993460335, -0.01785531307294073, 0.02237219227017537, 0.0337300717892997, 0.020088490546158924, 0.014561124470314105, 0.020149119647116825, 0.0337907008902576, 0.021826528498909085, -0.038984605048188785, -0.014662172971910603, 0.030355042522750685, -0.027060853920124063, 0.03435657249919799, 0.02066446886790416, -0.03649880818362453, -0.00017

In [72]:
# Import and initialize Pinecone client

import os
import pinecone
from langchain.vectorstores import Pinecone


pinecone.init(
    api_key=os.getenv('PINECONE_API_KEY'),
    environment=os.getenv('PINECONE_ENV')
)


print("api_key: ", os.getenv('PINECONE_API_KEY'))
print("environment: ", os.getenv('PINECONE_ENV'))

api_key:  79e8452b-1af4-4d3b-a90b-01b0f55fe9e6
environment:  asia-southeast1-gcp-free


# Pinecone Quick Start 
https://docs.pinecone.io/docs/quickstart


In [73]:

pinecone.create_index("quickstart", dimension=8, metric="euclidean")



In [74]:
pinecone.list_indexes()


['quickstart']

In [75]:
index = pinecone.Index("quickstart")


In [76]:
# Upsert sample data (5 8-dimensional vectors)
index.upsert([
    ("A", [0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1]),
    ("B", [0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2]),
    ("C", [0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3]),
    ("D", [0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4]),
    ("E", [0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5])
])

ForbiddenException: (403)
Reason: Forbidden
HTTP response headers: HTTPHeaderDict({'content-length': '0', 'date': 'Fri, 30 Jun 2023 22:15:09 GMT', 'server': 'envoy'})


In [79]:
index.describe_index_stats()


{'dimension': 8,
 'index_fullness': 0.0,
 'namespaces': {},
 'total_vector_count': 0}

In [80]:
index.query(
  vector=[0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3],
  top_k=3,
  include_values=True
)

{'matches': [], 'namespace': ''}

In [81]:
pinecone.delete_index("quickstart")


# Things that Didn't work

In [102]:
# If you are running the notebook again, delete the index first
# pinecone.delete_index("langchain-quickstart")

In [93]:
index_name = "langchain-quickstart"
DIMENSIONS = 1024

pinecone.create_index(
        name=index_name, 
        metric="cosine",
        dimension=DIMENSIONS)


In [94]:
# Upload vectors to Pinecone
search = Pinecone.from_documents(texts, embeddings, index_name=index_name)

In [95]:
# Do a simple vector similarity search

query = "What is magical about an autoencoder?"
result = search.similarity_search(query)

print(result)

[Document(page_content='An autoencoder is like a magician. It takes an input, which can be anything from a picture to a', metadata={}), Document(page_content='learn, the autoencoder looks at the input and the output and changes the weights of the connections', metadata={}), Document(page_content='So, if you want to shrink a picture, you could ask an autoencoder to do it for you. First, the', metadata={}), Document(page_content='Autoencoders are used when you want to do something but you don’t have any instructions. This is', metadata={})]


In [100]:
# Import Python REPL tool and instantiate Python agent

from langchain.agents.agent_toolkits import create_python_agent
from langchain.tools.python.tool import PythonREPLTool
from langchain.python import PythonREPL
from langchain.llms.openai import OpenAI

agent_executor = create_python_agent(
    llm=OpenAI(temperature=0, max_tokens=1000),
    tool=PythonREPLTool(),
    verbose=True
)

In [101]:
# Execute the Python agent

agent_executor.run("Find the roots (zeros) if the quadratic function 3 * x**2 + 2*x -1")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m I need to solve a quadratic equation
Action: Python REPL
Action Input: import numpy as np[0m
Observation: [36;1m[1;3m[0m
Thought:[32;1m[1;3m I can use the numpy function to solve the equation
Action: Python REPL
Action Input: np.roots([3,2,-1])[0m
Observation: [36;1m[1;3m[0m
Thought:[32;1m[1;3m I now know the final answer
Final Answer: (-1.0, 0.3333333333333333)[0m

[1m> Finished chain.[0m


'(-1.0, 0.3333333333333333)'