# Langchain Demo

In as few lines as possible demonstrate 

* Large Language Model ("LLM") Wrappers
* Prompt Templates
* Chains - Take a prompt template and LLM and create an interface
  * including sequential chains
* Autonomous Agents

Starting with OpenAI LLMs 

In [1]:
# Load environment
from dotenv import load_dotenv, find_dotenv
load_dotenv(find_dotenv())

True

## Large Language Model wrapper

In [2]:
from langchain.llms import OpenAI
llm = OpenAI(model_name="text-davinci-003")
llm("explain large language models in one sentence")

'\n\nLarge language models are pre-trained models that use a large body of text to learn the statistical properties of language, typically for use in natural language processing tasks.'

In [3]:
from langchain.schema import (
    AIMessage, 
    HumanMessage,
    SystemMessage
)
from langchain.chat_models import ChatOpenAI

In [4]:
chat = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0.3)
messages = [
    SystemMessage(content="You are an expert data scientist"),
    HumanMessage(content="Write a python script that trains a neural network on simulated data")
]
response=chat(messages)

In [5]:
print(response.content, end='\n')

Sure, here's an example script that trains a simple neural network on simulated data using the Keras library:

```python
import numpy as np
from keras.models import Sequential
from keras.layers import Dense

# Generate some simulated data
X = np.random.rand(1000, 10)
y = np.sum(X, axis=1)

# Define the neural network architecture
model = Sequential()
model.add(Dense(32, input_dim=10, activation='relu'))
model.add(Dense(1, activation='linear'))

# Compile the model
model.compile(loss='mean_squared_error', optimizer='adam')

# Train the model on the simulated data
model.fit(X, y, epochs=100, batch_size=32, validation_split=0.2)
```

In this script, we first generate some simulated data `X` and `y`. `X` is a matrix of 1000 rows and 10 columns, where each element is a random number between 0 and 1. `y` is a vector of length 1000, where each element is the sum of the corresponding row in `X`.

We then define the neural network architecture using the `Sequential` class from Keras. Our networ

## Prompt Templates

In [6]:
from langchain import PromptTemplate

template = """
You are an expert data, scientist with expertise in building deep learning models. Explain the concept of {concept} in a couple of lines.
"""


prompt = PromptTemplate(
    input_variables=["concept"],
    template=template,
    )

In [7]:
prompt

PromptTemplate(input_variables=['concept'], output_parser=None, partial_variables={}, template='\nYou are an expert data, scientist with expertise in building deep learning models. Explain the concept of {concept} in a couple of lines.\n', template_format='f-string', validate_template=True)

In [8]:
llm(prompt.format(concept="regularization"))

'\nRegularization is a technique used to reduce the complexity of a model by penalizing large weights. This helps to prevent overfitting and allow the model to generalize better to unseen data.'

In [9]:
llm(prompt.format(concept="autoencoder"))

'\nAn autoencoder is a type of artificial neural network that is used to learn efficient data representations, or compressions, in an unsupervised manner. It works by compressing the input data into a lower-dimensional representation, and then reconstructing the output data from the lower-dimensional representation. Autoencoders are useful for learning meaningful features from large and complex datasets.'

## Sequential Chains

In [10]:
from langchain.chains import LLMChain
chain = LLMChain(llm=llm, prompt=prompt)

print(chain.run("autoencoder"))


An autoencoder is a type of artificial neural network that is used to learn a compressed representation of data. It works by having an encoder that compresses the input data into a latent space and a decoder that reconstructs the original data from the latent space.


In [11]:
second_prompt = PromptTemplate(
    input_variables=["ml_concept"],
    template="Turn the concept description of {ml_concept} and explain it to me like I'm five in 500 words."
)
chain_two = LLMChain(llm=llm, prompt=second_prompt)

In [12]:
from langchain.chains  import SimpleSequentialChain

overall_chain = SimpleSequentialChain(chains=[chain, chain_two], verbose=True)

explanation = overall_chain.run("autoencoder")
print(explanation)



[1m> Entering new SimpleSequentialChain chain...[0m
[36;1m[1;3m
Autoencoders are a type of neural network that use an encoder-decoder architecture to compress data into a smaller representation, often called a latent variable, for efficient storage. The encoder compresses the data into a lower-dimensional representation, and the decoder reconstructs the original data from the compressed representation. Autoencoders can be used for dimensionality reduction, denoising, and feature learning.[0m
[33;1m[1;3m

Autoencoders are a type of computer program that can help make things simpler. Imagine that you have a big box full of stuff that you want to store in a smaller space. Autoencoders can help you do that.

First, the autoencoder takes all the stuff in the box and compresses it into a smaller representation. This smaller representation is like a code that can be used to store the data in a more efficient way.
 
Second, the autoencoder will take this code and use it to reconstruct

### Create embedding from LLM output

In [13]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=100,
    chunk_overlap=0,
)

texts = text_splitter.create_documents([explanation])

In [14]:
texts

[Document(page_content='Autoencoders are a type of computer program that can help make things simpler. Imagine that you have', metadata={}),
 Document(page_content='a big box full of stuff that you want to store in a smaller space. Autoencoders can help you do', metadata={}),
 Document(page_content='that.', metadata={}),
 Document(page_content='First, the autoencoder takes all the stuff in the box and compresses it into a smaller', metadata={}),
 Document(page_content='representation. This smaller representation is like a code that can be used to store the data in a', metadata={}),
 Document(page_content='more efficient way.', metadata={}),
 Document(page_content='Second, the autoencoder will take this code and use it to reconstruct the original data. This means', metadata={}),
 Document(page_content='that the data can be stored in a much smaller space, and it can be easily retrieved later when', metadata={}),
 Document(page_content='needed.', metadata={}),
 Document(page_content='To m

In [15]:
texts[0].page_content

'Autoencoders are a type of computer program that can help make things simpler. Imagine that you have'

In [16]:
from langchain.embeddings import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(model="ada")

In [17]:
query_result = embeddings.embed_query(texts[0].page_content)
query_result

[-0.02284177392721176,
 -0.0026856556069105864,
 0.0054192692041397095,
 -0.004466957878321409,
 0.003641392569988966,
 0.027829419821500778,
 -0.022923987358808517,
 -0.039901167154312134,
 -0.012893887236714363,
 -0.016566108912229538,
 0.005569994915276766,
 0.020375356078147888,
 -0.012160812504589558,
 -0.0008469749009236693,
 -0.0007146618445403874,
 0.014195607975125313,
 0.00857765506953001,
 0.02189631387591362,
 -0.010530236177146435,
 -0.02882969006896019,
 -0.023321354761719704,
 0.012818524613976479,
 0.0038435019087046385,
 -0.015661755576729774,
 0.00821454357355833,
 0.0068922690115869045,
 0.01037265919148922,
 -0.03836650773882866,
 -0.017813021317124367,
 -0.009509412571787834,
 0.020937150344252586,
 -0.02481490932404995,
 -0.022978797554969788,
 -0.031460534781217575,
 -0.03951750323176384,
 -0.008824296295642853,
 0.006375006400048733,
 -0.02310211770236492,
 0.029185950756072998,
 0.0026154310908168554,
 0.039654526859521866,
 -0.006186599377542734,
 -0.006063278

### Store vector representations in Pinecone

In [18]:
import os
import pinecone
from langchain.vectorstores import Pinecone

pinecone.init(
    api_key=os.environ.get("PINECONE_API_KEY"),
    environment=os.environ.get("PINECONE_ENV")
)


  from tqdm.autonotebook import tqdm


In [19]:
index_name = "langchain-demo01"


# if index_name not in pinecone.list_indexes():
#     pinecone.create_index(index_name, dimension=1024, metric='cosine')

In [20]:
search = Pinecone.from_documents(texts, embeddings, index_name=index_name)

In [21]:
query = "What si magical about an autoencoder?"
result = search.similarity_search(query)

In [22]:
result

[Document(page_content='eyes, and the pattern of its stripes.', metadata={}),
 Document(page_content='needed.', metadata={}),
 Document(page_content='that.', metadata={}),
 Document(page_content='the number of dimensions in a dataset. This is useful for finding patterns in data that would be', metadata={})]

# Agents

In [23]:
from langchain.agents.agent_toolkits import create_python_agent
from langchain.tools.python.tool import PythonAstREPLTool
from langchain.python import PythonREPL
from langchain.llms.openai import OpenAI

In [24]:
agent_executor = create_python_agent(
    llm = OpenAI(temperature=0, max_tokens=1000),
    tool = PythonAstREPLTool(),
    verbose=True
)

In [25]:
agent_executor.run("Find the roots (zeros) of the polynomial equation x^3 - 2x^2 + 3x - 4 = 0")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m I need to solve a polynomial equation
Action: python_repl_ast
Action Input: from sympy import Symbol, solve[0m
Observation: [36;1m[1;3m[0m
Thought:[32;1m[1;3m I can use the solve function to find the roots
Action: python_repl_ast
Action Input: x = Symbol('x')
solve(x**3 - 2*x**2 + 3*x - 4, x)[0m
Observation: [36;1m[1;3m[2/3 + (-1/2 - sqrt(3)*I/2)*(35/27 + 5*sqrt(6)/9)**(1/3) - 5/(9*(-1/2 - sqrt(3)*I/2)*(35/27 + 5*sqrt(6)/9)**(1/3)), 2/3 - 5/(9*(-1/2 + sqrt(3)*I/2)*(35/27 + 5*sqrt(6)/9)**(1/3)) + (-1/2 + sqrt(3)*I/2)*(35/27 + 5*sqrt(6)/9)**(1/3), -5/(9*(35/27 + 5*sqrt(6)/9)**(1/3)) + 2/3 + (35/27 + 5*sqrt(6)/9)**(1/3)][0m
Thought:[32;1m[1;3m I now know the final answer
Final Answer: 2/3 + (-1/2 - sqrt(3)*I/2)*(35/27 + 5*sqrt(6)/9)**(1/3) - 5/(9*(-1/2 - sqrt(3)*I/2)*(35/27 + 5*sqrt(6)/9)**(1/3)), 2/3 - 5/(9*(-1/2 + sqrt(3)*I/2)*(35/27 + 5*sqrt(6)/9)**(1/3)) + (-1/2 + sqrt(3)*I/2)*(35/27 + 5*sqrt(6)/9)**(1/3), -5/(9

'2/3 + (-1/2 - sqrt(3)*I/2)*(35/27 + 5*sqrt(6)/9)**(1/3) - 5/(9*(-1/2 - sqrt(3)*I/2)*(35/27 + 5*sqrt(6)/9)**(1/3)), 2/3 - 5/(9*(-1/2 + sqrt(3)*I/2)*(35/27 + 5*sqrt(6)/9)**(1/3)) + (-1/2 + sqrt(3)*I/2)*(35/27 + 5*sqrt(6)/9)**(1/3), -5/(9*(35/27 + 5*sqrt(6)/9)**(1/3)) + 2/3 + (35/27 + 5*sqrt(6)/9)**(1/3)'