<a href="https://colab.research.google.com/github/rabbitmetrics/langchain-13-min/blob/main/notebooks/langchain-13-min.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
# Import dependencies

import os
import sys
import json
import dotenv
import langchain

In [None]:
# Load environment variables

from dotenv import load_dotenv,find_dotenv
load_dotenv(find_dotenv())

True

In [17]:
# Run basic query with OpenAI wrapper

from langchain_openai import OpenAI

openai_api_key = os.getenv("OPENAI_API_KEY")
llm = OpenAI(openai_api_key=openai_api_key, model="gpt-3.5-turbo-instruct", temperature=0.3)
prompt = f"explain large language models in one sentence"
llm(prompt)

'\n\nLarge language models are advanced artificial intelligence systems that are trained on vast amounts of text data and can generate human-like language and perform various natural language processing tasks.'

In [20]:
# import schema for chat messages and ChatOpenAI in order to query chatmodels GPT-3.5-turbo or GPT-4

from langchain.schema import (
    AIMessage,
    HumanMessage,
    SystemMessage
)
#from langchain.chat_models import ChatOpenAI
from langchain_openai import ChatOpenAI

In [21]:
# query chatmodels GPT-3.5-turbo or GPT-4

chat = ChatOpenAI(model_name="gpt-3.5-turbo",temperature=0.3)
messages = [
    SystemMessage(content="You are an expert data scientist"),
    HumanMessage(content="Write a Python script that trains a neural network on simulated data ")
]
response=chat(messages)

print(response.content,end='\n')

Sure, here is an example Python script that trains a simple neural network on simulated data using the popular deep learning library TensorFlow:

```python
import numpy as np
import tensorflow as tf

# Generate simulated data
np.random.seed(0)
X = np.random.rand(100, 2)
y = np.random.randint(0, 2, 100)

# Define the neural network architecture
model = tf.keras.models.Sequential([
    tf.keras.layers.Dense(4, activation='relu', input_shape=(2,)),
    tf.keras.layers.Dense(1, activation='sigmoid')
])

# Compile the model
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

# Train the model
model.fit(X, y, epochs=50, batch_size=32)

# Evaluate the model
loss, accuracy = model.evaluate(X, y)
print(f'Loss: {loss}, Accuracy: {accuracy}')
```

In this script, we first generate some simulated data with 2 features and binary labels. We then define a simple neural network with 2 hidden layers and train it on the simulated data for 50 epochs. Finally, we evaluate th

In [22]:
# Import prompt and define PromptTemplate

from langchain import PromptTemplate

template = """
You are an expert data scientist with an expertise in building deep learning models. 
Explain the concept of {concept} in a couple of lines
"""

prompt = PromptTemplate(
    input_variables=["concept"],
    template=template,
)

In [23]:
# Run LLM with PromptTemplate

llm(prompt.format(concept="autoencoder"))

'\nAn autoencoder is a type of neural network that is used for unsupervised learning. It is designed to learn a compressed representation of the input data by encoding it into a lower-dimensional latent space and then reconstructing the original input from this compressed representation. This allows for dimensionality reduction and can also be used for tasks such as data denoising and anomaly detection.'

In [24]:
# Import LLMChain and define chain with language model and prompt as arguments.

from langchain.chains import LLMChain
chain = LLMChain(llm=llm, prompt=prompt)

# Run the chain only specifying the input variable.
print(chain.run("autoencoder"))

  chain = LLMChain(llm=llm, prompt=prompt)
  print(chain.run("autoencoder"))



An autoencoder is a type of neural network that is trained to reconstruct its input data. It consists of an encoder, which compresses the input data into a lower-dimensional representation, and a decoder, which reconstructs the original input from the compressed representation. This allows the model to learn efficient representations of the data and can be used for tasks such as data compression, denoising, and anomaly detection.


In [25]:
# Define a second prompt 

second_prompt = PromptTemplate(
    input_variables=["ml_concept"],
    template="Turn the concept description of {ml_concept} and explain it to me like I'm five in 500 words",
)
chain_two = LLMChain(llm=llm, prompt=second_prompt)

In [28]:
# Define a sequential chain using the two chains above: the second chain takes the output of the first chain as input

from langchain.chains import SimpleSequentialChain
overall_chain = SimpleSequentialChain(chains=[chain, chain_two], verbose=True)

# Run the chain specifying only the input variable for the first chain.
explanation = overall_chain.run("autoencoder")
print(explanation)



[1m> Entering new SimpleSequentialChain chain...[0m
[36;1m[1;3m
An autoencoder is a type of neural network that is trained to reconstruct its input data. It consists of an encoder that compresses the input data into a lower-dimensional representation, and a decoder that reconstructs the original data from the compressed representation. This technique is often used for dimensionality reduction and feature extraction in unsupervised learning tasks.[0m
[33;1m[1;3m

An autoencoder is like a magic box that can take a big picture and make it smaller, but still keep all the important parts. Then, it can take the smaller picture and make it big again, just like the original! 

Imagine you have a big puzzle with lots of pieces. You want to make the puzzle smaller, but you don't want to lose any pieces. So, you put the puzzle in a special box that can shrink it down. But, this box is also special because it remembers how to put the puzzle back together. So, when you want to see the big 

In [29]:
# Import utility for splitting up texts and split up the explanation given above into document chunks

from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size = 100,
    chunk_overlap  = 0,
)

texts = text_splitter.create_documents([explanation])


In [32]:
# Individual text chunks can be accessed with "page_content"

texts[0].page_content

'An autoencoder is like a magic box that can take a big picture and make it smaller, but still keep'

In [40]:
# Import Python REPL tool and instantiate Python agent

from langchain_experimental.agents.agent_toolkits import create_python_agent
from langchain_experimental.tools.python.tool import PythonREPLTool
from langchain_experimental.utilities import PythonREPL
from langchain.llms.openai import OpenAI

agent_executor = create_python_agent(
    llm=OpenAI(temperature=0, max_tokens=1000),
    tool=PythonREPLTool(),
    verbose=True
)

In [41]:
# Execute the Python agent

agent_executor.run("Find the roots (zeros) if the quadratic function 3 * x**2 + 2*x -1")



[1m> Entering new AgentExecutor chain...[0m


Python REPL can execute arbitrary code. Use with caution.


[32;1m[1;3m I can use the quadratic formula to find the roots of a quadratic function.
Action: Python_REPL
Action Input: from math import sqrt[0m
Observation: [36;1m[1;3m[0m
Thought:[32;1m[1;3m I need to import the sqrt function from the math library to use it.
Action: Python_REPL
Action Input: sqrt(2**2 - 4*3*-1)[0m
Observation: [36;1m[1;3m[0m
Thought:[32;1m[1;3m I can use the quadratic formula to find the roots of a quadratic function.
Action: Python_REPL
Action Input: (-2 + sqrt(2**2 - 4*3*-1)) / (2*3)[0m
Observation: [36;1m[1;3m[0m
Thought:[32;1m[1;3m I can use the quadratic formula to find the roots of a quadratic function.
Action: Python_REPL
Action Input: (-2 - sqrt(2**2 - 4*3*-1)) / (2*3)[0m
Observation: [36;1m[1;3m[0m
Thought:[32;1m[1;3m I now know the final answer
Final Answer: 0.3333333333333333, -1.0[0m

[1m> Finished chain.[0m


'0.3333333333333333, -1.0'

In [None]:
# FROM HERE LESS USEFUL----------Import and instantiate OpenAI embeddings

from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings()
#embeddings = OpenAIEmbeddings(model_name="ada")

In [36]:
# Turn the first text chunk into a vector with the embedding

query_result = embeddings.embed_query(texts[0].page_content)
print(query_result)

[-0.006759706884622574, -0.0032328299712389708, 0.01926516555249691, 0.016872229054570198, 0.001901167561300099, 0.021576983854174614, -0.0074153984896838665, -0.03509639948606491, -0.00507315993309021, -0.054996974766254425, 0.016804631799459457, 0.02598431333899498, 0.017480602487921715, -0.010599220171570778, 0.0076519884169101715, -0.005843766499310732, 0.0037550171837210655, 0.027687760069966316, -0.0030029998160898685, -0.022388149052858353, -0.00995704811066389, 0.012647411786019802, -0.02036023698747158, -0.011930882930755615, 0.011302229948341846, -0.002889774739742279, 0.013221986591815948, -0.023888804018497467, -0.0009142503840848804, -0.01032207254320383, 0.01620977744460106, -0.015750117599964142, -0.03606979548931122, -0.04128829017281532, -0.03509639948606491, -0.009436550550162792, -0.000538241642061621, -0.020617106929421425, 0.03469081595540047, 0.008307679556310177, 0.026903633028268814, 0.013086793012917042, -0.017696913331747055, -0.012944838963449001, 0.023699533

In [None]:
# Import and initialize Pinecone client

import os
import pinecone
from langchain.vectorstores import Pinecone


pinecone.init(
    api_key=os.getenv('PINECONE_API_KEY'),  
    environment=os.getenv('PINECONE_ENV')  
)

In [None]:
# Upload vectors to Pinecone

index_name = "langchain-quickstart"
search = Pinecone.from_documents(texts, embeddings, index_name=index_name)

In [None]:
# Do a simple vector similarity search

query = "What is magical about an autoencoder?"
result = search.similarity_search(query)

print(result)