<a href="https://colab.research.google.com/github/rabbitmetrics/langchain-13-min/blob/main/notebooks/langchain-13-min.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [26]:
# Load environment variables

import os
from dotenv import load_dotenv,find_dotenv
load_dotenv(find_dotenv())

True

In [27]:
from langchain.llms import AzureOpenAI
llm = AzureOpenAI(deployment_name=os.getenv('AOAI_DEPL_TEXT_DAVINCI_003'))
llm("Tell me a joke")

'\n\nQ: What did the fish say when it hit the wall?\nA: Dam!'

In [26]:
# import schema for chat messages and ChatOpenAI in order to query chatmodels GPT-3.5-turbo or GPT-4

from langchain.schema import (
    AIMessage,
    HumanMessage,
    SystemMessage
)
from langchain.chat_models import AzureChatOpenAI

In [27]:
chat = AzureChatOpenAI(deployment_name=os.getenv('AOAI_DEPL_GPT_35_TURBO'),temperature=0.3)
messages = [
    SystemMessage(content="You are an expert data scientist"),
    HumanMessage(content="Write a Python script that trains a neural network on simulated data ")
]
response=chat(messages)

print(response.content,end='\n')

Sure, here's an example Python script that trains a simple neural network on simulated data using the Keras library:

```python
import numpy as np
from keras.models import Sequential
from keras.layers import Dense

# Generate simulated data
X = np.random.rand(1000, 10)
y = np.random.randint(2, size=(1000, 1))

# Define the model architecture
model = Sequential()
model.add(Dense(32, input_dim=10, activation='relu'))
model.add(Dense(1, activation='sigmoid'))

# Compile the model
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

# Train the model
model.fit(X, y, epochs=10, batch_size=32, validation_split=0.2)
```

In this script, we first generate a simulated dataset with 1000 samples and 10 features. The target variable is binary, with values of either 0 or 1. 

We then define a simple neural network with one hidden layer of 32 neurons and an output layer with one neuron. We use the ReLU activation function for the hidden layer and the sigmoid activation 

In [28]:
# Import prompt and define PromptTemplate

from langchain import PromptTemplate

template = """
You are an expert data scientist with an expertise in building deep learning models. 
Explain the concept of {concept} in a couple of lines
"""

prompt = PromptTemplate(
    input_variables=["concept"],
    template=template,
)

In [29]:
# Run LLM with PromptTemplate

llm(prompt.format(concept="autoencoder"))

'\nAn autoencoder is a type of artificial neural network used for unsupervised learning. It is trained to reproduce its input at its output, and typically uses some form of dimensionality reduction in the intermediate representation. This enables the model to learn a compressed representation of the data which can then be used for various tasks such as anomaly detection.'

In [None]:
# Import LLMChain and define chain with language model and prompt as arguments.

from langchain.chains import LLMChain
chain = LLMChain(llm=llm, prompt=prompt)

# Run the chain only specifying the input variable.
print(chain.run("autoencoder"))

In [None]:
# Define a second prompt 

second_prompt = PromptTemplate(
    input_variables=["ml_concept"],
    template="Turn the concept description of {ml_concept} and explain it to me like I'm five in 500 words",
)
chain_two = LLMChain(llm=llm, prompt=second_prompt)

In [None]:
# Define a sequential chain using the two chains above: the second chain takes the output of the first chain as input

from langchain.chains import SimpleSequentialChain
overall_chain = SimpleSequentialChain(chains=[chain, chain_two], verbose=True)

# Run the chain specifying only the input variable for the first chain.
explanation = overall_chain.run("autoencoder")
print(explanation)

In [31]:
# Import utility for splitting up texts and split up the explanation given above into document chunks

from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size = 100,
    chunk_overlap  = 20,
)

texts = text_splitter.create_documents([explanation])


In [32]:
# Individual text chunks can be accessed with "page_content"

texts[0].page_content

"An autoencoder is a special type of computer program. It is like a robot that can look at data and learn how to compress it and remember it. It can do this without any help from humans.\n\nIn order to understand how an autoencoder works, let's imagine a robot that has a big box of Lego blocks. The robot has to figure out a way to put all the Lego blocks into the smallest possible box. To do this, it has to first figure out how to arrange the blocks so that they fit in the box. This process is called “encoding”. \n\nWhen the robot has arranged the blocks, the box is filled up and the robot has “compressed” the data. The robot then has to figure out how to take the blocks out of the box and put them back together to make the same thing as before. This process is called “decoding”."

In [29]:
# Import and instantiate OpenAI embeddings

from langchain.embeddings import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(model=os.getenv('AOAI_DEPL_TEXT_EMBEDDING_ADA_002'), chunk_size = 1)

In [33]:
# Turn the first text chunk into a vector with the embedding

query_result = embeddings.embed_query(texts[0].page_content)
print(query_result)
print("Embedding dimensions:", len(query_result))

[-0.029727820307016373, -0.012920662760734558, -0.0008087478927336633, -0.022237952798604965, 0.004623255226761103, 0.0015515412669628859, -0.0016110612777993083, -0.012154946103692055, -0.015520238317549229, -0.03943118453025818, 0.005121936090290546, 0.033588577061891556, -0.0100443996489048, -0.0067241499200463295, 0.002136284951120615, 0.001475934754125774, 0.012586063705384731, 0.00269126845523715, 0.013963066972792149, -0.01437488105148077, -0.015147032216191292, 0.027025289833545685, 0.001989897806197405, -0.04105270281434059, -0.015134163200855255, -0.015430154278874397, 0.0304227564483881, -0.03801557421684265, 0.018750404939055443, -0.016794288530945778, 0.01921369507908821, -0.023524872958660126, -0.004938550293445587, -0.030319802463054657, -0.030602924525737762, -0.01369281392544508, 0.012997877784073353, -0.020114537328481674, 0.02602149359881878, 0.009182164445519447, 0.021903354674577713, 0.023859471082687378, -0.010327521711587906, -0.0037642368115484715, -0.0019705940

In [30]:
from langchain.vectorstores.azuresearch import AzureSearch

vector_store: AzureSearch = AzureSearch(
    azure_search_endpoint=os.getenv('SEARCH_API_BASE'),
    azure_search_key=os.getenv('SEARCH_API_KEY'),
    index_name=os.getenv('SEARCH_API_INDEX_NAME'),
    embedding_function=embeddings.embed_query,
)


In [31]:
from langchain.document_loaders import TextLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter

loader = TextLoader("./sharks.txt", encoding="utf-8")

documents = loader.load()
text_splitter = RecursiveCharacterTextSplitter(chunk_size=400, chunk_overlap=0)
docs = text_splitter.split_documents(documents)

vector_store.add_documents(documents=docs)

['MWYyNjA2YjQtNmE5ZC00NjczLTljMWEtNDgwZGQzZmQ1NjQ1',
 'Njc2MjgyYTYtOTc0Ni00MDEwLWJlNWEtNTBjNjAyOTk3Y2Y2',
 'OTBjZTY4Y2UtZTQzZi00Yjc0LWE1ZTgtZDRjMWYyMzk4YmQ4',
 'YTY4ZjFiMzQtNmQxMi00ZDFjLWFhODUtZDgxN2MxNGQ2ZThl',
 'MDE1YzE1MDgtNTljYS00M2JjLThmMTYtNDkyOGFiYTk3ODhl',
 'Zjk1MWFmMTAtZTgwZC00YmQ5LThmZDYtZTIwMWY1OWIyODFj',
 'NGI1ODBmMzItMjg0My00NjhiLThlNGQtMjE0YzdmNWI0N2I4']

In [38]:
# Do a simple vector similarity search

docs = vector_store.similarity_search(
    query="I'm hungry. What do you suggest to eat?",
    k=3,
    search_type="similarity",
)
print(docs[0].page_content)


Several species are apex predators, which are organisms that are at the top of their food chain. 
Select examples include the tiger shark, blue shark, great white shark, mako shark, thresher shark, and hammerhead shark.
Sharks are caught by humans for shark meat or shark fin soup. 
Many shark populations are threatened by human activities.


In [None]:
# Import Python REPL tool and instantiate Python agent

from langchain.agents.agent_toolkits import create_python_agent
from langchain.tools.python.tool import PythonREPLTool
from langchain.python import PythonREPL
from langchain.llms.openai import AzureOpenAI

agent_executor = create_python_agent(
    llm=AzureOpenAI(temperature=0, max_tokens=1000, deployment_name=os.getenv('AOAI_DEPL_TEXT_DAVINCI_003')),
    tool=PythonREPLTool(),
    verbose=True
)

In [None]:
# Execute the Python agent

agent_executor.run("Find the roots (zeros) if the quadratic function 3 * x**2 + 2*x -1")