# 0. Installing libraries

In [1]:
# !pip install python-dotenv

In [2]:
# !pip install -U langchain langchain-core

In [3]:
# !pip install pinecone-client

In [4]:
# !pip install -U langchain-openai

In [5]:
# !pip install sentence-transformers

In [6]:
# !pip install -U langchain-huggingface

In [7]:
# !pip install -U langchain_experimental

# 1. Importing libraries

In [8]:
import os
import time
from dotenv import load_dotenv
from pinecone import Pinecone as PineconeClient, ServerlessSpec

from langchain_openai import ChatOpenAI
from langchain.schema import (
    AIMessage,
    HumanMessage,
    SystemMessage
)
from langchain import PromptTemplate
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_huggingface import HuggingFaceEmbeddings
from langchain.vectorstores import Pinecone
from langchain_experimental.agents.agent_toolkits import create_python_agent
from langchain_experimental.tools.python.tool import PythonREPLTool
from langchain_experimental.utilities.python import PythonREPL
from langchain.agents import create_tool_calling_agent, AgentExecutor
from langchain.prompts import ChatPromptTemplate, MessagesPlaceholder

In [9]:
start = time.time()

# 2. Using LangChain with Together.ai

In [10]:
# Load from .env file
load_dotenv()

# Create the LangChain ChatOpenAI instance using Together's endpoint
llm = ChatOpenAI(
    model_name="mistralai/Mixtral-8x7B-Instruct-v0.1",  # You can use other Together-supported models too
    openai_api_base="https://api.together.xyz/v1",
    openai_api_key=os.environ["TOGETHERAI_API_KEY"], # Use the API key from the .env file
    temperature=0.7
)

# Send a message
response = llm.invoke([HumanMessage(content="Explain large language models in one sentence.")])

# Print the result
print(response.content)

 Large language models are artificial intelligence models that have been trained on vast amounts of text data and can generate human-like text based on the input they receive.


# 3. Priming

In [11]:
chat = ChatOpenAI(
    model_name="mistralai/Mixtral-8x7B-Instruct-v0.1",
    openai_api_base="https://api.together.xyz/v1",
    openai_api_key=os.environ["TOGETHERAI_API_KEY"],
    temperature=0.3
)

messages = [
    SystemMessage(content="You are an expert data scientist"),
    HumanMessage(content="Write a Python script that trains a neural network on simulated data")
]

response = chat.invoke(messages)

print(response.content, end='\n')

 Sure, I'd be happy to help you with that! Here's an example Python script that trains a simple neural network using the Keras library on some simulated data. This script generates random input data and corresponding output labels, trains a neural network to map the inputs to the outputs, and then evaluates the network's performance.
```python
# Import necessary libraries
import numpy as np
from keras.models import Sequential
from keras.layers import Dense

# Set random seed for reproducibility
np.random.seed(42)

# Generate some simulated input data and corresponding output labels
# In this case, we'll generate 1000 data points with 10 input features each,
# and a simple linear relationship between the inputs and outputs.
# In practice, you would typically load your data from a file or database.
num_data_points = 1000
input_dim = 10
output_dim = 1

# Generate random input data
X = np.random.rand(num_data_points, input_dim)

# Generate corresponding output labels based on a simple line

# 4. Creating and using prompt templates

In [12]:
template = """
You are an expert data scientist with an expertise in building deep learning models.
Explain the concept of {concept} in a couple of lines.
"""

prompt = PromptTemplate(
    input_variables=["concept"],
    template=template
)

prompt

PromptTemplate(input_variables=['concept'], template='\nYou are an expert data scientist with an expertise in building deep learning models.\nExplain the concept of {concept} in a couple of lines.\n')

In [13]:
# Format the prompt and wrap in HumanMessage
formatted_prompt = prompt.format(concept="regularization")

response = llm.invoke([HumanMessage(content=formatted_prompt)])

print(response.content)

 Sure, I'd be happy to explain the concept of regularization in a few lines!

Regularization is a technique used in machine learning, including deep learning, to prevent overfitting of models to training data. It works by adding a penalty term to the loss function, which discourages the model from learning overly complex relationships between the features and the target variable. This penalty term typically takes the form of the L1 or L2 norm of the model's weights, which encourages the weights to be small and sparse, thereby reducing the model's capacity and preventing overfitting. Regularization can help improve the generalization performance of the model and lead to better performance on unseen data.


In [14]:
formatted_prompt = prompt.format(concept="autoencoder")
response = llm([HumanMessage(content=formatted_prompt)])

print(response.content)

  response = llm([HumanMessage(content=formatted_prompt)])


 An autoencoder is a type of artificial neural network used for learning efficient codings of input data. It consists of two parts: an encoder, which maps the input data to a lower-dimensional representation, and a decoder, which maps this lower-dimensional representation back to the original data space. The goal of an autoencoder is to minimize the difference between the input and the reconstructed output, thereby learning a compact and informative representation of the input data. Autoencoders have many applications, including dimensionality reduction, anomaly detection, and generative modeling.


# 5. Chaining

In [15]:
# Create a chain using the new RunnableSequence method
chain = prompt | llm

# Run the chain
response = chain.invoke({"concept": "autoencoder"})

# Print the result
print(response.content)

 An autoencoder is a type of artificial neural network used for learning efficient codings of input data. It's comprised of two parts: an encoder, which maps the input to a lower-dimensional representation, and a decoder, which maps this representation back to the original dimension. The goal of an autoencoder is to minimize the difference between the input and the output, thereby learning a compact and informative representation of the input data. Autoencoders have found applications in various domains, including dimensionality reduction, anomaly detection, and generative modeling.


In [16]:
second_prompt = PromptTemplate(
    input_variables=["ml_concept"],
    template="Turn the concept description of {ml_concept} and explain it to me like I'm five in 500 words"
)

chain_two = prompt | second_prompt | llm

response = chain_two.invoke({"concept": "autoencoder"})
explanation = response.content

print(explanation)

 Sure, I'd be happy to explain autoencoders in a simple way!

Imagine you have a big box of lego blocks of different colors and shapes. Now, your task is to build a small, exact replica of this box using only some of the lego blocks. You can only look at the big box and try to memorize the contents, but you can't open it or take out the blocks. Then, you go to a separate table and try to build the replica using the limited number of blocks you have. This is essentially what an autoencoder does!

In data science terms, an autoencoder is a type of neural network that is trained to copy its input to its output, but with a twist. The network has a narrow "bottleneck" in the middle that reduces the number of features or dimensions of the input data. This forces the network to learn a compact and efficient representation of the input data, just like how you had to build a small replica of the big box using fewer lego blocks.

The autoencoder consists of two parts: the encoder and the decoder

# 6. Document chunking

In [17]:
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size = 100,
    chunk_overlap = 0
)

texts = text_splitter.create_documents([explanation])

texts

[Document(page_content="Sure, I'd be happy to explain autoencoders in a simple way!"),
 Document(page_content='Imagine you have a big box of lego blocks of different colors and shapes. Now, your task is to'),
 Document(page_content='build a small, exact replica of this box using only some of the lego blocks. You can only look at'),
 Document(page_content="the big box and try to memorize the contents, but you can't open it or take out the blocks. Then,"),
 Document(page_content='you go to a separate table and try to build the replica using the limited number of blocks you'),
 Document(page_content='have. This is essentially what an autoencoder does!'),
 Document(page_content='In data science terms, an autoencoder is a type of neural network that is trained to copy its input'),
 Document(page_content='to its output, but with a twist. The network has a narrow "bottleneck" in the middle that reduces'),
 Document(page_content='the number of features or dimensions of the input data. This for

In [18]:
texts[0].page_content

"Sure, I'd be happy to explain autoencoders in a simple way!"

# 7. Creating embeddings

In [19]:
embeddings = HuggingFaceEmbeddings(
    model_name="sentence-transformers/all-MiniLM-L6-v2"
)

query_result = embeddings.embed_query(texts[0].page_content)
query_result

[-0.1283787339925766,
 -0.060270845890045166,
 -0.044977813959121704,
 0.007588156033307314,
 -0.035884466022253036,
 0.024218657985329628,
 -0.049276966601610184,
 -0.043137647211551666,
 -0.05208056420087814,
 -0.013494600541889668,
 0.039489977061748505,
 0.01161170657724142,
 -0.002269306220114231,
 0.028034480288624763,
 -0.06034579873085022,
 0.028864778578281403,
 -0.06871556490659714,
 0.07362587004899979,
 -0.12378135323524475,
 -0.0477871298789978,
 0.03414962440729141,
 -0.02835344523191452,
 -0.0286081675440073,
 0.0028957908507436514,
 0.031387828290462494,
 0.012274770997464657,
 0.011112072505056858,
 0.04263579845428467,
 0.07507694512605667,
 -0.03126320615410805,
 0.030870916321873665,
 0.024923639371991158,
 0.07557026296854019,
 0.10439079999923706,
 -0.044207628816366196,
 0.031903624534606934,
 -0.05027369037270546,
 0.056193914264440536,
 -0.045568469911813736,
 -0.03324221819639206,
 -0.012522927485406399,
 -0.01810121163725853,
 -0.00423078378662467,
 0.0074923

# 8. Storing embeddings in Pinecone

In [20]:
# Initialize Pinecone SDK client
pc = PineconeClient(
    api_key=os.environ.get("PINECONE_API_KEY")
)

# Create or use an index
index_name = "langchain-quickstart"

# Only if index doesn't already exist
# pc.create_index(
#     index_name,
#     dimension=384,
#     metric="cosine",
#     spec=ServerlessSpec(cloud="aws", region="us-east-1")
# )

# Now connect LangChain to Pinecone via vector store
vectorstore = Pinecone.from_documents(
    documents=texts,               # your LangChain Document[] list
    embedding=embeddings,
    index_name=index_name
)

# Do a search
query = "What is magical about an autoencoder?"

result = vectorstore.similarity_search(query)
result

[Document(page_content='idea is that the autoencoder learns to keep only the most important features of the input data in'),
 Document(page_content='An autoencoder is a type of machine learning model that can learn to compress and then reconstruct'),
 Document(page_content='One cool thing about autoencoders is that they can be used for things like image compression,'),
 Document(page_content='One cool thing about autoencoders is that they can be used for things like image compression,')]

# 9. Creating and using an agent for reasoning

In [21]:
# Together.ai LLM setup
llm = ChatOpenAI(
    model="mistralai/Mistral-7B-Instruct-v0.1",
    openai_api_key=os.getenv("TOGETHERAI_API_KEY"),
    openai_api_base="https://api.together.xyz/v1",
    temperature=0,
    max_tokens=1000
)

# Tool setup
tools = [PythonREPLTool()]

# Prompt setup
prompt = ChatPromptTemplate.from_messages([
    SystemMessage(content="You are a helpful assistant that uses Python for computation."),
    ("human", "{input}"),
    MessagesPlaceholder(variable_name="agent_scratchpad"),
])

# Create agent and executor
agent = create_tool_calling_agent(llm=llm, tools=tools, prompt=prompt)
agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)

# Run task
result = agent_executor.invoke({
    "input": "Find the roots (zeroes) of the quadratic function 3 * x**2 + 2*x - 1"
})
print(result)

Error in StdOutCallbackHandler.on_chain_start callback: AttributeError("'NoneType' object has no attribute 'get'")


[32;1m[1;3m To find the roots (zeroes) of the quadratic function3 * x**2 +2*x -1, we can use the quadratic formula:x = [-b ± sqrt(b**2 - 4ac)] / 2a

where a = 3, b = 2, and c = -1.

Substituting these values into the formula, we get:

x = [-2 ± sqrt(2**2 - 4(3)(-1))] / 2(3)

Simplifying, we get:

x = [-2 ± sqrt(4 + 12)] / 6

x = [-2 ± sqrt(16)] / 6

x = [-2 ± 4] / 6

So the two roots (zeroes) of the quadratic function are:

x = (-2 + 4) / 6 = 0

and

x = (-2 - 4) / 6 = -2/3[0m

[1m> Finished chain.[0m
{'input': 'Find the roots (zeroes) of the quadratic function 3 * x**2 + 2*x - 1', 'output': ' To find the roots (zeroes) of the quadratic function3 * x**2 +2*x -1, we can use the quadratic formula:x = [-b ± sqrt(b**2 - 4ac)] / 2a\n\nwhere a = 3, b = 2, and c = -1.\n\nSubstituting these values into the formula, we get:\n\nx = [-2 ± sqrt(2**2 - 4(3)(-1))] / 2(3)\n\nSimplifying, we get:\n\nx = [-2 ± sqrt(4 + 12)] / 6\n\nx = [-2 ± sqrt(16)] / 6\n\nx = [-2 ± 4] / 6\n\nSo the two roots (ze

In [22]:
end = time.time()

# 10. Calculating code execution time

In [23]:
print(f"Total runtime: {round(end - start)} seconds")

Total runtime: 66 seconds
