dotenv is a library to interact with .env files
load_dotenv reads the .env file and loads the data into the working environment

find_dotenv searches for the nearest .env file in the current directory

In [None]:
from dotenv import load_dotenv, find_dotenv
load_dotenv(find_dotenv())

### 1. LLM Wrappers

"note: temperature determines how random or deterministic the LLM's response is - 
lower temperature means it picks most likely next word every time
higher temperature -> higher sample size of words -> res more creative"

In [None]:
from langchain_openai import OpenAI
llm = OpenAI(model_name = "gpt-4o-mini")
res = llm("explain large language models in one sentence")

Issue: Payment at openai, doesn't work yet

Ganz genau ✅ — jedes Mal wenn du ein Chatmodell (z. B. ChatOpenAI) aufrufst, wird die Antwort automatisch in ein AIMessage-Objekt gepackt.
Das ist der Standard-Container, in dem LangChain Chat-Antworten speichert.
Basically Daten des Responses

```python
AIMessage(
    content="Katzen können über 100 verschiedene Laute von sich geben, während Hunde nur etwa 10 haben!",
    additional_kwargs={},
    response_metadata={
        'token_usage': {
            'completion_tokens': 24,
            'prompt_tokens': 15,
            'total_tokens': 39
        },
        'model_name': 'gpt-4o-mini'
    }
)

In [None]:
from langchain.schema import (
    AIMessage, # response from AI
    HumanMessage, # message from user
    SystemMessage # how the AI should behave - contextual prompt
)

In [None]:
chat = OpenAI(model_name = "gpt-4o-mini", temperature=0.3)
messages = [ # input messages
    SystemMessage(content="You are an expert scientist"),
    HumanMessage(content="Write a python script that trains a neural network on simulated data.")
]
response = chat(messages) # send to the model # get response

### 2. Promp Templates

In [None]:
from langchain import PromptTemplate

template = """
You are an expert data scientist with an expertise in building deep learning models.
Explain the concept of {concept} in a couple of lines.
"""

prompt = PromptTemplate(
    input_variables=["concept"],
    template=template, # updated template variable with concept input
)  

In [None]:
prompt

In [None]:
llm(prompt.format(concept="regularization"))

### 3. Chains

A chain in LangChain is a pipeline that links prompts, LLMs, and other components (like retrievers or output parsers) together into a single callable workflow.

In this case it's kinda like a composite function f(g(x)):

- the outer function is the LLM 
- the inner function is the prompt template

In [None]:
from langchain.chains import LLMChain
chain = LLMChain(llm=llm, prompt=prompt) # create chain

# Run chain only specifying the concept
print(chain.run("overfitting"))

or we can make a sequential model

output of chain 1 = input of chain 2

In [None]:
second_prompt = PromptTemplate(
    input_variables=["ml_concept"],
    template="Turn the concept description of {ml_concept} and explain it to me like I'm five in 500 words."
)
chain2 = LLMChain(llm=llm, prompt=second_prompt)

Simple Sequential Chain returns both the output of the first description of the concept AND the second description

In [None]:
from langchain.chains import SequentialChain
overall_chain = simpleSequentialChain(chains=[chain, chain2], verbose=True)

# Run the overall chain, only specifying the concept of chain 1
explanation = overall_chain.run("autoencoder")
print(explanation)

### 4. Embeddings & Vector Stores

1. Split text into chunks - Chunking

This is how a document looks like:
- the actual content
- metadata -> dictionary of tags about the data (similar to a json file)

```python
from langchain.schema import Document

doc = Document(
    page_content="Paris is the capital of France.",
    metadata={"source": "wikipedia", "category": "geography"}
)
```

default seperators for RecursiveCharacterTextSplitter:
["\n\n", "\n", " ", ""]

In [None]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=100, # maximum size of each chunk
    chunk_overlap=0, # how many characters are repeated between chunks (so context flows across boundaries)
)

texts = text_splitter.create_documents([explanation]) # texts is an array of Document objects

2. Turn into an embedding (a vector)

In [None]:
from langchain.embeddings import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(model="text-embedding-3-small")

query_result is just printing to test and debug the first embeddings chunk

In [None]:
query_result = embeddings.embed_query(texts[0].page_content) # get embedding for the first chunk
query_result

- import pinecone → official Pinecone client → lets you connect to Pinecone and create/manage indexes.
- from langchain.vectorstores import Pinecone → LangChain wrapper → lets LangChain talk to Pinecone easily (add texts, search, etc.).

In [None]:
import os
import pinecone # vector database
from langchain.vectorstores import Pinecone

#init
pinecone.init(
    api_key=os.getenv("PINECONE_API_KEY"), 
    environment=os.getenv("PINECONE_ENVIRONMENT_REGION")
)

1. texts → a list of Document objects (chunks of text you split earlier).
2. embeddings → your embedding model (e.g. OpenAIEmbeddings() or HuggingFaceEmbeddings()).
3. index_name="langchain-quickstart" → the name of the Pinecone index you already created.
4. 👉 Pinecone.from_documents(...) does 3 things at once:
- Creates embeddings for each document.
- Stores them in Pinecone under that index.
- Returns a LangChain VectorStore object (search) that you can now query.

In [None]:
index_name = "langchain-quickstart" # basically the name for this vectore storage space

search = Pinecone.from_documents(texts, embeddings, index_name=index_name) # create vector store under this index name

#### Similarity Search based on prompts
1. Query → turned into an embedding vector.
2. Vector → sent to Pinecone index.
3. Pinecone → finds nearest stored vectors (similar chunks).
4. LangChain → wraps them back as Documents with text + metadata.
5. 👉 result = list of the most relevant chunks.


Pinecone doesn’t generate answers — it just stores your document embeddings and returns the most relevant chunks.

In [None]:
query = "What is magical about autoencoders?"

result = search.similarity_search(query) # search for similar chunks to the query

In [None]:
result

### 5. Tryout AI Agent tools from langchain

##### REPL = Read–Eval–Print Loop 
It’s an interactive programming shell that:
1. Reads your input (e.g. 2 + 2)
2. Evaluates it (runs the code)
3. Prints the result (4)
4. Loops back for the next command

In [8]:
# Agent creation
from langchain_experimental.agents.agent_toolkits import create_python_agent # updated location


# Python REPL tool + utility
from langchain_experimental.tools import PythonREPLTool
from langchain_experimental.utilities import PythonREPL

***REMOVED*** LLM wrapper
from langchain_openai import OpenAI

agent_executor can now run python code as per REPL protocol


Note: in LangChain (and in Python generally), verbose=True just means:

👉 print extra details about what’s happening under the hood.

In [None]:
agent_executor = create_python_agent (
    llm=OpenAI(temperature=0, max_tokens=1000, model_name="gpt-4o-mini"), 
    tool=PythonREPLTool(PythonREPL()), 
    verbose=True
)

Then i can give it a prompt and it'll solve it for me - e.g:

In [None]:
agent_executor.run("Find the roots (zeros) of the quadratic function f(x) = 2x^2 - 4x - 6")

it'll create a python script, agent_executor REPL allows the llm to run the script and return the result