## Transitioning from Old class to New Pipe Base Operator

## 1. Understanding `Runnables`
- `Runnables` are self-contained units of work.
- Can be executed in isolation or combined for complex operations.
- Provides flexibility in execution (sync, async, parallel).

## 2. `RunnableParallel`
- Executes tasks concurrently.
- Useful for performance enhancement in scenarios where tasks can run independently.
- Syntax example:
    ```python
    from some_module import RunnableParallel
    ```

## 3. `RunnablePassthrough`
- A simple `Runnable` that passes inputs directly to outputs without modification.
- Helpful for debugging or chaining in pipelines.
- Example use case:
    ```python
    from some_module import RunnablePassthrough
    passthrough = RunnablePassthrough()
    result = passthrough.run(input_data)
    ```

## 4. `RunnableLambda`
- Allows quick, inline definitions of small, custom functions.
- Example:
    ```python
    from some_module import RunnableLambda
    lambda_op = RunnableLambda(lambda x: x * 2)
    result = lambda_op.run(5)  # Output: 10
    ```

## 5. Assign Functions
- Used to assign values or parameters during execution.
- Useful in data pipelines to update intermediate values.

## 6. Performance Improvement (Inference Speed)
- Focus on optimizing the inference speed by leveraging parallel execution.
- Use `RunnableParallel` or batching techniques.
- Consider optimizing data pipelines by removing unnecessary steps.

## 7. Async Invoke
- Executes operations asynchronously, improving the overall throughput of the system.
- Syntax example:
    ```python
    async def async_operation():
        result = await some_async_function()
    ```

## 8. Batch Support
- Handles multiple inputs at once to improve performance.
- Can be combined with `RunnableParallel` for parallel batch execution.

## 9. Async Batch Execution
- Combines asynchronous execution with batch processing for high-performance tasks.
- Reduces overall execution time for larger datasets.

## 10. Using `Itemgetter` with `LCEL`
- `Itemgetter` is used to extract specific items from collections.
- When combined with `LCEL` (LangChain Execution Layer), it can streamline complex operations.

## 11. Bind Tools
- `Bind` tools help to connect different steps in the pipeline.
- Ensures smooth data flow between various `Runnable` components.

## 12. Stream Support
- Keep your pipelines more responsive by incorporating stream support for data.
- This allows continuous data processing and near real-time outputs.
  


In [1]:
import os

In [2]:
os.environ["GROQ_API_KEY"] = os.getenv("GROQ_API_KEY")
os.environ["GOOGLE_API_KEY"] = os.getenv("GOOGLE_API_KEY")
os.environ["LANGSMITH_TRACING_V2"] = os.getenv("LANGSMITH_TRACING_V2", "true")
os.environ["LANGSMITH_ENDPOINT"] = os.getenv("LANGSMITH_ENDPOINT")
os.environ["LANGSMITH_API_KEY"] = os.getenv("LANGSMITH_API_KEY")
os.environ["LANGSMITH_PROJECT"] = "Rag with Conversation"



In [3]:
from langchain_google_genai import GoogleGenerativeAIEmbeddings
embeddings = GoogleGenerativeAIEmbeddings(model="models/embedding-001")
from langchain_google_genai import ChatGoogleGenerativeAI
llm = ChatGoogleGenerativeAI(model="gemini-2.0-flash")

  from .autonotebook import tqdm as notebook_tqdm


In [4]:
'''from langchain_huggingface import HuggingFaceEmbeddings
embeddings=HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")
from langchain_groq import ChatGroq
import os
llm=ChatGroq(model_name="Gemma2-9b-It")'''

'from langchain_huggingface import HuggingFaceEmbeddings\nembeddings=HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")\nfrom langchain_groq import ChatGroq\nimport os\nllm=ChatGroq(model_name="Gemma2-9b-It")'

# this is my simple chain (old chaining concept)

In [5]:
template= 'Hi! I am learning {skill}. Can you suggest me top 5 things to learn?\n'

In [6]:
from langchain import PromptTemplate

In [7]:
prompt = PromptTemplate(template=template,input_variables=["skill"])

In [8]:
print(prompt)

input_variables=['skill'] input_types={} partial_variables={} template='Hi! I am learning {skill}. Can you suggest me top 5 things to learn?\n'


In [9]:
from langchain import LLMChain

In [10]:
llm_chain = LLMChain(prompt=prompt,llm=llm)

  llm_chain = LLMChain(prompt=prompt,llm=llm)


In [11]:
print(llm_chain.run('Data Science'))

  print(llm_chain.run('Data Science'))


Okay, great! Data Science is a vast and exciting field. Here are my top 5 things to learn to get a solid foundation and start making progress, focusing on practicality and building a portfolio:

**1. Python Programming (with a focus on Data Science Libraries):**

*   **Why:** Python is the dominant language in data science due to its versatility, extensive libraries, and large community.
*   **What to Learn:**
    *   **Fundamentals:** Variables, data types (lists, dictionaries, tuples, sets), control flow (if/else, loops), functions, object-oriented programming (classes, objects).
    *   **Key Libraries:**
        *   **NumPy:**  For numerical computing, array manipulation, and linear algebra.  Essential for efficient data handling.
        *   **Pandas:**  For data manipulation and analysis, using DataFrames.  Crucial for cleaning, transforming, and exploring data.
        *   **Matplotlib:** For basic data visualization.  Learn to create plots, charts, and histograms.
        *   *

In [12]:
print(llm_chain.run({'skill':'Data Science'}))

Okay, great! Data Science is a vast and exciting field. Focusing on these 5 areas will give you a solid foundation and make you a competitive candidate:

**Top 5 Things to Learn for Data Science:**

1.  **Programming (Python or R):**
    *   **Why:** This is the bedrock of data science. You'll use it for data manipulation, analysis, visualization, and model building.
    *   **Focus on:**
        *   **Python:**
            *   **Fundamental Syntax:** Variables, data types, control flow (loops, conditionals), functions.
            *   **Key Libraries:**
                *   **NumPy:**  Numerical computing, arrays, linear algebra.
                *   **Pandas:** Data manipulation, data frames, data cleaning.
                *   **Scikit-learn:** Machine learning algorithms (regression, classification, clustering), model evaluation.
                *   **Matplotlib & Seaborn:** Data visualization.
            *   **R:**
                *   **Fundamental Syntax:** Variables, data types, c

# this is a implementation  using LCEL

In [13]:
llm

ChatGoogleGenerativeAI(model='models/gemini-2.0-flash', google_api_key=SecretStr('**********'), client=<google.ai.generativelanguage_v1beta.services.generative_service.client.GenerativeServiceClient object at 0x000001BD4EAD7C50>, default_metadata=())

In [14]:
prompt

PromptTemplate(input_variables=['skill'], input_types={}, partial_variables={}, template='Hi! I am learning {skill}. Can you suggest me top 5 things to learn?\n')

In [15]:
chain = prompt | llm

In [16]:
print(chain.invoke({'skill':'Big Data'}))

content="Okay, that's great! Big Data is a huge and exciting field. Here are my top 5 things to learn, focusing on getting you a solid foundation and practical skills:\n\n**1.  Fundamentals of Distributed Computing and Data Processing:**\n\n*   **Why it's important:** This is the bedrock of Big Data. You need to understand how data is processed and stored across multiple machines.\n*   **What to learn:**\n    *   **Distributed Systems Concepts:** Understand concepts like fault tolerance, data consistency, scalability, and parallel processing.  Learn about the CAP theorem.\n    *   **MapReduce:**  Even though newer frameworks are popular, understand the MapReduce paradigm. It's a foundational concept for parallel data processing.  Know how it works, its limitations, and its advantages.\n    *   **Distributed File Systems (HDFS):** Learn how data is stored and accessed in a distributed environment.  Understand concepts like data replication, data locality, and namenodes/datanodes.\n*   *

In [17]:
from langchain_core.output_parsers import StrOutputParser

In [18]:
parser = StrOutputParser()

In [19]:
chain = prompt | llm | parser

In [20]:
print(chain.invoke({'skill':'Machine Learning'}))

Okay, great! Machine Learning is a fascinating and rapidly evolving field. Here are my top 5 suggestions for things to learn, focusing on a solid foundation and practical application:

1.  **Python Programming Fundamentals:**

    *   **Why:** Python is the dominant language in ML.  You'll use it for everything from data manipulation to model building and deployment.
    *   **Key Concepts:**
        *   **Data Structures:** Lists, dictionaries, tuples, sets.  Understanding how to use these effectively is crucial for handling data.
        *   **Control Flow:** `if/else` statements, `for` and `while` loops.  Essential for logic and automation.
        *   **Functions:** Defining and using functions for code reusability and organization.
        *   **Object-Oriented Programming (OOP) Basics:**  Classes and objects, inheritance (a basic understanding is helpful, you don't need to be an OOP expert right away).
        *   **File I/O:** Reading and writing data to files (e.g., CSV, text f

# lets discuss about the runnables

In [21]:
from langchain_core.runnables import RunnableParallel, RunnablePassthrough , RunnableLambda

In [22]:
chain = RunnablePassthrough()

In [23]:
chain.invoke('Welcome to the notebook!')

'Welcome to the notebook!'

In [24]:
chain = RunnablePassthrough() | RunnablePassthrough() | RunnablePassthrough()

In [25]:
chain.invoke('Welcome to my notebook!')

'Welcome to my notebook!'

In [26]:
def string_upper(input):
  return input.upper()

In [27]:
chain = RunnablePassthrough() | RunnableLambda(string_upper)

In [28]:
chain.invoke('Welcome to my notebook!')

'WELCOME TO MY NOTEBOOK!'

In [29]:

string_upper.invoke('Welcome to my notebook!')

AttributeError: 'function' object has no attribute 'invoke'

- Because string_upper has no invoke method

In [49]:
chain = RunnableLambda(string_upper)

In [50]:
chain.invoke('Welcome to my notebook!')

'WELCOME TO MY NOTEBOOK!'

In [51]:
chain = RunnableParallel({'x':RunnablePassthrough(),'y':RunnablePassthrough()})

In [52]:
chain.invoke("Joydeb")

{'x': 'Joydeb', 'y': 'Joydeb'}

In [53]:
chain.invoke({'name': 'Joydeb','notebook': "Joydeb's notebook"})

{'x': {'name': 'Joydeb', 'notebook': "Joydeb's notebook"},
 'y': {'name': 'Joydeb', 'notebook': "Joydeb's notebook"}}

In [54]:
lambda x: x['notebook']

<function __main__.<lambda>(x)>

In [55]:
chain = RunnableParallel({'x':RunnablePassthrough(),'notebook':lambda x: x['notebook']})

In [57]:
chain.invoke({'name': 'Joydeb','notebook': "Joydeb's notebook"})

{'x': {'name': 'Joydeb', 'notebook': "Joydeb's notebook"},
 'notebook': "Joydeb's notebook"}

In [58]:
def fetch_website(input: dict):
    output = input.get('Website','Not found')
    return output

In [59]:
mydict={'name': 'Joydeb','notebook': "Joydeb's notebook"}

In [60]:
mydict.get("website","Not found")

'Not found'

In [61]:
chain = RunnableParallel({'Website':RunnablePassthrough() | RunnableLambda(fetch_website),
                          'notebook':lambda z: z['notebook']})

In [62]:
chain.invoke({'name': 'Joydeb','notebook': "Joydeb's notebook"})

{'Website': 'Not found', 'notebook': "Joydeb's notebook"}

In [63]:
chain.invoke({'name': 'Joydeb','notebook': "Joydeb's notebook", "website": "joydebai.com"})

{'Website': 'Not found', 'notebook': "Joydeb's notebook"}

In [64]:
def extra_func(input):
    return 'Happy Learning'

In [65]:
chain = RunnableParallel({'x' : RunnablePassthrough()}).assign(extra=RunnableLambda(extra_func))

In [66]:
chain = RunnableParallel({'x' : RunnablePassthrough()}).assign(y=RunnableLambda(extra_func))

In [67]:
chain.invoke('Hello')

{'x': 'Hello', 'y': 'Happy Learning'}

In [69]:
from langchain_community.document_loaders import TextLoader, DirectoryLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.vectorstores import Chroma

### Reading the txt files from doc directory

loader = DirectoryLoader('./doc', glob="./*.txt", loader_cls=TextLoader)
docs = loader.load()

### Creating Chunks using RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=50,
    chunk_overlap=10,
    length_function=len
)
new_docs = text_splitter.split_documents(documents=docs)
doc_strings = [doc.page_content for doc in new_docs]

###  BGE Embddings

'''from langchain.embeddings import HuggingFaceBgeEmbeddings

model_name = "BAAI/bge-base-en-v1.5"
model_kwargs = {'device': 'cpu'}
encode_kwargs = {'normalize_embeddings': True} # set True to compute cosine similarity
embeddings = HuggingFaceBgeEmbeddings(
    model_name=model_name,
    model_kwargs=model_kwargs,
    encode_kwargs=encode_kwargs,
)
'''

### Creating Retriever using Vector DB

db = Chroma.from_documents(new_docs, embeddings)
retriever = db.as_retriever(search_kwargs={"k": 4})

In [70]:
template = """Answer the question based only on the following context:
{context}

Question: {question}
"""
prompt = PromptTemplate.from_template(template)


In [71]:
retrieval_chain = (
    RunnableParallel({"context": retriever, "question": RunnablePassthrough()})
    | prompt
    | llm
    | StrOutputParser()
    )

In [72]:
question ="what is llama3? can you highlight 3 important points?"

In [73]:
retrieval_chain.invoke(question)

'Based on the provided documents:\n\nLLaMA 3 (Large Language Model Meta AI 3) is:\n\n*   Widely used for research and development.\n*   Designed to be competitive.\n*   Released in 2024.'

### Synchronus vs Asynchronus

In [79]:
import time

start_time = time.time()

result = retrieval_chain.invoke(question)

print('Time taken:',time.time() - start_time)

Time taken: 1.5644686222076416


  result = retrieval_chain.invoke(question)


In [80]:
start_time = time.time()

result = retrieval_chain.ainvoke(question)

print('Time taken:',time.time() - start_time)

Time taken: 0.0


In [81]:
start_time = time.time()

batch_output = retrieval_chain.batch([
                        "what is llama3?",
                        "can you highlight 3 main properties?"
                       ])

print('Time taken:',time.time() - start_time)

Time taken: 3.4914324283599854


In [82]:
batch_output

['LLaMA 3 (Large Language Model Meta AI 3) is the.',
 'Based on the provided context, here are three main properties of LLaMA 3:\n\n1.  It is widely used for research and development.\n2.  LLaMA 3 models are designed to be competitive.\n3.  It has 8B and 70B parameters.']

In [83]:
start_time = time.time()

batch_output = await retrieval_chain.abatch([
                        "what is llama3?",
                        "can you highlight 3 main properties?"
                       ])

print('Time taken:',time.time() - start_time)

Time taken: 3.8755598068237305


In [84]:
batch_output

['LLaMA 3 (Large Language Model Meta AI 3) is the.',
 'Based on the provided context, 3 main properties are:\n\n1.  8B and 70B parameters.\n2.  Widely used for research, development.\n3.  Designed to be competitive.']

In [85]:
my_dict = {'name': 'Joydeb','notebook': "Joydeb's notebook", "website": "joydebai.com"}
my_dict

{'name': 'Joydeb', 'notebook': "Joydeb's notebook", 'website': 'joydebai.com'}

In [88]:
from operator import itemgetter

website = itemgetter('website')

In [89]:
website(my_dict)

'joydebai.com'

In [90]:
template = """Answer the question based only on the following context:
{context}

Question: {question}

Answer in the following language: {language}
"""
prompt = PromptTemplate.from_template(template)


In [91]:
retrieval_chain = (
    RunnableParallel({"context": itemgetter('question') | retriever,
                       "question": itemgetter('question'),
                       "language": itemgetter('language')
                       })
    | prompt
    | llm
    | StrOutputParser()
    )

In [92]:
### itemgetter only works with dictionaries , input has to be a dict

response = retrieval_chain.invoke({'question': "what is llama3?",
                        'language': "Spnish"})

print(response)

LLaMA 3 (Large Language Model Meta AI 3) es un modelo de lenguaje grande.


In [93]:
template = 'Hi! I am learning {skill}. Can you suggest me top 5 things to learn?\n'

prompt = PromptTemplate.from_template(template=template)

chain = prompt | llm

In [95]:
for s in chain.stream({'skill':'Big Data'}):
    print(s.content,end='')

Okay, great! Big Data is a vast and exciting field. Here are my top 5 suggestions for things to learn, focusing on practicality and building a solid foundation:

1.  **Core Concepts & Hadoop Ecosystem:**

    *   **Why:**  Understanding the *why* behind Big Data is crucial.  Learn about the problems it solves: Volume, Velocity, Variety, Veracity, and Value (the 5 V's). You need to understand the challenges of processing massive datasets and the distributed computing paradigm.
    *   **What to Learn:**
        *   **Distributed Computing Principles:** Master-Slave architecture, MapReduce paradigm, fault tolerance, data locality.
        *   **Hadoop Distributed File System (HDFS):**  Learn how data is stored and accessed in a distributed manner.  Understand blocks, replication, and NameNode/DataNode architecture.
        *   **MapReduce:**  Learn the fundamental programming model for processing data in parallel.  Understand the Map, Shuffle, and Reduce phases. Even if you don't write M

In [96]:
import json
from langchain_core.messages import ToolMessage
from langchain_core.tools import tool
from langchain_core.utils.function_calling import convert_to_openai_tool

@tool
def multiply(first_number: int, second_number: int):
    """Multiplies two numbers together."""
    return first_number * second_number

model_with_tools = llm.bind(tools=[convert_to_openai_tool(multiply)])

In [97]:
response = model_with_tools.invoke('What is 35 * 46?')

In [98]:
response

AIMessage(content='', additional_kwargs={'function_call': {'name': 'multiply', 'arguments': '{"second_number": 46.0, "first_number": 35.0}'}}, response_metadata={'prompt_feedback': {'block_reason': 0, 'safety_ratings': []}, 'finish_reason': 'STOP', 'safety_ratings': []}, id='run--41ffe1d8-12ab-496e-9d59-2900f4235a0b-0', tool_calls=[{'name': 'multiply', 'args': {'second_number': 46.0, 'first_number': 35.0}, 'id': '22fac5a8-b82c-42f5-8272-ac986100adf9', 'type': 'tool_call'}], usage_metadata={'input_tokens': 32, 'output_tokens': 9, 'total_tokens': 41, 'input_token_details': {'cache_read': 0}})