<a href="https://colab.research.google.com/github/sunnysavita10/Generative-AI-Indepth-Basic-to-Advance/blob/main/LCEL(Langchain_Expression_Language).ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Transitioning from Old class to New Pipe Base Operator

## 1. Understanding `Runnables`
- `Runnables` are self-contained units of work.
- Can be executed in isolation or combined for complex operations.
- Provides flexibility in execution (sync, async, parallel).

## 2. `RunnableParallel`
- Executes tasks concurrently.
- Useful for performance enhancement in scenarios where tasks can run independently.
- Syntax example:
    ```python
    from some_module import RunnableParallel
    ```

## 3. `RunnablePassthrough`
- A simple `Runnable` that passes inputs directly to outputs without modification.
- Helpful for debugging or chaining in pipelines.
- Example use case:
    ```python
    from some_module import RunnablePassthrough
    passthrough = RunnablePassthrough()
    result = passthrough.run(input_data)
    ```

## 4. `RunnableLambda`
- Allows quick, inline definitions of small, custom functions.
- Example:
    ```python
    from some_module import RunnableLambda
    lambda_op = RunnableLambda(lambda x: x * 2)
    result = lambda_op.run(5)  # Output: 10
    ```

## 5. Assign Functions
- Used to assign values or parameters during execution.
- Useful in data pipelines to update intermediate values.

## 6. Performance Improvement (Inference Speed)
- Focus on optimizing the inference speed by leveraging parallel execution.
- Use `RunnableParallel` or batching techniques.
- Consider optimizing data pipelines by removing unnecessary steps.

## 7. Async Invoke
- Executes operations asynchronously, improving the overall throughput of the system.
- Syntax example:
    ```python
    async def async_operation():
        result = await some_async_function()
    ```

## 8. Batch Support
- Handles multiple inputs at once to improve performance.
- Can be combined with `RunnableParallel` for parallel batch execution.

## 9. Async Batch Execution
- Combines asynchronous execution with batch processing for high-performance tasks.
- Reduces overall execution time for larger datasets.

## 10. Using `Itemgetter` with `LCEL`
- `Itemgetter` is used to extract specific items from collections.
- When combined with `LCEL` (LangChain Execution Layer), it can streamline complex operations.

## 11. Bind Tools
- `Bind` tools help to connect different steps in the pipeline.
- Ensures smooth data flow between various `Runnable` components.

## 12. Stream Support
- Keep your pipelines more responsive by incorporating stream support for data.
- This allows continuous data processing and near real-time outputs.
  


In [None]:
!pip install -q langchain_google_genai langchain_community langchain langchain_huggingface langchain_groq

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m50.4/50.4 kB[0m [31m3.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.3/2.3 MB[0m [31m28.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.0/1.0 MB[0m [31m34.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m106.5/106.5 kB[0m [31m7.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m399.7/399.7 kB[0m [31m19.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m290.2/290.2 kB[0m [31m16.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m245.3/245.3 kB[0m [31m11.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m76.4/76.4 kB[0m [31m5.2 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

In [None]:
from google.colab import userdata
GROQ_API_KEY=userdata.get('GROQ_API_KEY')
import os
os.environ["GROQ_API_KEY"]=GROQ_API_KEY

In [None]:
from google.colab import userdata
GOOGLE_API_KEY=userdata.get('GOOGLE_API_KEY')
import os
os.environ["GOOGLE_API_KEY"]=GOOGLE_API_KEY

In [None]:
from langchain_google_genai import GoogleGenerativeAIEmbeddings
embeddings = GoogleGenerativeAIEmbeddings(model="models/embedding-001")
from langchain_google_genai import ChatGoogleGenerativeAI
llm = ChatGoogleGenerativeAI(model="gemini-1.0-pro")

In [None]:
'''from langchain_huggingface import HuggingFaceEmbeddings
embeddings=HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")
from langchain_groq import ChatGroq
import os
llm=ChatGroq(model_name="Gemma2-9b-It")'''

'from langchain_huggingface import HuggingFaceEmbeddings\nembeddings=HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")\nfrom langchain_groq import ChatGroq\nimport os\nllm=ChatGroq(model_name="Gemma2-9b-It")'

# this is my simple chain (old chaining concept)

In [None]:
template= 'Hi! I am learning {skill}. Can you suggest me top 5 things to learn?\n'

In [None]:
from langchain import PromptTemplate

In [None]:
prompt = PromptTemplate(template=template,input_variables=["skill"])

In [None]:
print(prompt)

input_variables=['skill'] input_types={} partial_variables={} template='Hi! I am learning {skill}. Can you suggest me top 5 things to learn?\n'


In [None]:
from langchain import LLMChain

In [None]:
llm_chain = LLMChain(prompt=prompt,llm=llm)

In [None]:
print(llm_chain.run('Langchain'))

**Top 5 Things to Learn in Langchain:**

1. **Fundamentals of Blockchain Technology:** Understand the underlying concepts of blockchain, including decentralization, consensus mechanisms, and smart contracts.
2. **Langchain Architecture and Syntax:** Familiarize yourself with the unique architecture and syntax of Langchain, which combines the features of blockchain and natural language processing.
3. **Langchain Programming:** Learn how to develop Langchain programs, including creating smart contracts, defining data models, and handling transactions.
4. **Langchain Toolset and Ecosystem:** Explore the various tools and libraries available for Langchain development, such as the Langchain IDE and the Langchain community.
5. **Applications and Use Cases:** Understand the potential applications of Langchain in various industries, including finance, supply chain management, and healthcare.


In [None]:
prompt | llm

PromptTemplate(input_variables=['skill'], input_types={}, partial_variables={}, template='Hi! I am learning {skill}. Can you suggest me top 5 things to learn?\n')
| ChatGoogleGenerativeAI(model='models/gemini-1.0-pro', google_api_key=SecretStr('**********'), client=<google.ai.generativelanguage_v1beta.services.generative_service.client.GenerativeServiceClient object at 0x7876db7b8400>, async_client=<google.ai.generativelanguage_v1beta.services.generative_service.async_client.GenerativeServiceAsyncClient object at 0x7876db7b9420>, default_metadata=())

In [None]:
print(llm_chain.run({'skill':'Data Science'}))

**Top 5 Things to Learn for Data Science:**

1. **Programming Languages:** Python (primary), R (secondary)
   - Python is widely used for data analysis, machine learning, and deep learning.
   - R is a specialized language for statistical analysis and data visualization.

2. **Data Management and Manipulation:** SQL, NoSQL, Pandas, NumPy
   - SQL for querying and managing relational databases.
   - NoSQL for handling non-relational data.
   - Pandas and NumPy for data manipulation and analysis in Python.

3. **Statistics and Probability:** Probability distributions, hypothesis testing, regression analysis
   - Understanding statistical concepts is crucial for analyzing and interpreting data.
   - Hypothesis testing helps in making data-driven decisions.

4. **Machine Learning Algorithms:** Supervised (e.g., regression, classification), unsupervised (e.g., clustering, dimensionality reduction)
   - Supervised algorithms learn from labeled data to make predictions.
   - Unsupervised algo

# this is a implementation  using LCEL

In [None]:
llm

ChatGoogleGenerativeAI(model='models/gemini-1.0-pro', google_api_key=SecretStr('**********'), client=<google.ai.generativelanguage_v1beta.services.generative_service.client.GenerativeServiceClient object at 0x7876db7b8400>, async_client=<google.ai.generativelanguage_v1beta.services.generative_service.async_client.GenerativeServiceAsyncClient object at 0x7876db7b9420>, default_metadata=())

In [None]:
prompt

PromptTemplate(input_variables=['skill'], input_types={}, partial_variables={}, template='Hi! I am learning {skill}. Can you suggest me top 5 things to learn?\n')

In [None]:
chain = prompt | llm

In [None]:
print(chain.invoke({'skill':'Big Data'}))

**Top 5 Essential Concepts for Big Data Learning:**

1. **Data Engineering:** Understanding the processes involved in collecting, storing, processing, and managing vast volumes of data. This includes data pipelines, data lakes, and data warehouses.

2. **Hadoop Ecosystem:** Mastering the Hadoop framework and its components, such as HDFS (Hadoop Distributed File System), MapReduce, YARN, and Hive, is crucial for handling large-scale data analysis.

3. **Big Data Analytics:** Developing expertise in analyzing and extracting insights from massive datasets using techniques like machine learning, artificial intelligence, and statistical modeling.

4. **Cloud Computing:** Leveraging cloud platforms like AWS, Azure, or GCP to store, process, and analyze large amounts of data efficiently and cost-effectively.

5. **Data Visualization:** Effectively communicating data insights and findings through visual representations, dashboards, and interactive visualizations. This helps stakeholders unders

In [None]:
from langchain_core.output_parsers import StrOutputParser

In [None]:
parser = StrOutputParser()

In [None]:
chain = prompt | llm | parser

In [None]:
print(chain.invoke({'skill':'Machine Learning'}))

{'x': {'skill': 'Machine Learning'}, 'extra': 'Happy Learning'}


# lets discuss about the runnables

In [None]:
from langchain_core.runnables import RunnableParallel, RunnablePassthrough , RunnableLambda

In [None]:
chain = RunnablePassthrough()

In [None]:
chain.invoke('Welcome to my channel')

'Welcome to my channel'

In [None]:
chain = RunnablePassthrough() | RunnablePassthrough() | RunnablePassthrough()

In [None]:
chain.invoke('Welcome to my youtube channel')

'Welcome to my youtube channel'

In [None]:
def string_upper(input):
  return input.upper()

In [None]:
chain = RunnablePassthrough() | RunnableLambda(string_upper)

In [None]:
chain.invoke('Welcome to my youtube channel')

'WELCOME TO MY YOUTUBE CHANNEL'

In [None]:
# string_upper.invoke('Welcome to my youtube channel')

In [None]:
chain = RunnableLambda(string_upper)

In [None]:
chain.invoke('Welcome to my  youtube channel')

'WELCOME TO MY  YOUTUBE CHANNEL'

In [None]:
chain = RunnableParallel({'x':RunnablePassthrough(),'y':RunnablePassthrough()})

In [None]:
chain.invoke("Venky")

{'x': 'Venky', 'y': 'Venky'}

In [None]:
chain.invoke({'Youtube': '@venkateshn','Blog': "Venky's blog"})

{'x': {'Youtube': '@venkateshn', 'Blog': "Venky's blog"},
 'y': {'Youtube': '@venkateshn', 'Blog': "Venky's blog"}}

In [None]:
lambda x: x['Blog']

<function __main__.<lambda>(x)>

In [None]:
chain = RunnableParallel({'x':RunnablePassthrough(),'Blog':lambda x: x['Blog']})

In [None]:
chain.invoke({'Youtube': '@venkateshn','Blog': "Venky's blog"})

{'x': {'Youtube': '@venkateshn', 'Blog': "Venky's blog"},
 'Blog': "Venky's blog"}

In [None]:
def fetch_website(input: dict):
    output = input.get('Website','Not found')
    return output

In [None]:
mydict={'Youtube': '@venkateshn','Blog': "Venkys blog"}

In [None]:
mydict.get("website","Not found")

'Not found'

In [None]:
chain = RunnableParallel({'Website':RunnablePassthrough() | RunnableLambda(fetch_website),
                          'Blog':lambda z: z['Blog']})

In [None]:
chain.invoke({'Youtube': '@venkateshn','Blog': "Venky's blog"})

{'Website': 'Not found', 'Blog': "Venky's blog"}

In [None]:
chain.invoke({'Youtube': '@venkateshn','Blog': "Venky's blog" , 'Website' : 'venkateshn.com'})

{'Website': 'venkateshn.com', 'Blog': "Venky's blog"}

In [None]:
def extra_func(input):
    return 'Happy Learning'

In [None]:
chain = RunnableParallel({'x' : RunnablePassthrough()}).assign(extra=RunnableLambda(extra_func))

In [None]:
chain = RunnableParallel({'x' : RunnablePassthrough()}).assign(y=RunnableLambda(extra_func))

In [None]:
chain.invoke('Hello')

{'x': 'Hello', 'extra': 'Happy Learning'}

In [None]:
!pip install -q chromadb

[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/67.3 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━━[0m [32m30.7/67.3 kB[0m [31m1.6 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m67.3/67.3 kB[0m [31m1.0 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
  Preparing metadata (setup.py) ... [?25l[?25hdone
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m599.2/599.2 kB[0m [31m1.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.4/2.4 MB[0m [31m1.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m273.8/273.8 kB[0m [31m1.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

In [None]:
!pip install -q pypdf

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/292.8 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [91m━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m30.7/292.8 kB[0m [31m756.3 kB/s[0m eta [36m0:00:01[0m[2K   [91m━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━━━━━━━━━━━━━━━━━━[0m [32m122.9/292.8 kB[0m [31m1.6 MB/s[0m eta [36m0:00:01[0m[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m [32m286.7/292.8 kB[0m [31m2.7 MB/s[0m eta [36m0:00:01[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m292.8/292.8 kB[0m [31m2.5 MB/s[0m eta [36m0:00:00[0m
[?25h

In [None]:
from langchain_community.document_loaders import TextLoader, DirectoryLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.vectorstores import Chroma

### Reading the txt files from source directory

# loader = DirectoryLoader('/content/source', glob="./*.pdf", loader_cls=TextLoader)
# docs = loader.load()

# loader = DirectoryLoader('/content/source/', glob='*.pdf', loader_cls=TextLoader)
# docs = loader.load()

from langchain_community.document_loaders import PyPDFLoader

loader = PyPDFLoader("/content/source/Transformer.pdf")
docs = loader.load()

### Creating Chunks using RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=50,
    chunk_overlap=10,
    length_function=len
)
new_docs = text_splitter.split_documents(documents=docs)
doc_strings = [doc.page_content for doc in new_docs]

###  BGE Embddings

'''from langchain.embeddings import HuggingFaceBgeEmbeddings

model_name = "BAAI/bge-base-en-v1.5"
model_kwargs = {'device': 'cpu'}
encode_kwargs = {'normalize_embeddings': True} # set True to compute cosine similarity
embeddings = HuggingFaceBgeEmbeddings(
    model_name=model_name,
    model_kwargs=model_kwargs,
    encode_kwargs=encode_kwargs,
)
'''

### Creating Retriever using Vector DB

db = Chroma.from_documents(new_docs, embeddings)
retriever = db.as_retriever(search_kwargs={"k": 4})

In [None]:
template = """Answer the question based only on the following context:
{context}

Question: {question}
"""
prompt = PromptTemplate.from_template(template)


In [None]:
retrieval_chain = (
    RunnableParallel({"context": retriever, "question": RunnablePassthrough()})
    | prompt
    | llm
    | StrOutputParser()
    )

In [None]:
question ="Explain Scaled Dot-Product Attention?"

In [None]:
# prompt | llm
retrieval_chain.invoke(question)

'The provided context does not contain an explanation of Scaled Dot-Product Attention.'

In [None]:
retrieval_chain.invoke(question)

'The provided context does not include an explanation of Scaled Dot-Product Attention, so I cannot answer this question from the provided context.'

In [None]:
import time

start_time = time.time()

result = retrieval_chain.invoke(question)

print('Time taken:',time.time() - start_time)

Time taken: 0.997051477432251


In [None]:
start_time = time.time()

result = retrieval_chain.ainvoke(question)

print('Time taken:',time.time() - start_time)

Time taken: 0.00014853477478027344


In [None]:
start_time = time.time()

batch_output = retrieval_chain.batch([
                        "Explain self-attention layers",
                        "can you highlight 3 main properties?"
                       ])

print('Time taken:',time.time() - start_time)

Time taken: 1.2810289859771729


In [None]:
batch_output

['The provided context does not contain an explanation of self-attention layers.',
 'The provided context does not mention anything about 3 main properties, so I cannot highlight them.']

In [None]:
start_time = time.time()

batch_output = await retrieval_chain.abatch([
                        "what is llama3?",
                        "can you highlight 3 main properties?"
                       ])

print('Time taken:',time.time() - start_time)

Time taken: 1.1528196334838867


In [None]:
batch_output

['The provided context does not mention llama3, so I cannot answer this question from the provided context.',
 'This context does not mention anything about 3 main properties, so I cannot extract the requested data from the provided context.']

In [None]:
my_dict = {'Youtube': '@sunnysavita10','Blog': "sunny's blog" , 'Website' : 'sunnysavita.com'}
my_dict

{'Youtube': '@sunnysavita10',
 'Blog': "sunny's blog",
 'Website': 'sunnysavita.com'}

In [None]:
from operator import itemgetter

website = itemgetter('Website')

In [None]:
website(my_dict)

'sunnysavita.com'

In [None]:
template = """Answer the question based only on the following context:
{context}

Question: {question}

Answer in the following language: {language}
"""
prompt = PromptTemplate.from_template(template)


In [None]:
retrieval_chain = (
    RunnableParallel({"context": itemgetter('question') | retriever,
                       "question": itemgetter('question'),
                       "language": itemgetter('language')
                       })
    | prompt
    | llm
    | StrOutputParser()
    )

In [None]:
### itemgetter only works with dictionaries , input has to be a dict

response = retrieval_chain.invoke({'question': "what is llama3?",
                        'language': "Germany"})

print(response)

Die Frage kann aus dem bereitgestellten Kontext nicht beantwortet werden.


In [None]:
template = 'Hi! I am learning {skill}. Can you suggest me top 5 things to learn?\n'

prompt = PromptTemplate.from_template(template=template)

chain = prompt | llm

In [None]:
for s in chain.stream({'skill':'Big Data'}):
    print(s.content,end='')

**Top 5 Things to Learn for Big Data:**

1. **Big Data Fundamentals:** Understand concepts like 4Vs (Volume, Velocity, Variety, Value), data architectures (Hadoop, Spark, etc.), and data management techniques (ETL, ELT).

2. **Data Processing and Analytics:** Learn programming languages like Python or Java to process large datasets, perform data analysis, and extract insights using tools like Hadoop MapReduce, Spark, and Hive.

3. **Cloud Computing:** Familiarize yourself with cloud platforms like AWS, Azure, or GCP for storing, processing, and analyzing big data. Understand concepts like data lakes, data warehouses, and serverless computing.

4. **Machine Learning and Artificial Intelligence (AI):** Explore techniques for applying machine learning and AI to big data to build predictive models, detect patterns, and enhance decision-making.

5. **Data Visualization and Communication:** Learn techniques for effectively visualizing and communicating insights derived from big data using da

In [None]:
import json
from langchain_core.messages import ToolMessage
from langchain_core.tools import tool
from langchain_core.utils.function_calling import convert_to_openai_tool

@tool
def multiply(first_number: int, second_number: int):
    """Multiplies two numbers together."""
    return first_number * second_number

model_with_tools = llm.bind(tools=[convert_to_openai_tool(multiply)])

In [None]:
response = model_with_tools.invoke('What is 35 * 46?')

ChatGoogleGenerativeAIError: Invalid argument provided to Gemini: 400 * GenerateContentRequest.tools[0].function_declarations[0].name: Invalid function name. Must start with a letter or an underscore. Must be alphameric (a-z, A-Z, 0-9), underscores (_), dots (.) or dashes (-), with a maximum length of 64.


In [None]:
response

'Die Frage kann aus dem bereitgestellten Kontext nicht beantwortet werden.'