<a href="https://colab.research.google.com/github/sunnysavita10/Generative-AI-Indepth-Basic-to-Advance/blob/main/LCEL(Langchain_Expression_Language).ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Transitioning from Old class to New Pipe Base Operator

## 1. Understanding `Runnables`
- `Runnables` are self-contained units of work.
- Can be executed in isolation or combined for complex operations.
- Provides flexibility in execution (sync, async, parallel).

## 2. `RunnableParallel`
- Executes tasks concurrently.
- Useful for performance enhancement in scenarios where tasks can run independently.
- Syntax example:
    ```python
    from some_module import RunnableParallel
    ```

## 3. `RunnablePassthrough`
- A simple `Runnable` that passes inputs directly to outputs without modification.
- Helpful for debugging or chaining in pipelines.
- Example use case:
    ```python
    from some_module import RunnablePassthrough
    passthrough = RunnablePassthrough()
    result = passthrough.run(input_data)
    ```

## 4. `RunnableLambda`
- Allows quick, inline definitions of small, custom functions.
- Example:
    ```python
    from some_module import RunnableLambda
    lambda_op = RunnableLambda(lambda x: x * 2)
    result = lambda_op.run(5)  # Output: 10
    ```

## 5. Assign Functions
- Used to assign values or parameters during execution.
- Useful in data pipelines to update intermediate values.

## 6. Performance Improvement (Inference Speed)
- Focus on optimizing the inference speed by leveraging parallel execution.
- Use `RunnableParallel` or batching techniques.
- Consider optimizing data pipelines by removing unnecessary steps.

## 7. Async Invoke
- Executes operations asynchronously, improving the overall throughput of the system.
- Syntax example:
    ```python
    async def async_operation():
        result = await some_async_function()
    ```

## 8. Batch Support
- Handles multiple inputs at once to improve performance.
- Can be combined with `RunnableParallel` for parallel batch execution.

## 9. Async Batch Execution
- Combines asynchronous execution with batch processing for high-performance tasks.
- Reduces overall execution time for larger datasets.

## 10. Using `Itemgetter` with `LCEL`
- `Itemgetter` is used to extract specific items from collections.
- When combined with `LCEL` (LangChain Execution Layer), it can streamline complex operations.

## 11. Bind Tools
- `Bind` tools help to connect different steps in the pipeline.
- Ensures smooth data flow between various `Runnable` components.

## 12. Stream Support
- Keep your pipelines more responsive by incorporating stream support for data.
- This allows continuous data processing and near real-time outputs.
  


In [None]:
!pip install langchain_google_genai
!pip install langchain_community
!pip install langchain
!pip install langchain_huggingface
!pip install langchain_groq

Collecting langchain_google_genai
  Downloading langchain_google_genai-2.0.0-py3-none-any.whl.metadata (3.9 kB)
Collecting langchain-core<0.4,>=0.3.0 (from langchain_google_genai)
  Downloading langchain_core-0.3.0-py3-none-any.whl.metadata (6.2 kB)
Collecting jsonpatch<2.0,>=1.33 (from langchain-core<0.4,>=0.3.0->langchain_google_genai)
  Downloading jsonpatch-1.33-py2.py3-none-any.whl.metadata (3.0 kB)
Collecting langsmith<0.2.0,>=0.1.117 (from langchain-core<0.4,>=0.3.0->langchain_google_genai)
  Downloading langsmith-0.1.121-py3-none-any.whl.metadata (13 kB)
Collecting tenacity!=8.4.0,<9.0.0,>=8.1.0 (from langchain-core<0.4,>=0.3.0->langchain_google_genai)
  Downloading tenacity-8.5.0-py3-none-any.whl.metadata (1.2 kB)
Collecting jsonpointer>=1.9 (from jsonpatch<2.0,>=1.33->langchain-core<0.4,>=0.3.0->langchain_google_genai)
  Downloading jsonpointer-3.0.0-py2.py3-none-any.whl.metadata (2.3 kB)
Collecting httpx<1,>=0.23.0 (from langsmith<0.2.0,>=0.1.117->langchain-core<0.4,>=0.3.0-

In [None]:
from google.colab import userdata
GROQ_API_KEY=userdata.get('GROQ_API_KEY')
import os
os.environ["GROQ_API_KEY"]=GROQ_API_KEY

In [None]:
from google.colab import userdata
GOOGLE_API_KEY=userdata.get('GOOGLE_API_KEY')
import os
os.environ["GOOGLE_API_KEY"]=GOOGLE_API_KEY

In [None]:
from langchain_google_genai import GoogleGenerativeAIEmbeddings
embeddings = GoogleGenerativeAIEmbeddings(model="models/embedding-001")
from langchain_google_genai import ChatGoogleGenerativeAI
llm = ChatGoogleGenerativeAI(model="gemini-1.0-pro")

In [None]:
'''from langchain_huggingface import HuggingFaceEmbeddings
embeddings=HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")
from langchain_groq import ChatGroq
import os
llm=ChatGroq(model_name="Gemma2-9b-It")'''

modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/10.7k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]



1_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

# this is my simple chain (old chaining concept)

In [None]:
template= 'Hi! I am learning {skill}. Can you suggest me top 5 things to learn?\n'

In [None]:
from langchain import PromptTemplate

In [None]:
prompt = PromptTemplate(template=template,input_variables=["skill"])

In [None]:
print(prompt)

input_variables=['skill'] input_types={} partial_variables={} template='Hi! I am learning {skill}. Can you suggest me top 5 things to learn?\n'


In [None]:
from langchain import LLMChain

In [None]:
llm_chain = LLMChain(prompt=prompt,llm=llm)

In [None]:
print(llm_chain.run('Data Science'))

**Top 5 Essential Skills for Data Science Learners:**

1. **Programming Languages:** Python and R are the most popular programming languages for data science. Focus on mastering data manipulation, analysis, and visualization capabilities.

2. **Statistics and Probability:** Understand fundamental statistical concepts such as probability distributions, hypothesis testing, and regression analysis. These skills are crucial for data analysis and modeling.

3. **Machine Learning Algorithms:** Explore different machine learning algorithms, including supervised (e.g., linear regression, decision trees) and unsupervised (e.g., clustering, dimensionality reduction) techniques.

4. **Data Manipulation and Visualization:** Learn to efficiently import, clean, manipulate, and visualize data using libraries like pandas, NumPy, and Matplotlib. Effective data visualization is key for communicating insights.

5. **Cloud Computing:** Familiarize yourself with cloud platforms like AWS, Azure, or Google C

In [None]:
print(llm_chain.run({'skill':'Data Science'}))

**Top 5 Things to Learn for Data Science:**

1. **Programming Language:** Proficiency in a programming language is essential for data manipulation, analysis, and visualization. Python and R are the most popular choices for data science.

2. **Data Manipulation and Analysis:** Understand how to clean, transform, and explore data using tools like Pandas (Python) or dplyr (R). Learn statistical concepts like descriptive statistics, hypothesis testing, and regression analysis.

3. **Machine Learning:** Study supervised and unsupervised learning algorithms, such as linear regression, logistic regression, decision trees, and clustering. Understand the principles of model selection, evaluation, and deployment.

4. **Data Visualization:** Learn to create informative and visually appealing visualizations using libraries like Matplotlib (Python) or ggplot2 (R). This enables effective communication of insights and facilitates data exploration.

5. **Cloud Computing:** Familiarize yourself with cl

# this is a implementation  using LCEL

In [None]:
llm

ChatGoogleGenerativeAI(model='models/gemini-1.0-pro', google_api_key=SecretStr('**********'), client=<google.ai.generativelanguage_v1beta.services.generative_service.client.GenerativeServiceClient object at 0x7b10ed2405e0>, async_client=<google.ai.generativelanguage_v1beta.services.generative_service.async_client.GenerativeServiceAsyncClient object at 0x7b10ed2418d0>, default_metadata=())

In [None]:
prompt

PromptTemplate(input_variables=['skill'], input_types={}, partial_variables={}, template='Hi! I am learning {skill}. Can you suggest me top 5 things to learn?\n')

In [None]:
chain = prompt | llm

In [None]:
print(chain.invoke({'skill':'Big Data'}))

content='**Top 5 Essential Concepts for Big Data:**\n\n1. **Big Data Frameworks:** Understand the fundamentals of Hadoop, Spark, and other popular frameworks used for processing and managing large datasets.\n2. **Data Ingestion and Storage:** Learn the techniques for importing and storing data from various sources, including structured, semi-structured, and unstructured data.\n3. **Data Analysis and Visualization:** Explore methods for analyzing big data, including statistical techniques, machine learning algorithms, and data visualization tools.\n4. **Data Management and Security:** Understand the principles of data management, data governance, and data security in a big data environment.\n5. **Cloud Computing for Big Data:** Familiarize yourself with the role of cloud platforms, such as AWS, Azure, and GCP, in supporting big data storage, processing, and analysis.' additional_kwargs={} response_metadata={'prompt_feedback': {'block_reason': 0, 'safety_ratings': []}, 'finish_reason': '

In [None]:
from langchain_core.output_parsers import StrOutputParser

In [None]:
parser = StrOutputParser()

In [None]:
chain = prompt | llm | parser

In [None]:
print(chain.invoke({'skill':'Machine Learning'}))

**Top 5 Essential Concepts to Learn in Machine Learning:**

1. **Supervised Learning:** Understanding how algorithms learn from labeled data and predict outcomes. This includes concepts like linear regression, logistic regression, and decision trees.

2. **Unsupervised Learning:** Exploring algorithms that learn from unlabeled data to find patterns and structures. This encompasses methods like clustering, dimensionality reduction, and anomaly detection.

3. **Model Evaluation:** Mastering techniques to assess the performance of machine learning models, such as accuracy, precision, recall, and confusion matrices.

4. **Feature Engineering:** Learning how to transform and select the most informative features from raw data to improve model performance. This involves data cleaning, normalization, and feature selection techniques.

5. **Model Tuning and Optimization:** Understanding how to adjust model parameters and hyperparameters to enhance its predictive capabilities. This includes meth

# lets discuss about the runnables

In [None]:
from langchain_core.runnables import RunnableParallel, RunnablePassthrough , RunnableLambda

In [None]:
chain = RunnablePassthrough()

In [None]:
chain.invoke('Welcome to this youtube channel')

'Welcome to this youtube channel'

In [None]:
chain = RunnablePassthrough() | RunnablePassthrough() | RunnablePassthrough()

In [None]:
chain.invoke('Welcome to my sunny"s youtube channel')

'Welcome to my sunny"s youtube channel'

In [None]:
def string_upper(input):
  return input.upper()

In [None]:
chain = RunnablePassthrough() | RunnableLambda(string_upper)

In [None]:
chain.invoke('Welcome to my sunny"s youtube channel')

'WELCOME TO MY SUNNY"S YOUTUBE CHANNEL'

In [None]:
string_upper.invoke('Welcome to my sunny"s youtube channel')

AttributeError: 'function' object has no attribute 'invoke'

In [None]:
chain = RunnableLambda(string_upper)

In [None]:
chain.invoke('Welcome to my sunny"s youtube channel')

'WELCOME TO MY SUNNY"S YOUTUBE CHANNEL'

In [None]:
chain = RunnableParallel({'x':RunnablePassthrough(),'y':RunnablePassthrough()})

In [None]:
chain.invoke("Sunny")

{'x': 'Sunny', 'y': 'Sunny'}

In [None]:
chain.invoke({'Youtube': '@sunnysavita10','Blog': "Sunny's blog"})

{'x': {'Youtube': '@sunnysavita10', 'Blog': "Sunny's blog"},
 'y': {'Youtube': '@sunnysavita10', 'Blog': "Sunny's blog"}}

In [None]:
lambda x: x['Blog']

In [None]:
chain = RunnableParallel({'x':RunnablePassthrough(),'Blog':lambda x: x['Blog']})

In [None]:
chain.invoke({'Youtube': '@sunnysavita10','Blog': "Sunny's blog"})

{'x': {'Youtube': '@sunnysavita10', 'Blog': "Sunny's blog"},
 'Blog': "Sunny's blog"}

In [None]:
def fetch_website(input: dict):
    output = input.get('Website','Not found')
    return output

In [None]:
mydict={'Youtube': '@sunnysavita10','Blog': "Sunny's blog"}

In [None]:
mydict.get("website","Not found")

'Not found'

In [None]:
chain = RunnableParallel({'Website':RunnablePassthrough() | RunnableLambda(fetch_website),
                          'Blog':lambda z: z['Blog']})

In [None]:
chain.invoke({'Youtube': '@sunnysavita10','Blog': "Sunny's blog"})

{'Website': 'Not found', 'Blog': "Sunny's blog"}

In [None]:
chain.invoke({'Youtube': '@sunnysavita10','Blog': "Sunny's blog" , 'Website' : 'sunnysavita.com'})

{'Website': 'sunnysavita.com', 'Blog': "Sunny's blog"}

In [None]:
def extra_func(input):
    return 'Happy Learning'

In [None]:
chain = RunnableParallel({'x' : RunnablePassthrough()}).assign(extra=RunnableLambda(extra_func))

In [None]:
chain = RunnableParallel({'x' : RunnablePassthrough()}).assign(y=RunnableLambda(extra_func))

In [None]:
chain.invoke('Hello')

{'x': 'Hello', 'y': 'Happy Learning'}

In [None]:
!pip install chromadb

Collecting chromadb
  Downloading chromadb-0.5.5-py3-none-any.whl.metadata (6.8 kB)
Collecting chroma-hnswlib==0.7.6 (from chromadb)
  Downloading chroma_hnswlib-0.7.6-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (252 bytes)
Collecting fastapi>=0.95.2 (from chromadb)
  Downloading fastapi-0.114.2-py3-none-any.whl.metadata (27 kB)
Collecting uvicorn>=0.18.3 (from uvicorn[standard]>=0.18.3->chromadb)
  Downloading uvicorn-0.30.6-py3-none-any.whl.metadata (6.6 kB)
Collecting posthog>=2.4.0 (from chromadb)
  Downloading posthog-3.6.6-py2.py3-none-any.whl.metadata (2.0 kB)
Collecting onnxruntime>=1.14.1 (from chromadb)
  Downloading onnxruntime-1.19.2-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.metadata (4.5 kB)
Collecting opentelemetry-api>=1.2.0 (from chromadb)
  Downloading opentelemetry_api-1.27.0-py3-none-any.whl.metadata (1.4 kB)
Collecting opentelemetry-exporter-otlp-proto-grpc>=1.2.0 (from chromadb)
  Downloading opentelemetry_exporter_otlp_pro

In [None]:
from langchain_community.document_loaders import TextLoader, DirectoryLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.vectorstores import Chroma

### Reading the txt files from source directory

loader = DirectoryLoader('./source', glob="./*.txt", loader_cls=TextLoader)
docs = loader.load()

### Creating Chunks using RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=50,
    chunk_overlap=10,
    length_function=len
)
new_docs = text_splitter.split_documents(documents=docs)
doc_strings = [doc.page_content for doc in new_docs]

###  BGE Embddings

'''from langchain.embeddings import HuggingFaceBgeEmbeddings

model_name = "BAAI/bge-base-en-v1.5"
model_kwargs = {'device': 'cpu'}
encode_kwargs = {'normalize_embeddings': True} # set True to compute cosine similarity
embeddings = HuggingFaceBgeEmbeddings(
    model_name=model_name,
    model_kwargs=model_kwargs,
    encode_kwargs=encode_kwargs,
)
'''

### Creating Retriever using Vector DB

db = Chroma.from_documents(new_docs, embeddings)
retriever = db.as_retriever(search_kwargs={"k": 4})

In [None]:
template = """Answer the question based only on the following context:
{context}

Question: {question}
"""
prompt = PromptTemplate.from_template(template)


In [None]:
retrieval_chain = (
    RunnableParallel({"context": retriever, "question": RunnablePassthrough()})
    | prompt
    | llm
    | StrOutputParser()
    )

In [None]:
question ="what is llama3? can you highlight 3 important points?"

In [None]:
retrieval_chain.invoke(question)

'- Llama 3 is a large language model developed by Meta AI.\n- It was released in April 2024.\n- It is used in various applications, including language translation, question answering, and text generation.'

In [None]:
import time

start_time = time.time()

result = retrieval_chain.invoke(question)

print('Time taken:',time.time() - start_time)

Time taken: 1.6216003894805908


Exception ignored in: <coroutine object RunnableSequence.ainvoke at 0x7b0fd7ee3c30>
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/google/colab/_variable_inspector.py", line 27, in run
KeyError: '__builtins__'
Exception ignored in: <coroutine object RunnableSequence.ainvoke at 0x7b0fd7ee3c30>
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/google/colab/_variable_inspector.py", line 27, in run
KeyError: '__builtins__'


In [None]:
start_time = time.time()

result = retrieval_chain.ainvoke(question)

print('Time taken:',time.time() - start_time)

Time taken: 0.00014090538024902344


In [None]:
start_time = time.time()

batch_output = retrieval_chain.batch([
                        "what is llama3?",
                        "can you highlight 3 main properties?"
                       ])

print('Time taken:',time.time() - start_time)

Time taken: 0.9684896469116211


In [None]:
batch_output

['Llama 3 is a family of large language models developed by Meta AI.',
 'The provided context does not mention anything about the main properties, so I cannot extract the requested data from the provided context.']

In [None]:
start_time = time.time()

batch_output = await retrieval_chain.abatch([
                        "what is llama3?",
                        "can you highlight 3 main properties?"
                       ])

print('Time taken:',time.time() - start_time)

Time taken: 0.9242796897888184


In [None]:
batch_output

In [None]:
my_dict = {'Youtube': '@sunnysavita10','Blog': "sunny's blog" , 'Website' : 'sunnysavita.com'}
my_dict

{'Youtube': '@sunnysavita10',
 'Blog': "sunny's blog",
 'Website': 'sunnysavita.com'}

In [None]:
from operator import itemgetter

website = itemgetter('Website')

In [None]:
website(my_dict)

'sunnysavita.com'

In [None]:
template = """Answer the question based only on the following context:
{context}

Question: {question}

Answer in the following language: {language}
"""
prompt = PromptTemplate.from_template(template)


In [None]:
retrieval_chain = (
    RunnableParallel({"context": itemgetter('question') | retriever,
                       "question": itemgetter('question'),
                       "language": itemgetter('language')
                       })
    | prompt
    | llm
    | StrOutputParser()
    )

In [None]:
### itemgetter only works with dictionaries , input has to be a dict

response = retrieval_chain.invoke({'question': "what is llama3?",
                        'language': "Spnish"})

print(response)

Llama (Large Language Model Meta AI) es una familia


In [None]:
template = 'Hi! I am learning {skill}. Can you suggest me top 5 things to learn?\n'

prompt = PromptTemplate.from_template(template=template)

chain = prompt | llm

In [None]:
for s in chain.stream({'skill':'Big Data'}):
    print(s.content,end='')

**Top 5 Must-Learn Concepts for Big Data:**

1. **Data Management and Storage:** Understand various data storage technologies like Hadoop Distributed File System (HDFS), NoSQL databases (e.g., MongoDB, Cassandra), and data warehousing concepts.

2. **Data Processing and Analysis:** Learn about frameworks like Apache Spark, MapReduce, and Hadoop for processing and analyzing large datasets. Explore data mining, machine learning, and statistical techniques for extracting insights.

3. **Cloud Computing:** Familiarize yourself with cloud platforms like AWS, Azure, and GCP that offer scalable and cost-effective solutions for storing, processing, and analyzing big data.

4. **Data Visualization and Communication:** Gain expertise in tools like Tableau, Power BI, and QlikView to effectively communicate data insights to stakeholders. Learn about data visualization best practices and storytelling techniques.

5. **Data Governance and Security:** Understand data governance principles, data quali

In [None]:
import json
from langchain_core.messages import ToolMessage
from langchain_core.tools import tool
from langchain_core.utils.function_calling import convert_to_openai_tool

@tool
def multiply(first_number: int, second_number: int):
    """Multiplies two numbers together."""
    return first_number * second_number

model_with_tools = llm.bind(tools=[convert_to_openai_tool(multiply)])

In [None]:
response = model_with_tools.invoke('What is 35 * 46?')

In [None]:
response