## Installing libraries and connect to LLMs

In [None]:
!pip install -qU  \
  python-dotenv \
  langchain \
  langchain-community \
  openai \
  langchain-openai

In [None]:
import os

# Set your API keys as environment variables
os.environ['OPENAI_API_KEY'] = ''

In [None]:
openai_api_key = os.getenv('OPENAI_API_KEY')

In [None]:
# Connect to OpenAI

from langchain_openai import ChatOpenAI
llm_gpt4 = ChatOpenAI(model="gpt-4")

In [None]:
# Verify that you can use the LLM
llm_gpt4.invoke("What is a large language model?").content

'A large language model is an artificial intelligence model that has been trained on a vast amount of text data. It uses this data to generate human-like text based on the input it is given. These models, such as OpenAI\'s GPT-3, are capable of completing tasks that require a deep understanding of language, like translation, answering questions, creating written content, summarization, and more. They are called "large" because they have a high number of parameters, often in the billions, allowing them to capture more information and produce more accurate results.'

## Basic Prompt Engineering

In [None]:
# Basic request using system and human/user message

system_prompt="""
You explain things to people like they are five year olds.
"""
user_prompt=f"""
What is large language model?
"""

from langchain_core.messages import HumanMessage, SystemMessage
import textwrap

messages = [
    SystemMessage(content=system_prompt),
    HumanMessage(content=user_prompt),
]

In [None]:
response=llm_gpt4.invoke(messages)
answer = textwrap.fill(response.content, width=100)

In [None]:
print(answer)

Okay, imagine you have a really big toy robot. This robot has been taught to understand and use
human language. You can ask it questions or tell it to write a story, and it will try its best to do
it. It learned how to do this by reading lots and lots of books, websites, and other stuff people
wrote. We call this big language-knowing toy robot a "large language model". It's like a super smart
parrot that can repeat things it has learned, but also try to make new sentences based on what it
knows. However, just like a parrot, it doesn't really understand what it's saying, it's just really
good at copying humans.


### Prompt Template from LangChain

In [None]:
from langchain.prompts import PromptTemplate

In [None]:
# Create a simple prompt template

prompt_template = """
You are a helpful assistant that explains AI topics. Given the following input:
{topic}
Provide an explanation of the given topic.
"""

# Create the prompt from the prompt template
prompt = PromptTemplate(
    input_variables=["topic"],
    template=prompt_template,
)

### Composing the Chain

In [None]:
# Assemble the chain using the pipe operator
chain = prompt | llm_gpt4

In [None]:
chain.invoke({"topic":"What is large language model"}).content

'A large language model is a type of artificial intelligence model that has been trained on a vast amount of text data. These models, like OpenAI\'s GPT-3 or Google\'s BERT, are designed to generate human-like text based on the input they are given.\n\nThe "large" in large language model refers to the size of the model in terms of the number of parameters it has. These models can have billions or even trillions of parameters, allowing them to capture a wide range of nuances in the data they were trained on.\n\nThe models are capable of understanding context, completing sentences, generating whole paragraphs, and even writing an essay on a given topic. Their applications span across various fields such as content creation, dialogue systems, translation, and more. However, they also have limitations and can sometimes produce outputs that are biased, nonsensical, or inappropriate.'

## Chain to Transcribe YouTube Videos

In [None]:
! pip install --upgrade --quiet  youtube-transcript-api

### Loaders from LangChain

In [None]:
# Import the Youtube Loader from the LangChain community

from langchain_community.document_loaders import YoutubeLoader

loader = YoutubeLoader.from_youtube_url(
    "https://youtu.be/h04DwdAkNZ4?si=C7MPK1mqvkBzUAAR", add_video_info=False
)

In [None]:
# Load the video transcript as documents
docs=loader.load()

In [None]:
docs

[Document(page_content="uh Hey guys so recently I have written this uh this article or you can call it a tutorial on how to create open source AI applications using Lang chain uh so we need to understand that uh LMS are not just enough to create your to build your eii applications you need to have a proper toolkit or a framework and uh um eii Frameworks like Lang chain and Lama andex uh uh help us build these a applications seamlessly uh so uh uh what is langin it's basically an open open source framework for eiml data Engineers to develop uh sophisticated Eid driven applications powered by llm and uh basically langin facilitates uh the integration of uh language models with all the required components including um external databases like vector databases logic reasoning apis and Etc so so these all are required to enance the capabilities of llm powered applications so Lan chain and then llama index they both provide uh provide this this toolkit so basically langin has six modules and 

### Chain to Summarise Youtube Video Giving a Transcript

In [None]:
transcript=docs[0].page_content

In [None]:
# We can now use the transcript in a chain
prompt_template = """
You are a helpful assistant that explains YT videos. Given the following video transcript:
{video_transcript}
Give a summary.
"""

# Create the prompt
prompt = PromptTemplate(
    input_variables=["video_transcript"],
    template=prompt_template,
)

In [None]:
chain = prompt | llm_gpt4

In [None]:
# Note that we can just feed the chain the docs without extracting the content as text

chain.invoke({"video_transcript":docs}).content

'The video is a tutorial on how to create open source AI applications using Lang chain. The speaker explains that Lang chain is an open-source framework for data engineers to develop AI-driven applications powered by language models. It facilitates the integration of language models with all required components, including external databases and logic reasoning APIs. The speaker demonstrates how to use Lang chain to split a publicly available PDF into chunks and store them into a vector database. They also show how to ask a query and retrieve the most relevant response. The speaker uses a Single Store notebook feature and mentions the need to create a workspace and a database. They provide a step-by-step guide on how to install the required libraries, load the PDF, split it into chunks, store the contents into a database, and ask a query.'

In [None]:
from langchain.chains.combine_documents import create_stuff_documents_chain

In [None]:
# The create_stuff_documents_chain takes a list of docs and formats them all into a prompt

prompt_template = """
You are a helpful assistant that explains AI topics. Given the following context:
{context}
Summarize what LangChain can do.
"""

# Create the prompt
prompt = PromptTemplate(
    input_variables=["context"],
    template=prompt_template,
)

chain = create_stuff_documents_chain(llm_gpt4, prompt)

In [None]:
#docs

In [None]:
chain.invoke({"context": docs})

'LangChain is an open-source framework designed to aid data engineers in developing sophisticated AI-driven applications. It facilitates the integration of language models with various components, including external databases, logic reasoning APIs, and more to enhance the capabilities of AI applications. LangChain includes six modules - models, chains, prompts, indexes, memory, and agents - that aid in the seamless construction of AI applications. It allows users to load and split PDFs, set up databases to store content, create embeddings, and insert them into the database. Users can then ask queries and retrieve the most relevant response.'

In [None]:
from langchain.prompts import PromptTemplate
from langchain_core.output_parsers import StrOutputParser

summarize_prompt_template = """
You are a helpful assistant that summarizes AI concepts:
{context}
Summarize the context
"""

summarize_prompt = PromptTemplate.from_template(summarize_prompt_template)

In [None]:
summarize_prompt

PromptTemplate(input_variables=['context'], template='\nYou are a helpful assistant that summarizes AI concepts:\n{context}\nSummarize the context\n')

In [None]:
output_parser = StrOutputParser()

chain = summarize_prompt | llm_gpt4 | output_parser

chain.invoke({"context": "What is LangChain?"})

'LangChain is an Artificial Intelligence (AI) project based on blockchain technology. Its primary aim is to develop a decentralized translation solution. The project utilizes AI and the power of community contributions to facilitate accurate and efficient translation services. The features of blockchain like transparency, security, and incentives (through tokens) are used to encourage contributors to improve the AI translation models. LangChain aims to disrupt the traditional translation industry by providing a more affordable, quicker, and reliable translation service.'

In [None]:
# Verify the type of the chain
print(type(chain)) # Should print <class 'langchain_core.runnables.base.RunnableSequence'>

<class 'langchain_core.runnables.base.RunnableSequence'>


In [None]:
# Inject python functions into a chain with RunnableLambda
from langchain_core.runnables import RunnableLambda

summarize_chain = summarize_prompt | llm_gpt4 | output_parser

# Define a custom lambda function and wrap it in RunnableLambda
length_lambda = RunnableLambda(lambda summary: f"Summary length: {len(summary)} characters")

lambda_chain = summarize_chain | length_lambda

lambda_chain.invoke({"context": "What is LangChain?"})

'Summary length: 1429 characters'

In [None]:
print(type(lambda_chain.steps[-1])) # Should print <class 'langchain_core.runnables.base.RunnableLambda'>

<class 'langchain_core.runnables.base.RunnableLambda'>


In [None]:
# Use function in chain without converting to RunnableLambda
chain_with_function = summarize_chain |  (lambda summary: f"Summary length: {len(summary)} characters")

In [None]:
print(type(chain_with_function.steps[-1]))

<class 'langchain_core.runnables.base.RunnableLambda'>


In [None]:
chain_with_function.invoke({"context": "What is LangChain?"})

'Summary length: 679 characters'

### Text Splitters from LangChain (Chunking Data)

In [None]:
from langchain_text_splitters import RecursiveCharacterTextSplitter

In [None]:
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=100,
    chunk_overlap=20,
    length_function=len,
    is_separator_regex=False,
)

In [None]:
docs_split = text_splitter.split_documents(docs)

In [None]:
docs_split

[Document(page_content='uh Hey guys so recently I have written this uh this article or you can call it a tutorial on how to', metadata={'source': 'h04DwdAkNZ4'}),
 Document(page_content='tutorial on how to create open source AI applications using Lang chain uh so we need to understand', metadata={'source': 'h04DwdAkNZ4'}),
 Document(page_content='need to understand that uh LMS are not just enough to create your to build your eii applications', metadata={'source': 'h04DwdAkNZ4'}),
 Document(page_content='eii applications you need to have a proper toolkit or a framework and uh um eii Frameworks like', metadata={'source': 'h04DwdAkNZ4'}),
 Document(page_content='eii Frameworks like Lang chain and Lama andex uh uh help us build these a applications seamlessly', metadata={'source': 'h04DwdAkNZ4'}),
 Document(page_content="seamlessly uh so uh uh what is langin it's basically an open open source framework for eiml data", metadata={'source': 'h04DwdAkNZ4'}),
 Document(page_content='for eiml da

We can extend this tutorial to create a simple RAG setup using SingleStore as a vector database.