In [None]:
!pip install langchain lark chromadb pypdf google-cloud-aiplatform google-auth > /dev/null

[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
google-colab 1.0.0 requires requests==2.27.1, but you have requests 2.31.0 which is incompatible.[0m[31m
[0m

There are many PDFs and papers in below location, which will be used for embedding and further querying it.

https://github.com/insightbuilder/python_de_learners_data/tree/main/resources

In [None]:
#Loading the documents from langchain resources folder

from langchain.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter

def load_pdf(path_pdf):
  get_text = PyPDFLoader(path_pdf)
  
  get_pages = get_text.load()

  final_text = []

  shredder = RecursiveCharacterTextSplitter(chunk_size=350,
                                            chunk_overlap=20,
                                            length_function=len) 
  
  final_shred = shredder.split_documents(get_pages)

  return final_shred


In [None]:
import os

os.environ['GOOGLE_APPLICATION_CREDENTIALS']="/content/generativeaitrial-trialLC.json"

In [None]:
from langchain.schema import Document
from langchain.embeddings import VertexAIEmbeddings
from langchain.vectorstores import Chroma

embeddings = VertexAIEmbeddings()

In [None]:
persist_directory = '/content/content/lc_documentdb'

In [None]:
!unzip lc_documentdb.zip

Archive:  lc_documentdb.zip
   creating: content/lc_documentdb/
  inflating: content/lc_documentdb/chroma-embeddings.parquet  
   creating: content/lc_documentdb/index/
  inflating: content/lc_documentdb/index/uuid_to_id_b973adbd-91f7-4c95-b071-c5aabdbd5ab7.pkl  
  inflating: content/lc_documentdb/index/id_to_uuid_b973adbd-91f7-4c95-b071-c5aabdbd5ab7.pkl  
  inflating: content/lc_documentdb/index/index_b973adbd-91f7-4c95-b071-c5aabdbd5ab7.bin  
  inflating: content/lc_documentdb/index/index_metadata_b973adbd-91f7-4c95-b071-c5aabdbd5ab7.pkl  
  inflating: content/lc_documentdb/chroma-collections.parquet  


In [None]:
vectordb = Chroma(persist_directory=persist_directory, 
                  embedding_function=embeddings)

In [None]:
db_retriever = vectordb.as_retriever()
db_retriever.get_relevant_documents("langchain concepts")

[Document(page_content='5/31/23, 6:13 AM Concepts — \x00\x00 LangChain 0.0.186\nhttps://python.langchain.com/en/stable/getting_started/concepts.html 1/3Concepts\nContents\nChain of Thought\nAction Plan Generation\nReAct\nSelf-ask\nPrompt Chaining\nMemetic Proxy\nSelf Consistency\nInception\nMemPrompt\nThese are concepts and terminology commonly used when developing LLM applications. It', metadata={'source': '/content/Concepts.pdf', 'page': 0}),
 Document(page_content='This is the Python specific portion of the documentation. For a purely conceptual guide to\nLangChain, see here. For the JavaScript documentation, see here.\nGetting Started\nHow to get started using LangChain to create an Language Model application.\nQuickstart Guide\nConcepts and terminology .\nConcepts and terminology', metadata={'source': '/content/WelcometoLangChain.pdf', 'page': 0}),
 Document(page_content='contains reference to external papers or sources where the concept was first introduced, as\nwell as to places

In [None]:
from langchain.llms import VertexAI
from langchain.chains import LLMChain
from langchain.prompts import PromptTemplate

llm = VertexAI(temperature=0)


prompt_template = """Use the context below to write a 400 word blog post about the topic below:
    Context: {context}
    Topic: {topic}
    Blog post:"""

PROMPT = PromptTemplate(
    template=prompt_template, input_variables=["context", 
                                               "topic"]
)

chain = LLMChain(llm=llm, prompt=PROMPT)

In [None]:
def generate_blog_post(topic):
    docs = vectordb.similarity_search(topic, k=4)
    inputs = [{"context": doc.page_content, 
               "topic": topic} for doc in docs]
    gen = chain.apply(inputs)
    return gen

In [None]:
output = generate_blog_post("Langchain Concepts")

In [None]:
print(output[0]['text'])

## Langchain Concepts

Langchain is a large language model (LLM) that can be used for a variety of tasks, including text generation, summarization, and question answering. It is built on the Transformer architecture, which is a type of neural network that is designed to process sequential data.

One of the key features of Langchain is its ability to generate text in a coherent and informative way. This is due to the model's large size and its ability to learn from a massive dataset of text. Langchain can also be used to generate text in different styles, such as creative writing, technical writing,


In [None]:
print(output[2]['text'])

Chain of Thought (CoT) is a prompting technique used to encourage the model to generate text that is coherent and follows a logical progression. It is based on the idea that human language is generated by a series of steps, or thoughts, that build on each other to create a coherent narrative.

The CoT technique works by providing the model with a series of prompts that guide it through the process of generating text. The first prompt is typically a topic or theme, and the subsequent prompts are used to provide additional information or context. The model is then allowed to generate text based on the prompts, and the output is typically more coherent


In [None]:
len(output[1]['text'].split(' '))

103

In [None]:
stat_template = """Extract the word frequencies
from the context below and then write a 280 character 
youtube comment on the topic:
    Context: {context}
    Topic: {topic}
    Blog post:"""

yt_stat = PromptTemplate(
    template=stat_template, input_variables=["context", 
                                               "topic"]
)

chain_yt = LLMChain(llm=llm, 
                    prompt=yt_stat)

In [None]:
def generate_yt(topic):
    docs = vectordb.similarity_search(topic, k=4)
    inputs = [{"context": doc.page_content, 
               "topic": topic} for doc in docs]
    gen = chain_yt.apply(inputs)
    return gen

In [31]:
stat_out = generate_yt("Langchain Concepts")

In [35]:
print(stat_out[3]['text'])

The most frequent words in the context are:
    1. LangChain
    2. module
    3. provide
    4. interface
    5. application

Here is a 280 character youtube comment on the topic:

LangChain is a new language model that is designed to be modular and extensible. This means that it can be used to build a variety of different applications, from chatbots to text generators. LangChain provides standard interfaces for each module, which makes it easy to use and extend.
