<a href="https://colab.research.google.com/github/rabbitmetrics/langchain-13-min/blob/main/notebooks/langchain-13-min.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Environment Setup
- Create a HF account
- Create and OpenAI account optionally
- Install python packages - **transformers, datasets, openai, sentencepiece, torch, numpy, pandas,** matplotlib, seaborn, tqdm, sklearn, nltk, spacy, gensim, pytorch-lightning, wandb, sentence-transformers, faiss-cpu, annoy, mpld3, wordcloud, plotly, streamlit, pyngrok, fastapi, uvicorn, pydantic, fastapi-utils, aiofiles, python-multipart, **python-dotenv**, fastapi-chameleon, fastapi-pagination, fastapi-pagination[custom]
- Add the following to .env
    - OPENAI_API_KEY=your openai key
    - HUGGINGFACEHUB_API_TOKEN=your hf key


In [2]:
import pandas as pd
import numpy as np

# Load environment variables
from dotenv import load_dotenv,find_dotenv,dotenv_values
load_dotenv(find_dotenv())
[ k for k,_ in dotenv_values().items() ] # sanity check

['OPENAI_API_KEY', 'HUGGINGFACEHUB_API_TOKEN']

In [3]:
!cat .env | sed -e "s/=.*/=\*\*\*\*\*/g" # sanity check

OPENAI_API_KEY=*****
HUGGINGFACEHUB_API_TOKEN=*****

## A very simple language model

In [4]:
from collections import Counter
import numpy as np

class BrainDeadLanguageModel:
    '''
    A unigram language model that 
    - learns the probability of each word in a corpus and 
    - computes the probability of a sentence as the product of the probabilities of its constituent words.
    - predicts the next word in a sentence.
    '''
    def __init__(self, corpus):
        # Count the frequency of each word in the corpus
        self.word_counts = Counter(corpus.split())

        # Compute the total number of words in the corpus
        self.total_words = sum(self.word_counts.values())

        # Compute the probability of each word
        self.word_probs = {word: count / self.total_words for word, count in self.word_counts.items()}

    def probability(self, sentence):
        # Compute the probability of a sentence as the product of the probabilities of its constituent words
        words = sentence.split()
        return np.prod([self.word_probs.get(word, 1e-10) for word in words])
    
    def predict_next_word(self, sentence):
        # Predict the next word
        words = sentence.split()
        return max(self.word_probs, key=lambda word: self.probability(' '.join(words + [word])))    
    
## Whole corpus is a single sentence
ll = BrainDeadLanguageModel('the quick brown fox jumps over the lazy dog')

display(
    "=== Sentence Probabilities ===",
    ll.probability('the quick brown fox jumps over the lazy dog'),
    ll.probability('the quick brown fox jumps over the dog lazy cat'),
    ll.probability('the quick brown fox jumps over the'),
    ll.probability('the'),
    ll.probability('moon is blue'),
    "=== Predictions ===",
    ll.predict_next_word('the quick brown fox jumps over'),
    ll.predict_next_word('the quick brown'),
    "=== Word Probablity ===",
    ll.word_probs,
    )            


'=== Sentence Probabilities ==='

1.0324699166852783e-08

1.0324699166852784e-18

8.363006325150756e-07

0.2222222222222222

1e-30

'=== Predictions ==='

'the'

'the'

'=== Word Probablity ==='

{'the': 0.2222222222222222,
 'quick': 0.1111111111111111,
 'brown': 0.1111111111111111,
 'fox': 0.1111111111111111,
 'jumps': 0.1111111111111111,
 'over': 0.1111111111111111,
 'lazy': 0.1111111111111111,
 'dog': 0.1111111111111111}

### Why LLMs

- The above Model is a language model(LM) as it has the ability to predict the next word - very poorly though .
  
- Large Language models (LLM) are trained on huge amounts of data and are able to predict the next word with a high degree of accuracy and many other capabilities.
- LLM can have different architectures like RNN, LSTM, GRU, Transformer, etc.
  
- **Transformers** are the most recent/popular architecture for LLMs. 
- Companies that offer LLMs are Google, Facebook, Microsoft, OpenAI, HuggingFace etc. The examples are GPT4, BERT, T5, Llama etc.
- LLMs ( even some LMs) are the basis of many NLP tasks like 
    - text generation, 
    - summarization, 
    - question answering, 
    - sentiment analysis, etc.

## Hugging Face
- https://huggingface.co/models
- is a repository of pretrained models for NLP and Image tasks
- has a python library for using these models
- has a python library for training your own models
- has a python library for fine-tuning pretrained models
- It hosts models from Google, Facebook, Microsoft and others

### Transformers on Hugging Face - Google's FLAN as an example

**FLAN** stands for **Fine-tuned LAnguage Net** It is a language model developed by Google that uses instruction fine-tuning to train the model on a large set of varied instructions that use a simple and intuitive description of the task. The instruction tuning phase of FLAN only takes a small number of updates compared to the large amount of computation involved in pre-training the model.

[Introducing FLAN: More generalizable Language Models with Instruction Tuning, Google AI Blog, 2021](https://ai.googleblog.com/2021/10/introducing-flan-more-generalizable.html)



In [6]:
from transformers import T5Tokenizer, T5ForConditionalGeneration
'''
GitHub Copilot: 
These classes are part of the Hugging Face Transformers library, 
which is a popular open-source library for natural language processing (NLP) tasks 
such as text classification, question answering, and machine translation.

The `T5Tokenizer` class is a tokenizer specifically designed for the T5 model, 
which is a transformer-based language model that can be fine-tuned for a variety of NLP tasks. 

The `T5ForConditionalGeneration` class is a pre-trained T5 model 
that can be used for text generation tasks such as summarization, translation, and text completion.
'''

flan_tokenizer = T5Tokenizer.from_pretrained("google/flan-t5-base")
flan_model = T5ForConditionalGeneration.from_pretrained("google/flan-t5-base")


### Question Answering with FLAN

In [185]:
input_text = f"The quick brown fox jumps over what?"
input_ids = flan_tokenizer(input_text, return_tensors="pt").input_ids #
outputs = flan_model.generate(input_ids, max_length=40, num_beams=15, )# early_stopping=not True)
 
display(
    input_text , 
    # input_ids,
    # outputs , 
    flan_tokenizer.decode(outputs[0]),
    )

'The quick brown fox jumps over what?'

'<pad>teddy bears teddy bears teddy bears teddy bears teddy bears teddy bears te'

### Detour into Embeddings

In [186]:
pd.DataFrame(['']).assign( 
    prompt=input_text,
    ___ = '',
    encoded_prompt=str(flan_tokenizer(input_text, return_tensors="pt").input_ids), 
    decode_of_the_encoded_prompt=flan_tokenizer.decode(input_ids[0]),
    __ = '',
    encoded_outputs=str(outputs),
    # encoded_output0=str(outputs[0]),
    decoded_outputs=flan_tokenizer.decode(outputs[0])
).T.iloc[1:].style.set_properties(**{'text-align': 'left'})

Unnamed: 0,0
prompt,The quick brown fox jumps over what?
___,
encoded_prompt,"tensor([[ 37, 1704, 4216, 3, 20400, 4418, 7, 147, 125, 58,  1]])"
decode_of_the_encoded_prompt,The quick brown fox jumps over what?
__,
encoded_outputs,"tensor([[ 0, 3, 17, 15, 8155, 4595, 7, 3, 17, 15, 8155, 4595,  7, 3, 17, 15, 8155, 4595, 7, 3, 17, 15, 8155, 4595,  7, 3, 17, 15, 8155, 4595, 7, 3, 17, 15, 8155, 4595,  7, 3, 17, 15]])"
decoded_outputs,teddy bears teddy bears teddy bears teddy bears teddy bears teddy bears te


### Playing with max_length

In [187]:
pd.DataFrame([ ( l, flan_tokenizer.decode(
    (enc:=flan_model.generate (
        flan_tokenizer(
            cap:="Complete this sentence. the quick brown fox jumps over the  ",
            return_tensors="pt"
        ).input_ids,
        max_length=l,
    ))[0]
), (enc[0].numpy().tolist() )) for l in [ 2, 5, 10, 40 ]
], columns=['length','output_text','output_encodings']).style.set_properties(
    **{'text-align': 'left'}
).set_caption(
    f'<h3>flan_model.generate() output for different lengths</h3>'\
        f'<br> for <b>{cap}</b>'
)



Unnamed: 0,length,output_text,output_encodings
0,2,,"[0, 3]"
1,5,cliffs and,"[0, 3, 12591, 7, 11]"
2,10,cliffs and jumps over the,"[0, 3, 12591, 7, 11, 4418, 7, 147, 8, 3]"
3,40,cliffs and jumps over the cliffs.,"[0, 3, 12591, 7, 11, 4418, 7, 147, 8, 3, 12591, 7, 5, 1]"


### Explore QA, Sentiment Analysis, Text Generation, etc. with FLAN

In [188]:
pd.DataFrame([(prompt, '---' if prompt[0] == '#' else flan_tokenizer.decode(
    flan_model.generate( flan_tokenizer( prompt, return_tensors="pt").input_ids, 
                        max_length=200,
                        )[0]
)) for prompt in [
    ## QA
    '## QA ----------------------',
    "The capital of Ghana is ",
    "What is the capital of Ghana?",
    "Complete this sentence. the quick brown fox jumps over the  ",
    "Complete the following sentence. the quick brown fox jumps over the  ",
    
    ## Sentiment Analysis  
    '## Sentiment Analysis ---------',
    "What is sentiment of the following text? I enjoy friendly chats with my friends",
    "What is sentiment of the following text? You are tough",
    "What is sentiment of the following text? You are tough to work with  ",
    "What is sentiment of the following text? You are a tough and strong person",
    
    ## Translation
    '## Translation -----------------',
    "Translate the following text to French: Ride the bicycle carefully", 
    "Translate the following text to Tamil: I am going to the market",

    ## Math
    '## Math -----------------------',
    "What is the sum of 2 and 3?",
    "What is the sum of 2 and 3? What is the sum of 4 and 5?",
    ]
]).style.set_properties(**{'text-align': 'left' }).set_caption("Flan - QA, Sentiment Analysis, Translation")


Unnamed: 0,0,1
0,## QA ----------------------,---
1,The capital of Ghana is,Ghana
2,What is the capital of Ghana?,kuala lumpur
3,Complete this sentence. the quick brown fox jumps over the,cliffs and jumps over the cliffs.
4,Complete the following sentence. the quick brown fox jumps over the,cliffs.
5,## Sentiment Analysis ---------,---
6,What is sentiment of the following text? I enjoy friendly chats with my friends,positive
7,What is sentiment of the following text? You are tough,positive
8,What is sentiment of the following text? You are tough to work with,negative
9,What is sentiment of the following text? You are a tough and strong person,positive


In [189]:
pd.DataFrame([(prompt, '---' if prompt[0] == '#' else flan_tokenizer.decode(
    flan_model.generate( flan_tokenizer( prompt, return_tensors="pt").input_ids, 
                        max_length=200,
                        )[0]
)) for prompt in [
    # Text Generation
    '## Text Generation -----------------------',
    "A short story in 10 words or less",
    "Tell me short story in less than twenty words about a crow",
    "Compose a letter to your friend about your vacation",
    "Compose a twenty word letter to your friend about your vacation",
    "Compose a letter to your friend about your vacation in the mountains",
    ]
]).style.set_properties(**{'text-align': 'left' })


Unnamed: 0,0,1
0,## Text Generation -----------------------,---
1,A short story in 10 words or less,A young man is attempting to find his way home from a party. He is unable to find his way home because of a traffic accident. He is unable to find his way home because of a traffic accident. He is unable to find his way home because of a traffic accident. He is unable to find his way home because of a traffic accident. He is unable to find his way home because of a traffic accident. He is unable to find his way home because of a traffic accident. He is unable to find his way home because of a traffic accident. He is unable to find his way home because of a traffic accident. He is unable to find his way home because of a traffic accident. He is unable to find his way home because of a traffic accident. He is unable to find his way home because of a traffic accident. He is unable to find his
2,Tell me short story in less than twenty words about a crow,A crow is a crow that lives in a crow nest. It is a crow that lives in a crow nest. It is a crow that lives in a crow nest. It is a crow that lives in a crow nest. It is a crow that lives in a crow nest. It is a crow that lives in a crow nest.
3,Compose a letter to your friend about your vacation,Greetings from the East Coast. I'm going to the beach. I'm going to the beach. I'm going to the beach. I'm going to the beach. I'm going to the beach. I'm going to the beach. I'm going to the beach. I'm going to the beach. I'm going to the beach. I'm going to the beach. I'm going to the beach. I'm going to the beach. I'm going to the beach. I'm going to the beach. I'm going to the beach. I'm going to the beach. I'm going to the beach. I'm going to the beach. I'm going to the beach. I'm going to the beach. I'm going to the beach. I'm going to the beach. I'm going to the beach. I'm going to the beach
4,Compose a twenty word letter to your friend about your vacation,i'm going to the beach
5,Compose a letter to your friend about your vacation in the mountains,Greetings from the mountains. I'm glad you enjoyed your vacation. I'm looking forward to your next one.


`Model are sensitive to prompts.`

### Fun with translations

In [44]:
langs = [ "german", "french", "spanish", "italian", "portugese", "russian", 
         "tamil", "hindi", "bengali", "japanese", "chinese", "korean", "hebrew",
         "தமிழ்", "हिन्दी", "বাংলা", "日本語", "中文", "한국어", "עברית"]
acc=[]
for lang in langs:
    input_text = f"translate English to {lang}: How old are you my dear friend?"
    input_ids = flan_tokenizer(input_text, return_tensors="pt").input_ids
    outputs = flan_model.generate(input_ids, max_length=40, )# (num_beams=4, early_stopping=not True)
    acc.append( [ lang, flan_tokenizer.decode(outputs[0]), len(outputs)] )

pd.DataFrame(acc, columns=["lang", "translation", "num_outputs"]).style.set_properties(**{'text-align': 'left' })


Unnamed: 0,lang,translation,num_outputs
0,german,Wie er er ich meine lieben Freund?,1
1,french,"mesure tu tu ami, ami?",1
2,spanish,Cuánto a o es t a mi amiga amigo?,1
3,italian,Mio amiamo a tutti i miei amiamo?,1
4,portugese,Quanto ano teve o meu amigo?,1
5,russian,а асти а а адному артнера?,1
6,tamil,?,1
7,hindi,?,1
8,bengali,?,1
9,japanese,,1


## OpenAI 
- has models to generate text, answer questions, summarize text, translate text, and more.
- has a playground where you can try out the API.
- has a Python package that you can install with pip.
- is paid, but you can get 12,500 tokens for free for initial testing 
  - that's 125,000 words or 625,000 characters 
  - to be used within 30 days.
- model names include davinci, curie, babbage, ada, and others. 
- You can see the full list on the OpenAI playground. 
  - https://beta.openai.com/playground and 
  - [OpenAI Models](https://platform.openai.com/docs/models/overview)


In [190]:
import openai

def predict_next_word(prompt, max_tokens=5):
    response = openai.Completion.create(    # <=== This is the OpenAI API call
        engine="text-davinci-003",
        prompt=prompt,
        temperature=1,
        max_tokens=max_tokens,
        # top_p=1,
        # frequency_penalty=0,
        # presence_penalty=0,
        # stop=["\n"]
    )
    display( [c.text for c in response.choices ])
    return (response.choices[0].text)

prompts = [
    "The quick brown fox jumps over the "
    , 'Complete the following sentence. "The quick brown fox jumps over the "  '
    , 'The quick brown fox jumps over the lazy dog. The next sentence is: '
]

pd.DataFrame( 
    { prompt: [ predict_next_word(prompt) ] for prompt in prompts} 
).T.style.set_caption("Predict next word using OpenAI GPT-3 text-davinci-003 . Burning some personal pennies.")


['\nlazy dog.']

['\nlazy dog']

['\n\nHe quickly ran']

Unnamed: 0,0
The quick brown fox jumps over the,lazy dog.
"Complete the following sentence. ""The quick brown fox jumps over the """,lazy dog
The quick brown fox jumps over the lazy dog. The next sentence is:,He quickly ran


## LangChain - an AI orchestrator
-  is free and open source.
-  is a wrapper around OpenAI and other model providers 
   -  that **supposedly** makes it easy to use OpenAI/others from Python.
-  is a Python package that you can install with pip.
- [LangChain Quickstart](https://python.langchain.com/docs/get_started/quickstart)

### LangChain Quickstart with OpenAI

In [191]:
# Run basic query with OpenAI wrapper
from langchain.llms import OpenAI, Replicate
oai_dv3_llm = OpenAI(model_name="text-davinci-003")
# rep_llama2_llm = Replicate(model_name="llama2")
fox_prompt = "The quick brown fox jumps over the "
fox_response =  oai_dv3_llm( fox_prompt, temperature=0.1)

display(fox_prompt, fox_response)

'The quick brown fox jumps over the '

'\n\nlazy dog'

#### Hacks to memoize the API calls to OpenAI

In [192]:
from functools import cache

@cache
def get_llm(model_name): 
    print(f"Loading {model_name}")
    return OpenAI(model_name=model_name)

@cache
def llm(model_name, prompt, **kwargs):
    print(f"model_name={model_name}, prompt={prompt}") 
    return get_llm(model_name)(prompt, **kwargs)



### LangChain QA with OpenAI

In [194]:
mol_prompt = "What is the meaning of life?"
dv3_model_name = "text-davinci-003"

# for prompt in [mol_prompt, fox_prompt] :
for prompt in ['Do not answer lazy dog. Complete this sentence: This quick brown fox jumps over the ...'] :
    display(pd.DataFrame( [
        [ 
            str(kwargs), 
            llm(dv3_model_name, prompt, **kwargs)   # <=== 
        ]
        for kwargs in [
            {},
            {'max_tokens': 10},
            {'max_tokens': 40, 'temperature': 0.4},
            {'max_tokens': 400, 'temperature': 0.8}
        ]
    ], columns=['llm_params', f'{dv3_model_name} : {prompt} : llm_output']
    ).style.set_properties(**{'text-align': 'left' }))


model_name=text-davinci-003, prompt=Do not answer lazy dog. Complete this sentence: This quick brown fox jumps over the ...
model_name=text-davinci-003, prompt=Do not answer lazy dog. Complete this sentence: This quick brown fox jumps over the ...
model_name=text-davinci-003, prompt=Do not answer lazy dog. Complete this sentence: This quick brown fox jumps over the ...
model_name=text-davinci-003, prompt=Do not answer lazy dog. Complete this sentence: This quick brown fox jumps over the ...


Unnamed: 0,llm_params,text-davinci-003 : Do not answer lazy dog. Complete this sentence: This quick brown fox jumps over the ... : llm_output
0,{},lazy dog
1,{'max_tokens': 10},lazy dog.
2,"{'max_tokens': 40, 'temperature': 0.4}",lazy dog.
3,"{'max_tokens': 400, 'temperature': 0.8}",lazy dog


### LangChain Quickstart with HuggingFace + OpenAI

In [195]:
from langchain.llms import HuggingFaceHub
# repo_id = "google/flan-t5-xl"  # See https://huggingface.co/models?pipeline_tag=text-generation&sort=downloads for some other options
flan_t5_base_repo_id = "google/flan-t5-base"  # See https://huggingface.co/models?pipeline_tag=text-generation&sort=downloads for some other options

hf_llm = HuggingFaceHub(                                 # <===
    repo_id=flan_t5_base_repo_id, 
    model_kwargs={"temperature": .8, "max_length": 64}, client=None
)

langs = [ 'German', 'Italian', 'Marathi', 'Tamil', 'Sanskrit', 'Hindi', 'Kannada']
acc=[]
the_text = "How old are you my dear friend?"             # <===
for lang in langs[:]:
    prompt = f"translate to {lang}: {the_text}"
    acc.append (   [
            lang,
            hf_llm(prompt) ,                             # <===
            llm("text-davinci-003", prompt) ,            # <===
        ]
    )
    
pd.DataFrame( acc, columns=['lang', f'{flan_t5_base_repo_id}', f'oai/{dv3_model_name}']
            ).style.set_properties(**{'text-align': 'left' }
            ).set_caption(f"Translation of '{the_text}'")

model_name=text-davinci-003, prompt=translate to German: How old are you my dear friend?
model_name=text-davinci-003, prompt=translate to Italian: How old are you my dear friend?
model_name=text-davinci-003, prompt=translate to Marathi: How old are you my dear friend?
model_name=text-davinci-003, prompt=translate to Tamil: How old are you my dear friend?
model_name=text-davinci-003, prompt=translate to Sanskrit: How old are you my dear friend?
model_name=text-davinci-003, prompt=translate to Hindi: How old are you my dear friend?
model_name=text-davinci-003, prompt=translate to Kannada: How old are you my dear friend?


Unnamed: 0,lang,google/flan-t5-base,oai/text-davinci-003
0,German,Wie er ich meine lieben Freund lebt?,Wie alt bist du mein lieber Freund?
1,Italian,Mio amiamo a tutti i miei amigli?,Quanti anni hai caro amico mio?
2,Marathi,?,"तू मीरा मित्रा, तुमच्यावर किती वय आहे?"
3,Tamil,?,உங்களுக்கு என் நண்பர் என்றால் என்னை எத்தனை வயது ஆகும்?
4,Sanskrit,?,कित्येक्षु वर्षाः त्वम् मम प्रिय मित्रः ?
5,Hindi,?,तुम मेरे प्रिय दोस्त कितने वर्ष के हो?
6,Kannada,?,"ನಿಮ್ಮ ಪ್ರಿಯ ಸ್ನೇಹಿತರು, ನಿಮ್ಮ ವಯಸ್ಸು ಎಂದುಕೊಂಡು?"


In [123]:
## Langchain - Building  blocks

In [122]:
import langchain
import re
def is_camelcase(s): return re.match(r'^[A-Z].*', s)
pd.concat(
[pd.DataFrame (
    np.array ( [x for x in dir(lcx) if not x.startswith('_') and is_camelcase(x) ] ),
    columns=[str(lcx.__name__).split('.')[-1]]
) for lcx in [ 
    langchain.llms,
    langchain.chat_models, 
    langchain.embeddings, 
    langchain.prompts,
    langchain.vectorstores,
    langchain.document_loaders,
    langchain.text_splitter,
    langchain.chains,
    langchain.utils,
    ]] , axis=1
).fillna('-').style


Unnamed: 0,llms,chat_models,embeddings,prompts,vectorstores,document_loaders,text_splitter,chains,utils
0,AI21,AzureChatOpenAI,AlephAlphaAsymmetricSemanticEmbedding,AIMessagePromptTemplate,AlibabaCloudOpenSearch,AZLyricsLoader,ABC,APIChain,Any
1,AlephAlpha,ChatAnthropic,AlephAlphaSymmetricSemanticEmbedding,BaseChatPromptTemplate,AlibabaCloudOpenSearchSettings,AcreomLoader,AbstractSet,AnalyzeDocumentChain,Callable
2,AmazonAPIGateway,ChatGooglePalm,Any,BasePromptTemplate,AnalyticDB,AirbyteJSONLoader,Any,ChatVectorDBChain,Dict
3,Anthropic,ChatOpenAI,BedrockEmbeddings,ChatMessagePromptTemplate,Annoy,AirtableLoader,BaseDocumentTransformer,ConstitutionalChain,HTTPError
4,Anyscale,ChatVertexAI,CohereEmbeddings,ChatPromptTemplate,AtlasDB,ApifyDatasetLoader,Callable,ConversationChain,List
5,Aviary,FakeListChatModel,DashScopeEmbeddings,FewShotPromptTemplate,AwaDB,ArxivLoader,CharacterTextSplitter,ConversationalRetrievalChain,Optional
6,AzureMLOnlineEndpoint,PromptLayerChatOpenAI,DeepInfraEmbeddings,FewShotPromptWithTemplates,AzureSearch,AzureBlobStorageContainerLoader,Collection,FlareChain,Response
7,AzureOpenAI,-,ElasticsearchEmbeddings,HumanMessagePromptTemplate,Cassandra,AzureBlobStorageFileLoader,Dict,GraphCypherQAChain,Tuple
8,Banana,-,EmbaasEmbeddings,LengthBasedExampleSelector,Chroma,BSHTMLLoader,Document,GraphQAChain,-
9,BaseLLM,-,FakeEmbeddings,MaxMarginalRelevanceExampleSelector,Clarifai,BibtexLoader,Enum,HypotheticalDocumentEmbedder,-


## LangChain Prompts
- Are conviniences for compose prompts - yet to fully grasp the concept

In [53]:
# import schema for chat messages and ChatOpenAI in order to query chatmodels GPT-3.5-turbo or GPT-4

from langchain.schema import (
    AIMessage,  
    HumanMessage, 
    SystemMessage
)

In [160]:
messages = [
    SystemMessage(content="You are an expert in Kalidasa's works."),
    HumanMessage(content="Name a work by Kalidasa and quote a verse in devanagari script."),
]


Unnamed: 0,0,1
0,SystemMessage,You are an expert in Kalidasa's works.
1,HumanMessage,Name a work by Kalidasa and quote a verse in devanagari script.


In [164]:
from langchain.chat_models import ( 
    ChatOpenAI, ChatAnthropic, AzureChatOpenAI, 
    ChatGooglePalm ,ChatVertexAI, FakeListChatModel, 
    PromptLayerChatOpenAI 
)

chatters = {}
okemoji, ngemoji = '✅', '❌'

for  m in ( 
    ChatOpenAI, ChatAnthropic, AzureChatOpenAI, 
    ChatGooglePalm ,ChatVertexAI, FakeListChatModel, 
    # PromptLayerChatOpenAI 
) :
    try :
        chat = m(client=None, temperature=0.3) 
        print(f"{okemoji} Successfully loaded {m}")
        chatters[m.__name__] = chat
    except Exception as e:
        # print(f"\n{ngemoji} Failed to load {m} with {e}\n")
        print(f"\n{ngemoji} Failed to load {m}\n")
        pass

messages = [
    SystemMessage(content="You are an expert in Kalidasa's works."),
    HumanMessage(content="Name a work by Kalidasa and quote a verse in devanagari script."),
]

for name, chat in chatters.items():
    print(f"\n{name} : {chat.__class__}")
    try :
        # print(messages)
        display(pd.DataFrame( [ [str(type(m)).split('.')[-1][:-2],  m.content] for m in messages]).style)
        nn_response=chat(messages)  # <=== this is the call to the chatbot
        print(nn_response.content,end='\n')
        print("----------\n")
        # print(m2)
        m2 = messages + [nn_response] + [
            HumanMessage(content="How about a different verse on river गङ्गा and गोदावरी and कावेरी?")
        ]
        display(pd.DataFrame( [ [str(type(m)).split('.')[-1][:-2],  m.content] for m in m2]).style)
        nn_response2=chat(m2)     # <=== this is the call to the chatbot
        print(nn_response2.content,end='\n')
        print("===========\n")
        
    except Exception as e:
        print(f"\n{ngemoji} Failed to chat with {chat} with {e}\n")
        pass

✅ Successfully loaded <class 'langchain.chat_models.openai.ChatOpenAI'>

❌ Failed to load <class 'langchain.chat_models.anthropic.ChatAnthropic'>


❌ Failed to load <class 'langchain.chat_models.azure_openai.AzureChatOpenAI'>


❌ Failed to load <class 'langchain.chat_models.google_palm.ChatGooglePalm'>


❌ Failed to load <class 'langchain.chat_models.vertexai.ChatVertexAI'>


❌ Failed to load <class 'langchain.chat_models.fake.FakeListChatModel'>


ChatOpenAI : <class 'langchain.chat_models.openai.ChatOpenAI'>


Unnamed: 0,0,1
0,SystemMessage,You are an expert in Kalidasa's works.
1,HumanMessage,Name a work by Kalidasa and quote a verse in devanagari script.


One of Kalidasa's most famous works is "Meghaduta" (The Cloud Messenger). Here is a verse from Meghaduta in Devanagari script:

अशोकवनिका नगरजनपदे नगरजनपदे
विद्युद्दलन्ति विद्युद्दलितदिवाकरस्य दिवाकरस्य।
स्वर्णवर्णवद्युत्पुलकविभवः स्वर्णवर्णवद्युत्पुलकविभवः
विद्योतमानः स्वगतविधिर्विदितः स्वगतविधिर्विदितः॥

Translation:
In the city, where the grove of Ashoka trees is,
The sun, though hidden by clouds, sends forth its rays.
Its golden hue, like the glow of a newly risen sun,
Reveals its presence, known to those who greet it.

Note: Devanagari script may not be supported on all devices.
----------



Unnamed: 0,0,1
0,SystemMessage,You are an expert in Kalidasa's works.
1,HumanMessage,Name a work by Kalidasa and quote a verse in devanagari script.
2,AIMessage,"One of Kalidasa's most famous works is ""Meghaduta"" (The Cloud Messenger). Here is a verse from Meghaduta in Devanagari script: अशोकवनिका नगरजनपदे नगरजनपदे विद्युद्दलन्ति विद्युद्दलितदिवाकरस्य दिवाकरस्य। स्वर्णवर्णवद्युत्पुलकविभवः स्वर्णवर्णवद्युत्पुलकविभवः विद्योतमानः स्वगतविधिर्विदितः स्वगतविधिर्विदितः॥ Translation: In the city, where the grove of Ashoka trees is, The sun, though hidden by clouds, sends forth its rays. Its golden hue, like the glow of a newly risen sun, Reveals its presence, known to those who greet it. Note: Devanagari script may not be supported on all devices."
3,HumanMessage,How about a different verse on river गङ्गा and गोदावरी and कावेरी?


Certainly! Here is a verse from Kalidasa's "Raghuvamsha" that mentions the rivers Ganga, Godavari, and Kaveri in Devanagari script:

गङ्गायां गोदावर्यां च कावेर्यां च महानदीम्।
तीर्थानि विश्रामयन्ति ये तुल्यानि मनोहराणि॥

Translation:
The holy rivers Ganga, Godavari, and Kaveri,
Where pilgrims find solace, are equally enchanting.

Note: Devanagari script may not be supported on all devices.



In [None]:
# Import prompt and define PromptTemplate

from langchain import PromptTemplate

template = """
You are an expert data scientist with an expertise in building deep learning models. 
Explain the concept of {concept} in a couple of lines
"""

prompt = PromptTemplate(
    input_variables=["concept"],
    template=template,
)

In [None]:
# Run LLM with PromptTemplate

pd.DataFrame([ (
        _llm, 
        _llm(prompt.format(concept="autoencoder"))
    ) for _llm in [oai_dv3_llm, hf_llm] ]
).style.set_properties(**{'text-align': 'left' })

In [None]:
# Import LLMChain and define chain with language model and prompt as arguments.

from langchain.chains import LLMChain
chain = LLMChain(llm=hf_llm, prompt=prompt)

# Run the chain only specifying the input variable.
print(chain.run("autoencoder"))

In [None]:
# Define a second prompt 

second_prompt = PromptTemplate(
    input_variables=["ml_concept"],
    template="Turn the concept description of {ml_concept} and explain it to me like I'm five in 500 words",
)
chain_two = LLMChain(llm=oai_llm, prompt=second_prompt)

In [None]:
# Define a sequential chain using the two chains above: the second chain takes the output of the first chain as input

from langchain.chains import SimpleSequentialChain
overall_chain = SimpleSequentialChain(chains=[chain, chain_two], verbose=True)

# Run the chain specifying only the input variable for the first chain.
explanation = overall_chain.run("autoencoder")
print(explanation)

In [None]:
# Import utility for splitting up texts and split up the explanation given above into document chunks

from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size = 100,
    chunk_overlap  = 0,
)

texts = text_splitter.create_documents([explanation])


In [None]:
# Individual text chunks can be accessed with "page_content"

texts[0].page_content

In [None]:
# Import and instantiate OpenAI embeddings

from langchain.embeddings import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(model_name="ada")

In [None]:
# Turn the first text chunk into a vector with the embedding

query_result = embeddings.embed_query(texts[0].page_content)
print(query_result)

In [None]:
# Import and initialize Pinecone client

import os
import pinecone
from langchain.vectorstores import Pinecone


pinecone.init(
    api_key=os.getenv('PINECONE_API_KEY'),  
    environment=os.getenv('PINECONE_ENV')  
)

In [None]:
# Upload vectors to Pinecone

index_name = "langchain-quickstart"
search = Pinecone.from_documents(texts, embeddings, index_name=index_name)

In [None]:
# Do a simple vector similarity search

query = "What is magical about an autoencoder?"
result = search.similarity_search(query)

print(result)

In [None]:
# Import Python REPL tool and instantiate Python agent

from langchain.agents.agent_toolkits import create_python_agent
from langchain.tools.python.tool import PythonREPLTool
from langchain.python import PythonREPL
from langchain.llms.openai import OpenAI

agent_executor = create_python_agent(
    llm=OpenAI(temperature=0, max_tokens=1000),
    tool=PythonREPLTool(),
    verbose=True
)

In [None]:
# Execute the Python agent

agent_executor.run("Find the roots (zeros) if the quadratic function 3 * x**2 + 2*x -1")

In [None]:
import torch
import torch.nn as nn

# Define the vocabulary size and embedding dimension
vocab_size = 5000
embedding_dim = 10

# Define the embedding layer
embedding = nn.Embedding(vocab_size, embedding_dim)

# Define the input text in Devanagari script
input_text = "यह" # एक उदाहरण है"

# Convert the input text to a tensor of indices
input_indices = torch.tensor([ord(c) for c in input_text])
print(input_indices.shape, input_indices.min(), input_indices.max())

# Embed the input text using the embedding layer
embedded_text = embedding(input_indices)

print(embedded_text)


In [14]:
from transformers import AutoTokenizer
import transformers
import torch
import tensorflow as tf

model = "meta-llama/Llama-2-7b-chat-hf"
# model = "google/flan-t5-base"

tokenizer = AutoTokenizer.from_pretrained(model)
pipeline = transformers.pipeline(
    "text-generation",
    model=model,
    # torch_dtype=torch.float16,
    torch_dtype=torch.float32,
    device_map="auto",
)

sequences = pipeline(
    'I liked "Breaking Bad" and "Band of Brothers". Do you have any recommendations of other shows I might like?\n',
    do_sample=True,
    top_k=10,
    num_return_sequences=1,
    eos_token_id=tokenizer.eos_token_id,
    max_length=200,
)
for seq in sequences:
    print(f"Result: {seq['generated_text']}")


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

ValueError: Could not load model meta-llama/Llama-2-7b-chat-hf with any of the following classes: (<class 'transformers.models.auto.modeling_auto.AutoModelForCausalLM'>, <class 'transformers.models.auto.modeling_tf_auto.TFAutoModelForCausalLM'>, <class 'transformers.models.llama.modeling_llama.LlamaForCausalLM'>).