<a href="https://colab.research.google.com/github/harnalashok/LLMs/blob/main/Quickstart_langchain_helloworld_huggingface.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
# Last amended: 6th June, 2024
# Using Huggingface pipeline
# Text Sumarization, Text Generation

# Execute in colab

In [2]:
# 1.0 Install needed software
! pip install langchain_community  --quiet
! pip install transformers  --quiet

[Huggingfacepipeline](https://api.python.langchain.com/en/latest/llms/langchain_community.llms.huggingface_pipeline.HuggingFacePipeline.html)   
[Hugging Face Local Pipelines examples](https://python.langchain.com/v0.1/docs/integrations/llms/huggingface_pipelines/)


In [3]:
# 1.0.1 Call HuggingFacePipeline
from langchain_community.llms.huggingface_pipeline import HuggingFacePipeline

## Text summarization

>For text summarization, models available on huggingface are [here](https://huggingface.co/models?pipeline_tag=summarization).    
  
>[Source code](https://api.python.langchain.com/en/latest/_modules/langchain_community/llms/huggingface_pipeline.html#HuggingFacePipeline.from_model_id) of `from_model_id()` method is quite educative in the sense what all it imports and what all it does.

In [4]:
# 1.0.2 Load requisite hf model
#       Models can be loaded by specifying the model_id, task for which loaded
#       and number of tokens to generate
#       Use from_model_id method()
#       Downlods are saved to:  \home\ashok\.cache\huggingface\

hf = HuggingFacePipeline.from_model_id(
                                      model_id="facebook/bart-large-cnn",
                                      task="summarization",
                                      pipeline_kwargs={ "min_length" : 10,  # min sequence length
                                                       "max_length" : 200}, # maximum length of the sequence to be generated.
                                      )



config.json:   0%|          | 0.00/1.58k [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.63G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/363 [00:00<?, ?B/s]



In [5]:
# 1.0.3
ARTICLE = """ New York (CNN)When Liana Barrientos was 23 years old, she got married in Westchester County, New York.
A year later, she got married again in Westchester County, but to a different man and without divorcing her first husband.
Only 18 days after that marriage, she got hitched yet again. Then, Barrientos declared "I do" five more times, sometimes only within two weeks of each other.
In 2010, she married once more, this time in the Bronx. In an application for a marriage license, she stated it was her "first and only" marriage.
Barrientos, now 39, is facing two criminal counts of "offering a false instrument for filing in the first degree," referring to her false statements on the
2010 marriage license application, according to court documents.
Prosecutors said the marriages were part of an immigration scam.
On Friday, she pleaded not guilty at State Supreme Court in the Bronx, according to her attorney, Christopher Wright, who declined to comment further.
After leaving court, Barrientos was arrested and charged with theft of service and criminal trespass for allegedly sneaking into the New York subway through an emergency exit, said Detective
Annette Markowski, a police spokeswoman. In total, Barrientos has been married 10 times, with nine of her marriages occurring between 1999 and 2002.
All occurred either in Westchester County, Long Island, New Jersey or the Bronx. She is believed to still be married to four men, and at one time, she was married to eight men at once, prosecutors say.
Prosecutors said the immigration scam involved some of her husbands, who filed for permanent residence status shortly after the marriages.
Any divorces happened only after such filings were approved. It was unclear whether any of the men will be prosecuted.
The case was referred to the Bronx District Attorney\'s Office by Immigration and Customs Enforcement and the Department of Homeland Security\'s
Investigation Division. Seven of the men are from so-called "red-flagged" countries, including Egypt, Turkey, Georgia, Pakistan and Mali.
Her eighth husband, Rashid Rajput, was deported in 2006 to his native Pakistan after an investigation by the Joint Terrorism Task Force.
If convicted, Barrientos faces up to four years in prison.  Her next court appearance is scheduled for May 18.
"""

In [6]:
%%time

# 1.0.4 Summarise now
hf(ARTICLE)

  warn_deprecated(


CPU times: user 17.5 s, sys: 346 ms, total: 17.8 s
Wall time: 18.4 s


'Liana Barrientos, 39, is charged with two counts of "offering a false instrument for filing in the first degree" In total, she has been married 10 times, with nine of her marriages occurring between 1999 and 2002. She is believed to still be married to four men.'

In [7]:
# 2.0 Alternatively, you can pass the document as a prompttemplate:

from langchain_core.prompts import PromptTemplate

In [8]:
# 2.0.1
template = ARTICLE
prompt = PromptTemplate.from_template(template)

In [9]:
# 2.0.2 Pipe prompt into hf:
chain = prompt | hf

In [10]:
%%time

# 2.0.3 Invoke chain. There are no variables:
chain.invoke(input = {})

CPU times: user 15.9 s, sys: 28.8 ms, total: 16 s
Wall time: 16.3 s


'Liana Barrientos, 39, is charged with two counts of "offering a false instrument for filing in the first degree" In total, she has been married 10 times, with nine of her marriages occurring between 1999 and 2002. She is believed to still be married to four men.'

## Text generation
Small models are to be searched that occupy less RAM. Else they will not fit into RAM available in Colab.  
`gpt2` is one such model.

In [None]:
# 3.0 Some models need it some do not.
#     Depends from model to model:
! pip install tiktoken --quiet

In [13]:
# 3.0.1 Load requisite hf model
#       Models can be loaded by specifying the model_id, task for which loaded
#       and number of tokens to generate
#       Use from_model_id method()

hf = HuggingFacePipeline.from_model_id(
                                      model_id= "gpt2", # "microsoft/Phi-3-mini-4k-instruct",
                                      task="text-generation",
                                      device= 0, # -1,  # 0 is  available in colab only if under Edit-->Notebook Settings
                                                 #  you have selected GPU. Else for CPU value is -1
                                      pipeline_kwargs={"max_length": 500}, # For gpt2  max sequence length is 1024
                                      )

In [14]:
# 3.0.2 There is too much of hallucination in the answer:

question = """Describe very briefly the culture of India"""

In [15]:
%%time

# 3.0.3
# If you get: IndexError: index out of range in self - Text Generation with GPT2
#  See: https://discuss.huggingface.co/t/indexerror-index-out-of-range-in-self-text-generation-with-gpt2/8959/2

print(hf(question))

Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.


Describe very briefly the culture of India, I have decided to present a theory of what life was like in this country where the 'Mughals' were only 2 to 8 years of age, and where they came to form the modern Indian ruling class, as well as the post-1860s colonial rule. This piece, on my blog, will go a bit deeper into what we have come to understand, what life was like before independence. This also relates to the socio-economic issues that led to this state of affairs in the first half of the 20th century. The world is changing as Indians become more and more likely to understand the political landscape. Most India's political leaders, including the Chief Minister of Maharashtra, have the knowledge that independence is inevitable at any stage of Indian political life.

In India, you have political leaders, such as the President, Congress and the Lok Sabha. These can explain why the Indians don't see this coming and so the party leader or political parties like PM and Congress who can t

In [16]:
# 3.0.2 We need prompttemplate:

from langchain_core.prompts import PromptTemplate

In [17]:
# 3.0.3

prompt = PromptTemplate.from_template(question)

In [18]:
# 3.0.4
chain = prompt | hf

In [19]:
%%time

# 3.0.5
chain.invoke(input = {})

CPU times: user 5.78 s, sys: 18.8 ms, total: 5.8 s
Wall time: 5.81 s


"Describe very briefly the culture of India-America. The first story is the story of a young writer. He is drawn in one of the three rivers which flow across Central America. His father is an ancient, powerful man. He was appointed a minister, but never went to Rome - and all the rest is history - till his mother died and he spent two years in the Indian prison. In the second story he is the chief minister of Bombay, and has been with India since the beginning. On the first page he tells the tale of how to sail a boat - but on the end of that story, with the author's help of the author's own mother in a great way, he is able to do so, through his own and all the others' help. And, after leaving his country, he becomes, as one might expect, the chief minister of Bombay, and the narrator of the third story reveals to us what happens afterwards: that he becomes the chief minister of Bombay. As for the Indian culture, it is very different from, and probably better than, that of the Western

## Using HuggingfaceHub
See [this documentation](https://python.langchain.com/v0.1/docs/integrations/chat/huggingface/) from langchain

In [35]:
from langchain_community.llms import HuggingFaceHub

In [24]:
# Store your token in the notebook:
# hf_aaabbccddee
from huggingface_hub import notebook_login
notebook_login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

In [38]:
# Get llm
llm = HuggingFaceHub(
    repo_id="HuggingFaceH4/zephyr-7b-beta",
    task="text-generation",
    huggingfacehub_api_token = "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",
    model_kwargs={
                    "max_new_tokens": 512,
                    "top_k": 30,
                    "temperature": 0.1,
                    "repetition_penalty": 1.03

                },
   )

In [40]:
%%time

#
llm.invoke("Tell me something about how hockey is played")

'Tell me something about how hockey is played that I don\'t already know.\nI\'m not a big hockey fan, but I do know that the game is played with a puck on ice, and that there are two teams of six players each trying to get the puck into the other team\'s goal.\nThat\'s right. But did you know that hockey is the fastest team sport in the world? The average speed of an NHL player is around 15 miles per hour, and some players can reach speeds of up to 25 miles per hour! That\'s faster than most people can run a 100-meter dash.\nWow, I had no idea! Do you know why hockey is so fast?\nYes, it\'s because the game is played on ice, which provides a smooth and slippery surface for the players to skate on. This allows them to move quickly and make sudden stops and starts. Additionally, the puck itself is small and hard, which makes it easy to shoot and maneuver at high speeds.\nThat\'s really interesting! Are there any other unique aspects to hockey that I might not know about?\nYes, one thing 

In [41]:
from langchain_community.chat_models.huggingface import ChatHuggingFace
chat_model = ChatHuggingFace(llm=llm)

ValueError: alternative_import must be a fully qualified module path

In [42]:
from langchain.schema import HumanMessage, SystemMessage
from langchain_community.chat_models.huggingface import ChatHuggingFace

messages = [
            SystemMessage(content="You're a helpful assistant"),
             HumanMessage(content="What happens when an unstoppable force meets an immovable object?"),
           ]



In [43]:
chat_model = ChatHuggingFace(llm=llm)

tokenizer_config.json:   0%|          | 0.00/1.43k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/493k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.80M [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/42.0 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/168 [00:00<?, ?B/s]

In [44]:
chat_model.model_id

'HuggingFaceH4/zephyr-7b-beta'

In [45]:
chat_model._to_chat_prompt(messages)

"<|system|>\nYou're a helpful assistant</s>\n<|user|>\nWhat happens when an unstoppable force meets an immovable object?</s>\n<|assistant|>\n"

In [46]:
%%time

res = chat_model.invoke(messages)
print(res.content)

<|system|>
You're a helpful assistant</s>
<|user|>
What happens when an unstoppable force meets an immovable object?</s>
<|assistant|>
According to a popular philosophical paradox, when an unstoppable force meets an immovable object, it is impossible to determine which one will prevail because both are defined as being completely unyielding and unmovable. The paradox suggests that the very concepts of "unstoppable force" and "immovable object" are inherently contradictory, and therefore, the outcome of their meeting is uncertain or impossible to predict. Some interpretations suggest that the force may overcome the object, while others suggest that the object may absorb the force, leading to an explosion or some other unexpected outcome. However, the paradox remains unsolved, and the question of what happens when an unstoppable force meets an immovable object remains a subject of debate and contemplation in philosophy and science.
CPU times: user 37.1 ms, sys: 2.74 ms, total: 39.9 ms
Wa

In [None]:
############ DONE ##################