# Huggingface With Langchain

Announcement Link: https://huggingface.co/blog/langchain


In [None]:
## Libraries Required
!pip install langchain-huggingface
## For API Calls
!pip install huggingface_hub
!pip install transformers
!pip install accelerate
!pip install  bitsandbytes
!pip install langchain


Collecting langchain-huggingface
  Downloading langchain_huggingface-1.1.0-py3-none-any.whl.metadata (2.8 kB)
Collecting langchain-core<2.0.0,>=1.1.0 (from langchain-huggingface)
  Downloading langchain_core-1.1.0-py3-none-any.whl.metadata (3.6 kB)
Collecting jsonpatch<2.0.0,>=1.33.0 (from langchain-core<2.0.0,>=1.1.0->langchain-huggingface)
  Downloading jsonpatch-1.33-py2.py3-none-any.whl.metadata (3.0 kB)
Collecting langsmith<1.0.0,>=0.3.45 (from langchain-core<2.0.0,>=1.1.0->langchain-huggingface)
  Downloading langsmith-0.4.48-py3-none-any.whl.metadata (14 kB)
Collecting orjson>=3.9.14 (from langsmith<1.0.0,>=0.3.45->langchain-core<2.0.0,>=1.1.0->langchain-huggingface)
  Downloading orjson-3.11.4-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (41 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m41.8/41.8 kB[0m [31m2.7 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting requests-toolbelt>=1.0.0 (from langsmith<1.0.0,>=0.3.45->langchain-core<2.0.

## HuggingFaceEndpoint
## How to Access HuggingFace Models with API
There are also two ways to use this class. You can specify the model with the repo_id parameter. Those endpoints use the serverless API, which is particularly beneficial to people using pro accounts or enterprise hub. Still, regular users can already have access to a fair amount of request by connecting with their HF token in the environment where they are executing the code.


In [None]:
from langchain_huggingface import HuggingFaceEndpoint

In [None]:
import os
os.environ["HUGGINGFACE_API_TOKEN"]=sec_key

## HuggingFacePipeline
Among transformers, the Pipeline is the most versatile tool in the Hugging Face toolbox. LangChain being designed primarily to address RAG and Agent use cases, the scope of the pipeline here is reduced to the following text-centric tasks: “text-generation", “text2text-generation", “summarization”, “translation”.
Models can be loaded directly with the from_model_id method


In [None]:
from langchain_huggingface import HuggingFacePipeline
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline

In [None]:
model_id="gpt2"
model=AutoModelForCausalLM.from_pretrained(model_id)
tokenizer=AutoTokenizer.from_pretrained(model_id)


config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/548M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

In [None]:
pipe=pipeline("text-generation",model=model,tokenizer=tokenizer,max_new_tokens=100)
hf=HuggingFacePipeline(pipeline=pipe)

Device set to use cpu


In [None]:
hf

HuggingFacePipeline(pipeline=<transformers.pipelines.text_generation.TextGenerationPipeline object at 0x78fb57765a90>, model_id='gpt2')

In [None]:
hf.invoke("What is machine learning")

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


'What is machine learning and what does it mean?\n\nMachine learning is an open science, with no rules, no expectations. It is a system of learning and prediction. It has to be a good, predictable, and flexible way to learn.\n\nMachine learning is the study of human behavior. It involves taking a set of data and using it to improve the behavior or behavior of humans. If you get a given data set, you can improve how well you do it. If you get a set of data'

In [None]:
## Use HuggingfacePipelines With Gpu
gpu_llm = HuggingFacePipeline.from_model_id(
    model_id="gpt2",
    task="text-generation",
    device_map="auto",  # replace with device_map="auto" to use the accelerate library.
    pipeline_kwargs={"max_new_tokens": 100},
)

Device set to use cpu


In [None]:
from langchain_core.prompts import PromptTemplate

template = """Question: {question}

Answer: Let's think step by step."""
prompt = PromptTemplate.from_template(template)

In [None]:
chain=prompt|gpu_llm

In [None]:
question="What is artificial intelligence?"
chain.invoke({"question":question})

"Question: What is artificial intelligence?\n\nAnswer: Let's think step by step. How many people do you know who've been in a job interview with an AI engineer, or a computer scientist who's been on a job interview with an AI engineer? You know, I've got a friend who was in a job interview with an AI engineer, and she's been in a job interview with an AI engineer for over 5 years now. And I have a colleague who was on a job interview with an AI engineer, and she's been in a job interview with an AI engineer for"

## Microsoft/Phi-3-mini-4k-instruct

In [None]:
from langchain_huggingface import HuggingFacePipeline

llm = HuggingFacePipeline.from_model_id(
    model_id="microsoft/Phi-3-mini-4k-instruct",
    task="text-generation",
    pipeline_kwargs={
        "max_new_tokens": 100,
        "top_k": 50,
        "temperature": 0.1,
    },
)
llm.invoke("Hugging Face is")


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Device set to use cpu


'Hugging Face is a platform that provides access to a wide range of pre-trained models and tools for natural language processing (NLP) tasks. One of the popular models on Hugging Face is the BERT (Bidirectional Encoder Representations from Transformers) model, which can be used for various NLP tasks such as text classification, sentiment analysis, and question answering.\n\nTo use the BERT model for text classification, we need to follow these steps:\n\n1'

In [None]:
import os
os.environ["HF_TOKEN"]=HF_Key

In [None]:
from langchain_huggingface import HuggingFacePipeline

llm = HuggingFacePipeline.from_model_id(
    model_id="meta-llama/Meta-Llama-3-8B-Instruct",
    task="text-generation",
    pipeline_kwargs={
        "max_new_tokens": 100,
        "top_k": 50,
        "temperature": 0.1,
    },
)
llm.invoke("Hugging Face is")


tokenizer_config.json:   0%|          | 0.00/51.0k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/9.09M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/73.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/654 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/23.9k [00:00<?, ?B/s]

Fetching 4 files:   0%|          | 0/4 [00:00<?, ?it/s]

model-00004-of-00004.safetensors:   0%|          | 0.00/1.17G [00:00<?, ?B/s]

model-00001-of-00004.safetensors:   0%|          | 0.00/4.98G [00:00<?, ?B/s]

model-00003-of-00004.safetensors:   0%|          | 0.00/4.92G [00:00<?, ?B/s]

model-00002-of-00004.safetensors:   0%|          | 0.00/5.00G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]