# Hugging Face Local Pipelines

Hugging Face models can be run locally through the `HuggingFacePipeline` class.

The [Hugging Face Model Hub](https://huggingface.co/models) hosts over 120k models, 20k datasets, and 50k demo apps (Spaces), all open source and publicly available, in an online platform where people can easily collaborate and build ML together.

These can be called from LangChain either through this local pipeline wrapper or by calling their hosted inference endpoints through the HuggingFaceHub class. For more information on the hosted pipelines, see the [HuggingFaceHub](huggingface_hub.ipynb) notebook.

To use, you should have the ``transformers`` python [package installed](https://pypi.org/project/transformers/).

In [None]:
!pip install transformers > /dev/null

### Load the model

In [None]:
from langchain import HuggingFacePipeline

llm = HuggingFacePipeline.from_model_id(model_id="bigscience/bloom-1b7", task="text-generation", model_kwargs={"temperature":0, "max_length":64})

### Accelerate Model Inference With DeepSpeed

DeepSpeed allows an over 2X inference speedup for a wide variety of models.  To use DeepSpeed, you need to have Cuda installed and install a version of Pytorch that was compiled with the version of Cuda you have installed.  Using docker and nvidia-docker along with official Cuda docker images can help set up the environment.

In [None]:
!pip install deepspeed > /dev/null

#### DeepSpeed Arguments
You can control DeepSpeed with an optional argument called ```deepspeed_args```

This variable takes a dictionary with two key values, ```dtype``` and ```max_tokens```

Valid values for ```dtype``` include ```fp32```, ```fp16``` and ```int8```.  Defaults to ```fp16```

Valid values for ```max_tokens``` is any integer less than the max context window for the model.  Smaller values take less memory.  Defaults to the max size given a model.

Passing any value, including an empty dictionary, will activate DeepSpeed.

In [None]:
from langchain import HuggingFacePipeline
import torch
llm = HuggingFacePipeline.from_model_id(model_id="EleutherAI/gpt-j-6b", task="text-generation", model_kwargs={"torch_dytpe":torch.float16},deepspeed_args={"dtype":"fp16"})

### Integrate the model in an LLMChain

In [None]:
from langchain import PromptTemplate,  LLMChain

template = """Question: {question}

Answer: Let's think step by step."""
prompt = PromptTemplate(template=template, input_variables=["question"])

llm_chain = LLMChain(prompt=prompt, llm=llm)

question = "What is electroencephalography?"

print(llm_chain.run(question))