Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 5 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -93,9 +93,13 @@ python3.10 -m venv qeff_env
source qeff_env/bin/activate
pip install -U pip

# Clone and Install the QEfficient Repo.
# Clone and Install the QEfficient repository from the mainline branch
pip install git+https://github.com/quic/efficient-transformers

# Clone and Install the QEfficient repository from a specific branch, tag or commit by appending @ref
# Release branch (e.g., release/v1.20.0):
pip install "git+https://github.com/quic/efficient-transformers@release/v1.20.0"

# Or build wheel package using the below command.
pip install build wheel
python -m build --wheel --outdir dist
Expand Down
22 changes: 22 additions & 0 deletions docs/source/quick_start.md
Original file line number Diff line number Diff line change
Expand Up @@ -221,4 +221,26 @@ Benchmark the model on Cloud AI 100, run the infer API to print tokens and tok/s
tokenizer = AutoTokenizer.from_pretrained(model_name)
qeff_model.generate(prompts=["My name is"],tokenizer=tokenizer)
```

### Local Model Execution
If the model and tokenizer are already downloaded, we can directly load them from local path.

```python
from QEfficient import QEFFAutoModelForCausalLM
from transformers import AutoTokenizer

# Local path to the downloaded model. You can find downloaded HF models in:
# - Default location: ~/.cache/huggingface/hub/models--{model_name}/snapshots/{snapshot_id}/
local_model_repo = "~/.cache/huggingface/hub/models--gpt2/snapshots/607a30d783dfa663caf39e06633721c8d4cfcd7e"

# Load model from local path
model = QEFFAutoModelForCausalLM.from_pretrained(pretrained_model_name_or_path=local_model_repo)

model.compile(num_cores=16)

# Load tokenizer from the same local path
tokenizer = AutoTokenizer.from_pretrained(pretrained_model_name_or_path=local_model_repo)

model.generate(prompts=["Hi there!!"], tokenizer=tokenizer)
```
End to End demo examples for various models are available in [**notebooks**](https://github.com/quic/efficient-transformers/tree/main/notebooks) directory. Please check them out.
Loading