<a href="https://colab.research.google.com/github/harnalashok/LLMs/blob/main/Set_HF_download_folder_and_use_huggingface_cli_in_colab.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
# Last amended: 8th June, 2024
# Objectives:
#            a. How to download HF models to a specific folder
#            b. How to set environment variable: HF_HOME
#            c. How to use huggingface-cli to download repo
#            d. How to use huggingface-cli to download a specific file from repo
#            e. How to search for gguf file and download it.
#

See this StackOverflow [reference](https://stackoverflow.com/questions/63312859/how-to-change-huggingface-transformers-default-cache-directory)

In [1]:
# 0.0 No need to install transformers
#     Comes preinstalled in colab:

from transformers import AutoTokenizer, AutoModelForMaskedLM

## 1. Specify cache_dir as parameter

In [9]:
# 0.1 Specify download cache folder
#     It will be created, if it does not exist:

cache_dir = "/content/mydir"

In [10]:
# 0.2 Download tokenizer and model
#

tokenizer = AutoTokenizer.from_pretrained("roberta-base",
                                          cache_dir=cache_dir
                                          )

model = AutoModelForMaskedLM.from_pretrained("roberta-base",
                                             cache_dir=cache_dir
                                             )

tokenizer_config.json:   0%|          | 0.00/25.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/481 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/499M [00:00<?, ?B/s]

In [12]:
# 0.3 Else, specify directly:
#

tokenizer = AutoTokenizer.from_pretrained("roberta-base",
                                          cache_dir="new_cache_dir/"
                                          )

model = AutoModelForMaskedLM.from_pretrained("roberta-base",
                                             cache_dir="new_cache_dir/"
                                             )

Setting the environmental variables, does not work in Colab. Neither of the following works:





## 2. Set environment variables
This fails

In [6]:
# 1.0 Set environment variables
!export HF_HOME=/content/hf/misc
!export HF_DATASETS_CACHE=/content/hf/datasets
!export TRANSFORMERS_CACHE=/content/hf/models

In [2]:
# 1.0.1 Set environment variables
import os
os.environ["HF_HOME"] = "/content/hf/misc"
os.environ["HF_DATASETS_CACHE"] = "/content/hf/datasets"
os.environ["TRANSFORMERS_CACHE"] = "/content/hf/models"

In [3]:
# 1.0.2 Create folders:

!mkdir -p /content/hf/misc
!mkdir /content/hf/datasets
!mkdir /content/hf/models

In [27]:
# 1.0.3 Check
print(os.environ["HF_HOME"])

/content/hf/misc


In [7]:
# 1.0.4 But downlaoded models are still saved to:
# /root/.cache/huggingface/hub/

tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")

model = AutoModelForMaskedLM.from_pretrained("bert-base-uncased")

Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForMaskedLM: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight', 'cls.seq_relationship.bias', 'cls.seq_relationship.weight']
- This IS expected if you are initializing BertForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


## 3. Downloading model repo using cli    
For `huggingface-cli` see [this link](https://huggingface.co/docs/huggingface_hub/en/guides/cli)

In [24]:
# 2.0.0 Set environment variables
import os
os.environ["HF_HOME"] = "/content/mymodels/"

Following command downloads **all files** in the huggingface repo

In [25]:
# 2.0.1 Specify model repo name:
#       Will download model files
#       to HF_HOME variable
#       All files in repo are downloaded:

! huggingface-cli download bert-base-uncased

Fetching 16 files:   0% 0/16 [00:00<?, ?it/s]Downloading 'flax_model.msgpack' to '/content/mymodels/hub/models--bert-base-uncased/blobs/ea201fabe466ef7182f1f687fb5be4b62a73d3a78883f11264ff7f682cdb54bf.incomplete'
Downloading 'coreml/fill-mask/float32_model.mlpackage/Data/com.apple.CoreML/model.mlmodel' to '/content/mymodels/hub/models--bert-base-uncased/blobs/bd3e35c1681371542bd98f96b299be1832d89dbf.incomplete'
Downloading 'config.json' to '/content/mymodels/hub/models--bert-base-uncased/blobs/45a2321a7ecfdaaf60a6c1fd7f5463994cc8907d.incomplete'
Downloading 'README.md' to '/content/mymodels/hub/models--bert-base-uncased/blobs/40a2aaca31dd005eb5f6ffad07b5ffed0a31d1f6.incomplete'
Downloading 'coreml/fill-mask/float32_model.mlpackage/Manifest.json' to '/content/mymodels/hub/models--bert-base-uncased/blobs/c1c37cd58b9eb000ddbb7ca90f04b893a33e50c8.incomplete'
Downloading '.gitattributes' to '/content/mymodels/hub/models--bert-base-uncased/blobs/505a7adf8be9e5fdf06aabbfbe9046e6c811f91b.incom

In [26]:
# 2.0.2 Check size of downloaded files:

! du -sh /content/mymodels/   # 3.3g

3.3G	/content/mymodels/


In [27]:
# 2.0.3 Delete the folder:

! rm -r -f /content/mymodels/

## 4. Download a specific GGUF model file

For some libraries, such as `llama-cpp-python`, we need `gguf` files. To search for gguf models, reach huggingface models, at the top, click on `Libraries-->GGUF`, then come to `Tasks`, click, say, `Text Generation`, and then select a model.

In [31]:
# 2.1 Download another repo.
#     No file is specified.
#     Complete repo is downloaded (52gb)

! huggingface-cli download TheBloke/zephyr-7B-beta-GGUF

[1;30;43mStreaming output truncated to the last 5000 lines.[0m

zephyr-7b-beta.Q6_K.gguf:  40% 2.39G/5.94G [02:27<01:00, 59.1MB/s][A[A[A[A[A[A




zephyr-7b-beta.Q8_0.gguf:  27% 2.10G/7.70G [01:42<01:46, 52.7MB/s][A[A[A[A[A
zephyr-7b-beta.Q5_K_S.gguf:  58% 2.88G/5.00G [03:21<00:29, 70.5MB/s][A




zephyr-7b-beta.Q8_0.gguf:  27% 2.11G/7.70G [01:42<01:38, 56.5MB/s][A[A[A[A[A





zephyr-7b-beta.Q6_K.gguf:  40% 2.40G/5.94G [02:28<01:01, 57.3MB/s][A[A[A[A[A[A

zephyr-7b-beta.Q5_K_M.gguf:  57% 2.94G/5.13G [03:28<00:42, 51.5MB/s][A[A
zephyr-7b-beta.Q5_K_S.gguf:  58% 2.89G/5.00G [03:21<00:32, 64.4MB/s][A





zephyr-7b-beta.Q6_K.gguf:  41% 2.41G/5.94G [02:28<01:01, 57.5MB/s][A[A[A[A[A[A




zephyr-7b-beta.Q8_0.gguf:  28% 2.12G/7.70G [01:42<01:39, 55.9MB/s][A[A[A[A[A

zephyr-7b-beta.Q5_K_M.gguf:  57% 2.95G/5.13G [03:28<00:42, 51.8MB/s][A[A
zephyr-7b-beta.Q5_K_S.gguf:  58% 2.90G/5.00G [03:22<00:34, 60.8MB/s][A





zephyr-7b-beta.Q6_K.gguf:  41% 2.42G/5

In [32]:
# 2.1.1 Check size of all downloaded files:

! du -sh /content/mymodels/   # 52G

52G	/content/mymodels/


In [33]:
# 2.1.2 Delete the folder:

! rm -r -f /content/mymodels/

In [34]:
# 2.2 Download only a specific file from this repo
#     to prespeicifed --cache-dir (with HF_HOME) :

! huggingface-cli download TheBloke/zephyr-7B-beta-GGUF zephyr-7b-beta.Q4_K_M.gguf

Downloading 'zephyr-7b-beta.Q4_K_M.gguf' to '/content/mymodels/hub/models--TheBloke--zephyr-7B-beta-GGUF/blobs/503580dce392c6e64669ad21a77023ba2a17baa0c381250fb67c11ba6406a85e.incomplete'
zephyr-7b-beta.Q4_K_M.gguf: 100% 4.37G/4.37G [00:43<00:00, 100MB/s] 
Download complete. Moving file to /content/mymodels/hub/models--TheBloke--zephyr-7B-beta-GGUF/blobs/503580dce392c6e64669ad21a77023ba2a17baa0c381250fb67c11ba6406a85e
/content/mymodels/hub/models--TheBloke--zephyr-7B-beta-GGUF/snapshots/e4714d14e9652aa9658fa937732cceadc63ac42e/zephyr-7b-beta.Q4_K_M.gguf


In [35]:
# 2.2.1 Check size of all downloaded files:

! du -sh /content/mymodels/   # 4.1g

4.1G	/content/mymodels/


**However**, the above method, puts the downloaded `gguf` file very deep inside `--cache-dir` into nested folders. Here is the full path of this `gguf` file:    

`/content/mymodels/hub/models--TheBloke--zephyr-7B-beta-GGUF/snapshots/e4714d14e9652aa9658fa937732cceadc63ac42e/zephyr-7b-beta.Q4_K_M.gguf`     

We will use this file path next.

## Testing the model file with `llama-cpp-python`

In [37]:
# 3.0 Install library:
! pip install llama-cpp-python  --quiet

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m50.2/50.2 MB[0m [31m6.3 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Installing backend dependencies ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m45.5/45.5 kB[0m [31m3.3 MB/s[0m eta [36m0:00:00[0m
[?25h  Building wheel for llama-cpp-python (pyproject.toml) ... [?25l[?25hdone


In [38]:
# 3.1 Import libraries:
import llama_cpp
from llama_cpp import Llama

In [49]:
# 3.2 Get the model:
#     This path is VERY LONG:

modelPath= "/content/mymodels/hub/models--TheBloke--zephyr-7B-beta-GGUF/snapshots/e4714d14e9652aa9658fa937732cceadc63ac42e/zephyr-7b-beta.Q4_K_M.gguf"
model = llama_cpp.Llama(model_path= modelPath)


llama_model_loader: loaded meta data with 21 key-value pairs and 291 tensors from /content/mymodels/hub/models--TheBloke--zephyr-7B-beta-GGUF/snapshots/e4714d14e9652aa9658fa937732cceadc63ac42e/zephyr-7b-beta.Q4_K_M.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.name str              = huggingfaceh4_zephyr-7b-beta
llama_model_loader: - kv   2:                       llama.context_length u32              = 32768
llama_model_loader: - kv   3:                     llama.embedding_length u32              = 4096
llama_model_loader: - kv   4:                          llama.block_count u32              = 32
llama_model_loader: - kv   5:                  llama.feed_forward_length u32              = 14336
llama_model_loader: - kv   6:                 ll

In [None]:
# 3.3 Use model to generate text:

print(model("The quick brown fox jumps ", stop=["."])["choices"][0]["text"])

### Download to a local folder--no nesting
Specify `--local-dir` instead of `--cache-dir`

In [46]:
# 3.4 Instead download the file directly to a local-directory (--local-dir)
#      instead of --cache-dir as earlier done:

! huggingface-cli download TheBloke/zephyr-7B-beta-GGUF zephyr-7b-beta.Q4_K_M.gguf --local-dir /content/zephyrx

/content/zephyrx/zephyr-7b-beta.Q4_K_M.gguf


In [47]:
# 3.5 Use it now:

modelPath= "/content/zephyrx/zephyr-7b-beta.Q4_K_M.gguf"
model = llama_cpp.Llama(model_path= modelPath)


llama_model_loader: loaded meta data with 21 key-value pairs and 291 tensors from /content/zephyrx/zephyr-7b-beta.Q4_K_M.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.name str              = huggingfaceh4_zephyr-7b-beta
llama_model_loader: - kv   2:                       llama.context_length u32              = 32768
llama_model_loader: - kv   3:                     llama.embedding_length u32              = 4096
llama_model_loader: - kv   4:                          llama.block_count u32              = 32
llama_model_loader: - kv   5:                  llama.feed_forward_length u32              = 14336
llama_model_loader: - kv   6:                 llama.rope.dimension_count u32              = 128
llama_model_loader: - kv   7:                 

In [48]:
# 3.5 Predict next few words:

type(model)
print(model("The quick brown fox jumps ", stop=["."])["choices"][0]["text"])


llama_print_timings:        load time =    8348.83 ms
llama_print_timings:      sample time =      19.00 ms /     7 runs   (    2.71 ms per token,   368.48 tokens per second)
llama_print_timings: prompt eval time =    8348.45 ms /     9 tokens (  927.61 ms per token,     1.08 tokens per second)
llama_print_timings:        eval time =    7556.03 ms /     6 runs   ( 1259.34 ms per token,     0.79 tokens per second)
llama_print_timings:       total time =   15941.39 ms /    15 tokens


10 over the lazy dog


## `huggingface-cli` help

In [50]:
! huggingface-cli --help

usage: huggingface-cli <command> [<args>]

positional arguments:
  {env,login,whoami,logout,repo,upload,download,lfs-enable-largefiles,lfs-multipart-upload,scan-cache,delete-cache,tag}
                        huggingface-cli command helpers
    env                 Print information about the environment.
    login               Log in using a token from huggingface.co/settings/tokens
    whoami              Find out which huggingface.co account you are logged in as.
    logout              Log out
    repo                {create} Commands to interact with your huggingface.co repos.
    upload              Upload a file or a folder to a repo on the Hub
    download            Download files from the Hub
    lfs-enable-largefiles
                        Configure your repository to enable upload of files > 5GB.
    scan-cache          Scan cache directory.
    delete-cache        Delete revisions from the cache directory.
    tag                 (create, list, delete) tags for a repo in 

In [51]:
!huggingface-cli download --help

usage: huggingface-cli <command> [<args>] download [-h] [--repo-type {model,dataset,space}]
                                                   [--revision REVISION] [--include [INCLUDE ...]]
                                                   [--exclude [EXCLUDE ...]]
                                                   [--cache-dir CACHE_DIR] [--local-dir LOCAL_DIR]
                                                   [--local-dir-use-symlinks {auto,True,False}]
                                                   [--force-download] [--resume-download]
                                                   [--token TOKEN] [--quiet]
                                                   repo_id [filenames ...]

positional arguments:
  repo_id               ID of the repo to download from (e.g. `username/repo-name`).
  filenames             Files to download (e.g. `config.json`, `data/metadata.jsonl`).

options:
  -h, --help            show this help message and exit
  --repo-type {model,dataset,space

In [None]:
########### DONE ############