## HF Log-In

In [1]:
!huggingface-cli login


    _|    _|  _|    _|    _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|_|_|_|    _|_|      _|_|_|  _|_|_|_|
    _|    _|  _|    _|  _|        _|          _|    _|_|    _|  _|            _|        _|    _|  _|        _|
    _|_|_|_|  _|    _|  _|  _|_|  _|  _|_|    _|    _|  _|  _|  _|  _|_|      _|_|_|    _|_|_|_|  _|        _|_|_|
    _|    _|  _|    _|  _|    _|  _|    _|    _|    _|    _|_|  _|    _|      _|        _|    _|  _|        _|
    _|    _|    _|_|      _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|        _|    _|    _|_|_|  _|_|_|_|

    To log in, `huggingface_hub` requires a token generated from https://huggingface.co/settings/tokens .
Enter your token (input will not be visible): 
Add token as git credential? (Y/n) y
Token is valid (permission: fineGrained).
The token `llm_course` has been saved to /root/.cache/huggingface/stored_tokens
[1m[31mCannot authenticate through git-credential as no helper is defined on your machine.
You might have to re-a

In [2]:
import torch
from transformers import AutoTokenizer, AutoModel
from huggingface_hub import notebook_login

# ===============================
# 1. Hugging Face Hub 로그인
# ===============================
# 노트북 환경일 경우 (터미널이면 huggingface-cli login 사용)
# notebook_login()

# ===============================
# 2. 모델 & 토크나이저 로드
# ===============================
checkpoint = "bert-base-cased"
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
model = AutoModel.from_pretrained(checkpoint)

# ===============================
# 3. 입력 문장
# ===============================
sentences = [
    "How are you?",
    "I'm fine, thank you!"
]

# ===============================
# 4. 토크나이징
#    - padding: 문장 길이 통일
#    - truncation: 최대 길이 초과 방지
#    - return_tensors: PyTorch tensor 반환
# ===============================
encoded_inputs = tokenizer(
    sentences,
    padding=True,
    truncation=True,
    return_tensors="pt"
)

print("Entire encoded inputs:")
print(encoded_inputs)
print("=" * 90)

print("Tokenized inputs:")
for k, v in encoded_inputs.items():
    print(f"{k}: {v.shape}")
print("=" * 90)

# ===============================
# 5. 모델 추론
# ===============================
with torch.no_grad():
    outputs = model(**encoded_inputs)

# 각 토큰의 contextual embedding
last_hidden_state = outputs.last_hidden_state
print("\nModel output shape:", last_hidden_state.shape)

# ===============================
# 6. 모델 & 토크나이저 로컬 저장
# ===============================
save_dir = "my_bert_model"

model.save_pretrained(save_dir)
tokenizer.save_pretrained(save_dir)

print(f"\nModel saved to `{save_dir}`")

# ===============================
# 7. 저장된 모델 다시 로드 (확인용)
# ===============================
model = AutoModel.from_pretrained(save_dir)
tokenizer = AutoTokenizer.from_pretrained(save_dir)

print("Model reloaded successfully")

# ===============================
# 8. Hugging Face Hub에 업로드
# ===============================
# 아래 줄을 활성화하면 Hub에 업로드됩니다
# model.push_to_hub("kims-awesome-model")
# tokenizer.push_to_hub("kims-awesome-model")

print("\nReady to push model to Hugging Face Hub!")


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/49.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/570 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/213k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/436k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/436M [00:00<?, ?B/s]

Entire encoded inputs:
{'input_ids': tensor([[ 101, 1731, 1132, 1128,  136,  102,    0,    0,    0,    0],
        [ 101,  146,  112,  182, 2503,  117, 6243, 1128,  106,  102]]), 'token_type_ids': tensor([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0, 0, 0, 0, 0]]), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 0, 0, 0, 0],
        [1, 1, 1, 1, 1, 1, 1, 1, 1, 1]])}
Tokenized inputs:
input_ids: torch.Size([2, 10])
token_type_ids: torch.Size([2, 10])
attention_mask: torch.Size([2, 10])

Model output shape: torch.Size([2, 10, 768])

Model saved to `my_bert_model`
Model reloaded successfully

Ready to push model to Hugging Face Hub!


## Push To Hub

In [3]:
model.push_to_hub("kims-awesome-model")
tokenizer.push_to_hub("kims-awesome-model")

Processing Files (0 / 0)      : |          |  0.00B /  0.00B            

New Data Upload               : |          |  0.00B /  0.00B            

  ...c6mscl_/model.safetensors:   6%|5         | 25.1MB /  433MB            

README.md: 0.00B [00:00, ?B/s]

CommitInfo(commit_url='https://huggingface.co/kim586w/kims-awesome-model/commit/b12598202e8196c6911a250198aeb84321de1481', commit_message='Upload tokenizer', commit_description='', oid='b12598202e8196c6911a250198aeb84321de1481', pr_url=None, repo_url=RepoUrl('https://huggingface.co/kim586w/kims-awesome-model', endpoint='https://huggingface.co', repo_type='model', repo_id='kim586w/kims-awesome-model'), pr_revision=None, pr_num=None)