# Hugging Face LLM利用演習

Hugging Faceで提供されている大規模言語モデル（LLM）の利用方法について記載されたノートブックです。

## 目次
1. 環境構築
2. Transformersライブラリの基本
3. テキスト生成モデルの利用
4. 多言語モデルの利用
5. ファインチューニング
6. Hugging Face Hubとの連携

## 1. 環境構築

まずは必要なライブラリをインストールします。

In [11]:
# 必要なライブラリのインストール
!pip install transformers datasets evaluate sentencepiece scikit-learn

Collecting scikit-learn
  Using cached scikit_learn-1.6.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (18 kB)
Collecting scipy>=1.6.0 (from scikit-learn)
  Downloading scipy-1.15.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (61 kB)
Collecting joblib>=1.2.0 (from scikit-learn)
  Using cached joblib-1.5.0-py3-none-any.whl.metadata (5.6 kB)
Collecting threadpoolctl>=3.1.0 (from scikit-learn)
  Using cached threadpoolctl-3.6.0-py3-none-any.whl.metadata (13 kB)
Using cached scikit_learn-1.6.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (13.5 MB)
Using cached joblib-1.5.0-py3-none-any.whl (307 kB)
Downloading scipy-1.15.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (37.7 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m37.7/37.7 MB[0m [31m10.3 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
[?25hUsing cached threadpoolctl-3.6.0-py3-none-any.whl (18 kB)
Installing collected packages: threadpoolc

In [2]:
# PyTorchをインストール
!pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu

Looking in indexes: https://download.pytorch.org/whl/cpu
Collecting torch
  Downloading https://download.pytorch.org/whl/cpu/torch-2.7.0%2Bcpu-cp310-cp310-manylinux_2_28_x86_64.whl.metadata (27 kB)
Collecting torchvision
  Downloading https://download.pytorch.org/whl/cpu/torchvision-0.22.0%2Bcpu-cp310-cp310-manylinux_2_28_x86_64.whl.metadata (6.1 kB)
Collecting torchaudio
  Downloading https://download.pytorch.org/whl/cpu/torchaudio-2.7.0%2Bcpu-cp310-cp310-manylinux_2_28_x86_64.whl.metadata (6.6 kB)
Collecting sympy>=1.13.3 (from torch)
  Downloading https://download.pytorch.org/whl/sympy-1.13.3-py3-none-any.whl.metadata (12 kB)
Collecting networkx (from torch)
  Downloading https://download.pytorch.org/whl/networkx-3.3-py3-none-any.whl.metadata (5.1 kB)
Collecting jinja2 (from torch)
  Downloading https://download.pytorch.org/whl/Jinja2-3.1.4-py3-none-any.whl.metadata (2.6 kB)
Collecting pillow!=8.3.*,>=5.3.0 (from torchvision)
  Downloading https://download.pytorch.org/whl/pillow-11.

In [3]:
# オプション: 進捗バーの表示のためにtqdmをインストール
!pip install tqdm



インストールしたライブラリのバージョンを確認します。

In [4]:
import transformers
import torch
import datasets
import os

# GPUを無効化する
os.environ["CUDA_VISIBLE_DEVICES"] = ""
torch.cuda.is_available = lambda: False

print(f"Transformers version: {transformers.__version__}")
print(f"PyTorch version: {torch.__version__}")
print(f"Datasets version: {datasets.__version__}")
print(f"GPU available: {torch.cuda.is_available()}")
print(f"Using device: {'cuda' if torch.cuda.is_available() else 'cpu'}")

  from .autonotebook import tqdm as notebook_tqdm


Transformers version: 4.51.3
PyTorch version: 2.7.0+cpu
Datasets version: 3.6.0
GPU available: False
Using device: cpu


## 2. Transformersライブラリの基本

Hugging Face Transformersライブラリの基本的な使い方を学びます。

In [None]:
from transformers import pipeline

# pipelineの使用例（テキスト分類）
classifier = pipeline("sentiment-analysis", device="cpu")
result = classifier("I love using Hugging Face models!")
print(result)

No model was supplied, defaulted to distilbert/distilbert-base-uncased-finetuned-sst-2-english and revision 714eb0f (https://huggingface.co/distilbert/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
Device set to use cpu


[{'label': 'POSITIVE', 'score': 0.9992625117301941}]


### 基本的なパイプライン

Transformersライブラリには様々なタスク用のパイプラインが用意されています。

In [6]:
# テキスト生成
generator = pipeline("text-generation", model="distilgpt2", device="cpu")
text = generator("Hugging Face is", max_length=30, num_return_sequences=2)
print(text)

# 質問応答
qa = pipeline("question-answering", model="distilbert/distilbert-base-cased-distilled-squad", device="cpu")
context = "Hugging Face was founded in 2016 and is based in New York and Paris."
result = qa(question="Where is Hugging Face based?", context=context)
print(result)

# 要約
summarizer = pipeline("summarization", model="facebook/bart-large-cnn", device="cpu")
long_text = """
Hugging Face is an AI community and platform where researchers, data scientists,
machine learning engineers, and developers can collaborate on machine learning projects.
It provides tools for building, training, and deploying machine learning models.
The company also maintains a popular repository of pre-trained models that can be used
for a wide range of tasks including natural language processing, computer vision, and audio processing.
"""
summary = summarizer(long_text, max_length=50, min_length=10, batch_size=1)  # バッチサイズを小さく
print(summary)

Device set to use cpu
Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'Hugging Face is a new feature in the "Skins of the Web."\n\n\nThe feature allows you to hide the subject in a photo'}, {'generated_text': 'Hugging Face is the first book ever written by Jonathan Wilshere.\n\nThe cover story was given by one of the best minds in London'}]


Device set to use cpu


{'score': 0.9182092547416687, 'start': 49, 'end': 67, 'answer': 'New York and Paris'}


Device set to use cpu


[{'summary_text': 'Hugging Face is an AI community and platform where researchers, data scientists,machine learning engineers, and developers can collaborate on machine learning projects. It provides tools for building, training, and deploying machine learning models.'}]


## 3. テキスト生成モデルの利用

特に大規模言語モデル（LLM）を使ったテキスト生成に焦点を当てます。

In [7]:
from transformers import AutoModelForCausalLM, AutoTokenizer

# 軽量なGPT-2モデルを使用
model_name = "distilgpt2"

# トークナイザーとモデルのロード
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name).to("cpu")

# テキスト生成の詳細なコントロール
prompt = "In a world where AI has become ubiquitous, "
inputs = tokenizer(prompt, return_tensors="pt").to("cpu")

# 生成パラメータを設定
output = model.generate(
    inputs["input_ids"],
    max_length=100,
    num_return_sequences=1,
    temperature=0.7,  # 創造性の制御（高いほど多様な出力）
    top_p=0.9,        # 核サンプリング
    no_repeat_ngram_size=2  # 同じフレーズの繰り返しを防止
)

# デコードして表示
generated_text = tokenizer.decode(output[0], skip_special_tokens=True)
print(generated_text)

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.


In a world where AI has become ubiquitous, vernacular, and the world of AI is becoming increasingly difficult to understand.

The world is now a globalized world, with the rise of the Internet, the proliferation of social media, social networks, etc. The world has been transformed into a digital world. It is a new world with a lot of new technologies, new technology, a whole new way of thinking, more than just a few new ideas. And it is not just the


### より効率的な方法

モデルを実行する際の最適化手法を示します。

In [8]:
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

# 小型の言語モデル「gpt2」を使用
model_name = "gpt2"

try:
    # トークナイザーのロード
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    
    # モデルのロード
    model = AutoModelForCausalLM.from_pretrained(
        model_name, 
        device_map="cpu",
        load_in_8bit=True 
    )
    
    # プロンプトの設定
    prompt = """Write a short story about a robot that learns to feel emotions.
    Robot: """
    
    # 入力のトークン化
    inputs = tokenizer(prompt, return_tensors="pt")
    
    # テキスト生成
    with torch.no_grad():
        output = model.generate(
            inputs["input_ids"],
            max_length=200,  # 短くして処理を高速化
            temperature=0.8,
            top_p=0.95,
            do_sample=True
        )
    
    # 生成されたテキストのデコード
    generated_text = tokenizer.decode(output[0], skip_special_tokens=True)
    print(generated_text)
    
except Exception as e:
    print(f"Error loading or using the model: {e}")
    print("\nTrying with simpler approach...")
    
    # 量子化なしの標準的なアプローチ
    model = AutoModelForCausalLM.from_pretrained(model_name).to("cpu")
    
    prompt = "Write a short story about a robot that learns to feel emotions.\nRobot: "
    inputs = tokenizer(prompt, return_tensors="pt")
    
    with torch.no_grad():
        output = model.generate(
            inputs["input_ids"],
            max_length=150, 
            temperature=0.8,
            top_p=0.95,
            do_sample=True
        )
    
    generated_text = tokenizer.decode(output[0], skip_special_tokens=True)
    print(generated_text)

Error loading or using the model: Using a `device_map` or `tp_plan` requires `accelerate`. You can install it with `pip install accelerate`

Trying with simpler approach...


Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Write a short story about a robot that learns to feel emotions.
Robot:  I'm a robot, my life is full of joy. I'm happy for you, and I love you, and I want to be your friend.
Robot:  I'm a robot, my life is full of joy. I'm happy for you, and I love you, and I want to be your friend.
Robot:  I'm a robot, my life is full of joy. I'm happy for you, and I love you, and I want to be your friend.
Robot:  I'm a robot, my life is full of joy. I'm happy for you, and I


## 4. 多言語モデルの利用

日本語を含む多言語モデルを使用する方法を紹介します。

In [9]:
from transformers import AutoModelForCausalLM, AutoTokenizer

# 多言語対応モデル（日本語含む）- 小型のものを選択
model_name = "cyberagent/open-calm-small"

try:
    # トークナイザーとモデルのロード
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    model = AutoModelForCausalLM.from_pretrained(model_name).to("cpu")
    
    # 日本語のプロンプト
    prompt = "日本の四季について短い文章を書いてください。\n"
    
    # 入力のトークン化
    inputs = tokenizer(prompt, return_tensors="pt")
    
    # テキスト生成
    with torch.no_grad():
        output = model.generate(
            inputs["input_ids"],
            max_length=150,  # 短くして処理を高速化
            temperature=0.7,
            top_p=0.9,
            do_sample=True
        )
    
    # 生成されたテキストのデコード
    generated_text = tokenizer.decode(output[0], skip_special_tokens=True)
    print(generated_text)
    
except Exception as e:
    print(f"Error loading or using the model: {e}")
    print("\nTrying with a different multilingual model...")
    
    # 代替として別の多言語モデルを使用
    from transformers import MBartForConditionalGeneration, MBart50TokenizerFast
    
    alt_model_name = "facebook/mbart-large-50"
    tokenizer = MBart50TokenizerFast.from_pretrained(alt_model_name)
    model = MBartForConditionalGeneration.from_pretrained(alt_model_name).to("cpu")
    
    # 英語から日本語への翻訳例
    tokenizer.src_lang = "en_XX"
    encoded = tokenizer("The four seasons in Japan are beautiful.", return_tensors="pt")
    
    generated_tokens = model.generate(
        **encoded,
        forced_bos_token_id=tokenizer.lang_code_to_id["ja_XX"]
    )
    
    translation = tokenizer.batch_decode(generated_tokens, skip_special_tokens=True)
    print("翻訳結果:", translation)

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.


日本の四季について短い文章を書いてください。

1. 四季

2. 季節

3. 季節感

4. 季節感

5. 季節感

6. 季節感

7. 季節感

8. 季節感

9. 季節感

10. 季節感

11. 季節感

12. 季節感

13. 季節感

14. 季節感

15. 季節感

16. 季節感

17. 季節感

18. 季節感

19. 季節感

20. 季節感

21


## 5. ファインチューニング

既存のモデルを特定のタスクに適応させるためのファインチューニングを紹介します。

In [None]:
from transformers import AutoModelForSequenceClassification, AutoTokenizer
from datasets import load_dataset
import torch
from torch.optim import AdamW
from torch.utils.data import DataLoader
from tqdm import tqdm
import torch.nn.functional as F

# 軽量なモデルを選択
model_name = "distilbert-base-uncased"
dataset_name = "sst2"  # 感情分析データセット

# データセットのロード
dataset = load_dataset(dataset_name)
train_dataset = dataset["train"].select(range(1000))  # 訓練データを1000件に制限
eval_dataset = dataset["validation"].select(range(200))  # 評価データを200件に制限

# トークナイザーとモデルのロード
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name, num_labels=2).to("cpu")

# データの前処理
def tokenize_function(examples):
    return tokenizer(examples["sentence"], padding="max_length", truncation=True, max_length=128)

# データセットをトークン化
train_tokenized = train_dataset.map(tokenize_function, batched=True)
eval_tokenized = eval_dataset.map(tokenize_function, batched=True)

# PyTorchデータセットに変換
train_tokenized.set_format("torch", columns=["input_ids", "attention_mask", "label"])
eval_tokenized.set_format("torch", columns=["input_ids", "attention_mask", "label"])

# データローダーの作成
train_dataloader = DataLoader(train_tokenized, batch_size=8, shuffle=True)
eval_dataloader = DataLoader(eval_tokenized, batch_size=16)

# オプティマイザーの設定
optimizer = AdamW(model.parameters(), lr=5e-5)

# シンプルな訓練ループ
num_epochs = 1
device = torch.device("cpu")

for epoch in range(num_epochs):
    model.train()
    total_loss = 0
    
    # 訓練ループ
    for batch in tqdm(train_dataloader, desc=f"Training Epoch {epoch+1}"):
        # ラベルを取り出す
        labels = batch.pop("label").to(device)
        # 残りの入力をモデルに渡す
        inputs = {k: v.to(device) for k, v in batch.items()}
        
        # モデルの順伝播
        outputs = model(**inputs)
        
        # 損失計算
        loss = F.cross_entropy(outputs.logits, labels)
        total_loss += loss.item()
        
        # 逆伝播と最適化
        loss.backward()
        optimizer.step()
        optimizer.zero_grad()
    
    avg_train_loss = total_loss / len(train_dataloader)
    print(f"Epoch {epoch+1} - Average training loss: {avg_train_loss:.4f}")
    
    # 評価ループ
    model.eval()
    correct = 0
    total = 0
    
    with torch.no_grad():
        for batch in tqdm(eval_dataloader, desc="Evaluating"):
            # ラベルを取り出す
            labels = batch.pop("label").to(device)
            # 残りの入力をモデルに渡す
            inputs = {k: v.to(device) for k, v in batch.items()}
            
            # 予測
            outputs = model(**inputs)
            predictions = torch.argmax(outputs.logits, dim=-1)
            
            # 正解数をカウント
            correct += (predictions == labels).sum().item()
            total += labels.size(0)
    
    accuracy = correct / total
    print(f"Validation Accuracy: {accuracy:.4f}")

# モデルの保存
model.save_pretrained("./finetuned-sentiment-model")
tokenizer.save_pretrained("./finetuned-sentiment-model")

# ファインチューニングしたモデルで予測
test_text = "I really enjoyed this movie, it was fantastic!"
inputs = tokenizer(test_text, return_tensors="pt").to(device)
with torch.no_grad():
    outputs = model(**inputs)
predicted_class = outputs.logits.argmax(-1).item()
print(f"Predicted class: {predicted_class}")  # 1=positive、0=negative

Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Map: 100%|██████████| 200/200 [00:00<00:00, 16168.94 examples/s]
Training Epoch 1: 100%|██████████| 125/125 [00:56<00:00,  2.23it/s]


Epoch 1 - Average training loss: 0.4873


Evaluating: 100%|██████████| 13/13 [00:02<00:00,  5.66it/s]


Validation Accuracy: 0.7700
Predicted class: 1


### ファインチューニング済みモデルの使用

In [21]:
# ファインチューニングしたモデルをロード
fine_tuned_model = AutoModelForSequenceClassification.from_pretrained("./finetuned-sentiment-model")
fine_tuned_tokenizer = AutoTokenizer.from_pretrained("./finetuned-sentiment-model")

# モデルを使って予測
from transformers import pipeline

sentiment_analysis = pipeline("sentiment-analysis", model=fine_tuned_model, tokenizer=fine_tuned_tokenizer, device="cpu")

# 予測の実行
test_sentences = [
    "I really enjoyed this movie. The plot was excellent!",
    "The service at this restaurant was terrible and the food was bland."
]

results = sentiment_analysis(test_sentences)
for sentence, result in zip(test_sentences, results):
    print(f"Sentence: {sentence}")
    print(f"Sentiment: {result['label']}, Score: {result['score']:.4f}\n")

Device set to use cpu


Sentence: I really enjoyed this movie. The plot was excellent!
Sentiment: LABEL_1, Score: 0.8711

Sentence: The service at this restaurant was terrible and the food was bland.
Sentiment: LABEL_0, Score: 0.9589



## 6. Hugging Face Hubとの連携

Hugging Face Hubから既存のモデルを利用する方法を示します。

In [23]:
from transformers import pipeline, AutoModelForCausalLM, AutoTokenizer
import torch

# 1. 感情分析モデル 
try:
    print("感情分析モデルを使用した例：")
    sentiment_analyzer = pipeline("sentiment-analysis", 
                                model="distilbert-base-uncased-finetuned-sst-2-english", 
                                device="cpu")
    
    text = "I really enjoyed this movie!"
    result = sentiment_analyzer(text)
    print(f"テキスト: '{text}'")
    print(f"感情分析結果: {result[0]['label']}, スコア: {result[0]['score']:.4f}")
    print("-" * 50)
except Exception as e:
    print(f"感情分析モデルのロードエラー: {e}")

# 2. テキスト生成モデル 
try:
    print("\nテキスト生成モデルを使用した例：")
    model_name = "distilgpt2"
    
    # モデルとトークナイザーをロード
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    model = AutoModelForCausalLM.from_pretrained(model_name)
    
    # テキスト生成パイプライン
    generator = pipeline('text-generation', model=model, tokenizer=tokenizer, device="cpu")
    
    # テキスト生成
    prompt = "Artificial intelligence will"
    result = generator(prompt, max_length=30, num_return_sequences=1)
    
    print(f"プロンプト: '{prompt}'")
    print(f"生成テキスト: '{result[0]['generated_text']}'")
    print("-" * 50)
except Exception as e:
    print(f"テキスト生成モデルのロードエラー: {e}")

# 3. 穴埋めモデル 
try:
    print("\n穴埋めモデルを使用した例：")
    unmasker = pipeline('fill-mask', model='distilbert-base-uncased', device="cpu")
    
    text = "The goal of artificial intelligence is to [MASK] human tasks."
    results = unmasker(text)
    
    print(f"元のテキスト: '{text}'")
    print("予測された穴埋め:")
    for i, result in enumerate(results[:3], 1):
        print(f"{i}. '{result['sequence']}' (スコア: {result['score']:.4f})")
    print("-" * 50)
except Exception as e:
    print(f"穴埋めモデルのロードエラー: {e}")

# 4. 代替として、すでにダウンロード済みのモデルを使用する方法
print("\nローカルにダウンロード済みのモデルを使用する方法:")
print("""
# ダウンロード済みのモデルを使用
from transformers import AutoModelForSequenceClassification, AutoTokenizer

model_path = "./finetuned-sentiment-model"  # ローカルのパス
tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForSequenceClassification.from_pretrained(model_path)

# モデルを使用
classifier = pipeline("sentiment-analysis", model=model, tokenizer=tokenizer)
result = classifier("I love this product!")
print(result)
""")

感情分析モデルを使用した例：


Device set to use cpu


テキスト: 'I really enjoyed this movie!'
感情分析結果: POSITIVE, スコア: 0.9999
--------------------------------------------------

テキスト生成モデルを使用した例：


Device set to use cpu
Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


プロンプト: 'Artificial intelligence will'
生成テキスト: 'Artificial intelligence will soon become even easier to understand than artificial intelligence. AI can become the next smart person that I am. It will be able to'
--------------------------------------------------

穴埋めモデルを使用した例：


Device set to use cpu


元のテキスト: 'The goal of artificial intelligence is to [MASK] human tasks.'
予測された穴埋め:
1. 'the goal of artificial intelligence is to perform human tasks.' (スコア: 0.3227)
2. 'the goal of artificial intelligence is to accomplish human tasks.' (スコア: 0.1006)
3. 'the goal of artificial intelligence is to solve human tasks.' (スコア: 0.0902)
--------------------------------------------------

ローカルにダウンロード済みのモデルを使用する方法:

# ダウンロード済みのモデルを使用
from transformers import AutoModelForSequenceClassification, AutoTokenizer

model_path = "./finetuned-sentiment-model"  # ローカルのパス
tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForSequenceClassification.from_pretrained(model_path)

# モデルを使用
classifier = pipeline("sentiment-analysis", model=model, tokenizer=tokenizer)
result = classifier("I love this product!")
print(result)



## まとめ

多くのHugging Faceはモデルを簡単に使用できることが分かったかと思います。大きなモデルではなく小さなモデルを選んでも、バッチサイズやシーケンス長を調整することで、限られたリソースでも十分な性能を発揮できます。

さらに学習を深めたい場合は、[Hugging Face Hub](https://huggingface.co/models)で「distil」や「small」などのキーワードで検索して、軽量モデルを探してみることをお勧めします。