# 09wk-2: `model`의 입력파악, `model`의 사용연습

최규빈  
2024-11-09

<a href="https://colab.research.google.com/github/guebin/MP2024/blob/main/posts/09wk-2.ipynb"><img src="https://colab.research.google.com/assets/colab-badge.svg" style="text-align: left"></a>

# 1. 강의영상

<https://youtu.be/playlist?list=PLQqh36zP38-xsRgsXpLEUNWClxLi0N9Mk&si=Lssxzw70RRT1adiJ>

# 2. Imports

In [2]:
import transformers
import datasets
import huggingface_hub
import torch
import torchvision
import pytorchvideo.data
import PIL
import tarfile
import mp2024pkg as mp

  from .autonotebook import tqdm as notebook_tqdm

# 3. Model 입력파악

`-` 아래중 하나의 방법순으로..

-   방법1: `model.forward?` 에서 시그니처를 확인
-   방법2: `model.forward?` 에서 사용예제를 확인
-   방법3: 인터넷을 활용한 외부 자료 확인 (공식문서, 공식튜토리얼,
    신뢰할만한 블로그, ChatGPT등)
-   방법4: `model.forward??` 를 보고 모든 코드를 뜯어봄 \<— 하지마세요

`-` 모델의 입력이 어떤형태로 정리되어야 하는지 알아내는 확실한 방법은
없음

-   방법1,2,3 은 다른사람의 호의에 기대해야함.
-   방법4는 사실상 불가능

`# 예제1` – 텍스트분류

In [3]:
model1 = transformers.AutoModelForSequenceClassification.from_pretrained(
    "distilbert/distilbert-base-uncased", num_labels=2
)

Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert/distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.

*모델의 기본정보(config)*

In [4]:
model1.config

-   max_position_embeddings: 512

*모델의 입력파악*

In [13]:
model1.forward? 

Signature:
model1.forward(
    input_ids: Optional[torch.Tensor] = None,
    attention_mask: Optional[torch.Tensor] = None,
    head_mask: Optional[torch.Tensor] = None,
    inputs_embeds: Optional[torch.Tensor] = None,
    labels: Optional[torch.LongTensor] = None,
    output_attentions: Optional[bool] = None,
    output_hidden_states: Optional[bool] = None,
    return_dict: Optional[bool] = None,
) -> Union[transformers.modeling_outputs.SequenceClassifierOutput, Tuple[torch.Tensor, ...]]
Docstring:
The [`DistilBertForSequenceClassification`] forward method, overrides the `__call__` special method.

<Tip>

Although the recipe for forward pass needs to be defined within this function, one should call the [`Module`]
instance afterwards instead of this since the former takes care of running the pre and post processing steps while
the latter silently ignores them.

</Tip>

Args:
    input_ids (`torch.LongTensor` of shape `(batch_size, sequence_length)`):
        Indices of input sequence 

*사용예시1 – 입력나열, loss O*

In [36]:
model1(
    input_ids = torch.tensor([[1,2,3,4], [2,3,4,5]]),
    labels = torch.tensor([0,0])
)

*사용예시2 – `**딕셔너리`, loss O*

In [47]:
model1_input = dict(
    input_ids = torch.tensor([[1,2,3,4], [2,3,4,5]]),
    labels = torch.tensor([0,0])
)
model1(**model1_input)

*사용예시3 – 입력나열, loss X*

In [49]:
model1(
    input_ids = torch.tensor([[1,2,3,4], [2,3,4,5]])
)

*사용예시4 – `**딕셔너리`, loss X*

In [51]:
model1_input = dict(
    input_ids = torch.tensor([[1,2,3,4], [2,3,4,5]])
)
model1(**model1_input)

*사용예시5 – 초간단, loss X*

In [52]:
model1(torch.tensor([[1,2,3,4], [2,3,4,5]]))

> 사용예시1~5에서 `model1()` 대신에 `model1.forward()`를 사용해도 된다.

`#`

`# 예제2` – 이미지분류

In [56]:
model2 = transformers.AutoModelForImageClassification.from_pretrained(
    "google/vit-base-patch16-224-in21k",
    num_labels=3 # 그냥 대충 3이라고 했음.. 별 이유는 없음
)

Some weights of ViTForImageClassification were not initialized from the model checkpoint at google/vit-base-patch16-224-in21k and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.

*모델의 기본정보(config)*

In [62]:
model2.config

-   image_size: 224

In [63]:
mp.tab(model2.config)

In [65]:
model2.config.num_channels

*모델의 입력파악*

In [68]:
torch.randn(2,3,64,64)

*사용예시1 – 입력나열, loss O*

In [78]:
torch.random.manual_seed(42)
model2(
    pixel_values = torch.randn(2,3,224,224),
    labels = torch.tensor([0,1])
)

*사용예시2 – `**딕셔너리`, loss O*

In [79]:
torch.random.manual_seed(42)
model2_input = dict(
    pixel_values = torch.randn(2,3,224,224),
    labels = torch.tensor([0,1])
)
model2(**model2_input)

*사용예시3 – 입력나열, loss X*

In [80]:
torch.random.manual_seed(42)
model2(
    pixel_values = torch.randn(2,3,224,224)
)

*사용예시4 – `**딕셔너리`, loss X*

In [81]:
torch.random.manual_seed(42)
model2_input = dict(
    pixel_values = torch.randn(2,3,224,224),
)
model2(**model2_input)

*사용예시5 – 초간단, loss X*

In [82]:
torch.random.manual_seed(42)
model2(torch.randn(2,3,224,224))

`#`

`# 예제3` – 동영상분류

In [83]:
model3 = transformers.VideoMAEForVideoClassification.from_pretrained(
    "MCG-NJU/videomae-base",
)

Some weights of VideoMAEForVideoClassification were not initialized from the model checkpoint at MCG-NJU/videomae-base and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.

*모델의 기본정보(config)*

In [85]:
model3.config

-   image_size: 224
-   num_frames: 16

*모델의 입력파악*

In [86]:
model3.forward?

Signature:
model3.forward(
    pixel_values: Optional[torch.Tensor] = None,
    head_mask: Optional[torch.Tensor] = None,
    labels: Optional[torch.Tensor] = None,
    output_attentions: Optional[bool] = None,
    output_hidden_states: Optional[bool] = None,
    return_dict: Optional[bool] = None,
) -> Union[Tuple, transformers.modeling_outputs.ImageClassifierOutput]
Docstring:
The [`VideoMAEForVideoClassification`] forward method, overrides the `__call__` special method.

<Tip>

Although the recipe for forward pass needs to be defined within this function, one should call the [`Module`]
instance afterwards instead of this since the former takes care of running the pre and post processing steps while
the latter silently ignores them.

</Tip>

Args:
    pixel_values (`torch.FloatTensor` of shape `(batch_size, num_frames, num_channels, height, width)`):
        Pixel values. Pixel values can be obtained using [`AutoImageProcessor`]. See
        [`VideoMAEImageProcessor.__call__`] for de

*사용예시1 – 입력나열, loss O*

In [None]:
torch.random.manual_seed(42)
model3(
    pixel_values = torch.randn(4,16,3,224,224),
    labels = torch.tensor([0,1,0,1])
)

*사용예시2 – `**딕셔너리`, loss O*

In [88]:
torch.random.manual_seed(42)
model3_input = dict(
    pixel_values = torch.randn(4,16,3,224,224),
    labels = torch.tensor([0,1,0,1])
)
model3(**model3_input)

*사용예시3 – 입력나열, loss X*

In [89]:
torch.random.manual_seed(42)
model3(
    pixel_values = torch.randn(4,16,3,224,224)
)

*사용예시4 – `**딕셔너리`, loss X*

In [90]:
torch.random.manual_seed(42)
model3_input = dict(
    pixel_values = torch.randn(4,16,3,224,224),
)
model3(**model3_input)

*사용예시5 – 초간단, loss X*

In [92]:
torch.random.manual_seed(42)
model3(torch.randn(4,16,3,224,224))

`#`

# 4. Model 사용 연습

## A. 텍스트

In [2]:
model1 = transformers.AutoModelForSequenceClassification.from_pretrained(
    "distilbert/distilbert-base-uncased", num_labels=2
)
tokenizer = transformers.AutoTokenizer.from_pretrained("distilbert/distilbert-base-uncased")

Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert/distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.

`# 예제1` – imdb

In [3]:
imdb = datasets.load_dataset('imdb')

In [4]:
d = imdb['train'].select(range(3))
d

`(풀이1)`

*실패*

In [5]:
model1.forward(torch.tensor(tokenizer(d['text'])['input_ids']))

*원인분석*

In [6]:
mp.show_list(
    tokenizer(d['text'])['input_ids']
)

Level 1 - Type: list, Length: 3, Content: [[101, 1045, 12524, 1045, ... // ... , 7987, 1013, 1028, 102]]
     Level 2 - Type: list, Length: 363, Content: [101, 1045, 12524, 1045,  ... // ... 7, 1037, 5436, 1012, 102]
     Level 2 - Type: list, Length: 304, Content: [101, 1000, 1045, 2572, 8 ... // ... 5, 1055, 4230, 1012, 102]
     Level 2 - Type: list, Length: 133, Content: [101, 2065, 2069, 2000, 4 ... // ... 6, 7987, 1013, 1028, 102]

In [7]:
mp.show_list(
    tokenizer(d['text'], padding=True)['input_ids']
)

Level 1 - Type: list, Length: 3, Content: [[101, 1045, 12524, 1045, ... // ...  0, 0, 0, 0, 0, 0, 0, 0]]
     Level 2 - Type: list, Length: 363, Content: [101, 1045, 12524, 1045,  ... // ... 7, 1037, 5436, 1012, 102]
     Level 2 - Type: list, Length: 363, Content: [101, 1000, 1045, 2572, 8 ... // ... , 0, 0, 0, 0, 0, 0, 0, 0]
     Level 2 - Type: list, Length: 363, Content: [101, 2065, 2069, 2000, 4 ... // ... , 0, 0, 0, 0, 0, 0, 0, 0]

*성공*

In [40]:
model1(torch.tensor(tokenizer(d['text'], padding=True)['input_ids']))

`(풀이2)`

In [47]:
model1(tokenizer(d['text'], padding=True, return_tensors="pt")['input_ids'])

`#`

`# 예제2` – emotion

In [48]:
emotion = datasets.load_dataset('emotion')

In [49]:
d = emotion['train'].select(range(3))
d

`(풀이)`

In [56]:
model1(torch.tensor(tokenizer(d['text'],padding=True)['input_ids']))

`#`

`# 예제3` – MBTI

In [9]:
# mbti1.csv 파일 다운로드 
!wget https://raw.githubusercontent.com/guebin/MP2024/refs/heads/main/posts/mbti_1.csv

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
    - Avoid using `tokenizers` before the fork if possible
    - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)

--2024-11-10 01:07:26--  https://raw.githubusercontent.com/guebin/MP2024/refs/heads/main/posts/mbti_1.csv
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.111.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 62856486 (60M) [text/plain]
Saving to: ‘mbti_1.csv’


2024-11-10 01:07:31 (108 MB/s) - ‘mbti_1.csv’ saved [62856486/62856486]


In [99]:
d = datasets.Dataset.from_csv("mbti_1.csv").select(range(3))
d

`(풀이1)`

*실패*

In [17]:
model1(torch.tensor(tokenizer(d['posts'],padding=True)['input_ids']))

*원인분석*

In [21]:
mp.show_list(
    tokenizer(d['posts'],padding=True)['input_ids']
)

Level 1 - Type: list, Length: 3, Content: [[101, 1005, 8299, 1024,  ... // ...  0, 0, 0, 0, 0, 0, 0, 0]]
     Level 2 - Type: list, Length: 2102, Content: [101, 1005, 8299, 1024, 1 ... // ... , 0, 0, 0, 0, 0, 0, 0, 0]
     Level 2 - Type: list, Length: 2102, Content: [101, 1005, 1045, 1005, 1 ... // ... 2, 1012, 1012, 1005, 102]
     Level 2 - Type: list, Length: 2102, Content: [101, 1005, 2204, 2028, 1 ... // ... , 0, 0, 0, 0, 0, 0, 0, 0]

In [22]:
mp.show_list(
    tokenizer(d['posts'],truncation=True)['input_ids']
)

Level 1 - Type: list, Length: 3, Content: [[101, 1005, 8299, 1024,  ... // ... , 7834, 1012, 2077, 102]]
     Level 2 - Type: list, Length: 512, Content: [101, 1005, 8299, 1024, 1 ... // ... 1, 4127, 2017, 2215, 102]
     Level 2 - Type: list, Length: 512, Content: [101, 1005, 1045, 1005, 1 ... // ... 6, 2600, 3259, 2028, 102]
     Level 2 - Type: list, Length: 512, Content: [101, 1005, 2204, 2028, 1 ... // ... 0, 7834, 1012, 2077, 102]

*성공*

In [31]:
model1(torch.tensor(tokenizer(d['posts'],truncation=True)['input_ids']))
#model1(tokenizer(d['posts'],truncation=True,return_tensors="pt")['input_ids'])

`(풀이2)` *–모델설정변경 (퀴즈5, 모델의 프레임수를 4로 바꾸는 예제에서
사용한 테크닉)*

*distilbert/distilbert-base-uncased 설정값 부르기*

In [80]:
config = transformers.AutoConfig.from_pretrained(
    "distilbert/distilbert-base-uncased"
)
config

*설정값변경*

In [84]:
config.max_position_embeddings = 2200

*설정값으로 모델불러오기*

In [86]:
model1_large = transformers.AutoModelForSequenceClassification.from_config(
    config=config
)

*모델사용*

In [88]:
model1_large(torch.tensor(tokenizer(d['posts'],padding=True)['input_ids']))

In [102]:
model1_large(**tokenizer(d['posts'],padding=True,return_tensors="pt"))

`#`

`# 예제4` – sms_spam

In [89]:
sms_spam = datasets.load_dataset('sms_spam')['train'].train_test_split(test_size=0.2, seed=42)
sms_spam

In [90]:
d = sms_spam['train'].select(range(3))
d

`(풀이)`

In [97]:
model1(**tokenizer(d['sms'],padding=True,return_tensors="pt"))

`#`

## B. 이미지

In [107]:
model2 = transformers.AutoModelForImageClassification.from_pretrained(
    "google/vit-base-patch16-224-in21k",
    num_labels=3 # 그냥 대충 3이라고 했음.. 별 이유는 없음
)

Some weights of ViTForImageClassification were not initialized from the model checkpoint at google/vit-base-patch16-224-in21k and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.

`# 예제1` – food101

In [173]:
d = datasets.load_dataset("food101", split="train[:4]")
d

`(예비학습)` – `torchvision.transforms` 에서 제공하는 기능들은
배치처리가 가능한가?

In [174]:
to_tensor = torchvision.transforms.ToTensor()

In [175]:
to_tensor(d['image'][0])

In [176]:
to_tensor(d['image'])

`(풀이)`

In [177]:
compose = torchvision.transforms.Compose([
    torchvision.transforms.ToTensor(),
    torchvision.transforms.Resize((224,224))
])

In [178]:
torch.stack(list(map(compose,d['image'])),axis=0).shape

In [181]:
model2.forward(
    torch.stack(list(map(compose,d['image'])),axis=0)
)

`#`

`# 예제2`

In [183]:
beans = datasets.load_dataset('beans')
d = beans['train'].select(range(4))
d

`(풀이)`

In [187]:
model2(torch.stack(list(map(compose,d['image'])),axis=0))

`#`

## C. 동영상

In [3]:
model3 = transformers.VideoMAEForVideoClassification.from_pretrained(
    "MCG-NJU/videomae-base",
)

Some weights of VideoMAEForVideoClassification were not initialized from the model checkpoint at MCG-NJU/videomae-base and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.

`# 예제1` – UCF101_subset

In [4]:
file_path = huggingface_hub.hf_hub_download(
    repo_id="sayakpaul/ucf101-subset",
    filename="UCF101_subset.tar.gz",
    repo_type="dataset"
)
# file_path는 다운로드한 압축파일이 존재하는 경로와 파일명이 string으로 저장되어있음.
with tarfile.open(file_path) as t:
     t.extractall("./data") # 여기에서 "."은 현재폴더라는 의미

In [5]:
mp.tree("./data")

└── UCF101_subset
    ├── test
    │   ├── ApplyEyeMakeup
    │   │   ├── UCF101
    │   │   ├── v_ApplyEyeMakeup_g03_c01.avi
    │   │   └── ...
    │   │   └── v_ApplyEyeMakeup_g23_c06.avi
    │   ├── ApplyLipstick
    │   │   ├── UCF101
    │   │   ├── v_ApplyLipstick_g14_c01.avi
    │   │   └── ...
    │   │   └── v_ApplyLipstick_g16_c04.avi
    │   └── ...
    │   └── BenchPress
    │       ├── UCF101
    │       ├── v_BenchPress_g05_c02.avi
    │       └── ...
    │       └── v_BenchPress_g25_c06.avi
    ├── train
    │   ├── ApplyEyeMakeup
    │   │   ├── UCF101
    │   │   ├── v_ApplyEyeMakeup_g02_c03.avi
    │   │   └── ...
    │   │   └── v_ApplyEyeMakeup_g25_c07.avi
    │   ├── ApplyLipstick
    │   │   ├── UCF101
    │   │   ├── v_ApplyLipstick_g01_c02.avi
    │   │   └── ...
    │   │   └── v_ApplyLipstick_g24_c05.avi
    │   └── ...
    │   └── BenchPress
    │       ├── UCF101
    │       ├── v_BenchPress_g01_c05.avi
    │       └── ...
    │       └── v_BenchPress_g24_c

In [6]:
video_path = "./data/UCF101_subset/test/BenchPress/v_BenchPress_g05_c02.avi"
video = pytorchvideo.data.encoded_video.EncodedVideo.from_path(video_path).get_clip(0, float('inf'))['video']
video.shape

`(풀이)`

In [12]:
model3(
    #video.permute(1,0,2,3)[:16,:,:224,:224].unsqueeze(0)
    #video.permute(1,0,2,3)[:16,:,:224,:224].reshape(1,16,3,224,224)
    torch.stack([video.permute(1,0,2,3)[:16,:,:224,:224]])
)