### CBOW(Continuous Bag-of-Words)

- 주변 단어들을 가지고 중심 단어를 예측하는 방식으로 학습합니다.
- 주변 단어들의 one-hot encoding 벡터를 각각 embedding layer에 projection하여 각각의 embedding 벡터를 얻고 이 embedding들을 element-wise한 덧셈으로 합친 뒤, 다시 linear transformation하여 예측하고자 하는 중심 단어의 one-hot encoding 벡터와 같은 사이즈의 벡터로 만든 뒤, 중심 단어의 one-hot encoding 벡터와의 loss를 계산합니다.
- 예) A cute puppy is walking in the park. & window size: 2
  - Input(주변 단어): "A", "cute", "is", "walking"
  - Output(중심 단어): "puppy"

### Skip-gram

- 중심 단어를 가지고 주변 단어들을 예측하는 방식으로 학습합니다.
- 중심 단어의 one-hot encoding 벡터를 embedding layer에 projection하여 해당 단어의 embedding 벡터를 얻고 이 벡터를 다시 linear transformation하여 예측하고자 하는 각각의 주변 단어들과의 one-hot encoding 벡터와 같은 사이즈의 벡터로 만든 뒤, 그 주변 단어들의 one-hot encoding 벡터와의 loss를 각각 계산합니다.
- 예) A cute puppy is walking in the park. & window size: 2
  - Input(중심 단어): "puppy"
  - Output(주변 단어): "A", "cute", "is", "walking"

##**2. Word2Vec**
1. 주어진 단어들을 word2vec 모델에 들어갈 수 있는 형태로 만듭니다.
2. CBOW, Skip-gram 모델을 각각 구현합니다.
3. 모델을 실제로 학습해보고 결과를 확인합니다.

In [3]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


### **필요 패키지 import**

In [4]:
!pip install konlpy



In [5]:
from tqdm import tqdm
from konlpy.tag import Okt
from torch import nn
from torch.nn import functional as F
from torch.utils.data import Dataset, DataLoader
from collections import defaultdict

import torch
import copy
import numpy as np

### **데이터 전처리**



데이터를 확인하고 Word2Vec 형식에 맞게 전처리합니다.  
학습 데이터는 1번 실습과 동일하고, 테스트를 위한 단어를 아래와 같이 가정해봅시다.

In [6]:
train_data = [
  "정말 맛있습니다. 추천합니다.",
  "기대했던 것보단 별로였네요.",
  "다 좋은데 가격이 너무 비싸서 다시 가고 싶다는 생각이 안 드네요.",
  "완전 최고입니다! 재방문 의사 있습니다.",
  "음식도 서비스도 다 만족스러웠습니다.",
  "위생 상태가 좀 별로였습니다. 좀 더 개선되기를 바랍니다.",
  "맛도 좋았고 직원분들 서비스도 너무 친절했습니다.",
  "기념일에 방문했는데 음식도 분위기도 서비스도 다 좋았습니다.",
  "전반적으로 음식이 너무 짰습니다. 저는 별로였네요.",
  "위생에 조금 더 신경 썼으면 좋겠습니다. 조금 불쾌했습니다."       
]

test_words = ["음식", "맛", "서비스", "위생", "가격"]

Tokenization과 vocab을 만드는 과정은 이전 실습과 유사합니다.

In [7]:
tokenizer = Okt()

In [8]:
def make_tokenized(data):
  tokenized = []
  for sent in tqdm(data):
    tokens = tokenizer.morphs(sent, stem=True)
    tokenized.append(tokens)

  return tokenized

In [9]:
train_tokenized = make_tokenized(train_data)

100%|██████████| 10/10 [00:05<00:00,  1.81it/s]


In [10]:
word_count = defaultdict(int)

for tokens in tqdm(train_tokenized):
  for token in tokens:
    word_count[token] += 1

100%|██████████| 10/10 [00:00<00:00, 13516.93it/s]


In [11]:
word_count = sorted(word_count.items(), key=lambda x: x[1], reverse=True)
print(list(word_count))

[('.', 14), ('도', 7), ('이다', 4), ('좋다', 4), ('별로', 3), ('다', 3), ('이', 3), ('너무', 3), ('음식', 3), ('서비스', 3), ('하다', 2), ('방문', 2), ('위생', 2), ('좀', 2), ('더', 2), ('에', 2), ('조금', 2), ('정말', 1), ('맛있다', 1), ('추천', 1), ('기대하다', 1), ('것', 1), ('보단', 1), ('가격', 1), ('비싸다', 1), ('다시', 1), ('가다', 1), ('싶다', 1), ('생각', 1), ('안', 1), ('드네', 1), ('요', 1), ('완전', 1), ('최고', 1), ('!', 1), ('재', 1), ('의사', 1), ('있다', 1), ('만족스럽다', 1), ('상태', 1), ('가', 1), ('개선', 1), ('되다', 1), ('기르다', 1), ('바라다', 1), ('맛', 1), ('직원', 1), ('분들', 1), ('친절하다', 1), ('기념일', 1), ('분위기', 1), ('전반', 1), ('적', 1), ('으로', 1), ('짜다', 1), ('저', 1), ('는', 1), ('신경', 1), ('써다', 1), ('불쾌하다', 1)]


In [12]:
w2i = {}
for pair in tqdm(word_count):
  if pair[0] not in w2i:
    w2i[pair[0]] = len(w2i)

100%|██████████| 60/60 [00:00<00:00, 97202.87it/s]


In [13]:
print(train_tokenized)
print(w2i)

[['정말', '맛있다', '.', '추천', '하다', '.'], ['기대하다', '것', '보단', '별로', '이다', '.'], ['다', '좋다', '가격', '이', '너무', '비싸다', '다시', '가다', '싶다', '생각', '이', '안', '드네', '요', '.'], ['완전', '최고', '이다', '!', '재', '방문', '의사', '있다', '.'], ['음식', '도', '서비스', '도', '다', '만족스럽다', '.'], ['위생', '상태', '가', '좀', '별로', '이다', '.', '좀', '더', '개선', '되다', '기르다', '바라다', '.'], ['맛', '도', '좋다', '직원', '분들', '서비스', '도', '너무', '친절하다', '.'], ['기념일', '에', '방문', '하다', '음식', '도', '분위기', '도', '서비스', '도', '다', '좋다', '.'], ['전반', '적', '으로', '음식', '이', '너무', '짜다', '.', '저', '는', '별로', '이다', '.'], ['위생', '에', '조금', '더', '신경', '써다', '좋다', '.', '조금', '불쾌하다', '.']]
{'.': 0, '도': 1, '이다': 2, '좋다': 3, '별로': 4, '다': 5, '이': 6, '너무': 7, '음식': 8, '서비스': 9, '하다': 10, '방문': 11, '위생': 12, '좀': 13, '더': 14, '에': 15, '조금': 16, '정말': 17, '맛있다': 18, '추천': 19, '기대하다': 20, '것': 21, '보단': 22, '가격': 23, '비싸다': 24, '다시': 25, '가다': 26, '싶다': 27, '생각': 28, '안': 29, '드네': 30, '요': 31, '완전': 32, '최고': 33, '!': 34, '재': 35, '의사': 36, '있다': 37, '만족스럽다': 38, '상태

실제 모델에 들어가기 위한 input을 만들기 위해 `Dataset` 클래스를 정의합니다.

In [14]:
class CBOWDataset(Dataset):
  def __init__(self, train_tokenized, window_size=2):
    self.x = []
    self.y = []

    for tokens in tqdm(train_tokenized):
      token_ids = [w2i[token] for token in tokens]
      for i, id in enumerate(token_ids):
        if i-window_size >= 0 and i+window_size < len(token_ids):
          self.x.append(token_ids[i-window_size:i] + token_ids[i+1:i+window_size+1])
          self.y.append(id)

    self.x = torch.LongTensor(self.x)  # (전체 데이터 개수, 2 * window_size)
    self.y = torch.LongTensor(self.y)  # (전체 데이터 개수)

  def __len__(self):
    return self.x.shape[0]

  def __getitem__(self, idx):
    return self.x[idx], self.y[idx]

In [15]:
class SkipGramDataset(Dataset):
  def __init__(self, train_tokenized, window_size=2):
    self.x = []
    self.y = []

    for tokens in tqdm(train_tokenized):
      token_ids = [w2i[token] for token in tokens]
      for i, id in enumerate(token_ids):
        if i-window_size >= 0 and i+window_size < len(token_ids):
          self.y += (token_ids[i-window_size:i] + token_ids[i+1:i+window_size+1])
          self.x += [id] * 2 * window_size

    self.x = torch.LongTensor(self.x)  # (전체 데이터 개수)
    self.y = torch.LongTensor(self.y)  # (전체 데이터 개수)

  def __len__(self):
    return self.x.shape[0]

  def __getitem__(self, idx):
    return self.x[idx], self.y[idx]

각 모델에 맞는 `Dataset` 객체를 생성합니다.

In [16]:
cbow_set = CBOWDataset(train_tokenized)
skipgram_set = SkipGramDataset(train_tokenized)
print(list(skipgram_set))

100%|██████████| 10/10 [00:00<00:00, 37249.59it/s]
100%|██████████| 10/10 [00:00<00:00, 19963.37it/s]

[(tensor(0), tensor(17)), (tensor(0), tensor(18)), (tensor(0), tensor(19)), (tensor(0), tensor(10)), (tensor(19), tensor(18)), (tensor(19), tensor(0)), (tensor(19), tensor(10)), (tensor(19), tensor(0)), (tensor(22), tensor(20)), (tensor(22), tensor(21)), (tensor(22), tensor(4)), (tensor(22), tensor(2)), (tensor(4), tensor(21)), (tensor(4), tensor(22)), (tensor(4), tensor(2)), (tensor(4), tensor(0)), (tensor(23), tensor(5)), (tensor(23), tensor(3)), (tensor(23), tensor(6)), (tensor(23), tensor(7)), (tensor(6), tensor(3)), (tensor(6), tensor(23)), (tensor(6), tensor(7)), (tensor(6), tensor(24)), (tensor(7), tensor(23)), (tensor(7), tensor(6)), (tensor(7), tensor(24)), (tensor(7), tensor(25)), (tensor(24), tensor(6)), (tensor(24), tensor(7)), (tensor(24), tensor(25)), (tensor(24), tensor(26)), (tensor(25), tensor(7)), (tensor(25), tensor(24)), (tensor(25), tensor(26)), (tensor(25), tensor(27)), (tensor(26), tensor(24)), (tensor(26), tensor(25)), (tensor(26), tensor(27)), (tensor(26), tens




### **모델 Class 구현**

차례대로 두 가지 Word2Vec 모델을 구현합니다.  


*   `self.embedding`: `vocab_size` 크기의 one-hot vector를 특정 크기의 `dim` 차원으로 embedding 시키는 layer.
*   `self.linear`: 변환된 embedding vector를 다시 원래 `vocab_size`로 바꾸는 layer.


In [18]:
class CBOW(nn.Module):
  def __init__(self, vocab_size, dim):
    super(CBOW, self).__init__()
    self.embedding = nn.Embedding(vocab_size, dim, sparse=True)
    self.linear = nn.Linear(dim, vocab_size)

  # B: batch size, W: window size, d_w: word embedding size, V: vocab size
  def forward(self, x):  # x: (B, 2W)
    embeddings = self.embedding(x)  # (B, 2W, d_w)
    embeddings = torch.sum(embeddings, dim=1)  # (B, d_w)
    output = self.linear(embeddings)  # (B, V)
    return output

In [19]:
class SkipGram(nn.Module):
  def __init__(self, vocab_size, dim):
    super(SkipGram, self).__init__()
    self.embedding = nn.Embedding(vocab_size, dim, sparse=True)
    self.linear = nn.Linear(dim, vocab_size)

  # B: batch size, W: window size, d_w: word embedding size, V: vocab size
  def forward(self, x): # x: (B)
    embeddings = self.embedding(x)  # (B, d_w)
    output = self.linear(embeddings)  # (B, V)
    return output

두 가지 모델을 생성합니다.

In [20]:
cbow = CBOW(vocab_size=len(w2i), dim=256)
skipgram = SkipGram(vocab_size=len(w2i), dim=256)

### **모델 학습**

다음과 같이 hyperparamter를 세팅하고 `DataLoader` 객체를 만듭니다.

In [21]:
batch_size=4
learning_rate = 5e-4
num_epochs = 5
device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')

cbow_loader = DataLoader(cbow_set, batch_size=batch_size)
skipgram_loader = DataLoader(skipgram_set, batch_size=batch_size)

첫번째로 CBOW 모델 학습입니다.

In [22]:
cbow.train()
cbow = cbow.to(device)
optim = torch.optim.SGD(cbow.parameters(), lr=learning_rate)
loss_function = nn.CrossEntropyLoss()

for e in range(1, num_epochs+1):
  print("#" * 50)
  print(f"Epoch: {e}")
  for batch in tqdm(cbow_loader):
    x, y = batch
    x, y = x.to(device), y.to(device) # (B, W), (B)
    output = cbow(x)  # (B, V)
 
    optim.zero_grad()
    loss = loss_function(output, y)
    loss.backward()
    optim.step()

    print(f"Train loss: {loss.item()}")

print("Finished.")

100%|██████████| 16/16 [00:00<00:00, 82.78it/s]
  0%|          | 0/16 [00:00<?, ?it/s]

##################################################
Epoch: 1
Train loss: 5.32407283782959
Train loss: 3.5595760345458984
Train loss: 4.547629356384277
Train loss: 3.913658380508423
Train loss: 4.479578018188477
Train loss: 4.342905044555664
Train loss: 4.667276382446289
Train loss: 5.308061599731445
Train loss: 4.54544734954834
Train loss: 4.497493743896484
Train loss: 4.771795272827148
Train loss: 4.435040473937988
Train loss: 3.8879635334014893
Train loss: 4.878297805786133
Train loss: 5.111123085021973
Train loss: 3.9713449478149414
##################################################
Epoch: 2
Train loss: 5.13508415222168


100%|██████████| 16/16 [00:00<00:00, 616.28it/s]
100%|██████████| 16/16 [00:00<00:00, 631.56it/s]
  0%|          | 0/16 [00:00<?, ?it/s]

Train loss: 3.434051990509033
Train loss: 4.425656795501709
Train loss: 3.808619499206543
Train loss: 4.3524346351623535
Train loss: 4.063481330871582
Train loss: 4.488699436187744
Train loss: 5.173338890075684
Train loss: 4.425926208496094
Train loss: 4.331645488739014
Train loss: 4.580134868621826
Train loss: 4.057640552520752
Train loss: 3.766841411590576
Train loss: 4.770680904388428
Train loss: 4.937209129333496
Train loss: 3.846787452697754
##################################################
Epoch: 3
Train loss: 4.948372840881348
Train loss: 3.311044931411743
Train loss: 4.306021690368652
Train loss: 3.7045955657958984
Train loss: 4.22697639465332
Train loss: 3.8030028343200684
Train loss: 4.312788963317871
Train loss: 5.040936470031738
Train loss: 4.311140060424805
Train loss: 4.172423839569092
Train loss: 4.3983259201049805
Train loss: 3.7057337760925293
Train loss: 3.647672414779663
Train loss: 4.6649017333984375
Train loss: 4.767360687255859
Train loss: 3.7263190746307373
####

100%|██████████| 16/16 [00:00<00:00, 55.27it/s]
100%|██████████| 16/16 [00:00<00:00, 670.48it/s]

Train loss: 3.1906325817108154
Train loss: 4.188723087310791
Train loss: 3.6016454696655273
Train loss: 4.103211879730225
Train loss: 3.561896324157715
Train loss: 4.139711856842041
Train loss: 4.910799980163574
Train loss: 4.200994968414307
Train loss: 4.020262718200684
Train loss: 4.226686477661133
Train loss: 3.381533145904541
Train loss: 3.530536413192749
Train loss: 4.560973167419434
Train loss: 4.601377010345459
Train loss: 3.6096858978271484
##################################################
Epoch: 5
Train loss: 4.582165718078613
Train loss: 3.072896718978882
Train loss: 4.0737624168396
Train loss: 3.4998278617858887
Train loss: 3.9811534881591797
Train loss: 3.340017318725586
Train loss: 3.969665050506592
Train loss: 4.782879829406738
Train loss: 4.0954179763793945
Train loss: 3.875405788421631
Train loss: 4.065690517425537
Train loss: 3.087124824523926
Train loss: 3.4155163764953613
Train loss: 4.45890998840332
Train loss: 4.439147472381592
Train loss: 3.4966869354248047
Finis




다음으로 Skip-gram 모델 학습입니다.

In [23]:
skipgram.train()
skipgram = skipgram.to(device)
optim = torch.optim.SGD(skipgram.parameters(), lr=learning_rate)
loss_function = nn.CrossEntropyLoss()

for e in range(1, num_epochs+1):
  print("#" * 50)
  print(f"Epoch: {e}")
  for batch in tqdm(skipgram_loader):
    x, y = batch
    x, y = x.to(device), y.to(device) # (B, W), (B)
    output = skipgram(x)  # (B, V)

    optim.zero_grad()
    loss = loss_function(output, y)
    loss.backward()
    optim.step()

    print(f"Train loss: {loss.item()}")

print("Finished.")

100%|██████████| 64/64 [00:00<00:00, 720.61it/s]
100%|██████████| 64/64 [00:00<00:00, 725.44it/s]
  0%|          | 0/64 [00:00<?, ?it/s]

##################################################
Epoch: 1
Train loss: 4.501903533935547
Train loss: 3.8635504245758057
Train loss: 4.141275405883789
Train loss: 4.496247291564941
Train loss: 4.316728115081787
Train loss: 4.261194229125977
Train loss: 4.368272304534912
Train loss: 4.098947525024414
Train loss: 4.2619476318359375
Train loss: 4.528131008148193
Train loss: 3.8184611797332764
Train loss: 4.191086769104004
Train loss: 4.1431474685668945
Train loss: 4.233884334564209
Train loss: 4.2071638107299805
Train loss: 4.326584815979004
Train loss: 4.152448654174805
Train loss: 4.223211288452148
Train loss: 4.215305328369141
Train loss: 4.092764854431152
Train loss: 4.458646297454834
Train loss: 4.170133113861084
Train loss: 4.835692405700684
Train loss: 4.005480766296387
Train loss: 4.282678127288818
Train loss: 4.412858963012695
Train loss: 4.538447380065918
Train loss: 4.131497383117676
Train loss: 4.506120204925537
Train loss: 4.707592487335205
Train loss: 4.128485202789307
Train

100%|██████████| 64/64 [00:00<00:00, 735.38it/s]
100%|██████████| 64/64 [00:00<00:00, 731.86it/s]
  0%|          | 0/64 [00:00<?, ?it/s]

Train loss: 3.7473230361938477
Train loss: 4.131361484527588
Train loss: 4.077559947967529
Train loss: 4.163684844970703
Train loss: 4.140625953674316
Train loss: 4.2675628662109375
Train loss: 4.088098049163818
Train loss: 4.158448219299316
Train loss: 4.158718109130859
Train loss: 4.036550521850586
Train loss: 4.260710716247559
Train loss: 4.012176513671875
Train loss: 4.713059425354004
Train loss: 3.927565813064575
Train loss: 4.192531108856201
Train loss: 4.283690452575684
Train loss: 4.451066970825195
Train loss: 4.077882289886475
Train loss: 4.418991565704346
Train loss: 4.647434234619141
Train loss: 4.080622673034668
Train loss: 4.4856343269348145
Train loss: 4.154275894165039
Train loss: 4.194477558135986
Train loss: 4.0761213302612305
Train loss: 3.8117876052856445
Train loss: 3.996396064758301
Train loss: 4.241329669952393
Train loss: 4.0757598876953125
Train loss: 3.9732844829559326
Train loss: 4.829377174377441
Train loss: 3.76967191696167
Train loss: 4.384131908416748
Trai

100%|██████████| 64/64 [00:00<00:00, 723.32it/s]

Train loss: 4.364875316619873
Train loss: 4.026218891143799
Train loss: 4.334148406982422
Train loss: 4.588531494140625
Train loss: 4.033250331878662
Train loss: 4.423529148101807
Train loss: 4.096147537231445
Train loss: 4.135798454284668
Train loss: 4.018080234527588
Train loss: 3.750169277191162
Train loss: 3.8858566284179688
Train loss: 4.163915634155273
Train loss: 4.005611896514893
Train loss: 3.9156172275543213
Train loss: 4.743786811828613
Train loss: 3.729802370071411
Train loss: 4.266858100891113
Train loss: 3.712614059448242
Train loss: 3.682309150695801
Train loss: 4.183540344238281
Train loss: 4.037846565246582
Train loss: 4.537623405456543
Train loss: 4.0490827560424805
Train loss: 4.123320579528809
Train loss: 3.825652837753296
Train loss: 3.8139736652374268
Train loss: 3.47247314453125
Train loss: 4.06852388381958
Train loss: 3.898646831512451
Train loss: 4.29521369934082
Train loss: 4.3852858543396
Train loss: 4.028070449829102
Train loss: 4.1642889976501465
Train loss




### **테스트**

학습된 각 모델을 이용하여 test 단어들의 word embedding을 확인합니다.

In [24]:
for word in test_words:
  input_id = torch.LongTensor([w2i[word]]).to(device)
  emb = cbow.embedding(input_id)

  print(f"Word: {word}")
  print(emb.squeeze(0))

Word: 음식
tensor([ 2.1512e-01,  1.6459e+00, -4.5536e-02,  9.7926e-02, -5.3946e-01,
         1.3987e-01, -7.0558e-01,  1.5987e+00, -6.7485e-01, -5.0270e-01,
         3.2398e-02,  1.5469e+00, -6.8049e-01, -6.7504e-02, -9.9103e-02,
        -2.6982e-01,  6.8192e-01, -7.6911e-02, -5.3510e-01,  8.4853e-01,
        -6.4414e-01, -6.9495e-01, -2.3576e+00, -1.9592e+00, -1.0910e+00,
        -7.1867e-01, -4.6458e-01,  1.2677e+00,  3.3500e-01,  1.0732e+00,
        -1.7693e+00, -7.0309e-01,  7.8530e-01,  5.1830e-01,  1.4247e-01,
        -2.3663e-01, -1.3117e+00,  2.2243e-01, -1.4052e+00,  2.1612e+00,
         2.4065e+00,  3.4722e-01, -7.3143e-01, -5.9588e-01, -1.2722e+00,
        -4.9789e-01,  8.7165e-01, -1.4515e-01,  2.5813e-01,  3.6628e-01,
         1.5090e+00,  7.1159e-01, -1.2334e+00,  2.0455e+00, -7.1456e-01,
         2.8921e-01, -7.4926e-01, -7.6920e-01, -2.3242e-01, -5.0443e-01,
        -9.7596e-01, -9.8172e-01,  1.9162e-02,  1.3211e+00, -2.1197e-01,
         9.3251e-01,  1.1003e+00, -1.2776e

In [25]:
for word in test_words:
  input_id = torch.LongTensor([w2i[word]]).to(device)
  emb = skipgram.embedding(input_id)

  print(f"Word: {word}")
  print(max(emb.squeeze(0)))

Word: 음식
tensor(2.5135, device='cuda:0', grad_fn=<UnbindBackward>)
Word: 맛
tensor(3.0978, device='cuda:0', grad_fn=<UnbindBackward>)
Word: 서비스
tensor(2.1603, device='cuda:0', grad_fn=<UnbindBackward>)
Word: 위생
tensor(3.1853, device='cuda:0', grad_fn=<UnbindBackward>)
Word: 가격
tensor(2.4698, device='cuda:0', grad_fn=<UnbindBackward>)


In [26]:
!apt-get install -qq texlive texlive-xetex texlive-latex-extra pandoc
!pip install -qq pypandoc

from google.colab import drive
drive.mount('/content/drive')

!jupyter nbconvert --to PDF '/content/drive/My Drive/Colab Notebooks/1_naive_bayes.ipynb의 사본'

Extracting templates from packages: 100%
Preconfiguring packages ...
Selecting previously unselected package fonts-droid-fallback.
(Reading database ... 146425 files and directories currently installed.)
Preparing to unpack .../00-fonts-droid-fallback_1%3a6.0.1r16-1.1_all.deb ...
Unpacking fonts-droid-fallback (1:6.0.1r16-1.1) ...
Selecting previously unselected package fonts-lato.
Preparing to unpack .../01-fonts-lato_2.0-2_all.deb ...
Unpacking fonts-lato (2.0-2) ...
Selecting previously unselected package poppler-data.
Preparing to unpack .../02-poppler-data_0.4.8-2_all.deb ...
Unpacking poppler-data (0.4.8-2) ...
Selecting previously unselected package tex-common.
Preparing to unpack .../03-tex-common_6.09_all.deb ...
Unpacking tex-common (6.09) ...
Selecting previously unselected package fonts-lmodern.
Preparing to unpack .../04-fonts-lmodern_2.004.5-3_all.deb ...
Unpacking fonts-lmodern (2.004.5-3) ...
Selecting previously unselected package fonts-noto-mono.
Preparing to unpack .