# Transformer

## Transformer Paper: Attention is All You Need

Google 2017年发布的论文： Attention is all you need.

![all you need is attention](./img/2024-05-07-09-48-26.png)

### Transformer Model Architecture

![transformer architecture](./img/2024-05-07-09-51-07.png)

## Embeddings -- 将单词表示为向量

In [None]:
# install gensim
%pip install gensim

In [4]:
import gensim.downloader
model = gensim.downloader.load('glove-wiki-gigaword-50')

In [2]:
# 该模型中的每个向量100个维度
import gensim.downloader
model = gensim.downloader.load('glove-wiki-gigaword-100')

In [5]:
# 返回该模型包括的词汇量
len(model)

400000

![words vectors](./img/2024-05-07-10-14-28.png)

In [5]:
# 'glove-wiki-gigaword-100'模型每个单词的向量长度为100
# GPT3 的向量长度是12,288

model['queen']

array([-0.50045 , -0.70826 ,  0.55388 ,  0.673   ,  0.22486 ,  0.60281 ,
       -0.26194 ,  0.73872 , -0.65383 , -0.21606 , -0.33806 ,  0.24498 ,
       -0.51497 ,  0.8568  , -0.37199 , -0.58824 ,  0.30637 , -0.30668 ,
       -0.2187  ,  0.78369 , -0.61944 , -0.54925 ,  0.43067 , -0.027348,
        0.97574 ,  0.46169 ,  0.11486 , -0.99842 ,  1.0661  , -0.20819 ,
        0.53158 ,  0.40922 ,  1.0406  ,  0.24943 ,  0.18709 ,  0.41528 ,
       -0.95408 ,  0.36822 , -0.37948 , -0.6802  , -0.14578 , -0.20113 ,
        0.17113 , -0.55705 ,  0.7191  ,  0.070014, -0.23637 ,  0.49534 ,
        1.1576  , -0.05078 ,  0.25731 , -0.091052,  1.2663  ,  1.1047  ,
       -0.51584 , -2.0033  , -0.64821 ,  0.16417 ,  0.32935 ,  0.048484,
        0.18997 ,  0.66116 ,  0.080882,  0.3364  ,  0.22758 ,  0.1462  ,
       -0.51005 ,  0.63777 ,  0.47299 , -0.3282  ,  0.083899, -0.78547 ,
        0.099148,  0.039176,  0.27893 ,  0.11747 ,  0.57862 ,  0.043639,
       -0.15965 , -0.35304 , -0.048965, -0.32461 , 

![queen embedding](./img/2024-05-07-09-56-57.png)

- 所有的词的向量都是通过大量语料训练学习得来的
- 向量的维度表达了单词的语义信息
- 但是每个维度的语义信息都是模糊的，没有准确定义的
- 将单词转换为向量后，可以进行向量的运算，可以寻找：
  - 最接近的词
  - 最不接近的词

In [8]:
# tower 的近义词

import numpy as np
model.most_similar('tower')

[('towers', 0.8470372557640076),
 ('building', 0.725898027420044),
 ('dome', 0.6875219345092773),
 ('spire', 0.6807529926300049),
 ('gate', 0.671362578868866),
 ('skyscraper', 0.6699519753456116),
 ('roof', 0.6561244130134583),
 ('walls', 0.6556639075279236),
 ('built', 0.6550073623657227),
 ('buildings', 0.6522013545036316)]

![similar words](./img/2024-05-07-10-18-58.png)

In [None]:
# queen 的最不相似的词

model.most_similar(negative=['queen'])

![word vectors operation](./img/2024-05-07-10-29-36.png)

In [9]:
# woman + king - man ~= queen

model.most_similar(positive=['woman', 'king'], negative=['man'])

[('queen', 0.7698540687561035),
 ('monarch', 0.6843381524085999),
 ('throne', 0.6755736470222473),
 ('daughter', 0.6594556570053101),
 ('princess', 0.6520534157752991),
 ('prince', 0.6517034769058228),
 ('elizabeth', 0.6464517712593079),
 ('mother', 0.631171703338623),
 ('emperor', 0.6106470823287964),
 ('wife', 0.6098655462265015)]

![queen vector](./img/2024-05-07-10-33-42.png)

In [15]:
# good + happy - bad - sad ~= ?

model.most_similar(positive=['good', 'happy'], negative=['bad', 'sad'])

[('enjoy', 0.4552291929721832),
 ('chance', 0.4535176753997803),
 ('ready', 0.45224252343177795),
 ('opportunity', 0.4434261918067932),
 ('excellent', 0.4415234923362732),
 ('free', 0.44127118587493896),
 ('maintain', 0.440281480550766),
 ('comfortable', 0.4352276027202606),
 ('healthy', 0.43348386883735657),
 ('better', 0.43163517117500305)]

如何计算两个向量之间的相似度？

![dot product](./img/2024-05-07-10-50-24.png)

![negative similar](./img/2024-05-07-10-51-33.png)

## Attention

Embeddings向量的局限性：

- 每个单词通常会有多个不同的含义，都包含在了一个向量中
- 缺乏上下文信息，无法区分不同的含义
- 例如：
  - `apple`
    - 可能是水果，也可能是苹果公司
  - `python`
    - 可能是一种动物，也可能是一种编程语言

Attention机制使得模型可以关注输入序列中不同位置的不同部分，从而更好地捕捉上下文信息。

![Attention Mechanism](./img/2024-05-07-11-08-10.png)

### Self Attention

![contextualized-embedding](./img/2024-05-07-11-16-05.png)

## Transformer Book

![transformer book](./img/2024-05-07-11-35-38.png)

书的源代码： [Github地址](https://github.com/nlp-with-transformers/notebooks)

## Hugging Face

Hugging Face[官网](https://huggingface.co/)提供了很多AI相关的工具和资源，包括：

- AI模型
- 数据集
- 代码库
- 课程
- ...