Skip to content

rgivhan/Transformer-Knowledge

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 

Repository files navigation

Transformer-Knowledge

We test the performance of Kaparthy's minGPT Transformer model on a simple knowledge task.

First we pretrain minGPT on Wikipedia text about notable individuals, then we finetune it on name-birthplace pairs of the form:

Q: Where was [person] born?
A: [place]

We then test its "knowledge" by asking it to predict the birthplaces of individuals in the same question answer format as above.

Attention

Two variants of self attention are tested: standard Masked multi-headed self-attention, and Synthesizer attention. The model using Masked multi-headed self-attention achieves an accuracy of ~20%. The model using the synthesizer variant, which is a form of attention that eschews the use of pairwise dot products, achieves ~17% accuracy.

Synthesizer attention: $Y_{i} = softmax(ReLU(XA_{i} + b_{1})B_{i} + b_{2})(XV_{i})$

Pretraining

For specifics, see the CharCorruptionDataset class in dataset.py. To pretrain on the wikipedia text, a piece of text is randomly truncrated and masked. For every such piece of text, $x$, its label, $y$, is truncated by a single character at the beginning (i.e. $y = x[1:]$), such that, for each character, the model is trying to preidct the next character in the masked string $x$

I.e.
Original: Khatchig Mouradian. Khatchig Mouradian is a journalist, writer and translator born in Lebanon .
x: Khatchig Mouradian. Khatchig Mouradian is a jour⁇and tran⁇nalist, writer ⁇□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□
y: hatchig Mouradian. Khatchig Mouradian is a jour⁇and tran⁇nalist, writer ⁇□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published