Bsc Thesis @VU Amsterdam

f

Integrating a Recurrent Connection in a Pre-trained Transformer Model

This project investigates the integration of a recurrent connection within a pre-trained transformer model (DistilGPT-2) using PyTorch. The main goal is to explore whether adding a recurrent connection to a pure attention-based model is worth the expense of increased parameters, longer training times, and other architectural constraints.

Dependencies

Before running the project, ensure the following dependencies are installed:

pytorch >= 1.10
torch
tqdm
numpy
wandb

You can install these dependencies by using the following command:

python setup.py bdist_wheel sdists

Project Structure

The project is structured as follows:

1. Loading the data

This section involves loading and preprocessing the data before feeding it to the model.

2. Adding basic & multiheadself attention on top of the classifier

The initial step is to build a basic model as a starting point for further enhancements. In this step, basic and multi-head self-attention layers are added on top of the classifier model.

4. Adding a transformer on top of the classifier

Next, a complete transformer is added on top of the classifier to introduce a more complex model.

5. Building a generator transformer

A generator transformer is implemented in this section, enhancing the model's capabilities.

6. Adding a recurrence layer on top of the generator transformer

The main focus of this project, a recurrent connection, is integrated into the generator transformer.

7. Results

Finally, the results and metrics (loss, gradient clipping norm, and perplexity) are tracked using weights & biases for evaluation.

Conclusion

This project aims to understand the impact of a recurrent connection in a pre-trained transformer model. By comparing the performance, training time, and architectural constraints, we can determine whether the addition of recurrent connections is beneficial in the context of attention-based models.

Name		Name	Last commit message	Last commit date
Latest commit History 174 Commits
.github		.github
.vscode		.vscode
__pycache__		__pycache__
data		data
dist		dist
runs		runs
util		util
Classifier.py		Classifier.py
Generator.py		Generator.py
GeneratorNotMoved.py		GeneratorNotMoved.py
LICENSE		LICENSE
README.md		README.md
Transformer.py		Transformer.py
basicGenerator.py		basicGenerator.py
basicTransformer.py		basicTransformer.py
imdb.word.pkl.gz		imdb.word.pkl.gz
main.py		main.py
preTrained.py		preTrained.py
preTrained2.py		preTrained2.py
selfAttention.py		selfAttention.py
setup.py		setup.py
test.py		test.py
tokens.py		tokens.py
transformerModels.py		transformerModels.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Bsc Thesis @VU Amsterdam

Integrating a Recurrent Connection in a Pre-trained Transformer Model

Dependencies

Project Structure

1. Loading the data

2. Adding basic & multiheadself attention on top of the classifier

4. Adding a transformer on top of the classifier

5. Building a generator transformer

6. Adding a recurrence layer on top of the generator transformer

7. Results

Conclusion

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Bsc Thesis @VU Amsterdam

Integrating a Recurrent Connection in a Pre-trained Transformer Model

Dependencies

Project Structure

1. Loading the data

2. Adding basic & multiheadself attention on top of the classifier

4. Adding a transformer on top of the classifier

5. Building a generator transformer

6. Adding a recurrence layer on top of the generator transformer

7. Results

Conclusion

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages