Deep Learning with Transformers

PREFACE

The Turing Test, proposed by Alan Turing in his pivotal 1950 paper "Computing Machinery and Intelligence," suggests that a machine demonstrates intelligence when its responses are indistinguishable from those of a human. For decades, researchers endeavored to create systems that could emulate human speech and writing. Today, such systems have become commonplace. With the advent of deep learning, large language models like GPT-4 can now generate text, answer essay questions, and even take exams in ways that often mirror human performance. Today's educators grapple with students using AI to write their essays, and programmers frequently collaborate with AI in pair programming. A parallel trend is seen in the realm of computer vision, where generative models such as Midjourney, Dalle, and Stable Diffusion produce artworks and digital images that rival the clarity of 4K photography.

Astronaut Feeding chickens (Generated by StableDiffusion)

The image illustrates the capabilities of text-to-image generative models. Using the description "Astronaut Feeding Chickens," the stable diffusion model produced an impressively realistic image. Although it took many years to progress from Turing's seminal paper to our current AI advancements, the pace of development has surged notably since 2012.

For decades, computer scientists have been trying to develop AI systems capable of human-like vision, speech processing, and text processing. While some of these efforts found success in various applications and industries, AI largely remained a niche field, hampered by its limited capabilities.

However, the landscape began to change in 2012. Alex Krizhevsky, a Ph.D. student at the University of Toronto, introduced an image recognition model named "AlexNet." This model, with its impressive accuracy, sparked a significant shift in artificial intelligence. The success of AlexNet was largely due to Nvidia's high-performance GPUs and the vast amount of data available online, from which the ImageNet dataset was created. At its core, AlexNet utilized a convolutional neural network, drawing inspiration from Yan Lecun's work in 1989.

Seeing the potential of this breakthrough, many companies and academic institutions quickly advanced in the field. The subsequent years saw the introduction of more powerful models, enhanced GPUs from Nvidia, and larger datasets sourced from the internet. Together, these factors led to rapid advancements in areas like object detection, language processing, and speech recognition.

By 2016, a team from Microsoft Research Asia developed an image recognition model called "ResNet" that surpassed human abilities in image classification. Around the same time, significant strides were made in language and speech processing, enhancing tools such as Google Translate and Apple Siri.

Yet, by 2017, while deep learning had become commonplace, it faced notable challenges. Progress seemed to be slowing down; although new research claimed to set new standards, improvements were often marginal. Expanding datasets or making models bigger no longer yielded significant enhancements. Moreover, these models, while excellent in their specific tasks, struggled with broader applications and could be easily tripped up in real-world scenarios, sometimes with biased outcomes.

A potential solution emerged in 2017 when Ashish Vaswani and his team at Google AI introduced a new deep learning architecture called "Transformers" in their paper, "Attention is All You Need." This new approach promised to reshape the way we thought about deep learning, moving away from traditional convolutional and recurrent neural networks.

Now, in 2023, the fields of computer vision, language processing, and speech are dominated by transformer-based models. Well-known models like GPT-4, Bard, LLama, Midjourney, and Stable Diffusion all utilize transformers, bringing us closer to realising Alan Turing's vision for AI.

However, while transformers are powerful and gaining in popularity, there's still a lot to learn about how they work. While they're simpler than previous models, they're also relatively new. In this book, we aim to unpack the transformer architecture, providing clear illustrations, code examples, diagrams, and equations, without overwhelming readers with complex math. Intended for both researchers and developers, we will delve into various aspects of transformers, including their optimization, fine-tuning, and applications across different domains.

We hope that this book will help deepen your understanding of the incredible advancements made in AI over the past years.

\

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
.gitbook/assets		.gitbook/assets
case-studies-of-transformer-models		case-studies-of-transformer-models
docs		docs
finetuning-large-language-models		finetuning-large-language-models
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
SUMMARY.md		SUMMARY.md
page-1.md		page-1.md
paradigms-of-deep-learning-research.md		paradigms-of-deep-learning-research.md
self-supervised-pretraining-of-transformers.md		self-supervised-pretraining-of-transformers.md
sequence-generation-with-transformer-decoder.md		sequence-generation-with-transformer-decoder.md
sequence-modelling-with-transformer-encoder.md		sequence-modelling-with-transformer-encoder.md
sequence-to-sequence-generation-with-transformer-encoder-decoder.md		sequence-to-sequence-generation-with-transformer-encoder-decoder.md
speech-transformers.md		speech-transformers.md
speeding-up-transformers.md		speeding-up-transformers.md
vision-transformers.md		vision-transformers.md

License

johnolafenwa/transformers

Folders and files

Latest commit

History

Repository files navigation

Deep Learning with Transformers

PREFACE

Table of Contents

About

Resources

License

Stars

Watchers

Forks