# W1D1 Intro

## Overview



## Prerequisite knowledge

This is the first day, so we assume you have prerequisites equivalent to having taken the CN and DL classes. This day’s tutorials delve into next-token prediction of a neural network with transformers in PyTorch. We will also talk about attention in this context. You should be familiar from DL W2D5, W3D1. You will work with neural data (actually EMG) in the second tutorial. You should be broadly familiar from the CN class. Please review these precourse materials if necessary!


In [None]:
# @title Install and import feedback gadget

!pip3 install vibecheck datatops --quiet

from vibecheck import DatatopsContentReviewContainer
def content_review(notebook_section: str):
    return DatatopsContentReviewContainer(
        "",  # No text prompt
        notebook_section,
        {
            "url": "https://pmyvdlilci.execute-api.us-east-1.amazonaws.com/klab",
            "name": "neuromatch_cn",
            "user_key": "y1x3mpx5",
        },
    ).render()


feedback_prefix = "W1D1_Intro"

## Video

Hi everyone, and welcome to the opening session of our tutorial series at the Neuromatch Academy. My name is Samuele Bolotta, I am from Italy and I am the curriculum specialist in the NeuroAI Course. I have a master's in neuroscience and a master's in artificial intelligence. In terms of my personal life, my main hobbies right now are powerlifting and hiking. I am excited to be one of the educators guiding you through an exploration of NeuroAI. Importantly, I want to highlight that this was a collective effort. Our team of content creators, reviewers, production editors, and collaborators has worked hard to put together the educational content we'll be covering.

Today, we'll explore the concept of Generalization in Artificial Intelligence (AI). Generalization is the ability of AI systems to perform well on new, unseen data across various real-world contexts. Industrial applications of machine learning often use flexible architectures with weak inductive biases which are trained on large, diverse datasets.

During the Deep Learning Course, specifically on Week 2, Day 5 (W2D5), we discussed Transformers. Introduced by Vaswani et al. in 2017, Transformers are particularly noted for their high parallelizability, which facilitates efficient training on a large scale. This efficiency, in turn, has encouraged the development of modern accelerators like GPUs and TPUs which are tuned for transformers, thus creating a prime example of the so-called “hardware lottery”. Transformers are used for a wide range of AI tasks, from natural language processing, to vision, and to robotics. They start by modeling the input as a set of tokens; in natural language processing, tokens can be parts of words; in vision, image patches. These tokens then interact in multiple stages through an attention mechanism. Finally, the output of the network can be read in a variety of ways to perform classification, regression, or autoregressive generation.

Let’s talk about the inductive biases in transformers. Transformers display permutation equivariance, meaning the model's output does not change if the order of the input data is altered. That might seem surprising, since transformers are often used for modelling sequences, like sentences, where matters order! Sensitivity to position is recovered by using position encoding, which tags each token with its position in a sequence or grid. Moreover, central to the Transformer architecture is the attention mechanism. This feature allows the model to selectively concentrate on various parts of the input data. This capability is critical for the model's overall performance, a topic we will delve deeper into on Day 4. High-capacity transformers have a remarkable ability to 'learn' inductive biases from data exposure and augmentations, enhancing their generalization capabilities. In practice, achieving generalization often involves training on a broad spectrum of data. The extensive datasets available today enable models like Transformers to be exposed to far more data than single humans or animals in their lifetimes.

To contextualize these concepts, we will examine the problem of recognizing and generating handwritten text. This example illuminates how the architecture, training dataset, and data augmentations converge to empower a model to accomplish a complex task. Handwriting recognition serves as a case study for discussing how AI, neuroscience, and cognitive science conceptualize and tackle problems. We will dissect the transformer architecture, focusing on its tokenization, attention mechanisms, and classification heads. Additionally, we'll introduce scaling laws—crucial for grasping how model performance improves with larger datasets and increased computational resources. Expanding further, we'll explore transfer learning, augmentation techniques, and the use of synthetic data.
By the end of this session, you will have gotten a broad overview of generalization in the context of AI and practical experience in applying these concepts. Let's dive in!


In [None]:
# @markdown
from ipywidgets import widgets
from IPython.display import YouTubeVideo
from IPython.display import IFrame
from IPython.display import display


class PlayVideo(IFrame):
  def __init__(self, id, source, page=1, width=400, height=300, **kwargs):
    self.id = id
    if source == 'Bilibili':
      src = f'https://player.bilibili.com/player.html?bvid={id}&page={page}'
    elif source == 'Osf':
      src = f'https://mfr.ca-1.osf.io/render?url=https://osf.io/download/{id}/?direct%26mode=render'
    super(PlayVideo, self).__init__(src, width, height, **kwargs)


def display_videos(video_ids, W=400, H=300, fs=1):
  tab_contents = []
  for i, video_id in enumerate(video_ids):
    out = widgets.Output()
    with out:
      if video_ids[i][0] == 'Youtube':
        video = YouTubeVideo(id=video_ids[i][1], width=W,
                             height=H, fs=fs, rel=0)
        print(f'Video available at https://youtube.com/watch?v={video.id}')
      else:
        video = PlayVideo(id=video_ids[i][1], source=video_ids[i][0], width=W,
                          height=H, fs=fs, autoplay=False)
        if video_ids[i][0] == 'Bilibili':
          print(f'Video available at https://www.bilibili.com/video/{video.id}')
        elif video_ids[i][0] == 'Osf':
          print(f'Video available at https://osf.io/{video.id}')
      display(video)
    tab_contents.append(out)
  return tab_contents


video_ids = [('Youtube', 'W5o_HTsef0I'), ('Bilibili', 'BV1ho4y1C7Eo')]
tab_contents = display_videos(video_ids, W=854, H=480)
tabs = widgets.Tab()
tabs.children = tab_contents
for i in range(len(tab_contents)):
  tabs.set_title(i, video_ids[i][0])
display(tabs)

## Slides

In [None]:
# @markdown
from IPython.display import IFrame
link_id = "rbx2a"
print(f"If you want to download the slides: https://osf.io/download/{link_id}/")
IFrame(src=f"https://mfr.ca-1.osf.io/render?url=https://osf.io/{link_id}/?direct%26mode=render%26action=download%26mode=render", width=854, height=480)

In [None]:
# @title Submit your feedback
content_review(f"{feedback_prefix}_Intro_Video")