# Spark of AI: How Transfer Learning Unlocked AI’s Potential

## Video

<https://youtu.be/6NuGEukBfcA>

Watch the [full video](https://youtu.be/6NuGEukBfcA)

------------------------------------------------------------------------

## Annotated Presentation

Below is an annotated version of the presentation, with timestamped
links to the relevant parts of the video for each slide.

Here is the annotated presentation based on the provided video
transcript and slide summaries.

### 1. The Spark of the AI Revolution

<figure>
<img
src="https://rajivshah.com/blog/images/spark-of-ai-transfer-learning/slide_1.png"
alt="Slide 1" />
<figcaption aria-hidden="true">Slide 1</figcaption>
</figure>

([Timestamp: 00:00](https://youtu.be/6NuGEukBfcA&t=0s))

The presentation begins with the title slide, “The Spark of the AI
Revolution: Transfer Learning,” presented by Rajiv Shah from Snowflake.
This talk was originally given at the University of Cincinnati and
recorded later to share the insights with a broader audience.

Rajiv sets the stage by explaining that this is not a deep technical
dive into code, but rather a descriptive history and analysis of the
drivers behind the current AI boom. The goal is to explain how AI learns
and how individuals can start to interrogate and understand these
technologies in their own lives.

The core premise is that **Transfer Learning** is the catalyst that
shifted AI from academic curiosity to a revolutionary force. The talk
aims to bridge the gap for those unfamiliar with the underlying
mechanics of how models like ChatGPT came to be.

### 2. Sparks of AGI: Early Experiments

<figure>
<img
src="https://rajivshah.com/blog/images/spark-of-ai-transfer-learning/slide_2.png"
alt="Slide 2" />
<figcaption aria-hidden="true">Slide 2</figcaption>
</figure>

([Timestamp: 01:00](https://youtu.be/6NuGEukBfcA&t=60s))

This slide illustrates an early experiment conducted by researchers
investigating GPT-4. To understand how the model was learning, they gave
it a concept and asked it to draw it using code (SVG). The slide
displays a progression of abstract animal figures, showing how the
model’s ability to represent concepts improved over time during
training.

This references the paper “Sparks of Artificial General Intelligence,”
which caused significant waves in the tech community. It suggests that
these models were beginning to show signs of **Artificial General
Intelligence (AGI)**—reasoning capabilities that extend beyond narrow
tasks.

The visual progression from crude shapes to recognizable forms serves as
a metaphor for the rapid evolution of these models. It highlights the
mystery and potential power hidden within the training process of Large
Language Models (LLMs).

### 3. Extinction Level Threat?

<figure>
<img
src="https://rajivshah.com/blog/images/spark-of-ai-transfer-learning/slide_3.png"
alt="Slide 3" />
<figcaption aria-hidden="true">Slide 3</figcaption>
</figure>

([Timestamp: 01:36](https://youtu.be/6NuGEukBfcA&t=96s))

The presentation addresses the extreme concerns surrounding the rapid
scaling of AI technologies. The slide features a dramatic image
reminiscent of the Terminator, referencing fears that unchecked AI
development could pose an **“extinction-level” threat** to humanity.

Rajiv notes that as these technologies scale, there is a segment of the
research and safety community worried about catastrophic outcomes. This
sets up a contrast between the theoretical existential risks and the
practical, everyday reality of how AI is currently being used.

This slide acknowledges the “hype and fear” cycle that dominates the
media narrative, validating the audience’s anxiety before pivoting to a
more grounded explanation of how the technology actually works.

### 4. The New AI Overlords

<figure>
<img
src="https://rajivshah.com/blog/images/spark-of-ai-transfer-learning/slide_4.png"
alt="Slide 4" />
<figcaption aria-hidden="true">Slide 4</figcaption>
</figure>

([Timestamp: 01:42](https://youtu.be/6NuGEukBfcA&t=102s))

Shifting to a lighter tone, this slide highlights the widespread
adoption of AI by the younger generation. It cites a statistic that
**89% of students** have used ChatGPT for homework, humorously
suggesting that children have already “accepted our new AI overlords.”

The slide points out a discrepancy in honesty, noting that while 89% use
it, a significant portion (implied by the “11% are lying” joke) might
not admit it. This reflects a fundamental shift in education and
information retrieval that has already taken place.

This context emphasizes that the AI revolution is not just a future
possibility but a current reality affecting how the next generation
learns and works. It underscores the urgency of understanding these
tools.

### 5. Fundamental Questions

<figure>
<img
src="https://rajivshah.com/blog/images/spark-of-ai-transfer-learning/slide_5.png"
alt="Slide 5" />
<figcaption aria-hidden="true">Slide 5</figcaption>
</figure>

([Timestamp: 01:53](https://youtu.be/6NuGEukBfcA&t=113s))

This slide poses the central questions that the presentation will
answer: “What is AI doing?” and “How should you think about AI?” It
serves as an agenda setting for the technical explanation that follows.

Rajiv transitions here from the societal impact of AI to the mechanics
of machine learning. He prepares the audience to look “under the hood”
to demystify the “magic” of tools like ChatGPT.

The goal is to move the audience from passive consumers of AI hype to
critical thinkers who understand the limitations and capabilities of the
technology based on how it is built.

### 6. How We Teach Computers

<figure>
<img
src="https://rajivshah.com/blog/images/spark-of-ai-transfer-learning/slide_6.png"
alt="Slide 6" />
<figcaption aria-hidden="true">Slide 6</figcaption>
</figure>

([Timestamp: 02:02](https://youtu.be/6NuGEukBfcA&t=122s))

The presentation begins its technical explanation with a fundamental
question: **“How do we teach computers?”** The slide uses imagery of
blueprints and tools, likening the traditional process of building AI
models to craftsmanship.

This introduces the concept of **Supervised Learning** in a relatable
way. Before discussing neural networks, Rajiv grounds the audience in
traditional analytics, where humans explicitly guide the machine on what
to look for.

The focus here is on the human element in traditional machine
learning—the “artisan” who must carefully select inputs to get a desired
output.

### 7. Identifying Features

<figure>
<img
src="https://rajivshah.com/blog/images/spark-of-ai-transfer-learning/slide_7.png"
alt="Slide 7" />
<figcaption aria-hidden="true">Slide 7</figcaption>
</figure>

([Timestamp: 02:11](https://youtu.be/6NuGEukBfcA&t=131s))

Using a real estate example, this slide explains the concept of
**Features** (or variables). To teach a computer to value a house, one
must identify specific characteristics like square footage, number of
bedrooms, or closet space.

Rajiv explains that we capture these characteristics and organize them
into a tabular format. This process is known as **Feature Engineering**,
where the data scientist decides which attributes are relevant for the
problem at hand.

This is the bedrock of traditional enterprise AI: converting real-world
objects into structured data points that a machine can process
mathematically.

### 8. Historical Data Patterns

<figure>
<img
src="https://rajivshah.com/blog/images/spark-of-ai-transfer-learning/slide_8.png"
alt="Slide 8" />
<figcaption aria-hidden="true">Slide 8</figcaption>
</figure>

([Timestamp: 02:50](https://youtu.be/6NuGEukBfcA&t=170s))

This slide displays a scatter plot correlating “Sales Price” with
“Square Feet.” It illustrates how enterprises gather historical data to
look for patterns and relationships backwards in time.

Rajiv notes that much of traditional analytics is simply looking at this
historical data to understand what happened. However, the power of AI
lies in using this data for **forward-looking** purposes.

The visual clearly shows a trend: as square footage increases, the price
generally increases. This linear relationship is what the machine needs
to “learn.”

### 9. Learning the Model

<figure>
<img
src="https://rajivshah.com/blog/images/spark-of-ai-transfer-learning/slide_9.png"
alt="Slide 9" />
<figcaption aria-hidden="true">Slide 9</figcaption>
</figure>

([Timestamp: 03:03](https://youtu.be/6NuGEukBfcA&t=183s))

Here, a line is drawn through the data points on the scatter plot. This
line represents the **Model**. Learning, in this context, is simply the
mathematical process of fitting this line to the historical data to
minimize error.

Rajiv explains that the model “understands the relationships” defined by
the data. Instead of a human manually writing rules, the algorithm finds
the best-fit trend based on the input features.

This simplifies the concept of training a model down to its essence:
finding a mathematical representation of a trend within a dataset.

### 10. Making Predictions

<figure>
<img
src="https://rajivshah.com/blog/images/spark-of-ai-transfer-learning/slide_10.png"
alt="Slide 10" />
<figcaption aria-hidden="true">Slide 10</figcaption>
</figure>

([Timestamp: 03:16](https://youtu.be/6NuGEukBfcA&t=196s))

This slide demonstrates the utility of the trained model. When a “New
House” comes onto the market, the model uses the learned line to predict
its value based on its square footage.

This defines the **Inference** stage of machine learning. The model is
no longer learning; it is applying its “knowledge” (the line) to unseen
data to generate a prediction.

It highlights the portability of a model—once trained, it can be used to
make rapid assessments of new data points without human intervention.

### 11. The Domain Limitation

<figure>
<img
src="https://rajivshah.com/blog/images/spark-of-ai-transfer-learning/slide_11.png"
alt="Slide 11" />
<figcaption aria-hidden="true">Slide 11</figcaption>
</figure>

([Timestamp: 03:30](https://youtu.be/6NuGEukBfcA&t=210s))

The presentation introduces a critical limitation of traditional models.
The slide shows the model trained on San Francisco data being applied to
houses in South Carolina. The result is labeled “Poor Model.”

Rajiv explains that while you can technically take the model with you,
it will fail because the **underlying relationships** between features
(size) and targets (price) are different in different domains
(geographies).

This illustrates the concept of **Domain Shift** or lack of
generalization. A model is only as good as the data it was trained on,
and it assumes the future (or new location) looks exactly like the past.

### 12. The Thinking Emoji

<figure>
<img
src="https://rajivshah.com/blog/images/spark-of-ai-transfer-learning/slide_12.png"
alt="Slide 12" />
<figcaption aria-hidden="true">Slide 12</figcaption>
</figure>

([Timestamp: 03:43](https://youtu.be/6NuGEukBfcA&t=223s))

This slide reinforces the previous point with a thinking emoji,
emphasizing the realization that the existing model is inadequate. The
“San Francisco Model” does not fit the “South Carolina Data.”

It serves as a visual pause to let the problem sink in: traditional
machine learning is brittle. It requires the data distribution to remain
constant.

Rajiv uses this to set up the labor-intensive nature of traditional
analytics, where models cannot simply be “transferred” across different
contexts.

### 13. Train New Model

<figure>
<img
src="https://rajivshah.com/blog/images/spark-of-ai-transfer-learning/slide_13.png"
alt="Slide 13" />
<figcaption aria-hidden="true">Slide 13</figcaption>
</figure>

([Timestamp: 03:55](https://youtu.be/6NuGEukBfcA&t=235s))

The solution in the traditional paradigm is presented here: **“Train New
Model.”** To get accurate predictions for South Carolina, one must
collect local data and repeat the entire training process from scratch.

This highlights the “Never-Ending Battle” of enterprise analytics. Data
scientists are constantly retraining models for every specific region,
product line, or use case.

This sets the baseline for why Transfer Learning (introduced later) is
such a revolution. In the old way, knowledge was not portable; every
problem required a bespoke solution.

### 14. Artisan AI

<figure>
<img
src="https://rajivshah.com/blog/images/spark-of-ai-transfer-learning/slide_14.png"
alt="Slide 14" />
<figcaption aria-hidden="true">Slide 14</figcaption>
</figure>

([Timestamp: 04:15](https://youtu.be/6NuGEukBfcA&t=255s))

Rajiv coins the term **“Artisan AI”** to describe this traditional
approach. The slide features an image of a craftsman, symbolizing that
these models are hand-built and rely heavily on human-crafted features.

This approach is slow and difficult to scale. Just as an artisan can
only produce a limited number of goods, a data science team using these
methods can only maintain a limited number of models.

It emphasizes that the intelligence in these systems comes largely from
the human who engineered the features, not the machine itself.

### 15. Enterprise AI Use Cases

<figure>
<img
src="https://rajivshah.com/blog/images/spark-of-ai-transfer-learning/slide_15.png"
alt="Slide 15" />
<figcaption aria-hidden="true">Slide 15</figcaption>
</figure>

([Timestamp: 04:23](https://youtu.be/6NuGEukBfcA&t=263s))

This slide lists common Enterprise AI applications: Forecasting,
Pricing, Customer Churn, and Fraud. It notes that **80% of production
models** currently fall into this category.

Rajiv grounds the talk in the reality of today’s business world. Despite
the hype around Generative AI, most companies are still running on these
“Artisan” structured data models.

This distinction is crucial for understanding the market. There is “Old
AI” (highly effective, structured, labor-intensive) and “New AI”
(generative, unstructured, scalable), and they solve different problems.

### 16. The Computer Science Perspective

<figure>
<img
src="https://rajivshah.com/blog/images/spark-of-ai-transfer-learning/slide_16.png"
alt="Slide 16" />
<figcaption aria-hidden="true">Slide 16</figcaption>
</figure>

([Timestamp: 04:41](https://youtu.be/6NuGEukBfcA&t=281s))

The presentation shifts from the enterprise view to the academic
Computer Science view. The slide asks, “How should we teach computers?”
signaling a move toward more advanced methodologies.

Rajiv indicates that computer scientists were trying to find ways to
move beyond the limitations of manual feature engineering. They wanted
machines to learn the features themselves.

This transition introduces the concept of **Deep Learning** and the move
toward processing unstructured data like audio, images, and text.

### 17. Frederick Jelinek’s Insight

<figure>
<img
src="https://rajivshah.com/blog/images/spark-of-ai-transfer-learning/slide_17.png"
alt="Slide 17" />
<figcaption aria-hidden="true">Slide 17</figcaption>
</figure>

([Timestamp: 04:49](https://youtu.be/6NuGEukBfcA&t=289s))

This slide introduces a quote from Frederick Jelinek, a pioneer in
speech recognition: **“Every time I fire a linguist, the performance of
the speech recognizer goes up.”**

This provocative quote encapsulates a major shift in AI philosophy. It
suggests that human expertise (linguistics) often gets in the way of raw
data processing. Instead of hard-coding grammar rules, it is better to
let the model learn patterns directly from the data.

Rajiv asks the audience to “chew on that,” as it foreshadows the “Bitter
Lesson” of AI: massive compute and data often outperform human domain
expertise.

### 18. Computer Vision in 2010

<figure>
<img
src="https://rajivshah.com/blog/images/spark-of-ai-transfer-learning/slide_18.png"
alt="Slide 18" />
<figcaption aria-hidden="true">Slide 18</figcaption>
</figure>

([Timestamp: 05:47](https://youtu.be/6NuGEukBfcA&t=347s))

The slide depicts the state of Computer Vision around 2010. It shows a
process of manual feature extraction (like HOG - Histogram of Oriented
Gradients) used to identify shapes and edges.

Rajiv explains that even in vision, researchers were essentially doing
“Artisan AI.” They sat around thinking about how to mathematically
describe the shape of a car or a truck to a computer.

This illustrates that before the deep learning boom, computer vision was
stuck in the same “feature engineering” trap as tabular analytics.

### 19. SVM Classification

<figure>
<img
src="https://rajivshah.com/blog/images/spark-of-ai-transfer-learning/slide_19.png"
alt="Slide 19" />
<figcaption aria-hidden="true">Slide 19</figcaption>
</figure>

([Timestamp: 06:05](https://youtu.be/6NuGEukBfcA&t=365s))

Following feature extraction, this slide shows a **Support Vector
Machine (SVM)** classifier separating data points (cars vs. trucks).
This was the standard approach: extract features manually, then use a
simple algorithm to classify them.

This reinforces the previous point about the limitations of the time.
The intelligence was in the manual extraction, not the classification
model.

Rajiv mentions his own work at Caterpillar, noting that this was exactly
how they tried to separate images of machinery—a tedious and specific
process.

### 20. Fei-Fei Li and Big Data

<figure>
<img
src="https://rajivshah.com/blog/images/spark-of-ai-transfer-learning/slide_20.png"
alt="Slide 20" />
<figcaption aria-hidden="true">Slide 20</figcaption>
</figure>

([Timestamp: 06:15](https://youtu.be/6NuGEukBfcA&t=375s))

The slide introduces **Professor Fei-Fei Li**, a visionary in computer
vision. It features a collage of images, hinting at the need for scale.

Rajiv explains that Fei-Fei Li recognized that for computer vision to
advance, it needed to move away from tiny datasets (100-200 images) and
toward massive scale. She understood that deep learning required vast
amounts of data to generalize.

This marks the beginning of the “Big Data” era in AI, where the focus
shifted from better algorithms to better and larger datasets.

### 21. ImageNet

<figure>
<img
src="https://rajivshah.com/blog/images/spark-of-ai-transfer-learning/slide_21.png"
alt="Slide 21" />
<figcaption aria-hidden="true">Slide 21</figcaption>
</figure>

([Timestamp: 06:41](https://youtu.be/6NuGEukBfcA&t=401s))

This slide details **ImageNet**, the dataset Fei-Fei Li helped create.
It contains **14 million images** across **1000 classes**.

Rajiv highlights the sheer effort involved, noting the use of
**Mechanical Turk** to crowdsource the labeling of these images. He
calls this the “dirty secret” of AI—that it is powered by low-wage human
labor labeling data.

ImageNet became the benchmark that drove the AI revolution. It provided
the “fuel” necessary for neural networks to finally work.

### 22. AlexNet and GPUs

<figure>
<img
src="https://rajivshah.com/blog/images/spark-of-ai-transfer-learning/slide_22.png"
alt="Slide 22" />
<figcaption aria-hidden="true">Slide 22</figcaption>
</figure>

([Timestamp: 07:44](https://youtu.be/6NuGEukBfcA&t=464s))

The presentation introduces **Alex Krizhevsky**, a graduate student
under Geoffrey Hinton. The slide mentions “AlexNet” and the use of GPUs
(Graphics Processing Units).

Rajiv tells the story of how Alex decided to use NVIDIA gaming cards to
train neural networks. Traditional CPUs were too slow for the math
required by deep learning.

This moment—combining the massive ImageNet dataset with the parallel
processing power of GPUs—was the “big bang” of modern AI.

### 23. AlexNet Training Details

<figure>
<img
src="https://rajivshah.com/blog/images/spark-of-ai-transfer-learning/slide_23.png"
alt="Slide 23" />
<figcaption aria-hidden="true">Slide 23</figcaption>
</figure>

([Timestamp: 08:05](https://youtu.be/6NuGEukBfcA&t=485s))

This slide provides the technical specs of AlexNet: trained on **1.2
million images**, using **2 GPUs**, taking roughly **6 days**, with **60
million parameters**.

Rajiv emphasizes that while 6 days seems long, the result was a model
vastly superior to anything else. It proved that neural networks, which
had been theoretical for decades, were now practical.

The “60 million parameters” figure is a precursor to the “billions” and
“trillions” we see today, marking the start of the parameter scaling
race.

### 24. Crushing the Competition

<figure>
<img
src="https://rajivshah.com/blog/images/spark-of-ai-transfer-learning/slide_24.png"
alt="Slide 24" />
<figcaption aria-hidden="true">Slide 24</figcaption>
</figure>

([Timestamp: 08:27](https://youtu.be/6NuGEukBfcA&t=507s))

A chart displays the results of the ImageNet Large Scale Visual
Recognition Challenge. It shows AlexNet achieving a significantly lower
error rate than the competitors.

Rajiv notes that the performance jump was so dramatic that by the
following year, **every competitor** had switched to using the AlexNet
architecture.

This visualizes the paradigm shift. The “Artisan” methods were instantly
obsolete, replaced by Deep Learning.

### 25. Feature Engineering vs. Deep Learning

<figure>
<img
src="https://rajivshah.com/blog/images/spark-of-ai-transfer-learning/slide_25.png"
alt="Slide 25" />
<figcaption aria-hidden="true">Slide 25</figcaption>
</figure>

([Timestamp: 08:37](https://youtu.be/6NuGEukBfcA&t=517s))

Using a humorous meme format, this slide compares the “Old Way” (Feature
Engineering + SVM) with the “New Way” (AlexNet). The AlexNet side is
depicted as a powerful, overwhelming force.

This solidifies the takeaway: Deep Learning didn’t just improve upon the
old methods; it completely replaced them for unstructured data tasks
like vision.

It emphasizes that the model learned the features itself (edges,
textures, shapes) rather than having humans manually code them.

### 26. The 1000 Classes

<figure>
<img
src="https://rajivshah.com/blog/images/spark-of-ai-transfer-learning/slide_26.png"
alt="Slide 26" />
<figcaption aria-hidden="true">Slide 26</figcaption>
</figure>

([Timestamp: 08:51](https://youtu.be/6NuGEukBfcA&t=531s))

This slide shows examples of the **1000 classes** in ImageNet, ranging
from specific dog breeds to everyday objects.

Rajiv explains that this model learned to identify a vast array of
things from the raw pixels. It went from raw vision to understanding
textures, shapes, and objects.

However, he sets up the next problem: What if you want to identify
something *not* in those 1000 classes?

### 27. The Hot Dog Problem

<figure>
<img
src="https://rajivshah.com/blog/images/spark-of-ai-transfer-learning/slide_27.png"
alt="Slide 27" />
<figcaption aria-hidden="true">Slide 27</figcaption>
</figure>

([Timestamp: 09:01](https://youtu.be/6NuGEukBfcA&t=541s))

referencing a famous scene from the show *Silicon Valley*, this slide
presents the specific challenge of classifying “Hot Dogs.”

Rajiv uses this to ask: How do you help a buddy with a startup who needs
to find hot dogs if “hot dog” isn’t one of the primary categories, or if
they need a specific *type* of hot dog? Do you have to start from
scratch?

This sets the stage for **Transfer Learning**—the solution to avoiding
the need for 14 million images every time you have a new problem.

### 28. Pre-Trained Models

<figure>
<img
src="https://rajivshah.com/blog/images/spark-of-ai-transfer-learning/slide_28.png"
alt="Slide 28" />
<figcaption aria-hidden="true">Slide 28</figcaption>
</figure>

([Timestamp: 09:12](https://youtu.be/6NuGEukBfcA&t=552s))

The slide introduces the concept of a **Pre-trained Model**. This is the
model that has already learned the 1000 classes from ImageNet.

Rajiv explains that this model already “knows” how to see. It
understands edges, curves, and textures. This knowledge is contained in
the “weights” of the neural network.

The key idea is that we don’t need to relearn how to “see” every time we
want to identify a new object.

### 29. Transfer Learning Mechanics

<figure>
<img
src="https://rajivshah.com/blog/images/spark-of-ai-transfer-learning/slide_29.png"
alt="Slide 29" />
<figcaption aria-hidden="true">Slide 29</figcaption>
</figure>

([Timestamp: 09:30](https://youtu.be/6NuGEukBfcA&t=570s))

This technical slide illustrates how **Transfer Learning** works. It
shows the layers of a neural network. We keep the early layers (which
know shapes and textures) and only retrain the final layers for the new
task (e.g., identifying boats).

Rajiv explains that we can transfer “most of that knowledge” and only
change a **small amount of parameters** (less than 10%).

This is the revolution: You can build a world-class model with a *small*
amount of data by standing on the shoulders of the giant ImageNet model.

### 30. The Revolution

<figure>
<img
src="https://rajivshah.com/blog/images/spark-of-ai-transfer-learning/slide_30.png"
alt="Slide 30" />
<figcaption aria-hidden="true">Slide 30</figcaption>
</figure>

([Timestamp: 09:54](https://youtu.be/6NuGEukBfcA&t=594s))

A graph titled “Transfer Learning Revolution” shows the dramatic
improvement in accuracy when using transfer learning versus training
from scratch. It includes a quote from **Andrew Ng** stating that
transfer learning will be the next driver of commercial success.

Rajiv emphasizes that this capability allowed startups and companies to
build powerful AI without needing Google-sized datasets. It democratized
access to high-performance computer vision.

This wraps up the vision section of the talk, establishing Transfer
Learning as the “Spark.”

### 31. The Implications

<figure>
<img
src="https://rajivshah.com/blog/images/spark-of-ai-transfer-learning/slide_31.png"
alt="Slide 31" />
<figcaption aria-hidden="true">Slide 31</figcaption>
</figure>

([Timestamp: 10:11](https://youtu.be/6NuGEukBfcA&t=611s))

The slide shows a YouTube video thumbnail from 2016 featuring Geoffrey
Hinton. This transitions the talk to the societal and professional
implications of this technology.

Rajiv prepares to share a famous prediction by Hinton regarding the
medical field, specifically radiology. It signals a shift from “how it
works” to “what it does to jobs.”

### 32. The Coyote Moment

<figure>
<img
src="https://rajivshah.com/blog/images/spark-of-ai-transfer-learning/slide_32.png"
alt="Slide 32" />
<figcaption aria-hidden="true">Slide 32</figcaption>
</figure>

([Timestamp: 10:19](https://youtu.be/6NuGEukBfcA&t=619s))

The slide displays a webpage for the University of Cincinnati Radiology
Fellows. Rajiv quotes Hinton: **“Radiologists are like the coyote that’s
already over the edge of the cliff but hasn’t yet looked down.”**

Hinton suggested people should stop training radiologists because AI
interprets images better. Rajiv humorously notes that since he was
speaking *at* U of C, he had to show the “coyotes” in the audience.

This highlights the tension between AI capabilities and human expertise,
a recurring theme in the presentation.

### 33. NLP: The Academic View

<figure>
<img
src="https://rajivshah.com/blog/images/spark-of-ai-transfer-learning/slide_33.png"
alt="Slide 33" />
<figcaption aria-hidden="true">Slide 33</figcaption>
</figure>

([Timestamp: 10:42](https://youtu.be/6NuGEukBfcA&t=642s))

The presentation switches domains from Computer Vision to **Natural
Language Processing (NLP)**. The slide depicts a traditional academic
setting, representing the text researchers.

Rajiv explains that while Computer Vision was having its revolution with
AlexNet, the text folks were still doing things the “Old Way”—crafting
features and rules for language.

They saw the success in vision and wondered how to replicate it for
text, but language proved more difficult to model than images initially.

### 34. Traditional NLP Tasks

<figure>
<img
src="https://rajivshah.com/blog/images/spark-of-ai-transfer-learning/slide_34.png"
alt="Slide 34" />
<figcaption aria-hidden="true">Slide 34</figcaption>
</figure>

([Timestamp: 11:04](https://youtu.be/6NuGEukBfcA&t=664s))

This slide lists various NLP tasks: Classification, Information
Extraction, and Sentiment Analysis.

Rajiv notes that traditionally, each of these was a **separate
discipline**. You built a specific model for sentiment, a different one
for translation, and another for summarization. There was no “one model
to rule them all.”

This fragmentation made NLP difficult and resource-intensive, as
knowledge didn’t transfer between tasks.

### 35. The GLUE Benchmark

<figure>
<img
src="https://rajivshah.com/blog/images/spark-of-ai-transfer-learning/slide_35.png"
alt="Slide 35" />
<figcaption aria-hidden="true">Slide 35</figcaption>
</figure>

([Timestamp: 11:25](https://youtu.be/6NuGEukBfcA&t=685s))

The slide introduces the **GLUE Benchmark** (General Language
Understanding Evaluation). This was a collection of different text tasks
put together to measure general language ability.

Rajiv explains this was an attempt to push the field toward
general-purpose models. Researchers wanted a single metric to see if a
model could understand language broadly, not just solve one specific
trick.

### 36. The Transformer Architecture

<figure>
<img
src="https://rajivshah.com/blog/images/spark-of-ai-transfer-learning/slide_36.png"
alt="Slide 36" />
<figcaption aria-hidden="true">Slide 36</figcaption>
</figure>

([Timestamp: 11:37](https://youtu.be/6NuGEukBfcA&t=697s))

This slide marks the turning point for text: the introduction of the
**Transformer** architecture by Google researchers in 2017 (the
“Attention Is All You Need” paper).

Rajiv highlights that this architecture was not only more accurate
(higher BLEU scores) but, crucially, more efficient.

The Transformer allowed for parallel processing of text, unlike previous
sequential models (RNNs/LSTMs), unlocking the ability to train on
massive datasets.

### 37. Lower Training Costs

<figure>
<img
src="https://rajivshah.com/blog/images/spark-of-ai-transfer-learning/slide_37.png"
alt="Slide 37" />
<figcaption aria-hidden="true">Slide 37</figcaption>
</figure>

([Timestamp: 11:46](https://youtu.be/6NuGEukBfcA&t=706s))

The slide emphasizes the **Training Cost** reduction associated with
Transformers.

Rajiv points out that because the architecture used less processing
power per unit of data, researchers immediately asked: “What happens if
we give it *more* processing?”

This efficiency paradox—making something cheaper allows you to do vastly
more of it—sparked the scaling era of LLMs.

### 38. Exponential Growth

<figure>
<img
src="https://rajivshah.com/blog/images/spark-of-ai-transfer-learning/slide_38.png"
alt="Slide 38" />
<figcaption aria-hidden="true">Slide 38</figcaption>
</figure>

([Timestamp: 11:56](https://youtu.be/6NuGEukBfcA&t=716s))

A graph demonstrates the exponential growth in the size of Transformer
models (measured in parameters) over just a few years. The curve shoots
upward vertically.

Rajiv explains that this scaling—simply making the models bigger and
feeding them more data—led to the performance of GPT-4.

This visualizes the “Scale” aspect of modern AI. We haven’t necessarily
changed the architecture since 2017; we’ve just made it significantly
larger.

### 39. GPT-4 and Images

<figure>
<img
src="https://rajivshah.com/blog/images/spark-of-ai-transfer-learning/slide_39.png"
alt="Slide 39" />
<figcaption aria-hidden="true">Slide 39</figcaption>
</figure>

([Timestamp: 12:11](https://youtu.be/6NuGEukBfcA&t=731s))

The presentation circles back to the GPT-4 generated images from Slide
2.

Rajiv connects the Transformer architecture and scaling directly to
these “Sparks of AGI.” The ability to reason and draw emerged from
simply predicting the next word at a massive scale.

### 40. The Era of ChatGPT

<figure>
<img
src="https://rajivshah.com/blog/images/spark-of-ai-transfer-learning/slide_40.png"
alt="Slide 40" />
<figcaption aria-hidden="true">Slide 40</figcaption>
</figure>

([Timestamp: 12:16](https://youtu.be/6NuGEukBfcA&t=736s))

The slide displays the ChatGPT logo, symbolizing the current era where
these technical advancements reached the public consciousness.

Rajiv sets up the next section of the talk: explaining exactly **how** a
model like ChatGPT is trained. He moves from history to the “Recipe.”

### 41. The Learning Process

<figure>
<img
src="https://rajivshah.com/blog/images/spark-of-ai-transfer-learning/slide_41.png"
alt="Slide 41" />
<figcaption aria-hidden="true">Slide 41</figcaption>
</figure>

([Timestamp: 12:20](https://youtu.be/6NuGEukBfcA&t=740s))

A visual diagram outlines the evolutionary stages of ChatGPT. It
previews the three steps Rajiv will cover: Pre-training, Fine-tuning,
and Alignment.

This roadmap helps the audience understand that ChatGPT isn’t just one
static thing; it’s the result of a multi-stage pipeline involving
different types of learning.

### 42. Recipe Step 1: Foundation Model

<figure>
<img
src="https://rajivshah.com/blog/images/spark-of-ai-transfer-learning/slide_42.png"
alt="Slide 42" />
<figcaption aria-hidden="true">Slide 42</figcaption>
</figure>

([Timestamp: 12:27](https://youtu.be/6NuGEukBfcA&t=747s))

The first step identified is the **“Foundation Model”** (or Base Model).

Rajiv explains that the core capability of these models is **Next Word
Prediction**. Before it can answer questions or be helpful, it must
simply learn the statistical structure of language.

### 43. Predictive Keyboards

<figure>
<img
src="https://rajivshah.com/blog/images/spark-of-ai-transfer-learning/slide_43.png"
alt="Slide 43" />
<figcaption aria-hidden="true">Slide 43</figcaption>
</figure>

([Timestamp: 12:31](https://youtu.be/6NuGEukBfcA&t=751s))

To make the concept relatable, the slide compares LLMs to the
**predictive text** feature on a smartphone keyboard.

Rajiv notes that while the game on your phone is simple, scaling that
concept up to the entire internet makes it incredibly powerful. It
grounds the “magic” of AI in a familiar user experience.

### 44. Next Token Prediction

<figure>
<img
src="https://rajivshah.com/blog/images/spark-of-ai-transfer-learning/slide_44.png"
alt="Slide 44" />
<figcaption aria-hidden="true">Slide 44</figcaption>
</figure>

([Timestamp: 12:40](https://youtu.be/6NuGEukBfcA&t=760s))

This technical slide defines **“Next Token Prediction.”** It explains
that the model looks at a sequence of text and calculates the
probability of what comes next.

Rajiv emphasizes that this is a hard statistical problem. There are many
possibilities for the next word, and the model must learn to weigh them
based on context.

### 45. The Homer Simpson Challenge

<figure>
<img
src="https://rajivshah.com/blog/images/spark-of-ai-transfer-learning/slide_45.png"
alt="Slide 45" />
<figcaption aria-hidden="true">Slide 45</figcaption>
</figure>

([Timestamp: 13:05](https://youtu.be/6NuGEukBfcA&t=785s))

Rajiv introduces a specific experiment: Training a Transformer to speak
like **Homer Simpson**. He mentions using 7MB of Simpsons scripts (~7
million tokens).

This serves as a concrete example to show how training data size affects
model performance.

### 46. 4 Million Tokens

<figure>
<img
src="https://rajivshah.com/blog/images/spark-of-ai-transfer-learning/slide_46.png"
alt="Slide 46" />
<figcaption aria-hidden="true">Slide 46</figcaption>
</figure>

([Timestamp: 13:32](https://youtu.be/6NuGEukBfcA&t=812s))

The slide shows the output of the model when trained on only **4 Million
tokens**. The text is “nonsensical and random.”

Rajiv demonstrates that with insufficient data, the model hasn’t learned
grammar or structure yet. It’s just outputting characters.

### 47. 16 Million Tokens

<figure>
<img
src="https://rajivshah.com/blog/images/spark-of-ai-transfer-learning/slide_47.png"
alt="Slide 47" />
<figcaption aria-hidden="true">Slide 47</figcaption>
</figure>

([Timestamp: 13:37](https://youtu.be/6NuGEukBfcA&t=817s))

At **16 Million tokens**, the output improves slightly. It contains
random words and incorrect grammar, but it’s recognizable as language.

This illustrates the “grokking” phase where the model starts to pick up
on basic syntax but lacks semantic meaning.

### 48. 64 Million Tokens

<figure>
<img
src="https://rajivshah.com/blog/images/spark-of-ai-transfer-learning/slide_48.png"
alt="Slide 48" />
<figcaption aria-hidden="true">Slide 48</figcaption>
</figure>

([Timestamp: 13:39](https://youtu.be/6NuGEukBfcA&t=819s))

With **64 Million tokens**, the model generates text that is “close to a
proper sentence” and sounds vaguely like Homer Simpson.

Rajiv uses this progression to prove that these models are statistical
engines. With enough data, they mimic the patterns of the training set
effectively.

### 49. GPT-2 Specifications

<figure>
<img
src="https://rajivshah.com/blog/images/spark-of-ai-transfer-learning/slide_49.png"
alt="Slide 49" />
<figcaption aria-hidden="true">Slide 49</figcaption>
</figure>

([Timestamp: 13:54](https://youtu.be/6NuGEukBfcA&t=834s))

The slide details **GPT-2** (released in 2019), which had **1.5 Billion
parameters**.

Rajiv recalls that when GPT-2 came out, he wasn’t excited because it was
just a “creative storytelling model.” It wasn’t factually accurate. He
wants the audience to remember that at their core, these models are just
predicting the next word, not checking facts.

### 50. Llama 3.1 and Scale

<figure>
<img
src="https://rajivshah.com/blog/images/spark-of-ai-transfer-learning/slide_50.png"
alt="Slide 50" />
<figcaption aria-hidden="true">Slide 50</figcaption>
</figure>

([Timestamp: 14:27](https://youtu.be/6NuGEukBfcA&t=867s))

Updating the timeline, this slide shows **Llama 3.1**. It highlights the
training data: **15 Trillion Tokens** and the compute: **40 Million GPU
Hours**.

Rajiv emphasizes that 15 trillion tokens is an “unfathomable amount of
information.” The scale has increased 10,000x since GPT-2.

This underscores the energy and compute intensity of modern AI—it
requires massive infrastructure.

### 51. Hallucinations

<figure>
<img
src="https://rajivshah.com/blog/images/spark-of-ai-transfer-learning/slide_51.png"
alt="Slide 51" />
<figcaption aria-hidden="true">Slide 51</figcaption>
</figure>

([Timestamp: 15:15](https://youtu.be/6NuGEukBfcA&t=915s))

This slide addresses **Hallucinations**. It uses an example of asking
for the “Capital of Mars.” The model will confidently invent an answer.

Rajiv argues that “hallucination” isn’t the right metaphor because the
model isn’t malfunctioning. It is doing exactly what it was designed to
do: predict the most likely next word. It has no concept of “truth,”
only statistical likelihood.

### 52. GPT-2 Failure on Sentiment

<figure>
<img
src="https://rajivshah.com/blog/images/spark-of-ai-transfer-learning/slide_52.png"
alt="Slide 52" />
<figcaption aria-hidden="true">Slide 52</figcaption>
</figure>

([Timestamp: 16:17](https://youtu.be/6NuGEukBfcA&t=977s))

Rajiv shows an example of trying to use the base GPT-2 model for a
specific task: **Customer Sentiment**. When prompted, the model just
continues the story instead of classifying the sentiment.

This illustrates that Base Models are creative but **not useful for
following instructions**. They don’t know they are supposed to solve a
problem; they just want to write text.

### 53. Recipe Step 2: Instruction Fine-Tuned

<figure>
<img
src="https://rajivshah.com/blog/images/spark-of-ai-transfer-learning/slide_53.png"
alt="Slide 53" />
<figcaption aria-hidden="true">Slide 53</figcaption>
</figure>

([Timestamp: 16:38](https://youtu.be/6NuGEukBfcA&t=998s))

This introduces the second step in the ChatGPT recipe: **“Instruction
Fine-Tuned Model.”**

Rajiv explains that to make the model useful, we must teach it to follow
orders. This is done via Transfer Learning—taking the base model and
training it further on examples of instructions and answers.

### 54. Fine-Tuning for Sentiment

<figure>
<img
src="https://rajivshah.com/blog/images/spark-of-ai-transfer-learning/slide_54.png"
alt="Slide 54" />
<figcaption aria-hidden="true">Slide 54</figcaption>
</figure>

([Timestamp: 16:46](https://youtu.be/6NuGEukBfcA&t=1006s))

The slide shows the process of fine-tuning the language model
specifically for **Sentiment Analysis**.

By showing the model examples of “Sentence -\> Sentiment,” we can tweak
the parameters so it learns to perform classification rather than just
storytelling.

### 55. Multi-Task Fine-Tuning

<figure>
<img
src="https://rajivshah.com/blog/images/spark-of-ai-transfer-learning/slide_55.png"
alt="Slide 55" />
<figcaption aria-hidden="true">Slide 55</figcaption>
</figure>

([Timestamp: 17:22](https://youtu.be/6NuGEukBfcA&t=1042s))

Rajiv expands the concept. We don’t just fine-tune for one task; we
fine-tune for **Topic Classification** as well.

The key insight is that **one model** can now solve multiple problems.
Unlike the “Old NLP” where you needed separate models, the LLM can swap
between tasks based on the instruction.

### 56. Translation Task

<figure>
<img
src="https://rajivshah.com/blog/images/spark-of-ai-transfer-learning/slide_56.png"
alt="Slide 56" />
<figcaption aria-hidden="true">Slide 56</figcaption>
</figure>

([Timestamp: 17:24](https://youtu.be/6NuGEukBfcA&t=1044s))

The slide adds **Translation** to the mix, using about 10,000 examples.

This reinforces the “General Purpose” nature of LLMs. They are Swiss
Army knives for text.

### 57. Generalization to New Tasks

<figure>
<img
src="https://rajivshah.com/blog/images/spark-of-ai-transfer-learning/slide_57.png"
alt="Slide 57" />
<figcaption aria-hidden="true">Slide 57</figcaption>
</figure>

([Timestamp: 17:30](https://youtu.be/6NuGEukBfcA&t=1050s))

Rajiv poses a challenge: What happens if you give the model a task it
**hasn’t** seen before?

The slide indicates the model will try to solve it. This is the
breakthrough of **Generalization**. Because it understands language so
well, it can interpolate and attempt tasks it wasn’t explicitly trained
on.

### 58. Practical Applications

<figure>
<img
src="https://rajivshah.com/blog/images/spark-of-ai-transfer-learning/slide_58.png"
alt="Slide 58" />
<figcaption aria-hidden="true">Slide 58</figcaption>
</figure>

([Timestamp: 17:55](https://youtu.be/6NuGEukBfcA&t=1075s))

This slide showcases the wide array of use cases: Code explanation,
Creative writing, Information extraction, etc.

Rajiv explains that these capabilities exist because we have “trained
these models to follow instructions.” This is why we can talk to them
via **Prompts**.

### 59. Zero Shot Learning

<figure>
<img
src="https://rajivshah.com/blog/images/spark-of-ai-transfer-learning/slide_59.png"
alt="Slide 59" />
<figcaption aria-hidden="true">Slide 59</figcaption>
</figure>

([Timestamp: 18:19](https://youtu.be/6NuGEukBfcA&t=1099s))

The slide introduces **“Zero shot learning”** and **“Prompting.”**

This is the ability to get a result without showing the model any
examples (zero shots). Rajiv notes that there is a “whole language”
around prompting, but fundamentally, it’s just giving the model the
instruction we trained it to expect.

### 60. Weeks vs. Days

<figure>
<img
src="https://rajivshah.com/blog/images/spark-of-ai-transfer-learning/slide_60.png"
alt="Slide 60" />
<figcaption aria-hidden="true">Slide 60</figcaption>
</figure>

([Timestamp: 18:41](https://youtu.be/6NuGEukBfcA&t=1121s))

A comparison slide contrasts “Training a ML Model (weeks)” with
“Prompting a LLM (days).”

Rajiv highlights the efficiency shift. In the old days, solving a
sentiment problem meant weeks of data collection and training. Now, it
takes minutes to write a prompt. This is a massive productivity booster
for NLP tasks.

### 61. Reasoning and Planning

<figure>
<img
src="https://rajivshah.com/blog/images/spark-of-ai-transfer-learning/slide_61.png"
alt="Slide 61" />
<figcaption aria-hidden="true">Slide 61</figcaption>
</figure>

([Timestamp: 19:25](https://youtu.be/6NuGEukBfcA&t=1165s))

The presentation pivots to the limitations of LLMs, specifically
regarding **Reasoning and Planning**. The slide shows a “Block Stacking”
puzzle.

Rajiv explains that stacking blocks requires planning several steps
ahead. It is not a one-step prediction problem; it requires maintaining
a state of the world in memory.

### 62. Mystery World Failure

<figure>
<img
src="https://rajivshah.com/blog/images/spark-of-ai-transfer-learning/slide_62.png"
alt="Slide 62" />
<figcaption aria-hidden="true">Slide 62</figcaption>
</figure>

([Timestamp: 20:40](https://youtu.be/6NuGEukBfcA&t=1240s))

The slide introduces **“Mystery World,”** a variation of the block
problem where the names of the blocks are changed to random words.

While a human (or a 4-year-old) understands that changing the name
doesn’t change the physics of stacking, **GPT-4 fails** (3% accuracy).
Rajiv explains that the model gets distracted by the creative aspect of
the words and loses the logical thread. It shows these models struggle
with abstract reasoning.

### 63. Recipe Step 3: Aligned Model

<figure>
<img
src="https://rajivshah.com/blog/images/spark-of-ai-transfer-learning/slide_63.png"
alt="Slide 63" />
<figcaption aria-hidden="true">Slide 63</figcaption>
</figure>

([Timestamp: 21:56](https://youtu.be/6NuGEukBfcA&t=1316s))

The final step in the recipe is the **“Aligned Model.”**

Rajiv introduces the need for safety and helpfulness. A model that
follows instructions perfectly might follow *bad* instructions. We need
to align it with human values.

### 64. Galactica: Science LLM

<figure>
<img
src="https://rajivshah.com/blog/images/spark-of-ai-transfer-learning/slide_64.png"
alt="Slide 64" />
<figcaption aria-hidden="true">Slide 64</figcaption>
</figure>

([Timestamp: 22:00](https://youtu.be/6NuGEukBfcA&t=1320s))

The slide presents **Galactica**, a model released by Meta focused on
science.

Rajiv describes the intent: a helpful assistant for researchers to write
code, summarize papers, and generate scientific content. It was meant to
be a specialized tool.

### 65. Galactica Output

<figure>
<img
src="https://rajivshah.com/blog/images/spark-of-ai-transfer-learning/slide_65.png"
alt="Slide 65" />
<figcaption aria-hidden="true">Slide 65</figcaption>
</figure>

([Timestamp: 22:20](https://youtu.be/6NuGEukBfcA&t=1340s))

An example of Galactica’s output shows it generating technical content.

Rajiv highlights the potential utility. It looked like a powerful tool
for accelerating scientific discovery.

### 66. Galactica Pulled

<figure>
<img
src="https://rajivshah.com/blog/images/spark-of-ai-transfer-learning/slide_66.png"
alt="Slide 66" />
<figcaption aria-hidden="true">Slide 66</figcaption>
</figure>

([Timestamp: 22:47](https://youtu.be/6NuGEukBfcA&t=1367s))

The slide reveals that Meta **pulled the model** shortly after release.

Rajiv explains why: users found they could ask it for the “benefits of
eating crushed glass” or “benefits of suicide,” and the model would
happily generate a scientific-sounding justification. It lacked a safety
layer. This incident underscored the necessity of **Red Teaming** and
alignment before release.

### 67. Learning What is Helpful

<figure>
<img
src="https://rajivshah.com/blog/images/spark-of-ai-transfer-learning/slide_67.png"
alt="Slide 67" />
<figcaption aria-hidden="true">Slide 67</figcaption>
</figure>

([Timestamp: 23:59](https://youtu.be/6NuGEukBfcA&t=1439s))

To explain how we define “helpful,” Rajiv shows a **Stack Overflow**
question.

He notes that defining “helpful” mathematically is difficult. Unlike
“square footage,” helpfulness is subjective and nuanced.

### 68. Technical Answer

<figure>
<img
src="https://rajivshah.com/blog/images/spark-of-ai-transfer-learning/slide_68.png"
alt="Slide 68" />
<figcaption aria-hidden="true">Slide 68</figcaption>
</figure>

([Timestamp: 24:05](https://youtu.be/6NuGEukBfcA&t=1445s))

The slide shows a detailed technical answer.

Rajiv points out that trying to create a “feature list” for what makes
this answer helpful is nearly impossible. We can’t write a rule-based
program to detect helpfulness.

### 69. The Dating App Analogy

<figure>
<img
src="https://rajivshah.com/blog/images/spark-of-ai-transfer-learning/slide_69.png"
alt="Slide 69" />
<figcaption aria-hidden="true">Slide 69</figcaption>
</figure>

([Timestamp: 24:26](https://youtu.be/6NuGEukBfcA&t=1466s))

Rajiv uses a humorous **Dating App** analogy. He compares the “Old Way”
(filling out long compatibility forms/features) with the “New Way”
(Swiping).

He explains that **Swiping** is a way of capturing human preferences
without asking the user to explicitly define them. This is how we teach
AI what is helpful.

### 70. Collect Human Feedback

<figure>
<img
src="https://rajivshah.com/blog/images/spark-of-ai-transfer-learning/slide_70.png"
alt="Slide 70" />
<figcaption aria-hidden="true">Slide 70</figcaption>
</figure>

([Timestamp: 24:55](https://youtu.be/6NuGEukBfcA&t=1495s))

The slide details the process: **“Collect Human Feedback.”**

We present the model with two options and ask a human, “Which is
better?” By collecting thousands of these “swipes,” we build a dataset
of human preference.

### 71. RLHF (Reinforcement Learning from Human Feedback)

<figure>
<img
src="https://rajivshah.com/blog/images/spark-of-ai-transfer-learning/slide_71.png"
alt="Slide 71" />
<figcaption aria-hidden="true">Slide 71</figcaption>
</figure>

([Timestamp: 25:05](https://youtu.be/6NuGEukBfcA&t=1505s))

This slide introduces the technical term: **RLHF**.

Rajiv explains this is the layer that turns a raw instruction-following
model into a safe, helpful product like ChatGPT. It is an active
curation process, similar to curating an Instagram feed.

### 72. The Makeover Example

<figure>
<img
src="https://rajivshah.com/blog/images/spark-of-ai-transfer-learning/slide_72.png"
alt="Slide 72" />
<figcaption aria-hidden="true">Slide 72</figcaption>
</figure>

([Timestamp: 25:27](https://youtu.be/6NuGEukBfcA&t=1527s))

A “Before and After” makeover image illustrates the effect of RLHF.

The “Before” is the raw model (messy, potentially harmful). The “After”
is the aligned model (polished, safe, presentable).

### 73. Tuning Responses

<figure>
<img
src="https://rajivshah.com/blog/images/spark-of-ai-transfer-learning/slide_73.png"
alt="Slide 73" />
<figcaption aria-hidden="true">Slide 73</figcaption>
</figure>

([Timestamp: 25:44](https://youtu.be/6NuGEukBfcA&t=1544s))

The slide shows different ways an AI can answer a question:
**Sycophantic** (sucking up to the user), **Baseline Truthful** (blunt),
or **Helpful Truthful**.

Rajiv notes we can train models to have specific personalities. We can
make them polite, or we can make them “kiss your butt” if the user wants
validation.

### 74. AI Conversations

<figure>
<img
src="https://rajivshah.com/blog/images/spark-of-ai-transfer-learning/slide_74.png"
alt="Slide 74" />
<figcaption aria-hidden="true">Slide 74</figcaption>
</figure>

([Timestamp: 26:17](https://youtu.be/6NuGEukBfcA&t=1577s))

This slide references **Character.ai** and the trend of people spending
hours talking to AI personas.

Rajiv mentions research showing people sometimes **prefer AI doctors**
over human ones because the AI is patient, listens, and is polite (due
to alignment). This suggests a future where AI handles high-touch
conversational roles.

### 75. The Full Recipe

<figure>
<img
src="https://rajivshah.com/blog/images/spark-of-ai-transfer-learning/slide_75.png"
alt="Slide 75" />
<figcaption aria-hidden="true">Slide 75</figcaption>
</figure>

([Timestamp: 27:19](https://youtu.be/6NuGEukBfcA&t=1639s))

The presentation summarizes the full pipeline: **Foundation Model -\>
Instruction Fine-Tuned -\> Aligned Model**.

This visual recap cements the three-stage process in the audience’s
mind.

### 76. Learning Mechanisms Recap

<figure>
<img
src="https://rajivshah.com/blog/images/spark-of-ai-transfer-learning/slide_76.png"
alt="Slide 76" />
<figcaption aria-hidden="true">Slide 76</figcaption>
</figure>

([Timestamp: 27:23](https://youtu.be/6NuGEukBfcA&t=1643s))

Rajiv maps the learning mechanisms to the stages: 1. **Next Word
Prediction** (Foundation) 2. **Multi-task Training** (Instruction) 3.
**Human Preferences** (Alignment)

He reiterates that understanding these three mechanics helps explain why
the models behave the way they do (hallucinations, ability to code,
politeness).

### 77. Key Takeaways

<figure>
<img
src="https://rajivshah.com/blog/images/spark-of-ai-transfer-learning/slide_77.png"
alt="Slide 77" />
<figcaption aria-hidden="true">Slide 77</figcaption>
</figure>

([Timestamp: 27:43](https://youtu.be/6NuGEukBfcA&t=1663s))

The presentation transitions to the conclusion with three main
takeaways: 1. **Measure Twice** 2. **Respect Scale** 3. **Critical
Thinking**

Rajiv notes in the video that he skimmed these in the original talk, but
the slides provide the detail for how to work effectively with AI.

### 78. Measure Twice (Benchmarks)

<figure>
<img
src="https://rajivshah.com/blog/images/spark-of-ai-transfer-learning/slide_78.png"
alt="Slide 78" />
<figcaption aria-hidden="true">Slide 78</figcaption>
</figure>

(\[Timestamp: End of Transcript\])

This slide displays a collage of AI benchmarks (MMLU, HumanEval, etc.).

The concept “Measure Twice” emphasizes that because AI models are
probabilistic and prone to hallucination, we cannot trust them blindly.
We must rely on rigorous benchmarking to understand their capabilities
and failures before deployment.

### 79. Targets for Evaluation

<figure>
<img
src="https://rajivshah.com/blog/images/spark-of-ai-transfer-learning/slide_79.png"
alt="Slide 79" />
<figcaption aria-hidden="true">Slide 79</figcaption>
</figure>

(\[Timestamp: End of Transcript\])

This slide likely elaborates on the need for clear **“targets”** or
ground truth when evaluating models.

You cannot improve what you cannot measure. In the context of “Prompt
Engineering,” this means you shouldn’t just tweak prompts randomly; you
need a systematic way to measure if a prompt change actually improved
the output.

### 80. Respect Scale

<figure>
<img
src="https://rajivshah.com/blog/images/spark-of-ai-transfer-learning/slide_80.png"
alt="Slide 80" />
<figcaption aria-hidden="true">Slide 80</figcaption>
</figure>

(\[Timestamp: End of Transcript\])

This slide illustrates the exponential growth in **single-chip inference
performance**.

“Respect Scale” refers to the lesson that betting against hardware and
data scaling is usually a losing bet. The capabilities of these models
grow faster than our intuition expects.

### 81. The Scaling Lesson (Humans)

<figure>
<img
src="https://rajivshah.com/blog/images/spark-of-ai-transfer-learning/slide_81.png"
alt="Slide 81" />
<figcaption aria-hidden="true">Slide 81</figcaption>
</figure>

(\[Timestamp: End of Transcript\])

This slide likely discusses how human expertise fits into the scaling
laws. As technology scales, the role of the human shifts from doing the
work to evaluating the work.

### 82. The Plateau

<figure>
<img
src="https://rajivshah.com/blog/images/spark-of-ai-transfer-learning/slide_82.png"
alt="Slide 82" />
<figcaption aria-hidden="true">Slide 82</figcaption>
</figure>

(\[Timestamp: End of Transcript\])

A visual showing that human contribution or specific “hacks” tend to
**plateau**, whereas general-purpose methods that leverage scale (like
Transformers) continue to improve.

This reinforces the “Bitter Lesson”: specialized, hand-crafted solutions
eventually lose to general methods that can consume more compute.

### 83. AlexNet vs Transformers

<figure>
<img
src="https://rajivshah.com/blog/images/spark-of-ai-transfer-learning/slide_83.png"
alt="Slide 83" />
<figcaption aria-hidden="true">Slide 83</figcaption>
</figure>

(\[Timestamp: End of Transcript\])

A comparison between **AlexNet** (the start of the deep learning era)
and **Transformers** (the current era).

It highlights the massive increase: **10,000x more data** and **1,000x
more compute**. This illustrates that the fundamental driver of progress
has been scale.

### 84. The Bitter Lesson

<figure>
<img
src="https://rajivshah.com/blog/images/spark-of-ai-transfer-learning/slide_84.png"
alt="Slide 84" />
<figcaption aria-hidden="true">Slide 84</figcaption>
</figure>

(\[Timestamp: End of Transcript\])

This slide explicitly references Rich Sutton’s **“The Bitter Lesson.”**

The lesson is that researchers often try to build their knowledge into
the system (like Jelinek’s linguists), but in the long run, the only
thing that matters is leveraging computation. AI succeeds when we stop
trying to teach it *how* to think and just give it enough power to learn
on its own.

### 85. Text to SQL

<figure>
<img
src="https://rajivshah.com/blog/images/spark-of-ai-transfer-learning/slide_85.png"
alt="Slide 85" />
<figcaption aria-hidden="true">Slide 85</figcaption>
</figure>

(\[Timestamp: End of Transcript\])

The slide examines **Text to SQL**, a common enterprise use case. It
compares AI performance to human experts.

It notes that while AI is good, humans still achieve higher exact match
accuracy. This nuances the “Respect Scale” argument—for high-precision
tasks, human oversight is still required.

### 86. Critical Thinking

<figure>
<img
src="https://rajivshah.com/blog/images/spark-of-ai-transfer-learning/slide_86.png"
alt="Slide 86" />
<figcaption aria-hidden="true">Slide 86</figcaption>
</figure>

(\[Timestamp: End of Transcript\])

The final takeaway is **“Critical Thinking.”**

In an age where AI can generate convincing but false information, human
judgment becomes the most valuable skill. We must critically evaluate
the outputs of these models.

### 87. Predictions and Concerns

<figure>
<img
src="https://rajivshah.com/blog/images/spark-of-ai-transfer-learning/slide_87.png"
alt="Slide 87" />
<figcaption aria-hidden="true">Slide 87</figcaption>
</figure>

(\[Timestamp: End of Transcript\])

This slide recaps expert predictions, ranging from job displacement to
existential threats.

It serves as a reminder that even experts disagree on the timeline and
impact, reinforcing the need for individual critical thinking rather
than blind faith in pundits.

### 88. Practical Limits: Bezos and Alexa

<figure>
<img
src="https://rajivshah.com/blog/images/spark-of-ai-transfer-learning/slide_88.png"
alt="Slide 88" />
<figcaption aria-hidden="true">Slide 88</figcaption>
</figure>

(\[Timestamp: End of Transcript\])

A humorous slide showing **Jeff Bezos** and **Alexa**. It likely
references an instance where Alexa failed to understand a simple context
despite Amazon’s massive resources.

This illustrates the **“Practical Limits of Learning.”** Despite the
hype, current AI still struggles with basic context that humans find
trivial.

### 89. Autonomous Driving Limits

<figure>
<img
src="https://rajivshah.com/blog/images/spark-of-ai-transfer-learning/slide_89.png"
alt="Slide 89" />
<figcaption aria-hidden="true">Slide 89</figcaption>
</figure>

(\[Timestamp: End of Transcript\])

Images of an autonomous driving interface and a car accident.

This points out that in high-stakes physical environments, “99%
accuracy” isn’t enough. The “long tail” of edge cases remains a massive
hurdle for AI.

### 90. Chatbot Failures

<figure>
<img
src="https://rajivshah.com/blog/images/spark-of-ai-transfer-learning/slide_90.png"
alt="Slide 90" />
<figcaption aria-hidden="true">Slide 90</figcaption>
</figure>

(\[Timestamp: End of Transcript\])

Examples of chatbots failing simple math or making **“legally binding
offers”** (referencing the Air Canada chatbot lawsuit).

This warns against deploying these models in critical business flows
without guardrails. They can confidently make costly mistakes.

### 91. Interaction Principles

<figure>
<img
src="https://rajivshah.com/blog/images/spark-of-ai-transfer-learning/slide_91.png"
alt="Slide 91" />
<figcaption aria-hidden="true">Slide 91</figcaption>
</figure>

(\[Timestamp: End of Transcript\])

This slide summarizes the three principles for interacting with AI:
**“Measure twice,” “Respect scale,”** and **“Think critically.”**

It acts as the final instructional slide, giving the audience a mantra
for navigating the AI landscape.

### 92. Evolution of Generative Capabilities

<figure>
<img
src="https://rajivshah.com/blog/images/spark-of-ai-transfer-learning/slide_92.png"
alt="Slide 92" />
<figcaption aria-hidden="true">Slide 92</figcaption>
</figure>

(\[Timestamp: End of Transcript\])

The slide shows a series of **Unicorn images** generated by GPT-4 over
time.

This visualizes the rapid improvement in generative capabilities. Just
as the “Sparks of AGI” images improved, the fidelity of these outputs
continues to evolve, reminding us that we are looking at a moving
target.

### 93. Conclusion

<figure>
<img
src="https://rajivshah.com/blog/images/spark-of-ai-transfer-learning/slide_93.png"
alt="Slide 93" />
<figcaption aria-hidden="true">Slide 93</figcaption>
</figure>

(\[Timestamp: End of Transcript\])

The final slide concludes the presentation with Rajiv Shah’s name and
affiliation (Snowflake).

It wraps up the narrative: from the spark of Transfer Learning to the
fire of the Generative AI revolution, offering a practical, technical,
and critical perspective on the technology shaping our future.

------------------------------------------------------------------------

*This annotated presentation was generated from the talk using
AI-assisted tools. Each slide includes timestamps and detailed
explanations.*