# Week 1.2: Foundations of AI Quiz

This notebook contains 20 multiple-choice questions covering fundamental concepts in AI and Machine Learning.

---

## Question 1

Which conclusion most accurately aligns with the implications of the No Free Lunch theorem in machine learning?

A. Every algorithm achieves optimal performance given sufficient training data.  
B. Model performance generalizes across tasks if architecture complexity is high.  
C. No algorithm is universally superior; performance depends entirely on problem-specific data and assumptions.  
D. Complex algorithms like deep neural networks outperform traditional models in all contexts.

**Answer: C**

**Rationale:** There is no one-size-fits-all algorithm—each method's success depends on matching its inductive biases to the specific problem and data.

## Question 2

Which statement best describes the relationship between AI and machine learning (ML)?

A. AI is the broad field of building intelligent systems; ML is the core set of algorithms that enable those systems to learn from data.  
B. ML is a subset of AI that only uses neural networks.  
C. AI and ML are synonymous and interchangeable terms.  
D. ML refers only to data preprocessing, while AI refers to model building.

**Answer: A**

**Rationale:** AI encompasses all approaches to creating intelligent behavior; ML specifically refers to the data-driven learning methods at its core.

## Question 3

Which mathematical formulation is most commonly minimized in supervised regression tasks?

A. Sum of absolute differences between predictions and targets  
B. Cross-entropy between predicted and true class distributions  
C. Sum of squared errors between predictions and targets  
D. Hinge loss over margin violations

**Answer: C**

**Rationale:** The least-squares criterion—minimizing sum of squared errors—has been the foundational regression loss since Gauss and Legendre.

## Question 4

What does the difference between the model's prediction $\hat{y}$ and the ground truth $y$ represent?

A. Bias  
B. Noise  
C. Loss  
D. Activation

**Answer: C**

**Rationale:** $\hat{y} - y$ quantifies the model's error on a single example, forming the basis for the loss function.

## Question 5

Why is the error term squared in the Mean Squared Error (MSE) loss function?

A. To make the error positive and penalize larger errors more  
B. To normalize the error over data points  
C. To avoid bias in predictions  
D. To satisfy gradient-descent assumptions

**Answer: A**

**Rationale:** Squaring ensures positivity and disproportionately penalizes larger deviations, improving sensitivity to outliers.

## Question 6

What does the neural network training process fundamentally aim to minimize?

A. The number of neurons  
B. The difference between activation and input  
C. The loss function  
D. The gradient direction

**Answer: C**

**Rationale:** Training adjusts weights via gradient-based optimization to minimize aggregate prediction error (the loss).

## Question 7

What does the Universal Approximation Theorem claim about neural networks?

A. They can exactly recover ground truth for any dataset  
B. They are always better than traditional models  
C. A single-hidden-layer network can approximate any continuous function given sufficient neurons  
D. They require multiple layers to approximate any function

**Answer: C**

**Rationale:** Even one hidden layer with enough units can approximate any continuous function on a compact domain.

## Question 8

What role does the activation function play in a neural network?

A. It normalizes inputs  
B. It adds linearity to the model  
C. It introduces non-linearity to allow learning complex relationships  
D. It reduces the model's capacity

**Answer: C**

**Rationale:** Non-linear activations (ReLU, tanh, sigmoid) enable networks to model complex, non-linear mappings.

## Question 9

Why is the term "regression" historically used in predictive modeling?

A. Because models regress in performance over time  
B. Due to the phenomenon of regression toward the mean observed by Galton  
C. Because it implies non-linear prediction  
D. It refers to a type of regularization

**Answer: B**

**Rationale:** Galton observed that quantitative traits tend to regress toward the population average, inspiring the name.

## Question 10

Which of the following is not typically a purpose of the loss function?

A. Measuring prediction accuracy  
B. Guiding parameter updates  
C. Ensuring binary classification  
D. Quantifying model error

**Answer: C**

**Rationale:** Loss functions quantify error and guide learning; classification type is determined by model/output design, not the loss.

## Question 11

What is the primary function of a neuron in a neural network?

A. To perform a linear transformation followed by a nonlinear transformation  
B. To store data for future use  
C. To execute complex mathematical operations  
D. To transmit data between different networks

**Answer: A**

**Rationale:** A neuron computes a weighted sum plus bias (linear) then applies a nonlinear activation, enabling complex function approximation.

## Question 12

What is the main reason for using a sigmoid activation function?

A. To introduce nonlinearity into the model  
B. To ensure the model is always accurate  
C. To simplify computation  
D. To increase processing speed

**Answer: A**

**Rationale:** Sigmoid squashes outputs to [0,1] and introduces nonlinearity—critical for learning complex patterns.

## Question 13

What is the primary purpose of the loss function in machine learning?

A. To quantify the error between predicted and actual values  
B. To increase the speed of the learning process  
C. To store data for future reference  
D. To ensure the model is always correct

**Answer: A**

**Rationale:** The loss provides a scalar measure of error that guides parameter updates via backpropagation.

## Question 14

What does the term 'regression' refer to in ML?

A. A model that predicts a continuous number  
B. A model that classifies data  
C. A model that reduces data size  
D. A model that increases prediction accuracy

**Answer: A**

**Rationale:** Regression predicts continuous outcomes (e.g., price, temperature), unlike classification's discrete labels.

## Question 15

What type of activation function behaves like an electronic diode (output = input if positive, else zero)?

A. ReLU  
B. Tanh  
C. GELU  
D. ELU

**Answer: A**

**Rationale:** ReLU(x) = max(0, x) is computationally simple and mitigates the vanishing gradient for positive values.

## Question 16

How did reinforcement learning contribute to AlphaFold's success?

A. It directly classified protein types  
B. It encoded biological rules manually  
C. It iteratively learned protein folding geometry through feedback  
D. It generated new amino acids

**Answer: C**

**Rationale:** RL allowed AlphaFold to refine folding predictions by receiving reward-based feedback on predicted structures.

## Question 17

What distinguishes "reasoning machines" from earlier language models?

A. They use larger training datasets  
B. They no longer hallucinate facts  
C. They can internally generalize logic, validate hypotheses, and create reproducible outputs  
D. They require no training whatsoever

**Answer: C**

**Rationale:** True reasoning systems can validate and hypothesize internally, unlike models that only predict next tokens.

## Question 18

What is the main advantage of neural networks for unstructured data?

A. They can effectively handle images, text, and audio  
B. They require less computational power  
C. They are easier to implement than other algorithms  
D. They always provide more accurate results

**Answer: A**

**Rationale:** Deep architectures learn hierarchical features automatically from unstructured modalities, reducing manual feature engineering.

## Question 19

In supervised learning, what does the residual ($\hat{y} - y$) encompass?

A. Only the model's systematic error (bias)  
B. Only the irreducible noise in data  
C. The total error (model error + noise) on that example  
D. The gradient used for weight updates

**Answer: C**

**Rationale:** The residual is the observed difference. It can be decomposed into bias, variance, and irreducible noise.

## Question 20

According to the bias-variance decomposition, what is the irreducible noise ($\sigma^2$)?

A. The model's training error  
B. The variance of predictions across datasets  
C. The squared bias of the estimator  
D. The error component due to inherent randomness in data

**Answer: D**

**Rationale:** $\sigma^2$ represents the irreducible error from random factors in the data generation process that no model can eliminate.

---

## Quiz Summary

This quiz covers fundamental concepts in AI and Machine Learning including:

- No Free Lunch Theorem
- AI vs Machine Learning relationship
- Loss functions and optimization
- Neural network fundamentals
- Activation functions
- Universal Approximation Theorem
- Bias-variance decomposition
- Recent advances in AI (AlphaFold, reasoning machines)

**Total Questions: 20**

In [None]:
# You can add code here for quiz analysis, scoring, or interactive features
# For example:

# Quiz answers for reference
answers = ['C', 'A', 'C', 'C', 'A', 'C', 'C', 'C', 'B', 'C', 
           'A', 'A', 'A', 'A', 'A', 'C', 'C', 'A', 'C', 'D']

print(f"Quiz contains {len(answers)} questions")
print(f"Answer key: {answers}")