## Concept 1: The Project Goal 
The Problem: "Distribution Shift" Imagine you spend months learning to drive in a sunny city with perfect roads (this is your Training Data). You become a great driver. Then, you take your driving test. Suddenly, it starts snowing, and the road is bumpy (this is your Test Data). Because the conditions (distribution) are different, you fail the test. The standard AI model fails because it is "frozen" after training. It cannot adapt to the snow.

## The Solution: 
Test-Time Training (TTT) Now imagine that during the driving test, you are allowed to practice on the snowy road for 5 minutes before the examiner scores you. You don't know the answers to the test, but you can "get a feel" for the slippery road.


- TTT gives the model a quick practice session on the specific test image right before it makes a prediction.

- Goal: We want to create a model that updates itself slightly for every single test image to handle unexpected changes.

## Concept 2: What is a Neural Network?
Think of a Neural Network as a super-complex filter.

Input: You feed it an image (a grid of numbers representing colors).

Layers: The image passes through many layers. Each layer looks for specific shapes.

- First layers see edges and lines.

- Middle layers see curves and eyes.

- Last layers see "Cat" or "Dog".

Output: It gives you a probability (e.g., "80% Cat").

Training is simply tuning the knobs (called parameters or weights) of these filters so they get the answer right.

## Concept 3: What is "Self-Supervised Learning"?
We need to adapt the model on test data without labels. If the model sees a photo of a "Dog" in the snow, it doesn't know it's a dog (that's what we are trying to predict!). So, how can it "practice" or "learn" from it?

We use a Self-Supervised Task (a fake task).

The Rotation Trick:

- Take the image of the dog.

- Rotate it 90 degrees.

- Ask the network: "How much did I rotate this?"

The network knows the answer because we did the rotation.

If the network can correctly guess the rotation, it forces the network to "understand" the image (e.g., "sky is usually up, grass is down"). This helps it adapt to the new style of image (snowy dog) without needing to know it's a dog.

## Concept 4: What is "Test-Time Training"?
A standard AI works like this:

1. Train (Days/Weeks) -> 2. Freeze -> 3. Test (Milliseconds).

TTT works like this:

1. Train (Days/Weeks).

2. Test Phase starts.

3. Image X arrives.
    - Create 4 copies of Image X (rotated 0°, 90°, 180°, 270°).
    - Ask model to guess rotations.
    - Update the model slightly based on its errors in guessing rotation.(the only thing that changes is the numbers inside the model(the weights))
    - Now the model is "warmed up" on Image X.

4. Final Prediction: Ask the warmed-up model: "Is this a dog?"

5. Reset the model for the next image.

## The Global Pipeline
Based on the diagram in TER subject file, here is our architecture:

Shared Encoder (The Body): The part of the network that looks at shapes/colors.

Main Head (The Classifier): A small part at the end that predicts "Dog/Cat" (Used for the main task).

Self-Supervised Head (The Helper): A small part at the end that predicts "0°/90°/180°" (Used for TTT).

The Workflow: During the test, we ignore the Main Head for a second. We use the Helper Head to update the Body. Once the Body is updated, we use the Main Head to get the final answer.