The Core Math: Input * Weight + Bias

Inputs -> Enemy position, player position, enemy health, player health

Weights -> How much important is current input is.

Bias -> Activation threshold.

Output = Sigmoid(Input * Weight + Bias)

return Output > 0.5 {true -> the neurone is fire}

Q1. Teach AI to learn to Double the number give it inputs [1, 2, 3, 4] and expect [2, 4, 6, 8]. 
Initially, it will guess wrong, but it will learn.

In [1]:
import torch
import torch.nn as nn
import torch.optim as optim

# Data(Teacher):
inputs = torch.tensor([[1.0], [2.0], [3.0], [4.0]])
targets = torch.tensor([[2.0], [4.0], [6.0], [8.0]])

# Model(Brain)
model = nn.Linear(1, 1) # 1 input and 1 output

print(f"start weight: {model.weight.item():2f}, Start Bais: {model.bias.item():2f}")

# Loss function/Optimizer: correcter or Greader
# Calculate how far the guess is from the target
criterion = nn.MSELoss()

# Optimizer: Tool that adjust the weights and biases to minimize the loss
optimizer = optim.SGD(model.parameters(), lr=0.01)

# Training loop: practice
print("\nTraining...")
# epoch is the number of times the model will see the data
for epoch in range(1000):
    # A. Reset gradients (delete old notes)
    optimizer.zero_grad()

    # B. Forward pass: make a guess
    output = model(inputs)

    # C. Calculate loss
    loss = criterion(output, targets) # Assign loss value

    # D. Backward pass: calculate the gradient of the loss with respect to the weights and biases
    loss.backward()

    # E. Update the weights and biases
    optimizer.step()

    # F. Print the loss every 100 epochs
    if epoch % 100 == 0:
        print(f"Epoch {epoch} loss: {loss.item():2f}")

# Test the model
test_input = torch.tensor([[10.0]])
prediction = model(test_input).item()

print("\n--- RESULTS ---")
print(f"Final Weight (should be ~2.0): {model.weight.item():.4f}")
print(f"Final Bias (should be ~0.0): {model.bias.item():.4f}")
print(f"Prediction for input 10.0: {prediction:.4f}")

start weight: 0.632358, Start Bais: -0.713767

Training...
Epoch 0 loss: 19.418684
Epoch 100 loss: 0.004552
Epoch 200 loss: 0.002499
Epoch 300 loss: 0.001372
Epoch 400 loss: 0.000753
Epoch 500 loss: 0.000414
Epoch 600 loss: 0.000227
Epoch 700 loss: 0.000125
Epoch 800 loss: 0.000068
Epoch 900 loss: 0.000038

--- RESULTS ---
Final Weight (should be ~2.0): 2.0038
Final Bias (should be ~0.0): -0.0111
Prediction for input 10.0: 20.0267


About Hidden Layers and Activation Function

O2. Make an XOR Gate?

In [None]:
import torch
import torch.nn as nn
import torch.optim as optim

# 1. DATA: XOR Problem
# Inputs: [A, B]
X = torch.tensor([[0,0], [0,1], [1,0], [1,1]], dtype=torch.float32)
# Targets: Output (1 if one is true, 0 if both or neither is true)
Y = torch.tensor([[0], [1], [1], [0]], dtype=torch.float32)

# 2. MODEL: Hidden Layer Network
class SimpleNet(nn.Module):
    # Multi-Layer Perceptron
    def __init__(self):
        # The 'super()' function is used to access methods of the parent class. Here, super(SimpleNet, self).__init__()
        # specifically calls the __init__ constructor from nn.Module, making sure the SimpleNet
        # instance is fully compatible with PyTorch's module system (like registering submodules,
        # handling device moves, etc.). Skipping this call can cause unexpected errors down the road.
        super(SimpleNet, self).__init__()
        # Layer 1: Takes 2 inputs -> Transforms to 2 hidden features
        # It maintains a 2 X 2 weight matrix and a bias vector of size 2
        self.hidden = nn.Linear(2, 2)
        # Activation: ReLU (The magic non-linearity) allows network to learn complex pattern
        self.relu = nn.ReLU()
        # Layer 2: Takes 2 hidden features -> Decides 1 final output
        self.output = nn.Linear(2, 1)
        # Final Activation: Sigmoid (Squashes output between 0 and 1 for probability)
        self.sigmoid = nn.Sigmoid()

    def forward(self, x):
        # Pass input through hidden layer
        x = self.hidden(x)
        # Apply activation (turn on/off neurons) to remove linearity from neurons
        x = self.relu(x)
        # Pass to output layer
        x = self.output(x)
        # Squash to 0-1 range
        x = self.sigmoid(x)
        return x

model = SimpleNet()
# Stochastic Gradient Descent: logic that update weights {lr = 0.1(lerning rate)} step size how much we change weight after each iteration
optimizer = optim.SGD(model.parameters(), lr=0.1)
# Mean Squared Error. This is the Penalty: The goal of the training is to minimize this value.
criterion = nn.MSELoss()

print("Training XOR Solver...")
# 3. TRAINING
for epoch in range(10000):          # Needs more practice than simple math
    optimizer.zero_grad()           # Clear previous calculation
    outputs = model(X)              # Get Current guesses
    loss = criterion(outputs, Y)    # Calculate Error
    loss.backward()                 # calculates the gradient of the loss function with respect to each weight by propagating the error backward through the network
    optimizer.step()                # It moves the weights in the opposite direction of the gradient to "descend" the error hill.
    
    if epoch % 1000 == 0:
        print(f"Epoch {epoch} Loss: {loss.item():.4f}")

# 4. TEST
print("\n--- RESULTS ---")
with torch.no_grad(): # Don't learn, just test
    test_outputs = model(X)
    predicted = (test_outputs > 0.5).float() # Convert probabilities to 0 or 1
    print(f"Inputs:\n{X}")
    print(f"Target:\n{Y}")
    print(f"AI Prediction:\n{predicted}")

Training XOR Solver...
Epoch 0 Loss: 0.2560
Epoch 1000 Loss: 0.1664
Epoch 2000 Loss: 0.0274
Epoch 3000 Loss: 0.0080
Epoch 4000 Loss: 0.0042
Epoch 5000 Loss: 0.0027
Epoch 6000 Loss: 0.0020
Epoch 7000 Loss: 0.0015
Epoch 8000 Loss: 0.0013
Epoch 9000 Loss: 0.0011

--- RESULTS ---
Inputs:
tensor([[0., 0.],
        [0., 1.],
        [1., 0.],
        [1., 1.]])
Target:
tensor([[0.],
        [1.],
        [1.],
        [0.]])
AI Prediction:
tensor([[0.],
        [1.],
        [1.],
        [0.]])


Hands-On: The "Game Loop" for AI

In [8]:
import gymnasium as gym

# 1. Setup the Game Environment
# render_mode="human" opens a window so you can watch
env = gym.make("CartPole-v1", render_mode="human")

# Reset the game to start
# 'state' is the Input for your Neural Network later
state, info = env.reset()

print("--- STARTING GAME ---")
print(f"Initial State (Input): {state}")
# For CartPole, State is 4 numbers: [Cart Position, Cart Velocity, Pole Angle, Pole Velocity]

total_reward = 0
for step in range(50):
    # 2. Pick a Random Action
    # In the future, your Neural Network will choose this!
    # 0 = Push Left, 1 = Push Right
    action = env.action_space.sample()
    
    # 3. Take the Action and see what happens
    # new_state: The new numbers after moving
    # reward: Points for keeping the pole upright (+1 per frame)
    # terminated: Did we lose? (Pole fell)
    new_state, reward, terminated, truncated, info = env.step(action)
    
    total_reward += reward
    print(f"Step {step}: Action {action} -> Reward {reward} -> New State {new_state}")
    
    if terminated or truncated:
        print(f"!!! GAME OVER !!! Total Score: {total_reward}")
        # Reset to try again
        state, info = env.reset()
        total_reward = 0

env.close()

--- STARTING GAME ---
Initial State (Input): [ 0.03585678 -0.04110261 -0.03893887 -0.01395973]
Step 0: Action 0 -> Reward 1.0 -> New State [ 0.03503472 -0.23564512 -0.03921806  0.2661877 ]
Step 1: Action 0 -> Reward 1.0 -> New State [ 0.03032182 -0.43018603 -0.03389431  0.54624754]
Step 2: Action 0 -> Reward 1.0 -> New State [ 0.0217181  -0.62481576 -0.02296936  0.82806146]
Step 3: Action 0 -> Reward 1.0 -> New State [ 0.00922179 -0.81961626 -0.00640813  1.1134328 ]
Step 4: Action 1 -> Reward 1.0 -> New State [-0.00717054 -0.62441075  0.01586053  0.8187465 ]
Step 5: Action 0 -> Reward 1.0 -> New State [-0.01965875 -0.81974614  0.03223545  1.1163756 ]
Step 6: Action 0 -> Reward 1.0 -> New State [-0.03605368 -1.0152761   0.05456297  1.4189936 ]
Step 7: Action 1 -> Reward 1.0 -> New State [-0.0563592  -0.8208702   0.08294284  1.1438524 ]
Step 8: Action 1 -> Reward 1.0 -> New State [-0.0727766  -0.626924    0.10581989  0.8782904 ]
Step 9: Action 0 -> Reward 1.0 -> New State [-0.08531509 -0