# Environment Setup

In [None]:
#Pretty standard stuff here

!mkdir PongReinforcementLearning
!cd PongReinforcementLearning

# Then, I set up a virtual environment (venv)
python -m venv PongReinforcementLearningVENV
!source PongReinforcementLearningVENV/bin/activate

# Make the venv recognizable to Jupyter Notebooks.
# This is the bridge that connects Jupyter to my isolated Python environment.
%pip install ipyconfig
python -m ipykernel install --user --name=PongReinforcementLearningVENV

# Time to fire up Jupyter Notebook.
# Make sure to select the new venv as the Python interpreter.
jupyter notebook

# Finally, installing some libs, i usually do these via the console but Jupyter's % operator usually works just fine
%pip3 install pygame

# See if I can run an external Pygame window from a Jupyter notebook on macosx

In [2]:
import pygame
pygame.init()

# Create external window
win = pygame.display.set_mode((500, 500))

# Main game loop
run = True
while run:
    pygame.time.delay(100)
    
    for event in pygame.event.get():
        if event.type == pygame.QUIT:
            run = False
            
    # Game logic here (e.g., move a rectangle)
    pygame.draw.rect(win, (255, 0, 0), (250, 250, 50, 50))
    
    pygame.display.update()

pygame.quit()


pygame 2.5.1 (SDL 2.28.2, Python 3.10.9)
Hello from the pygame community. https://www.pygame.org/contribute.html


**Well, it runs but shutdown isn't graceful.  The window pops up, draws a glorious red square.  But then simple window commands like "close" fail.  I had to Force Quit which then also brought the Jupyter notebook kernel to the ground.  This may wind up being a royal PITA but i'll give it a shot for now.  Worst case I'll switch to a simple python script run from the console.**

# Pong

In [1]:
import pygame
import random

# Initialize Pygame
pygame.init()

# Create a window
width, height = 800, 600  # Window dimensions
window = pygame.display.set_mode((width, height))
pygame.display.set_caption('Pong Game')

# Initialize paddle and ball attributes
paddle_width, paddle_height = 20, 100
ball_radius = 15

# Initial positions
left_paddle_pos = [50, height // 2 - paddle_height // 2]
right_paddle_pos = [width - 50 - paddle_width, height // 2 - paddle_height // 2]
ball_pos = [width // 2, height // 2]

# Ball velocity
ball_velocity = [random.choice([-4, 4]), random.choice([-4, 4])]

# Initialize scores
left_score = 0
right_score = 0

# Main game loop
run = True
while run:
    pygame.time.delay(30)
    for event in pygame.event.get():
        if event.type == pygame.QUIT:
            run = False

    # Create the state representation
    state = (left_paddle_pos[1], right_paddle_pos[1], ball_pos[0], ball_pos[1], ball_velocity[0], ball_velocity[1])

    # Handle paddle movement with boundary checks
    keys = pygame.key.get_pressed()
    if keys[pygame.K_w] and left_paddle_pos[1] > 0:
        left_paddle_pos[1] -= 5
    if keys[pygame.K_s] and left_paddle_pos[1] < height - paddle_height:
        left_paddle_pos[1] += 5
    if keys[pygame.K_UP] and right_paddle_pos[1] > 0:
        right_paddle_pos[1] -= 5
    if keys[pygame.K_DOWN] and right_paddle_pos[1] < height - paddle_height:
        right_paddle_pos[1] += 5

    # Update ball position
    ball_pos[0] += ball_velocity[0]
    ball_pos[1] += ball_velocity[1]

    # Collision detection with walls
    if ball_pos[1] <= 0 or ball_pos[1] >= height:
        ball_velocity[1] = -ball_velocity[1]

    # Collision detection with paddles
    if (left_paddle_pos[0] <= ball_pos[0] <= left_paddle_pos[0] + paddle_width and
        left_paddle_pos[1] <= ball_pos[1] <= left_paddle_pos[1] + paddle_height) or \
       (right_paddle_pos[0] <= ball_pos[0] <= right_paddle_pos[0] + paddle_width and
        right_paddle_pos[1] <= ball_pos[1] <= right_paddle_pos[1] + paddle_height):
        ball_velocity[0] = -ball_velocity[0]

    # Ball reset, scoring, and immediate feedback game-over condition
    if ball_pos[0] < 0:
        right_score += 1  # Right player scores
        ball_pos = [width // 2, height // 2]
        ball_velocity = [random.choice([-4, 4]), random.choice([-4, 4])]
        # Here, you would signal the end of an RL episode and update the agent
    elif ball_pos[0] > width:
        left_score += 1  # Left player scores
        ball_pos = [width // 2, height // 2]
        ball_velocity = [random.choice([-4, 4]), random.choice([-4, 4])]
        # here's where i'll signal the the end of an RL episode and update the agent

    # Draw paddles, ball, and scores
    window.fill((0, 0, 0))  # Clear screen
    pygame.draw.rect(window, (255, 255, 255), left_paddle_pos + [paddle_width, paddle_height])
    pygame.draw.rect(window, (255, 255, 255), right_paddle_pos + [paddle_width, paddle_height])
    pygame.draw.circle(window, (255, 255, 255), ball_pos, ball_radius)

    # Display scores
    font = pygame.font.SysFont(None, 36)
    score_display = font.render(f"{left_score} - {right_score}", True, (255, 255, 255))
    window.blit(score_display, (width // 2 - 20, 10))

    pygame.display.update()
    
pygame.quit()


pygame 2.5.1 (SDL 2.28.2, Python 3.10.9)
Hello from the pygame community. https://www.pygame.org/contribute.html


# Notes

## Implementing Game Mechanics for Pong

### 1. Initialize Pygame and Create Window
- Initialized Pygame and created an 800x600 window for the game.

### 2. Initialize Paddle and Ball Attributes
- Defined the dimensions of the paddles and the ball. Initialized their starting positions.

### 3. Paddle Movement
- Implemented keyboard controls for moving the paddles up and down.

### 4. Ball Movement and Collision Detection
- Added logic for ball movement and collision detection with the walls and paddles.

### 5. Ball Reset and Scoring
- Implemented ball reset and scoring mechanics. The ball resets to the center after a point is scored.

### 6. Paddle Boundaries
- Added boundaries to prevent the paddles from moving out of the window.

### 7. Game Over Conditions
- Implemented immediate feedback game-over conditions. The game resets after each point, serving as an episode in RL terms.


## Defining RL Elements for Pong

### 1. State Representation
- Decide how to represent the state of the game. Consider the trade-offs between granularity and computational complexity.

### 2. Action Space
- Define the set of actions I can take (e.g., move paddle up, move paddle down, stay still).

### 3. Reward Structure
- Design the rewards I receive for various outcomes (e.g., +1 for scoring, -1 for opponent scoring).

### 4. Policy Initialization
- Initialize my policy, which could be a Q-table, a neural network, or some other function mapping states to actions.

### 5. Learning Algorithm
- Choose and implement a learning algorithm (e.g., Q-learning, SARSA, Deep Q-Networks) to update my policy based on experiences.

### 6. Exploration-Exploitation Strategy
- Decide on a strategy for balancing exploration (trying new actions) and exploitation (sticking with known good actions), such as ε-greedy.

### 7. Training Loop
- Implement the training loop where I interact with the environment, update my policy, and optionally log metrics like average reward over time.

### 8. Evaluation Metrics
- Define metrics to evaluate my performance (e.g., average reward, win rate).

### 9. Hyperparameter Tuning
- Experiment with different learning rates, discount factors, and other hyperparameters to optimize performance.

### 10. Testing and Validation
- Test the trained agent to see how well it performs and validate that it is learning effectively.
