# Input Pipeline
To make observations of the Geometry Dash screen (the environment) requires us to create a pipeline. We need to both record the GD screen, as well as preprocess it before feeding it to our model.

### Why preprocess?
Let's say you take a frame from GD. Then you take the next one. The pixels between frame 1 and 2 will not look the same mathematically, yet intuitively we know that most of the values have only shifted over by a small transformations. It may be more effective to thus preprocess the image to get important features out of the image first.

### What is frame stacking?
You need multiple frames to sense a "movement." Think about a single frame of the game "Pong"--which way is the ball going?

## Stream Game
We will use the virtual camera provided on OBS Studio and opencv-python to read input.

In [None]:
import cv2

VIDEO_IDX = 1 # adjust to find OBS virtual camera
# usually 0, unless u have another camera

In [None]:
cap = cv2.VideoCapture(VIDEO_IDX) # get camera 

if not cap.isOpened():
    print('cannot access OBS')
else:
    while True:
        ret, frame = cap.read()
        if not ret:
            break

        cv2.imshow('GD Stream', frame)
        if cv2.waitKey(1) & 0xFF == ord('q'): # press q to get rid of the screen
            break
    
    # release resources
    cap.release()
    cv2.destroyAllWindows()

## CNN Preprocessing
We will use the pre-trained ResNet50 model available from PyTorch.

In [4]:
# Imports 
from torchvision.models import resnet50, ResNet50_Weights
import torch

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
device

device(type='cuda')

In [5]:
# CNN (use ResNet50)

# initialize weights/preprocessing steps
weights = ResNet50_Weights.DEFAULT
preprocess = weights.transforms()

# get model
model = resnet50(weights=weights).to(device)
model.eval()

# get model minus last layer (get embeddings)
feature_extractor = torch.nn.Sequential(*(list(model.children())[:-1])).to(device)

Downloading: "https://download.pytorch.org/models/resnet50-11ad3fa6.pth" to C:\Users\Jerry Chen/.cache\torch\hub\checkpoints\resnet50-11ad3fa6.pth
100.0%


In [6]:
def get_img_embedding(img):
    # add batch dim
    img_transformed = preprocess(img).unsqueeze(0).to(device)

    with torch.no_grad():
        embedding = feature_extractor(img_transformed).view(1, -1)
    return embedding

In [13]:
from PIL import Image

In [18]:
img = Image.open('test.png').convert("RGB")
embed = get_img_embedding(img)
print(embed)
print(embed.shape)

tensor([[0.0000, 0.0000, 0.1878,  ..., 0.0123, 0.0346, 0.0000]],
       device='cuda:0')
torch.Size([1, 2048])


## Frame Stacker
Custom built to accomodate for any embedding model.

In [1]:
from collections import deque
import numpy as np

In [3]:
class FrameStacker:
    def __init__(self, stack_size=4, embedding_dim=2048):
        self.stack_size = stack_size
        self.embedding_dim = embedding_dim
        self.stack = deque(maxlen=stack_size)  # Fixed-size buffer

    def reset_stack(self, initial_embedding):
        self.stack.clear()
        for _ in range(self.stack_size): # fill with copies of first frame
            self.stack.append(initial_embedding)
        return self._get_stacked_embeddings()

    def add_frame(self, embedding):
        self.stack.append(embedding)
        return self._get_stacked_embeddings() # also returns current stacked embedings

    def _get_stacked_embeddings(self):
        return np.concatenate(self.stack, axis=0)