# Borrowed ideas from https://github.com/nuno-faria/tetris-ai/blob/master/README.md
## Including:
### - what properties to look for in a board (lines about to be cleared), holes, bumpiness, total height (added next piece preview and swap piece, and coded on own)
### - how to set up and run training loop

# RL Tetris Project Outline:
0) Clean up code
    - separate classes from main loop and RL components (easier to run 1 cell at the moment)
    - remove legacy comments
    - make variables any magic numbers (mainly coordinates)
    - make production ready
1) [x] Create figure generator
    - [x] Choose one of 7 blocks (O, I, S, Z, J, L, T)
        - [x] Figure out color for block
        - [x] List out rotations for each block
2) Generate board/figure
    - [x] Adjust general layout for board
        - [x] timer
        - [x] score display
        - [x] lines display
        - level display 
    - Implement buttons (not essential atm)
        - new game button (esc for now)
        - reset button
        - pause button
    - Implement debugger/ replay buff
        - back (last n = 5 blocks, and game states, probably useful for debugging)
        * need to store matrix, queue, swap piece and time? Reset all. Not sure how to deal with other saved states?
    - [x] area for next blocks (show up to k=3)
    - [x] area for swapping blocks
3) Game logic
    - clean up timer for piece movement
    - [x] set up block shadow where piece will land (useful for computer mode)
    - Set up controls for piece movement
        - [x] human mode: button presses
        - [x] computer mode (render or not)
    - [x] make piece generator function
        - [x] go to next piece queue to display
        - [x] previous next piece gets bumped to board (above game)
    - [x] make movement functions
        - [x] left, right, soft drop, hard drop
        - [x] CW rotate, CCW rotate
        - [x] kick checks for rotations (only checked example shown here: https://tetris.fandom.com/wiki/SRShttps://tetris.fandom.com/wiki/SRS, HARD to check (I'm not a skilled Tetris player)!
    - update screen function
        - [x] move piece
        - check for silver/golden squares
        - check for line clears
            - [x] drop pieces above (already implemented in original code)
            - [x] adjust score (could be updated)
            - [x] adjust lines
            - [x] adjust level
            - play sound
           
4) RL Algorithm
    - review articles on making tetris and how to avoid getting stuck with delayed reward
    - research if possible to help model train by giving examples (showing how to clear lines, probably slow)
    - [x] create option for human input and computer with rendering (allow policy to be chosen, random or model)
        - [x] Make computer play random moves sampling from action space of allowable moves
        - Eventually swap to play best model moves and render
        - Develop algorithm to figure out 'all' (maybe most since getting every landing seems super difficult) landing spots and moves to get there
    - Create OpenAI gym tetris environment
    - Train model
        - Test different action spaces
            - Simple movements
            - complex movements
        - test different algorithms
            - double q
            - PPO
            - add more here...
        - test if adding heuristics as features improves convergence
    - PROBABLY MUCH MORE TO DO IN THIS SECTION
5) Streamlit App bonus
    - allow user to play vs computer


## To manually move around next pieces

In [1]:
import tetris_environment
tetris_env = tetris_environment.Tetris_Env

pygame 2.1.0 (SDL 2.0.16, Python 3.9.13)
Hello from the pygame community. https://www.pygame.org/contribute.html


In [3]:
game = tetris_env()
game.reset(render_mode = 'human');

In [1]:
# Things for testing:
# game.step(action_dict={'swap': 0, 'rotation': 0, 'shift': 0});
# game.do_naive_action(action = game.actions['swap'])
# game.game.get_properties()
# game.game.get_next_states()

# TO PLAY THE GAME: 

In [2]:
# game.play_game()

In [None]:
# SETTING UP NEURAL NETWORK TO PLAY GAME

from tensorflow import keras
from keras.models import Model
from keras.optimizers import RMSprop, Adam
from keras.layers import Dense, Flatten, Input
from keras.layers.convolutional import Conv2D
from tensorflow.keras import Sequential
from collections import deque
from tqdm import tqdm

# Deep Q-learning Agent
#  code modified from https://keon.github.io/deep-q-learning/
# and  https://github.com/nuno-faria/tetris-ai/blob/master/README.md

class DQNAgent:
    def __init__(self, state_size, action_size):
        self.state_size = state_size
        self.action_size = action_size
        self.memory = deque(maxlen=2000)
        self.gamma = 0.995    # discount rate
        self.epsilon = 1.0  # exploration rate
        self.epsilon_min = 0.01
        self.epsilon_decay = 0.995
        self.learning_rate = 0.001
        self.model = self._build_model()

    def _build_model(self):
        # Neural Net for Deep-Q learning Model
        model = Sequential()
        # print(f'input shape = {(self.state_size,)}')
        # print(f'output shape = {self.action_size}')

        model.add(Dense(256, input_shape=(self.state_size,), activation='relu'))
        model.add(Dense(256, activation='relu'))
        model.add(Dense(self.action_size, activation='linear'))
        model.compile(loss='mse',
                      optimizer=Adam(learning_rate=self.learning_rate))
        return model

    def memorize(self, state, action, reward, next_state, done):
        self.memory.append((state, action, reward, next_state, done))

    # state here is linear list of values for neural network, NOT dictionary
    def act(self, state):
        if np.random.rand() <= self.epsilon:
            return random.randrange(self.action_size)
        act_values = self.model.predict(state)
        return np.argmax(act_values[0])  # returns action

    def replay(self, batch_size):
        minibatch = random.sample(self.memory, batch_size)
        for state, action, reward, next_state, done in minibatch:
            target = reward
            if not done:
                target = reward + self.gamma * \
                       np.amax(self.model.predict(next_state)[0])
            target_f = self.model.predict(state)
            target_f[0][action] = target
            self.model.fit(state, target_f, epochs=1, verbose=0)
        if self.epsilon > self.epsilon_min:
            self.epsilon *= self.epsilon_decay

In [None]:
# initialize gym environment and the agent
env = Tetris_Env()
state, _ = env.reset()
def convert_state_to_linear(state):
    # state is dictionary of observations
    input_features =  state["board"].flatten()
    input_features = np.append(input_features, [state["agent"]["x"], 
                                                state["agent"]["y"],
                                                state["agent"]["piece"],
                                                state["agent"]["rotation"],
                                                state["swap"],
                                                state["has_swapped"]
                                               ]
                              )

    return np.reshape(input_features, [1,len(input_features)])
state_size = convert_state_to_linear(state).shape[1]
# magic number, 
# actions = ['no_op', 'left', 'right', 'down', 'cw', 'ccw', 'swap', 'hard']

action_size = 8
agent = DQNAgent(state_size, action_size)


In [None]:



# Iterate the game
episodes = 2
for e in tqdm(range(episodes)):

    # reset state in the beginning of each game
    # env.reset returns observation, info
    state_observation, _ = env.reset()
    state = convert_state_to_linear(state_observation)

    # time_t represents each frame of the game
    #!!!!!!!!!!!!! Fix the number of frames
    # print(f'first state has shape {state.shape}')
    
    max_frames = 5000
    for time_t in range(max_frames):
        # Decide action
        action = agent.act(state)

        # Advance the game to the next frame based on the action.
        #  env.step(action) returns: observation, reward, self.game.state == 'gameover', False, info
        next_state_obs, reward, done, _, _ = env.step(action) # step makes env make the frame move ahead for us
        next_state = convert_state_to_linear(next_state_obs)

        # memorize the previous state, action, reward, and done
        agent.memorize(state, action, reward, next_state, done)

        # make next_state the new current state for the next frame.
        state = next_state

        # done becomes True when the game ends
        # ex) The agent drops the pole
        if done:
            # print the score and break out of the loop
            print(f'episode: {e}/{episodes}, reward: {reward}, time = {time_t}')

            break

    # train the agent with the experience of the episode
    agent.replay(32)

In [None]:
for index, row in enumerate([[0] * 10 for _ in range(20)]):
    print(row)

In [None]:
x = [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0], 
     [0, 0, 0, 0, 0, 0, 0, 0, 0, 0], 
     [0, 0, 0, 0, 0, 5, 5, 0, 0, 0], 
     [0, 0, 0, 0, 5, 5, 0, 0, 0, 0], 
     [0, 0, 0, 0, 1, 0, 0, 0, 0, 0], 
     [0, 0, 0, 0, 1, 0, 0, 0, 0, 0], 
     [0, 0, 0, 0, 1, 0, 0, 0, 0, 0], 
     [0, 0, 0, 0, 1, 5, 5, 0, 0, 0], 
     [0, 0, 0, 6, 5, 5, 0, 0, 0, 0], 
     [7, 7, 6, 6, 6, 0, 0, 0, 0, 0], 
     [0, 7, 7, 0, 0, 0, 0, 0, 0, 0], 
     [0, 5, 0, 0, 0, 0, 0, 0, 1, 0], 
     [0, 5, 5, 0, 0, 0, 0, 0, 1, 0], 
     [0, 0, 5, 0, 0, 0, 0, 0, 1, 0], 
     [0, 0, 4, 4, 0, 5, 5, 0, 1, 0], 
     [0, 0, 4, 4, 5, 5, 0, 0, 5, 0], 
     [0, 0, 2, 2, 2, 0, 0, 0, 5, 5], 
     [0, 0, 0, 0, 2, 0, 0, 7, 7, 5], 
     [0, 0, 0, 0, 3, 4, 4, 0, 7, 7], 
     [0, 0, 0, 0, 3, 4, 4, 0, 2, 2], 
     [0, 0, 0, 1, 3, 3, 0, 0, 2, 0], 
     [0, 7, 0, 1, 0, 3, 0, 0, 2, 0], 
     [7, 7, 0, 1, 0, 3, 0, 4, 4, 0], 
     [7, 0, 0, 1, 0, 3, 3, 4, 4, 0]]
x = np.where(np.array(x) != 0, 1, 0)
[] + x.flatten().tolist()

In [None]:
# from: 
# https://github.com/nuno-faria/tetris-ai
# tetris_ai.py
game = Tetris_Env(render_mode = 'rgb_array')    

# when to stop training?
episodes = 2000
max_steps = None
# epsilon_stop_episode = 1500
mem_size = 20000
discount = 0.95
batch_size = 512
epochs = 1
# render_every = 50
log_every = 50
replay_start_size = 2000
train_every = 1
n_neurons = [32, 32]
# render_delay = None

agent = DQNAgent(env.get_state_size(),
                 n_neurons=n_neurons, activations=activations,
                 epsilon_stop_episode=epsilon_stop_episode, mem_size=mem_size,
                 discount=discount, replay_start_size=replay_start_size)

log_dir = f'logs/tetris-nn={str(n_neurons)}-mem={mem_size}-bs={batch_size}-e={epochs}-{datetime.now().strftime("%Y%m%d-%H%M%S")}'
log = CustomTensorBoard(log_dir=log_dir)

scores = []

for episode in tqdm(range(episodes)):
    current_state = env.reset()
    done = False
    steps = 0

In [None]:
from keras.models import Sequential, save_model, load_model
from keras.layers import Dense
from collections import deque
import numpy as np
import random

# Deep Q Learning Agent + Maximin
#
# This version only provides only value per input,
# that indicates the score expected in that state.
# This is because the algorithm will try to find the
# best final state for the combinations of possible states,
# in constrast to the traditional way of finding the best
# action for a particular state.
class DQNAgent:

    '''Deep Q Learning Agent + Maximin
    Args:
        state_size (int): Size of the input domain
        mem_size (int): Size of the replay buffer
        discount (float): How important is the future rewards compared to the immediate ones [0,1]
        epsilon (float): Exploration (probability of random values given) value at the start
        epsilon_min (float): At what epsilon value the agent stops decrementing it
        epsilon_stop_episode (int): At what episode the agent stops decreasing the exploration variable
        n_neurons (list(int)): List with the number of neurons in each inner layer
        activations (list): List with the activations used in each inner layer, as well as the output
        loss (obj): Loss function
        optimizer (obj): Otimizer used
        replay_start_size: Minimum size needed to train
    '''

    def __init__(self, state_size, mem_size=10000, discount=0.95,
                 epsilon=1, epsilon_min=0, epsilon_stop_episode=500,
                 n_neurons=[32,32], activations=['relu', 'relu', 'linear'],
                 loss='mse', optimizer='adam', replay_start_size=None):

        assert len(activations) == len(n_neurons) + 1

        self.state_size = state_size
        self.memory = deque(maxlen=mem_size)
        self.discount = discount
        self.epsilon = epsilon
        self.epsilon_min = epsilon_min
        self.epsilon_decay = (self.epsilon - self.epsilon_min) / (epsilon_stop_episode)
        self.n_neurons = n_neurons
        self.activations = activations
        self.loss = loss
        self.optimizer = optimizer
        if not replay_start_size:
            replay_start_size = mem_size / 2
        self.replay_start_size = replay_start_size
        self.model = self._build_model()


    def _build_model(self):
        '''Builds a Keras deep neural network model'''
        model = Sequential()
        model.add(Dense(self.n_neurons[0], input_dim=self.state_size, activation=self.activations[0]))

        for i in range(1, len(self.n_neurons)):
            model.add(Dense(self.n_neurons[i], activation=self.activations[i]))

        model.add(Dense(1, activation=self.activations[-1]))

        model.compile(loss=self.loss, optimizer=self.optimizer)
        
        return model


    def add_to_memory(self, current_state, next_state, reward, done):
        '''Adds a play to the replay memory buffer'''
        self.memory.append((current_state, next_state, reward, done))


    def random_value(self):
        '''Random score for a certain action'''
        return random.random()


    def predict_value(self, state):
        '''Predicts the score for a certain state'''
        return self.model.predict(state)[0]


    def act(self, state):
        '''Returns the expected score of a certain state'''
        state = np.reshape(state, [1, self.state_size])
        if random.random() <= self.epsilon:
            return self.random_value()
        else:
            return self.predict_value(state)


    def best_state(self, states):
        '''Returns the best state for a given collection of states'''
        max_value = None
        best_state = None

        if random.random() <= self.epsilon:
            return random.choice(list(states))

        else:
            for state in states:
                value = self.predict_value(np.reshape(state, [1, self.state_size]))
                if not max_value or value > max_value:
                    max_value = value
                    best_state = state

        return best_state


    def train(self, batch_size=32, epochs=3):
        '''Trains the agent'''
        n = len(self.memory)
    
        if n >= self.replay_start_size and n >= batch_size:

            batch = random.sample(self.memory, batch_size)

            # Get the expected score for the next states, in batch (better performance)
            next_states = np.array([x[1] for x in batch])
            next_qs = [x[0] for x in self.model.predict(next_states)]

            x = []
            y = []

            # Build xy structure to fit the model in batch (better performance)
            for i, (state, _, reward, done) in enumerate(batch):
                if not done:
                    # Partial Q formula
                    new_q = reward + self.discount * next_qs[i]
                else:
                    new_q = reward

                x.append(state)
                y.append(new_q)

            # Fit the model to the given values
            self.model.fit(np.array(x), np.array(y), batch_size=batch_size, epochs=epochs, verbose=0)

            # Update the exploration variable
            if self.epsilon > self.epsilon_min:
                self.epsilon -= self.epsilon_decay

In [None]:
import numpy as np
np.amax([[1,2,3], [3,6,10]],1)

In [None]:
import matplotlib.pyplot as plt
x = [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0], 
     [0, 0, 0, 0, 0, 0, 0, 0, 0, 0], 
     [0, 0, 0, 0, 0, 5, 5, 0, 0, 0], 
     [0, 0, 0, 0, 5, 5, 0, 0, 0, 0], 
     [0, 0, 0, 0, 1, 0, 0, 0, 0, 0], 
     [0, 0, 0, 0, 1, 0, 0, 0, 0, 0], 
     [0, 0, 0, 0, 1, 0, 0, 0, 0, 0], 
     [0, 0, 0, 0, 1, 5, 5, 0, 0, 0], 
     [0, 0, 0, 6, 5, 5, 0, 0, 0, 0], 
     [7, 7, 6, 6, 6, 0, 0, 0, 0, 0], 
     [0, 7, 7, 0, 0, 0, 0, 0, 0, 0], 
     [0, 5, 0, 0, 0, 0, 0, 0, 1, 0], 
     [0, 5, 5, 0, 0, 0, 0, 0, 1, 0], 
     [0, 0, 5, 0, 0, 0, 0, 0, 1, 0], 
     [0, 0, 4, 4, 0, 5, 5, 0, 1, 0], 
     [0, 0, 4, 4, 5, 5, 0, 0, 5, 0], 
     [0, 0, 2, 2, 2, 0, 0, 0, 5, 5], 
     [0, 0, 0, 0, 2, 0, 0, 7, 7, 5], 
     [0, 0, 0, 0, 3, 4, 4, 0, 7, 7], 
     [0, 0, 0, 0, 3, 4, 4, 0, 2, 2], 
     [0, 0, 0, 1, 3, 3, 0, 0, 2, 0], 
     [0, 7, 0, 1, 0, 3, 0, 0, 2, 0], 
     [7, 7, 0, 1, 0, 3, 0, 4, 4, 0], 
     [7, 0, 0, 1, 0, 3, 3, 4, 4, 0]]
# x = np.where(np.array(x) != 0, 1, 0)
x[:][:]

In [None]:
x[0]

In [None]:
x = spaces.Box(low=0, high=1, 
           shape=(24, 10), 
           dtype=int)

In [None]:
y = x.sample()
y, y.tolist()

In [None]:
[None]*3

In [None]:
2000000*(2000000-1)/2 - tot