# Assignment 10 - 705.603 - AWS Blackjack Optimization 
## Student Name: Ravindra Sadaphule

### Explanation:
The architecture of our agent's neural network is simple, consisting of three fully connected layers and ReLU activation functions. The input represents the current state, while the output provides action probabilities. The action with the highest probability is selected.

Initially, we set epsilon (ε) to 0.99, which means the agent has a 99% likelihood of taking a random action in the first hand for exploration purposes. The epsilon decay rate is set to 0.995, allowing ε to decrease over time and enabling the agent to increasingly exploit prior experiences in its decision-making. We also establish a lower ε boundary of 0.02 to ensure exploration throughout the training process.

Furthermore, we assign a discount factor (γ) of 0.99 to our agent, emphasizing the significance of immediate rewards since a single Blackjack game could end right after the cards are dealt. We set the learning rate (α) to 3e-4.

A critical aspect of all Reinforcement Learning algorithms is the reward function. We use OpenAI's Blackjack environment for training, which provides a +1 reward for winning and -1 for losing. However, this does not enable our agent to consider position (bet) size and learn about risk management.

To address this limitation, we multiply the reward by the raw probability derived from our agent's neural network output. This approach allows the agent to place larger bets on hands it deems more favorable and smaller bets on less favorable hands.


## Blackjack

### Environment Details

    ### Action Space
    There are two actions: stick (0), and hit (1).
    
    ### Observation Space
    Tuple(Discrete(32), Discrete(11), Discrete(2))
    The observation consists of a 3-tuple containing: 
        1. the player's current sum
        2. the value of the dealer's one showing card (1-10 where 1 is ace)
        3. whether the player holds a usable ace (0 or 1).
        
    ### Rewards
    - win game: +1
    - lose game: -1
    - draw game: 0
    - win game with natural blackjack:
        +1.5 (if natural is True)
        +1 (if natural is False)

In [81]:
! pip install -U gym
! pip install -U torch
! pip install gym[toy_text]

[0m

In [82]:
!pip install numpy matplotlib boto3

[0m

In [83]:
# import libraries
import boto3, re, sys, math, json, os, sagemaker, urllib.request
from sagemaker import get_execution_role
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from IPython.display import Image
from IPython.display import display
from time import gmtime, strftime
from sagemaker.predictor import csv_serializer

# Define IAM role
role = get_execution_role()

In [84]:
print(role)

arn:aws:iam::966444174817:role/aws-sagemaker-role


In [85]:
bucket_name = 'ravindrasadaphule-s3-bucket2' # <--- CHANGE THIS VARIABLE TO A UNIQUE NAME FOR YOUR BUCKET
s3 = boto3.resource('s3')
try:
    if  my_region == 'us-east-1':
      s3.create_bucket(Bucket=bucket_name)
    else: 
      s3.create_bucket(Bucket=bucket_name, CreateBucketConfiguration={ 'LocationConstraint': my_region })
    print('S3 bucket created successfully')
except Exception as e:
    print('S3 error: ',e)

S3 error:  An error occurred (BucketAlreadyOwnedByYou) when calling the CreateBucket operation: Your previous request to create the named bucket succeeded and you already own it.


# Import libraries and Create Class to Display Cards

In [86]:
!pip install gym

[0m

In [87]:
!pip install pygame

[0m

In [88]:
import sagemaker
from sagemaker.rl import RLEstimator, RLToolkit, RLFramework
import boto3

In [89]:
# run in local mode?
local_mode = False

if local_mode:
    instance_type = "local"
else:
    instance_type = "ml.m4.4xlarge"

In [90]:
try:
    role = sagemaker.get_execution_role()
except:
    role = get_execution_role()

print("Using IAM role arn: {}".format(role))

Using IAM role arn: arn:aws:iam::966444174817:role/aws-sagemaker-role


### Train RLEstimator with DWL for Blackjack.
### The actual Training code is located in blackjack_train.py

In [92]:

# Specify S3 bucket and prefix
sagemaker_session = sagemaker.Session()
s3_bucket = sagemaker_session.default_bucket()
s3_output_path = f's3://{s3_bucket}/blackjack/'

# Create an RLEstimator
estimator = RLEstimator(entry_point="blackjack_train.py",
                        source_dir=".",
                        toolkit=RLToolkit.COACH,
                        toolkit_version='0.11.0',
                        framework=RLFramework.TENSORFLOW,
                        role=sagemaker.get_execution_role(),
                        instance_type="ml.m4.xlarge",
                        instance_count=1,
                        output_path=s3_output_path,
                        base_job_name="blackjack-custom",
                        hyperparameters={
                            "num_episodes": 500000,
                            "epsilon": 0.9,
                            "gamma": 0.1,
                            "alpha": 0.3,
                            "decay_rate":0.005
                            
                        })

# Train the model
estimator.fit()

INFO:sagemaker:image_uri is not presented, retrieving image_uri based on instance_type, framework etc.
INFO:sagemaker.image_uris:Defaulting to only available Python version: py3
INFO:sagemaker:Creating training-job with name: blackjack-custom-2023-04-07-04-16-46-579


2023-04-07 04:18:21 Starting - Starting the training job...
2023-04-07 04:18:36 Starting - Preparing the instances for training......
2023-04-07 04:19:56 Downloading - Downloading input data
2023-04-07 04:19:56 Training - Downloading the training image...
2023-04-07 04:20:17 Training - Training image download completed. Training in progress..[34mbash: cannot set terminal process group (-1): Inappropriate ioctl for device[0m
[34mbash: no job control in this shell[0m
[34m2023-04-07 04:20:28,170 sagemaker-containers INFO     Imported framework sagemaker_tensorflow_container.training[0m
[34m2023-04-07 04:20:28,174 sagemaker-containers INFO     No GPUs detected (normal if no gpus installed)[0m
[34m2023-04-07 04:20:53,745 sagemaker-containers INFO     No GPUs detected (normal if no gpus installed)[0m
[34m2023-04-07 04:20:53,760 sagemaker-containers INFO     Invoking user script[0m
[34mTraining Env:[0m
[34m{
    "additional_framework_parameters": {
        "sagemaker_estimator"

UnexpectedStatusException: Error for Training job blackjack-custom-2023-04-07-04-16-46-579: Failed. Reason: AlgorithmError: ExecuteUserScriptError:
Command "/usr/bin/python blackjack_train.py --alpha 0.3 --decay_rate 0.005 --epsilon 0.9 --gamma 0.1 --num_episodes 500000", exit code: 1

In [None]:

# Deploy my estimator to a SageMaker Endpoint and get a MXNetPredictor
predictor = estimator.deploy(instance_type='ml.m4.xlarge',
                                initial_instance_count=1)



In [None]:
response = predictor.predict(data)