# Spark Cluster Master node creation

### Importing Required Libraries ::
* __boto3__: Required to connect as operate AWS task
* __botocore__: Required to handle the exceptions related to boto3 tasks
* __paramiko__: Reuired to run commands inside EC2 instances
* __json__: To convert python native dictionaries to string, to write in files
* __pickle__: To store configuration dictionary which will be used in next notebooks
* __datetime__, __pprint__, __os__, __sys__, __time__: General purpose use

In [1]:
import boto3, botocore, paramiko
from datetime import datetime
import pprint, os, sys, time, json, pickle
from botocore.exceptions import ClientError

### Reading current status and availble details::
* User is allowed to provide specific configurations using provided format of configuration file. If user does not provide any confguration or provides wrong configuration format, then default values will be used. Please check **README.MD** file for default values.

* Along with the user defined variables, we will extract details of Stack.

In [2]:
try:
    with open("cluster_config.json", "r") as config_file:
        user_config = json.load(config_file)

    region = user_config.get('Region', "us-east-1")
    wrk_spc_dir = user_config['WorkspaceDirectory']
    user = user_config.get('UserName', "root")
    cluster_instance_type = user_config.get('InstanceType', "t2.micro")
    cluster_key_pair_path = user_config['KeyPairPath']
    cluster_key_pair_name = user_config['KeyPairName']
    project_tag = user_config.get('ProjectTag', "SparkCluster")
    pickle_file = wrk_spc_dir + "/SparkClusterOnAWSEC2_" + user + "_CurrentStatus.pkl"
    if os.path.exists(pickle_file):
        with open(pickle_file, 'rb') as pickle_handle:
            user_config = pickle.load(pickle_handle)
        cluster_subnet_id = user_config['SubnetList'][0]
        cluster_security_group_list = [user_config['SecurityGroupId']]
        run_id = user_config['RunId']
    else:
        print("Status file '"+ pickle_file + "' is not available, which is unexpected. Please start running from 'cloudformation_stack_creation.ipynb' file.")
        raise Exception
except Exception as e:
    print("Unexpected error while fetching available status: " + str(e))
    exit()

### Creating boto3 session, clients and resources ::

In [3]:
try:
    session = boto3.session.Session(region_name=region)
    ec2_client = session.client('ec2')
    ec2_resource = session.resource('ec2')
except ClientError as e:
    print("Unexpected error while creating boto3 session, client and resources: " + str(e))
    exit()

### Fetching latest Image id ::
This image ID will be used to create the Master node. Following configurations are already done in the Image:
* Spark distribution is already present in the Image
* All required packages to run pyspark is already installed in the Image
* Jupyter notebook is configured
* following command must be executed before spark session/context can be created using this master node:

    _import findspark_
    
    _findspark.init(‘/home/ec2-user/spark-2.4.5-bin-hadoop2.7’)_

    _import pyspark_

In [4]:
try:
    node_images_list = ec2_client.describe_images(
        Filters=[
            {
                'Name': 'tag:Project',
                'Values': [project_tag]
            },
            {
                'Name': 'state',
                'Values': ['available']
            }
        ]
    )
except ClientError as e:
    print("Unexpected error while fetching node images: " + str(e))
    exit()

try:
    node_image_createdates = [(datetime.strptime(img['CreationDate'][:-5], '%Y-%m-%dT%H:%M:%S'), img['ImageId']) for img in node_images_list['Images']]
    latest_image_id = sorted(node_image_createdates, key=lambda x: x[1], reverse=True)[0][1]
    latest_image_id
except Exception as e:
    print("Unexpected error while extracting latest node image: " + str(e))
    exit()

### Check for already running Master Node for current user ::
* Only one master node is allowed per user.
* If a master node is already running, then same the node will be used as master node of current user.

In [5]:
try:
    instances = ec2_resource.instances.filter(
        Filters=[
            {
                'Name': 'instance-state-name',
                'Values': ['running']
            },
            {
                'Name': 'tag:Project',
                'Values': [project_tag]
            },
            {
                'Name': 'tag:User',
                'Values': [user]
            },
            {
                'Name': 'tag:NodeType',
                'Values': ['Master']
            }
        ]
    )
except ClientError as e:
    print("Unexpected error while looking for already running Master node EC2 instance for user-'" + user + "': " + str(e))
    exit()

### Instanciating the EC2 for master node on AWS ::
* __create_instance__ API is used under EC2 resource to instanciate one EC2 node, which will be Master Node of our spark cluster.
* __Instance type__, __key-pair__ name, __subnet id__, __security group list__ is provided as decalred in previous cell.

In [6]:
if list(instances):
    master_node_id = list(instances)[0].id
    print("One master node(InstanceId-'" + str(master_node_id) + "') is already running for user-'" + user + "'. It will be reused as only one master node is allowed per user.")
else:
    print("No master node is running for user-'" + user + "'. New node will be created.")
    try:
        resp = ec2_resource.create_instances(
            BlockDeviceMappings=[
                {
                    'DeviceName': '/dev/xvda',
                    'Ebs': {
                        'DeleteOnTermination': True
                    }
                },
            ],
            ImageId=latest_image_id,
            InstanceType=cluster_instance_type,
            KeyName=cluster_key_pair_name,
            MaxCount=1,
            MinCount=1,
            NetworkInterfaces=[
                {
                    'DeviceIndex': 0,
                    'SubnetId' : cluster_subnet_id,
                    'Groups': cluster_security_group_list,
                    'AssociatePublicIpAddress': True            
                }
            ],
            TagSpecifications=[
                {
                    'ResourceType': 'instance',
                    'Tags': [
                        {
                            'Key': 'Project',
                            'Value': project_tag
                        },
                        {
                            'Key': 'RunId',
                            'Value': run_id
                        },
                        {
                            'Key': 'User',
                            'Value': user
                        },
                        {
                            'Key': 'Name',
                            'Value': 'SparkClusterMaster_' + str(run_id)
                        },
                        {
                            'Key': 'NodeType',
                            'Value': 'Master'
                        }
                    ]
                }
            ]
        )
        master_node_id = resp[0].id
    except ClientError as e:
        print("Unexpected error while creating Spark Cluster Master node EC2 instance for user-'" + user + "': " + str(e))
        exit()

No master node is running for user-'ccbp-dev-user-saumalya'. New node will be created.


### Fetching required information of the Master Node ::
* Need to iterate and probe a few times to check whether the node is up before we can extract the informations

In [7]:
try:
    probe_limit = 60
    for _ in range(1, probe_limit):
        ec2_spark_cluster_master = ec2_client.describe_instances(InstanceIds=[master_node_id])
        if ec2_spark_cluster_master['Reservations'][0]['Instances'][0]['State']['Code'] == 16:
            spark_cluster_master_public_dns = ec2_spark_cluster_master['Reservations'][0]['Instances'][0]['PublicDnsName']
            spark_cluster_master_private_ip = ec2_spark_cluster_master['Reservations'][0]['Instances'][0]['PrivateIpAddress']
            break
        print("Requested EC2 node is still in " + ec2_spark_cluster_master['Reservations'][0]['Instances'][0]['State']['Name'] + " mode. Going to sleep for 10 seconds before next probing.")
        time.sleep(10)
    else:
        print("Requested EC2 node is not up after 10 mins, which is not expected. Please check the status in AWS console. Quiting process!")
        exit()
    print("~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~")
    print("Requested master node is up and running. Public DNS: '" + spark_cluster_master_public_dns + "'.")
    print("~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~")
except Exception as e:
    print("Unexpected error while extracting Spark Cluster Master node details: " + str(e))
    exit()

Requested EC2 node is still in pending mode. Going to sleep for 10 seconds before next probing.
Requested EC2 node is still in pending mode. Going to sleep for 10 seconds before next probing.
Requested EC2 node is still in pending mode. Going to sleep for 10 seconds before next probing.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Requested master node is up and running. Public DNS: 'ec2-54-225-22-151.compute-1.amazonaws.com'.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
