## Glue Setup with Python SDK (boto3)
This notebook will show how to set up some AWS resources using the Python SDK for AWS, boto3.

Boto3 Documentation: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/redshift.html

---

#### Package Import

---

In [2]:
import boto3
import configparser

---

#### Loading Credentials from file

---

In [3]:
#AWS Credentials
aws_path = "/home/rambino/.aws/credentials"
aws_cred = configparser.ConfigParser()
aws_cred.read(aws_path)

['/home/rambino/.aws/credentials']

---

#### Create SSH keypair for connecting to EC2 instances

---

In [4]:
ec2 = boto3.client('ec2',
    region_name             = "us-east-1",
    aws_access_key_id       = aws_cred['default']['aws_access_key_id'],
    aws_secret_access_key   = aws_cred['default']['aws_secret_access_key']
)

---

#### Creating New S3 Bucket

---

In [6]:
s3 = boto3.client(
    "s3",
    region_name="us-east-1",
    aws_access_key_id       = aws_cred['default']['aws_access_key_id'],
    aws_secret_access_key   = aws_cred['default']['aws_secret_access_key']
)

In [7]:
bucketName = "glue-kstine-bucket"

s3_response = s3.create_bucket(
    Bucket = bucketName,
    CreateBucketConfiguration = {
        'LocationConstraint':'eu-central-1'
    }
)

s3_response

ClientError: An error occurred (AccessDenied) when calling the CreateBucket operation: Access Denied

---

#### Setting up VPC for Glue

---

Creating default VPC:

In [8]:
!aws ec2 create-default-vpc --profile default


An error occurred (DefaultVpcAlreadyExists) when calling the CreateDefaultVpc operation: A Default VPC already exists for this account in this region.


Getting route table ID and VPC ID:

[more info on route tables](https://docs.aws.amazon.com/vpc/latest/userguide/VPC_Route_Tables.html)

In [9]:
rt_output = ec2.describe_route_tables()
route_table_id = rt_output['RouteTables'][0]['RouteTableId']
vpc_id = rt_output['RouteTables'][0]['VpcId']

---

#### Creating VPC Endpoint

---

In [10]:
#AWS VPCs have a defined set of 'services' - the service name we set up our endpoint with must come from this list.
paginator = ec2.get_paginator('describe_vpc_endpoint_services')
res = paginator.paginate()
services = [x for x in res]
services = services[0]['ServiceNames']

In [11]:
import re

pat = re.compile('.*s3.*')
print([x for x in services if pat.match(x)])

#NOTE: I was surprised to only find 'us-east-1' locations in the service endpoints above - particularly when this region was 'invalid'
#when I tried to use it for the S3 bucket creation. Maybe us-east-1 is just a 'default' and will still be able to access my bucket anywhere?
serviceName = 'com.amazonaws.us-east-1.s3'

['com.amazonaws.s3-global.accesspoint', 'com.amazonaws.us-east-1.s3', 'com.amazonaws.us-east-1.s3', 'com.amazonaws.us-east-1.s3-outposts']


In [12]:
response = ec2.create_vpc_endpoint(
    VpcId=vpc_id,
    ServiceName=serviceName, #Configurable, arbitrary name
    RouteTableIds=[route_table_id]
)

response
#Now, we have an endpoint which will allow the AWS Glue jobs to reach out to S3 and thereby access the data we have stored there.

ClientError: An error occurred (RouteAlreadyExists) when calling the CreateVpcEndpoint operation: route table rtb-00b37904147f08aad already has a route with destination-prefix-list-id pl-63a5400a

---

#### Creating IAM role for Glue to access S3

---

In [13]:
iam = boto3.client('iam',
    region_name             = "us-east-1",
    aws_access_key_id       = aws_cred['default']['aws_access_key_id'],
    aws_secret_access_key   = aws_cred['default']['aws_secret_access_key']
)

In [27]:
import json

iamRole = iam.create_role(
    Path="/",
    RoleName="Glue_General_Service_Role",
    Description="Allows glue to access general resources",
    AssumeRolePolicyDocument=json.dumps(
        {   
            "Version": "2012-10-17",
            "Statement": [
                {
                    "Effect": "Allow",
                    "Action": "sts:AssumeRole",
                    "Principal": {"Service": "glue.amazonaws.com"},
                }
            ],
        }
    )
)

iamRole

{'Role': {'Path': '/',
  'RoleName': 'Glue_General_Service_Role',
  'RoleId': 'AROA44VBDA5QZX3DN4WGA',
  'Arn': 'arn:aws:iam::886174844769:role/Glue_General_Service_Role',
  'CreateDate': datetime.datetime(2023, 8, 29, 18, 27, 40, tzinfo=tzutc()),
  'AssumeRolePolicyDocument': {'Version': '2012-10-17',
   'Statement': [{'Effect': 'Allow',
     'Action': 'sts:AssumeRole',
     'Principal': {'Service': 'glue.amazonaws.com'}}]}},
 'ResponseMetadata': {'RequestId': '281ad9bf-a9c2-423a-bce4-6991aceb76e8',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amzn-requestid': '281ad9bf-a9c2-423a-bce4-6991aceb76e8',
   'content-type': 'text/xml',
   'content-length': '802',
   'date': 'Tue, 29 Aug 2023 18:27:39 GMT'},
  'RetryAttempts': 0}}

---

### Give Glue access to previously-created S3 bucket

---

In [28]:
response = iam.put_role_policy(
    RoleName='Glue_General_Service_Role',
    PolicyName='S3Access',
    PolicyDocument=json.dumps(
        {
            "Version":"2012-10-17",
            "Statement": [
                {
                    "Sid": "ListObjectsInBucket",
                    "Effect": "Allow",
                    "Action": [ "s3:ListBucket" ],
                    "Resource": [ f"arn:aws:s3:::{bucketName}" ]
                },
                { 
                    "Sid": "AllObjectActions",
                    "Effect": "Allow",
                    "Action": "s3:*Object",
                    "Resource": [ f"arn:aws:s3:::{bucketName}/*" ]
                }
            ]
        }
    )
)

response

{'ResponseMetadata': {'RequestId': 'd210a91a-9ac6-4d13-be05-9a727f1f1e01',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amzn-requestid': 'd210a91a-9ac6-4d13-be05-9a727f1f1e01',
   'content-type': 'text/xml',
   'content-length': '206',
   'date': 'Tue, 29 Aug 2023 18:28:02 GMT'},
  'RetryAttempts': 0}}

---

### Applying general Glue Policy
This apparently gives Glue access to an array of general-purpose tools it needs to do what it does.
I'm not sure why these roles need to be given explicitly if they're pretty standard.


**Note:** Roles are a way to organize permissions. Policies are sets of permissions that can be attached to roles.

---

In [29]:
#Loading very long policy from local file:
policy_path = "./glue_iam_role.json"
with open(policy_path) as file:
    policy_json = file.read()


#aws iam put-role-policy --role-name my-glue-service-role --policy-name GlueAccess --policy-document
response = iam.put_role_policy(
    RoleName='Glue_General_Service_Role',
    PolicyName='GlueAccess',
    PolicyDocument=policy_json
)
response


{'ResponseMetadata': {'RequestId': '17721d99-f711-4e3d-9fa6-a9ade3f94864',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amzn-requestid': '17721d99-f711-4e3d-9fa6-a9ade3f94864',
   'content-type': 'text/xml',
   'content-length': '206',
   'date': 'Tue, 29 Aug 2023 18:28:09 GMT'},
  'RetryAttempts': 0}}