## Glue Setup with Python SDK (boto3)
This notebook will show how to set up some AWS resources using the Python SDK for AWS, boto3.

Boto3 Documentation: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/redshift.html

---

#### Package Import

---

In [14]:
import boto3
import configparser

---

#### Loading Credentials from file

---

In [15]:
#AWS Credentials
aws_path = "/home/rambino/.aws/credentials"
aws_cred = configparser.ConfigParser()
aws_cred.read(aws_path)

['/home/rambino/.aws/credentials']

In [21]:
s3 = boto3.client(
    "s3",
    region_name="us-east-1",
    aws_access_key_id       = aws_cred['kevin_aws_account']['aws_access_key_id'],
    aws_secret_access_key   = aws_cred['kevin_aws_account']['aws_secret_access_key']
)

bucketName = "glue-kstine-bucket-udacity"

---

#### Creating New S3 Bucket

---

In [22]:
s3_response = s3.create_bucket(
    Bucket = bucketName,
    CreateBucketConfiguration = {
        'LocationConstraint':'eu-central-1'
    }
)

s3_response

{'ResponseMetadata': {'RequestId': '7GDN72Q57J7B41EK',
  'HostId': 'yFYNeWTmzuPOegM6s2uagpsqmMl0EXN4SnvDLZiROe05cLkDasScJdLqLReEgdrmGzRdX+u91j0=',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amz-id-2': 'yFYNeWTmzuPOegM6s2uagpsqmMl0EXN4SnvDLZiROe05cLkDasScJdLqLReEgdrmGzRdX+u91j0=',
   'x-amz-request-id': '7GDN72Q57J7B41EK',
   'date': 'Thu, 31 Aug 2023 19:31:42 GMT',
   'location': 'http://glue-kstine-bucket-udacity.s3.amazonaws.com/',
   'server': 'AmazonS3',
   'content-length': '0'},
  'RetryAttempts': 0},
 'Location': 'http://glue-kstine-bucket-udacity.s3.amazonaws.com/'}

---

### Uploading test data to bucket:

---

In [23]:
s3.upload_file(
    Filename="./customer-data.json",
    Bucket=bucketName,
    Key="customers/customer-data.json"
)


---

#### Setting up VPC for Glue

---

Creating default VPC:

In [24]:
!aws ec2 create-default-vpc --profile default


An error occurred (DefaultVpcAlreadyExists) when calling the CreateDefaultVpc operation: A Default VPC already exists for this account in this region.


Getting route table ID and VPC ID:

[more info on route tables](https://docs.aws.amazon.com/vpc/latest/userguide/VPC_Route_Tables.html)

In [25]:
ec2 = boto3.client('ec2',
    region_name             = "us-east-1",
    aws_access_key_id       = aws_cred['kevin_aws_account']['aws_access_key_id'],
    aws_secret_access_key   = aws_cred['kevin_aws_account']['aws_secret_access_key']
)

In [26]:
rt_output = ec2.describe_route_tables()
route_table_id = rt_output['RouteTables'][0]['RouteTableId']
vpc_id = rt_output['RouteTables'][0]['VpcId']

---

#### Creating VPC Endpoint

---

In [27]:
#AWS VPCs have a defined set of 'services' - the service name we set up our endpoint with must come from this list.
paginator = ec2.get_paginator('describe_vpc_endpoint_services')
res = paginator.paginate()
services = [x for x in res]
services = services[0]['ServiceNames']

In [28]:
import re

pat = re.compile('.*s3.*')
print([x for x in services if pat.match(x)])

#NOTE: I was surprised to only find 'us-east-1' locations in the service endpoints above - particularly when this region was 'invalid'
#when I tried to use it for the S3 bucket creation. Maybe us-east-1 is just a 'default' and will still be able to access my bucket anywhere?
serviceName = 'com.amazonaws.us-east-1.s3'

['com.amazonaws.s3-global.accesspoint', 'com.amazonaws.us-east-1.s3', 'com.amazonaws.us-east-1.s3', 'com.amazonaws.us-east-1.s3-outposts']


In [29]:
response = ec2.create_vpc_endpoint(
    VpcId=vpc_id,
    ServiceName=serviceName,
    RouteTableIds=[route_table_id]
)

response
#Now, we have an endpoint which will allow the AWS Glue jobs to reach out to S3 and thereby access the data we have stored there.

{'VpcEndpoint': {'VpcEndpointId': 'vpce-033614953131f3cc7',
  'VpcEndpointType': 'Gateway',
  'VpcId': 'vpc-0c1eba71902c3b90f',
  'ServiceName': 'com.amazonaws.us-east-1.s3',
  'State': 'available',
  'PolicyDocument': '{"Version":"2008-10-17","Statement":[{"Effect":"Allow","Principal":"*","Action":"*","Resource":"*"}]}',
  'RouteTableIds': ['rtb-03a49da016f68bbbc'],
  'SubnetIds': [],
  'Groups': [],
  'PrivateDnsEnabled': False,
  'RequesterManaged': False,
  'NetworkInterfaceIds': [],
  'DnsEntries': [],
  'CreationTimestamp': datetime.datetime(2023, 8, 31, 19, 51, 44, tzinfo=tzutc()),
  'OwnerId': '544495716151'},
 'ResponseMetadata': {'RequestId': '71216589-bfb1-4b4d-b8d1-63ea4a46558d',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amzn-requestid': '71216589-bfb1-4b4d-b8d1-63ea4a46558d',
   'cache-control': 'no-cache, no-store',
   'strict-transport-security': 'max-age=31536000; includeSubDomains',
   'vary': 'accept-encoding',
   'content-type': 'text/xml;charset=UTF-8',
   'tran

---

#### Creating IAM role for Glue to access S3

---

In [32]:
iam = boto3.client('iam',
    region_name             = "us-east-1",
    aws_access_key_id       = aws_cred['kevin_aws_account']['aws_access_key_id'],
    aws_secret_access_key   = aws_cred['kevin_aws_account']['aws_secret_access_key']
)

In [33]:
import json

iamRole = iam.create_role(
    Path="/",
    RoleName="Glue_General_Service_Role",
    Description="Allows glue to access general resources",
    AssumeRolePolicyDocument=json.dumps(
        {   
            "Version": "2012-10-17",
            "Statement": [
                {
                    "Effect": "Allow",
                    "Action": "sts:AssumeRole",
                    "Principal": {"Service": "glue.amazonaws.com"},
                }
            ],
        }
    )
)

iamRole

{'Role': {'Path': '/',
  'RoleName': 'Glue_General_Service_Role',
  'RoleId': 'AROAX5RTZI434WT326WPS',
  'Arn': 'arn:aws:iam::544495716151:role/Glue_General_Service_Role',
  'CreateDate': datetime.datetime(2023, 8, 31, 19, 52, 16, tzinfo=tzutc()),
  'AssumeRolePolicyDocument': {'Version': '2012-10-17',
   'Statement': [{'Effect': 'Allow',
     'Action': 'sts:AssumeRole',
     'Principal': {'Service': 'glue.amazonaws.com'}}]}},
 'ResponseMetadata': {'RequestId': 'e92dd858-4034-49aa-b870-9dc05bde3305',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amzn-requestid': 'e92dd858-4034-49aa-b870-9dc05bde3305',
   'content-type': 'text/xml',
   'content-length': '802',
   'date': 'Thu, 31 Aug 2023 19:52:16 GMT'},
  'RetryAttempts': 0}}

---

### Give Glue access to previously-created S3 bucket

---

In [34]:
response = iam.put_role_policy(
    RoleName='Glue_General_Service_Role',
    PolicyName='S3Access',
    PolicyDocument=json.dumps(
        {
            "Version":"2012-10-17",
            "Statement": [
                {
                    "Sid": "ListObjectsInBucket",
                    "Effect": "Allow",
                    "Action": [ "s3:ListBucket" ],
                    "Resource": [ f"arn:aws:s3:::{bucketName}" ]
                },
                { 
                    "Sid": "AllObjectActions",
                    "Effect": "Allow",
                    "Action": "s3:*Object",
                    "Resource": [ f"arn:aws:s3:::{bucketName}/*" ]
                }
            ]
        }
    )
)

response

{'ResponseMetadata': {'RequestId': '0bb82e7e-54ec-444a-a0a3-bd903fcc9b14',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amzn-requestid': '0bb82e7e-54ec-444a-a0a3-bd903fcc9b14',
   'content-type': 'text/xml',
   'content-length': '206',
   'date': 'Thu, 31 Aug 2023 19:52:20 GMT'},
  'RetryAttempts': 0}}

---

### Applying general Glue Policy
This apparently gives Glue access to an array of general-purpose tools it needs to do what it does.
I'm not sure why these roles need to be given explicitly if they're pretty standard.


**Note:** Roles are a way to organize permissions. Policies are sets of permissions that can be attached to roles.

---

In [35]:
#Loading very long policy from local file:
policy_path = "./glue_iam_role.json"
with open(policy_path) as file:
    policy_json = file.read()


#aws iam put-role-policy --role-name my-glue-service-role --policy-name GlueAccess --policy-document
response = iam.put_role_policy(
    RoleName='Glue_General_Service_Role',
    PolicyName='GlueAccess',
    PolicyDocument=policy_json
)
response


{'ResponseMetadata': {'RequestId': 'da4d62e6-01b9-49b5-8b0b-a65962ee40d1',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amzn-requestid': 'da4d62e6-01b9-49b5-8b0b-a65962ee40d1',
   'content-type': 'text/xml',
   'content-length': '206',
   'date': 'Thu, 31 Aug 2023 19:52:30 GMT'},
  'RetryAttempts': 0}}