## Redshift Setup with Python SDK (boto3)
This notebook will show how to set up some AWS resources using the Python SDK for AWS, boto3.

Boto3 Documentation: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/redshift.html

---

#### Package Import

---

In [1]:
import boto3
import re
import configparser

---

#### Loading Credentials from file

---

In [2]:
config = configparser.ConfigParser()

config.read_file(open("/home/rambino/.aws/credentials"))
aws_key         = config.get('udacity_course','aws_access_key_id')
aws_secret      = config.get('udacity_course','aws_secret_access_key')

config.read_file(open("./redshift_credentials.cfg"))
redshift_user   = config.get('redshift_credentials','UN')
redshift_password   = config.get('redshift_credentials','PW')

---

#### Creating IAM role for Redshift

---

In [None]:
iam = boto3.client('iam',
    region_name             = "us-west-2",
    aws_access_key_id       = aws_key,
    aws_secret_access_key   = aws_secret
)

In [None]:
#Create IAM role:

#This policy is something about allowing Redshift to impersonate a user, but I don't really understand it.
#Look more into what "sts:AssumeRole" really means.

import json

dwhRole = iam.create_role(
    Path = "/",
    RoleName =  "RedShift_Impersonation",
    Description = "Allows redshift to access S3",
    AssumeRolePolicyDocument=json.dumps(
        {
            "Version": "2012-10-17",
            "Statement": [
                {
                    "Effect": "Allow",
                    "Action": 'sts:AssumeRole',
                    "Principal":{"Service": "redshift.amazonaws.com"}
                }
            ]
        }
    )
)

dwhRole

In [None]:
role = iam.get_role(RoleName = "Redshift_Impersonation")
role_arn = role['Role']['Arn']

In [None]:
#Attaching IAM policy to the role (which actually gives permissions):

attach_response = iam.attach_role_policy(
    RoleName = "RedShift_Impersonation",
    PolicyArn="arn:aws:iam::aws:policy/AmazonS3ReadOnlyAccess"
)

attach_response

---

#### Apply VPC Security Group rules to Redshift

---

The VPC is currently the AWS component I understand the least. From [what I've read](https://aws.amazon.com/vpc/features/) VPC means that AWS features like Redshift, RDS, and EC2 instances to control how traffic to these services works. For example, can these services talk to each other? Can they be accessed by other applications?
It looks like the main form of authentication is IP addresses - where you can specify only certain IP addresses can access the resources you create in AWS.
What I don't yet understand is:
- Is this equally applicable to S3, Kinesis, SQS, Lambda, and other AWS tools? Or is there something specific about EC2, RDS, and Redshift which means the VPC applies to them? (e.g., these are accessible via regular HTTPS requests / they're potentially publicly accessible?)
  - [It looks like no](https://docs.aws.amazon.com/glue/latest/dg/vpc-endpoints-s3.html) - VPC is configurable as a firewall for S3 as well. Maybe instead the question is whether we want to (a) have enhanced security like AWS services only talking to each other WITHIN the private AWS network (no public IPs), and (b) if we do want to expose our resources to the public internet, if we want to only allow some IP addresses to access resources but not others.


In any case, it might be that the reason we're using VPC for this current Redshift setup is because the [official AWS documentation](https://aws.amazon.com/premiumsupport/knowledge-center/redshift-cluster-private-public/) says that users should set up a VPC security group in order to expose a Redshift port publicly.

In [None]:
ec2 = boto3.client('ec2',
    region_name             = "us-west-2",
    aws_access_key_id       = aws_key,
    aws_secret_access_key   = aws_secret
)

In [None]:
sec_groups = ec2.describe_security_groups(
    GroupNames = [
        'Redshift_secGroup'
    ]
)

sec_groups
redshift_sg_id = sec_groups['SecurityGroups'][0]['GroupId']

In [None]:
response = ec2.create_security_group(
    Description = "Security Group for allowing all access to Redshift cluster",
    GroupName = "Redshift_secGroup"
)
response

In [None]:
vpc = ec2.authorize_security_group_ingress(
    CidrIp = '0.0.0.0/0', #Allowing permission to access from any IP
    FromPort = 5439, #Default port for Redshift
    ToPort = 5439,
    IpProtocol = 'TCP',
    GroupId = redshift_sg_id
)

---

#### Creating Redshift cluster

---

In [3]:
redshift = boto3.client('redshift',
    region_name             = "us-west-2",
    aws_access_key_id       = aws_key,
    aws_secret_access_key   = aws_secret
)

In [None]:
#Documentation: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/redshift.html#Redshift.Client.create_cluster
redshift_response = redshift.create_cluster(
    ClusterType = "multi-node",
    NodeType = 'dc2.large',
    NumberOfNodes = 4,
    DBName = "my_redshift_db",
    ClusterIdentifier = 'redshift-cluster-2',
    MasterUsername = redshift_user,
    MasterUserPassword = redshift_password,
    IamRoles = [role_arn],
    PubliclyAccessible = True,
    VpcSecurityGroupIds = [
        redshift_sg_id
    ]
)

'''
WARNING! After running this code, you WILL create a Redshift cluster. Be sure to delete it to not incur costs!!
'''

redshift_response

In [4]:
clusters = redshift.describe_clusters()
redshift_endpoint = clusters['Clusters'][0]['Endpoint']
db_name = clusters['Clusters'][0]['DBName']
cluster_id = clusters['Clusters'][0]['ClusterIdentifier']
clusters

In [None]:
response = redshift.delete_cluster(
    ClusterIdentifier = cluster_id,
    SkipFinalClusterSnapshot=True
)

---

#### Creating S3 Bucket

---

In [None]:
s3 = boto3.client('s3',
    region_name             = "us-west-2",
    aws_access_key_id       = aws_key,
    aws_secret_access_key   = aws_secret
)

In [None]:
'''
#This command is telling me my bucket name is invalid even though it is not. Not sure why:

s3_response = s3.create_bucket(
    Bucket = "whyWontBucketWork-udacitycourse",
    CreateBucketConfiguration = {
        'LocationConstraint':'us-west-2'
    },
    
)
'''

In [None]:
s3_resource = boto3.resource('s3',
    aws_access_key_id       = aws_key,
    aws_secret_access_key   = aws_secret
)
bucket = s3_resource.Bucket("udacitybucket17") #Bucket I made manually previously

#Iterate over files in a bucket:
bucket_data = bucket.objects.all()
for file in bucket_data:
    print(file)

#Alternatively:
bucket_data = bucket.objects.filter(Prefix = "AWS_")
for file in bucket_data:
    print(file)

---

#### Attempt to connect to Redshift cluster:

---

At this point we have:
- Created a redshift cluster, with an IAM role whose sole policy is 'AmazonS3ReadOnlyAccess'
- Specified a security group which allows access to port 5439 from any IP address.

What I think is missing though is: making sure Redshift is using our security group we set up (and not the default security group)

In [None]:
%load_ext sql

In [None]:
address = redshift_endpoint['Address']
port = redshift_endpoint['Port']
conn_string = f"postgresql://{redshift_user}:{redshift_password}@{address}:{port}/{db_name}"

%sql $conn_string

In [63]:
%%sql 

select oid as database_id,
       datname as database_name,
       datallowconn as allow_connect
from pg_database
order by oid;

 * postgresql://dev:***@redshift-cluster-2.cakcgemszurv.us-west-2.redshift.amazonaws.com:5439/my_redshift_db


: 

In [None]:
%%sql

SELECT current_database();