Skip to content

Latest commit

 

History

History
62 lines (37 loc) · 3.19 KB

File metadata and controls

62 lines (37 loc) · 3.19 KB

Back to main guide | Next


1. Provision a data lake, data lake administrator and data lake analyst

a. Launch CloudFormation Template

Launch the CloudFormation stack in one of the AWS regions. Other regions are also supported.

We recommend that CloudFormation template be launched from the user having administrator previliges.

Region Launch
US East (N. Virginia) Launch Solution in us-east-1
US West (Oregon) Launch Solution in us-west-2

Accept all default values, Click Next. On the last page, select the checkbox I acknowledge that AWS CloudFormation might create IAM resources with custom names and click on on Create Stack. Wait for cloudformation template to Complete.

CloudFormation template would create the below resources.

  • Data Lake Administrator user (dladmin)
  • Data Lake Analyst (dlanalyst)
  • S3 Bucket with Sample Patient Dataset having duplicates. (Use this S3 bucket throughout the lab. The one shown the in the screenshots is only for the reference.)
  • Labelling file that would be used in Activity#8
  • Glue Development Endpoint
  • SageMaker Notebook instance with Spark ETL code
  • IAM Role for AWS Glue and Lake Formation

NOTE: Password for dladmin and dlanalyst users is set to "welcome".

b. Setup Data Lake Administrator

Navigate to Lake Formation Dashboard from AWS Management Console. lakeformation-console First time you navigate to Lake Formation Dashboard page, you would be prompted for creating a Data Lake Administrator. Click on “Add administrators”.

OR

i) While you are logged in as an IAM Admin user

ii) Go to => Lake Formation Console → Admins and database creators → Data lake administrators → Grant

iii) Click on Add Administrators

add admin

iv) Select the Data Lake Administrator as “dladmin” user and click on Save.

dladmin

v) Another recommended change that you would need to do is to go to Lake Formation Console → Data Catalog → Settings and uncheck both the boxes as shown below and click on Save button.

datacatalog

After this step you, would not be using this IAM user again. Instead you will use dladmin user as a Data Lake Administrator and dlanalyst user as a data lake analyst/developer.


Back to main guide | Next