5. Crawl and catalog Patient data in AWS Glue

FindMatches ML Transform that we would be using, operates on tables defined in the AWS Glue Data Catalog. Use AWS Glue crawlers to discover and catalog the patient data.

a) Login as dlanalyst

b) Navigate to Lake Formation Console → Data Catalog → Tables → Create Table using a crawler

Click on Get Started in case you are prompted.

c) Select Add tables using a crawler

d) Give Crawler name as patient-raw-ds-cr and click Next

e) Select Data stores on the next page

f) Select S3 in Choose Data Store dropdown

g) Select <S3Bucket>/patientdata/rawdata as Include path

h) Click Next

i) Select Choose an existing IAM role and select AWSGlueServiceRole-LF-MLLab

j) Click Next

k) Select Database as patientdb and click Next

l) Click Finish button, select the crawler and click Run Crawler

Once the crawler is run successfully, it would create the table under AWS Glue Data Catalog Database. As a next step, Data Lake User would want to query the data in a data lake through data catalog tables. In order to do this, they would need permission from Lake Formation. In the next step, we login as a Data Lake Administrator and give ‘dlanalyst’ the permission to Select, Insert, Update, Delete data from a table in AWS Glue Data Catalog.

Back to main guide | Next

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

activity5.md

activity5.md

5. Crawl and catalog Patient data in AWS Glue

Files

activity5.md

Latest commit

History

activity5.md

File metadata and controls

5. Crawl and catalog Patient data in AWS Glue