Skip to content

A repository of AWS command line code snippets and AWS service usage tips for data scientists, platform administrators and product managers.

License

Notifications You must be signed in to change notification settings

erikaduan/aws_notes

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

51 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AWS usage notes - a comprehensive guide for AWS newbies

This repository contains a series of guides on how to set up and use AWS services required for data analysis and data science. The style conventions are:

  • Code, including AWS command line interface code, is indented in code blocks.
  • AWS console options are styled in bold italic text.
  • Resource names are styled in bold text.

AWS resource Topic Why this is important
🤠 Identity & Access Management (IAM) Create new AWS user groups, users and access policies IAM is needed to create and manage users and user groups, who often require different access permissions to different AWS resources. Platform governance and security policies tend to be managed via IAM, to ensure that different users have the appropriate level of access to cloud resources for their work requirements.
🪣 S3 bucket Manage S3 bucket permissions Data sets and data objects must be stored in a central location. S3 is the central data storage service in AWS and object storage permissions can be further finetuned using S3 bucket permissions.
📔 Sagemaker Enable Sagemaker IAM roles SageMaker supports data science work by providing a user-friendly integrated development environment (IDE) connected to EC2 instances and docker images for users to program in languages like Python and R. This is where data analysts and data scientists work to clean and analyse data and build statistical or machine learning models. To enable SageMaker functionality, SageMaker service permissions must be managed so that SageMaker can interact with all other required AWS services i.e. S3, Lambda and Glue.
📔 Sagemaker Introduction to SageMaker SageMaker provides users with at least two different ways of accessing a linux virtual environment for data science work; through Jupyter notebook instances or data science docker images via the SageMake Studio IDE. Notebook instances are useful for individual exploratory data science work whereas SageMaker Studio is more useful for production environment ML models requiring MLOps support. It is important to understand these differences to make an informed decision about where to host different types of data science projects.
📔 Sagemaker Manage R and Python environments

Tips on learning to use AWS

  • AWS provides management console (i.e. GUI) and command line options to perform operations. The command line interface, also called CloudShell, can be accessed at the top right panel via the >_ icon.
  • Create AWS services using shell scripts as this is the most reproducible deployment method (there's nothing wrong with clicking a lot of console buttons, it's just a reproducible practice to deploy and document your actions using shell scripts or code templates).
  • AWS resources can also be accessed using the Python software development kit (SDK) boto3. For data transformations, use the awswrangler Python SDK.

Other resources

About

A repository of AWS command line code snippets and AWS service usage tips for data scientists, platform administrators and product managers.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published