Skip to content
This repository was archived by the owner on May 19, 2025. It is now read-only.

intel/NLP-Workflow-with-AWS

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

PROJECT NOT UNDER ACTIVE MANAGEMENT

This project will no longer be maintained by Intel.

Intel has ceased development and contributions including, but not limited to, maintenance, bug fixes, new releases, or updates, to this project.

Intel no longer accepts patches to this project.

If you have an ongoing need to use this project, are interested in independently developing it, or would like to maintain patches for the open source software community, please create your own fork of this project.

Contact: webadmin@linux.intel.com

Cloud Training and Cloud Inference on Amazon Sagemaker/Elastic Kubernetes Service

This is a workflow to demonstrate how users utilize the Intel’s hardware (Cascade Lake or above) and related optimized software to perform cloud training and inference on Amazon Sagemaker platform. The main software packages used here are: HuggingFace, Amazon SageMaker, Intel Optimized TensorFlow and Intel Neural Compressor

Table of Contents

Implementation Details

This workflow ultilizes the infracsture provided by the Amazon. There are two versions provided by this workflow - EKS and SageMaker.

Architecture

Sagemaker:




EKS:

Model Spec

The uncased BERT base model is used to demonstrate this workflow.

bert-base-uncased-config = {
  "architectures": [
    "BertForMaskedLM"
  ],
  "attention_probs_dropout_prob": 0.1,
  "gradient_checkpointing": false,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "layer_norm_eps": 1e-12,
  "max_position_embeddings": 128,
  "model_type": "bert",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pad_token_id": 0,
  "position_embedding_type": "absolute",
  "transformers_version": "4.21.1",
  "type_vocab_size": 2,
  "use_cache": true,
  "vocab_size": 30522
}

Dataset

Microsoft Research Paraphrase Corpus is used as the dataset for training and testing.

Type Format Rows
Training Dataset HuggingFace Dataset 3668
Testing Dataset HuggingFace Dataset 1725

Getting Started

Prerequisites

Hardware

Name Description
CPU Intel CPU - Cascade Lake or above
Usable RAM 16 GB
Disk Size 256 GB

Software

Docker is required to start this workflow. You will also need AWS credentials and the related AWS CLI installed on the machine to push data/docker image to the Amazon ECR.

For setting up the AWS credential account, you may refer to the following link:
https://aws.amazon.com/account/

For configuring the AWS credentials, you may refer to the following link:
https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-configure.html

Installation

  1. Clone the repository
git clone https://github.com/intel/NLP-Workflow-with-AWS.git
cd NLP-Workflow-with-AWS
  1. Build the docker image to prepare the environment for running the Juypter notebooks.
cd scripts
sh build_main_image.sh

Usage

Use the built docker image (by build_main_image.sh) to run the Juypter notebooks. Execute the following command:

cd scripts
sh start_script.sh

Sagemaker/EKS Training

After starting the container (i.e. executing the step in 'Usage'), execute the following command in the interactive shell.

cd /root/notebooks
jupyter notebook --allow-root

Start the notebook that is named as training and eks/sagemaker.

Sagemaker/EKS Inference

After starting the container (i.e. executing the step in 'Usage'), execute the following command in the interactive shell.

cd /root/notebooks
jupyter notebook --allow-root

Start the notebook that is named as inference and eks/sagemaker.

License

License

About

No description, website, or topics provided.

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Packages

No packages published

Contributors 2

  •  
  •