# End-To-End LLM 

---

## Overview  

The End-to-End LLM (Large Language Model) Bootcamp is designed from a real-world perspective that follows the data processing, development, and deployment pipeline paradigm. Attendees walk through the workflow of preprocessing the SQuAD (Stanford Question Answering Dataset) dataset for Question Answering task, training the dataset using BERT (Bidirectional Encoder Representations from Transformers), and executing prompt learning strategy using NVIDIA® NeMo™ and a transformer-based language model, NVIDIA Megatron. Attendees will also learn to optimize an LLM using NVIDIA TensorRT™, an SDK for high-performance deep learning inference, guardrail prompts and responses from the LLM model using NeMo Guardrails, and deploy the AI pipeline using NVIDIA Triton™ Inference Server, an open-source software that standardizes AI model deployment and execution across every workload. Furthermore, we introduced two activity notebooks to test your understanding of the material and solidify your experience in the Question Answering (QA) domain.


### Why End-to-End LLM?

Solving real-world problems in the AI domain requires the use of a set of tools (software stacks and frameworks) and the solution process always follows the `data processing`, `development`, and `deployment` pattern. This material is to:
- assist AI hackathon participants to learn and apply the knowledge to solve their tasks using NVIDIA software stacks and frameworks
- enables bootcamp attendees to solve real-world problem using end-to-end approach (data processing --> development --> deployment)



The End-to-End LLM Bootcamp content contains three labs:
- Lab 1: Megatron-GPT
- Lab 2: TensorRT-LLM and Triton Deployment with Llama-2-7B Model
- Lab 3: NeMo Guardrails

The table of contents below will walk you through the `Megatron-GPT` lab and the activities included will test your understanding of the concept.

### Table of Content

The following contents will be covered:

**Lab 1: Megatron-GPT**
1. [Nemo Fundamentals](jupyter_notebook/nemo/NeMo_Primer.ipynb)
1. [Question Answering](jupyter_notebook/nemo/Question_Answering.ipynb)
1. [Question Answering Lab Activity](jupyter_notebook/nemo/Activity1.ipynb)
1. [Prompt Tuning/P-Tuning](jupyter_notebook/nemo/Multitask_Prompt_and_PTuning.ipynb) 
1. [Prompt Tuning/P-Tuning Lab Activity](jupyter_notebook/nemo/Activity2.ipynb)
1. [NeMo Megatron-GPT 1.3B: Language Model Inferencing](jupyter_notebook/nemo/demo.ipynb)



### Check your GPU

Let's execute the cell below to display information about the CUDA driver and GPUs running on the server by running the nvidia-smi command. To do this, execute the cell block below by giving it focus (clicking on it with your mouse), and hitting `Ctrl-Enter`, or pressing the play button in the toolbar above. If all goes well, you should see some output returned below the grey cell.

In [None]:
#!nvidia-smi

### Tutorial Duration

The material will be presented 3 labs in a total of 8hrs: 45mins sessions as follows:
- NeMo Megatron-GPT Lab: `4hrs: 30mins`
- TensorRT-LLM and Triton Deployment with LLama2 7B Model Labs: `1hrs: 10mins`
- NeMo Megatron-GPT : `3hrs: 05mins`

### Content Level
Beginner to Advanced

### Target Audience and Prerequisites
The target audience for these labs are researchers, graduate students, and developers who are interested in the End-to-End approach to solving LLM tasks via the use of GPUs. Audiences are expected to have Python programming background Knowledge.

---
## Licensing

Copyright © 2022 OpenACC-Standard.org. This material is released by OpenACC-Standard.org, in collaboration with NVIDIA Corporation, under the Creative Commons Attribution 4.0 International (CC BY 4.0). These materials include references to hardware and software developed by other entities; all applicable licensing and copyrights apply.