Skip to content

Commit

Permalink
Browse files Browse the repository at this point in the history
  • Loading branch information
programmah committed Feb 26, 2024
1 parent f9553e0 commit b09090e
Show file tree
Hide file tree
Showing 43 changed files with 5,936 additions and 1,553 deletions.
4 changes: 2 additions & 2 deletions Deployment_Guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,7 @@ flags:
When you are inside the container, launch jupyter lab:
`jupyter-lab --no-browser --allow-root --ip=0.0.0.0 --port=8888 --NotebookApp.token="" --notebook-dir=/workspace`.

Open the browser at `http://localhost:8888` and click on the `Start_here.ipynb` notebook.
Open the browser at `http://localhost:8888` and click on the `Megatron-GPT.ipynb` notebook to start Lab 1, `TensorRT-LLM.ipynb` notebook to start Lab 2, and `NeMo-Guardrails.ipynb` notebook to start Lab 3.
As soon as you are done with the lab, shut down jupyter lab by selecting `File > Shut Down` and the container by typing `exit` or pressing `ctrl d` in the terminal window.

Congratulations, you've successfully built and deployed an end-to-end LLM pipeline!
Expand Down Expand Up @@ -79,7 +79,7 @@ Congratulations, you've successfully built and deployed an end-to-end LLM pipeli

The `-B` flag mounts local directories in the container filesystem and ensures changes are stored locally in the project folder. Open jupyter lab in the browser: http://localhost:8888

You may start working on the lab by clicking the `Start_Here.ipynb` notebook.
You may start working on the labs by clicking the `Megatron-GPT.ipynb` notebook to start Lab 1, `TensorRT-LLM.ipynb` notebook to start Lab 2, and `NeMo-Guardrails.ipynb` notebook to start Lab 3.

When you finish these notebooks, shut down jupyter lab by selecting `File > Shut Down` in the top left corner, then shut down the Singularity container by typing `exit` or pressing `ctrl + d` in the terminal window.

Expand Down
160 changes: 160 additions & 0 deletions workspace/LLM-Use-Case.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,160 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "290af7ce",
"metadata": {},
"source": [
"# End-To-End LLM \n",
"\n",
"---"
]
},
{
"cell_type": "markdown",
"id": "9e3d6d34",
"metadata": {},
"source": [
"## Overview \n",
"\n",
"The End-to-End LLM (Large Language Model) Bootcamp is designed from a real-world perspective that follows the data processing, development, and deployment pipeline paradigm. Attendees walk through the workflow of preprocessing the SQuAD (Stanford Question Answering Dataset) dataset for Question Answering task, training the dataset using BERT (Bidirectional Encoder Representations from Transformers), and executing prompt learning strategy using NVIDIA® NeMo™ and a transformer-based language model, NVIDIA Megatron. Attendees will also learn to optimize an LLM using NVIDIA TensorRT™, an SDK for high-performance deep learning inference, guardrail prompts and responses from the LLM model using NeMo Guardrails, and deploy the AI pipeline using NVIDIA Triton™ Inference Server, an open-source software that standardizes AI model deployment and execution across every workload. Furthermore, we introduced two activity notebooks to test your understanding of the material and solidify your experience in the Question Answering (QA) domain.\n",
"\n",
"\n",
"### Why End-to-End LLM?\n",
"\n",
"Solving real-world problems in the AI domain requires the use of a set of tools (software stacks and frameworks) and the solution process always follows the `data processing`, `development`, and `deployment` pattern. This material is to:\n",
"- assist AI hackathon participants to learn and apply the knowledge to solve their tasks using NVIDIA software stacks and frameworks\n",
"- enables bootcamp attendees to solve real-world problem using end-to-end approach (data processing --> development --> deployment)\n",
"\n"
]
},
{
"cell_type": "markdown",
"id": "53e5ed10",
"metadata": {},
"source": [
"The End-to-End LLM Bootcamp content contains three labs:\n",
"- Lab 1: Megatron-GPT\n",
"- Lab 2: TensorRT-LLM and Triton Deployment with Llama-2-7B Model\n",
"- Lab 3: NeMo Guardrails\n",
"- Lab 4: LLM Use Case\n",
"\n",
"\n",
"## Problem statement \n",
"\n",
"From financial services to eCommerce to telecom and health services, the customer care service receives lots of inquiries from customers or users of their products. Responding to every question is sometimes impossible or may lead to long waiting hours for a face-to-face scenario. The solution is to develop a generative AI-based solution that can efficiently and accurately respond to customers' inquiries using custom, non-static information/data.\n"
]
},
{
"cell_type": "markdown",
"id": "cf032bfa-89f9-444c-8121-5fcad5e3da01",
"metadata": {},
"source": [
"<img src=\"jupyter_notebook/llm-use-case/images/inference-visual-tensor-rt-llm.png\" height=\"800px\" width=\"800px\" />"
]
},
{
"cell_type": "markdown",
"id": "1fe956ba",
"metadata": {},
"source": [
"The table of contents below will walk you through a solution prototype using the `LLM Use Case` lab and the challenge included will test your understanding of the solution concept."
]
},
{
"cell_type": "markdown",
"id": "1c7058d1",
"metadata": {},
"source": [
"### Table of Content\n",
"\n",
"The following contents will be covered:\n",
"\n",
"**Lab 4: LLM Use Case**\n",
"1. [Finetuning Llama2 With Custom Data](jupyter_notebook/llm-use-case/llama-chat-finetune.ipynb)\n",
"1. [Building TensorRT Engine With Finetune Model](jupyter_notebook/llm-use-case/trt-llama-chat.ipynb)\n",
"1. [Deploying Finetune Model using Triton Inference Server](jupyter_notebook/llm-use-case/triton-llama.ipynb)\n",
"1. [Challenge](jupyter_notebook/llm-use-case/challenge.ipynb) \n",
"\n"
]
},
{
"cell_type": "markdown",
"id": "2c950398",
"metadata": {},
"source": [
"### Check your GPU\n",
"\n",
"Let's execute the cell below to display information about the CUDA driver and GPUs running on the server by running the nvidia-smi command. To do this, execute the cell block below by giving it focus (clicking on it with your mouse), and hitting `Ctrl-Enter`, or pressing the play button in the toolbar above. If all goes well, you should see some output returned below the grey cell."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "7cfdf450",
"metadata": {},
"outputs": [],
"source": [
"#!nvidia-smi"
]
},
{
"cell_type": "markdown",
"id": "5ed38116",
"metadata": {},
"source": [
"### Tutorial Duration\n",
"\n",
"The material will be presented 3 labs in a total of 8hrs: 45mins sessions as follows:\n",
"- NeMo Megatron-GPT Lab: `4hrs: 30mins`\n",
"- TensorRT-LLM and Triton Deployment with LLama2 7B Model Labs: `1hrs: 10mins`\n",
"- NeMo Guardrails : `3hrs: 05mins`\n",
"- LLM Use Case: `3hrs: 30mins`"
]
},
{
"cell_type": "markdown",
"id": "eca3da4a",
"metadata": {},
"source": [
"### Content Level\n",
"Beginner to Advanced\n",
"\n",
"### Target Audience and Prerequisites\n",
"The target audience for these labs are researchers, graduate students, and developers who are interested in the End-to-End approach to solving LLM tasks via the use of GPUs. Audiences are expected to have Python programming background Knowledge."
]
},
{
"cell_type": "markdown",
"id": "ae541407",
"metadata": {},
"source": [
"---\n",
"## Licensing\n",
"\n",
"Copyright © 2022 OpenACC-Standard.org. This material is released by OpenACC-Standard.org, in collaboration with NVIDIA Corporation, under the Creative Commons Attribution 4.0 International (CC BY 4.0). These materials include references to hardware and software developed by other entities; all applicable licensing and copyrights apply."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.12"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
9 changes: 9 additions & 0 deletions workspace/data/filtered/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
This dataset is a subset of the Open Assistant dataset, which you can find here: https://huggingface.co/datasets/OpenAssistant/oasst1/tree/main

This subset of the data only contains the highest-rated paths in the conversation tree, with a total of 9,846 samples.

This dataset was used to train Guanaco with QLoRA.

For further information, please see the original dataset.

License: Apache 2.0
Loading

0 comments on commit b09090e

Please sign in to comment.