<img src="images/nvidia_header.png" style="margin-left: -30px; width: 300px; float: left;">

# Accelerating End-to-End Data Science Workflows #

## 00 - Introduction ##
Welcome to NVIDIA's Accelerating End-to-End Data Science Workflows course. This interactive lab offers an introduction to data science with an focus on speed and efficiency. It is designed to empower participants in tailoring their own custom solutions.

**Learning Objectives**
<br>
In this workshop, you will learn: 
* An overview of data science
* Various data science workflows
* How acceleration is achieved
* How to design operations to maximize GPU acceleration
* The implications of acceleration

### JupyterLab ###
For this hands-on lab, we use [JupyterLab](https://jupyterlab.readthedocs.io/en/stable/) to manage our environment.  The [JupyterLab Interface](https://jupyterlab.readthedocs.io/en/stable/user/interface.html) is a dashboard that provides access to interactive iPython notebooks, as well as the folder structure of our environment and a terminal window into the Ubuntu operating system. The first view includes a **menu bar** at the top, a **file browser** in the **left sidebar**, and a **main work area** that is initially open to this "introduction" notebook. 

<p><img src="images/jl_launcher.png" width=720></p>

* The file browser can be navigated just like any other file explorer. A double click on any of the items will open a new tab with its content. 
* The main work area includes tabbed views of open files that can be closed, moved, and edited as needed. 
* The notebooks, including this one, consist of a series of content and code **cells**. To execute code in a code cell, press `Shift+Enter` or the `Run` button in the menu bar above, while a cell is highlighted. Sometimes, a content cell will get switched to editing mode. Executing the cell with `Shift+Enter` or the `Run` button will switch it back to a readable form.
* To interrupt cell execution, click the `Stop` button in the menu bar or navigate to the `Kernel` menu, and select `Interrupt Kernel`. 
* We can use terminal commands in the notebook cells by prepending an exclamation point/bang(`!`) to the beginning of the command.
* We can create additional interactive cells by clicking the `+` button above, or by switching to command mode with `Esc` and using the keyboard shortcuts `a` (for new cell above) and `b` (for new cell below).

<a name='e1'></a>
### Exercise #1 - Practice ###
**Instructions**: <br>
* Try executing the simple print statement in the below cell.
* Then try executing the terminal command in the cell below.

In [1]:
# DO NOT CHANGE THIS CELL
# activate this cell by selecting it with the mouse or arrow keys then use the keyboard shortcut [Shift+Enter] to execute
print('This is just a simple print statement.')

This is just a simple print statement.


In [2]:
# DO NOT CHANGE THIS CELL
!echo 'This is another simple print statement.'

This is another simple print statement.


<a name='e2'></a>
### Exercise #2 - Available GPU Accelerators ###
The `nvidia-smi` (NVIDIA System Management Interface) command is a powerful utility for managing and monitoring NVIDIA GPU devices. It will print information about available GPUs, their current memory usage, and any processes currently utilizing them. 

**Instructions**: <br>
* Execute the below cell to learn about this environment's available GPUs. 

In [3]:
# DO NOT CHANGE THIS CELL
!nvidia-smi

Sat Jun  7 12:40:01 2025       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.85.12    Driver Version: 525.85.12    CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla T4            On   | 00000000:00:1B.0 Off |                    0 |
| N/A   36C    P0    25W /  70W |    169MiB / 15360MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  Tesla T4            On   | 00000000:00:1C.0 Off |                    0 |
| N/A   37C    P0    26W /  70W |    169MiB / 15360MiB |      0%      Default |
|       

**Note**: Currently, GPU memory usage is minimal, with no active processes utilizing the GPUs. Throughout our session, we'll employ this command to monitor memory consumption. When conducting GPU-based data analysis, it's advisable to maintain approximately 50% of GPU memory free, allowing for operations that may expand data stored on the device.

<a name='e3'></a>
### Exercise #3 - Magic Commands ###
The Jupyter environment come installed with *magic* commands, which can be recognized by the presence of `%` or `%%`. We will be using two magic commands liberally in this workshop: 
* `%time`: prints summary information about how long it took to run code for a single line of code
* `%%time`: prints summary information about how long it took to run code for an entire cell

**Instructions**: <br>
* Execute the below cell to import the `time` library. 
* Execute the cell below to time the single line of code. 
* Execute the cell below to time the entire cell. 

In [4]:
# DO NOT CHANGE THIS CELL
from time import sleep

In [5]:
# DO NOT CHANGE THIS CELL
# %time only times one line
%time sleep(1) 
sleep(1)

CPU times: user 1.07 ms, sys: 470 μs, total: 1.54 ms
Wall time: 1 s


In [6]:
%%time
# DO NOT CHANGE THIS CELL
# %%time will time the entire cell
sleep(1)
sleep(1)

CPU times: user 1.23 ms, sys: 0 ns, total: 1.23 ms
Wall time: 2 s


<a name='e4'></a>
### Exercise #4 - Jupyter Kernels and GPU Memory ###
The compute backend for Jupyter is called the *kernel*. The Jupyter environment starts up a separate kernel for each new notebook. The many notebooks in this workshop are each intended to stand alone with regard to memory and computation. 

To ensure we have enough memory and compute for each notebook, we can clear the memory at the conclusion of each notebook in two ways: 
1. Shut down the kernel with `ipykernel.kernelapp.IPKernelApp.do_shutdown()` or
2. Shut down the kernel through the *Running Terminals and Kernels* panel. 

**Instructions**: <br>
* Execute the below cell to shut down and restart the current kernel. 
* Shut down the current kernel through the *Running Terminals and Kernels* panel.

<p><img src="images/kernel_restart.png" width=720></p>

**Note**: Restarting the kernel from the *Kernel* menu will only clear the memory for *the current notebook's kernel*, while notebooks other than the one we're working on may still have memory allocated for *their unique kernels*. 

In [7]:
# DO NOT CHANGE THIS CELL
import IPython
app = IPython.Application.instance()
app.kernel.do_shutdown(True)

{'status': 'ok', 'restart': True}

**Note**: Executing the provided code cell will shut down the kernel and activate a popup indicating that the kernel has restarted.

**Well Done!** Let's move to the [next notebook](1-01_section_overview.ipynb). 

<img src="images/nvidia_header.png" style="margin-left: -30px; width: 300px; float: left;">