Deep Learning Environment Setups

This repo contains my ways of setting up various project environment setups. Please suggest more easy-to-use piplines and DL dev tips.

中文版本

TODO
Docker Setups
Docker Compose
Conda Setups
LLM API Setups
- FastChat
Dataset Preparation
Tips
- Environmental Variable Setups

TODO

Add Kaggle downloader for opensource datasets downloading.
Add conda setups.
MLC-LLM: https://betterprogramming.pub/frameworks-for-serving-llms-60b7f7b23407

Docker Setups

This setup contains a way of setting up an exprimenting env with docker. The environment was based on Nvidia NGC Docker Pytorch images.

There are many advantages of using this method:

You don't have to setup nvcc, cuDNN, etc.
You can fire up an env without messing up the Windows system. The only thing you need is docker-desktop.
This setup is developed and maintained by Nvidia themselves.

The only thing you need is a proper Nvidia GPU driver, and nvidia-container-toolkit based on you system setting. Here is a guide if you need to setup nvidia-container-toolkit in Linux.

To use this setup, you have to:

Specify all the required packages you need in the requirement.txt.
Replace the desired base image in ./exp_container/Dockerfile.
Give a proper name and tag for your image. For example liux2/app-framework-experiment:exp. And build the docker image with:
```
bash exp_container/build_docker.sh
```
Change the flags in ./exp_container/env_docker.sh based on your needs. And finally, fire up the conatiner with bash exp_container/env_docker.sh.

Tips:

The -it flag in the build docker command will start an interactive terminal for you. From here, you can choose to start a Jupyter-lab environment with:
```
bash exp_container/start_jupyter.sh
```
The -v flag with the path specified will attach your current dir into docker container.
After the Jupyter-lab env been setup, you can choose to use the docker Jupyter kernal from VS code if you prefer local IDE setups.
If you decide to migrate this env to another machine, use
```
docker save -o {{backup_file.tar}} {{image_name:tag}}
```
to save your image to a tar file with name and tag preserved. And use:
```
docker load -i {{backup_file.tar}}
```
to load it on the target machine.

Docker Compose

An easier alternative is to use docker compose. You can set the necessary parameters in docker-compose file, and run with docker compose up or docker compose up -d for detached mode. You can enter the terminal with the follwing parameter set:

stdin_open: true
tty: true

and enter with docker compose exec {service_name} sh. Here the {service_name} should be the name you used for the container, dev is the name used in docker compose file.

Conda Setups

LLM API Setups

This section provides various ways of serving LLM APIs.

FastChat

This setup contains a way of setting up an OpenAI-style API for your desired LLM based on fastchat.

To use this setup, you have to:

Download the checkpoints from your LLM source repo into LLM_fastchat_api/.
Specify your LLM requirements in requirements.txt.
Change any necessary params in the docker-compose.yml.
Fireup the API with docker compose up and shut it down with docker compose down.

Tips:

You can choose to change the backend to vLLM for faster experience with this tutorial.

Dataset Preparation

Google Drive Downloader

Google drive is an easy-to-use dataset storing and sharing application. To download files from Goole drive to your server with exploiting the high bandwidth:

Prepare your dataset as a compressed file, and click share file.
Get your user id by requiring a Link, the id will be a long hash in the middle that looks like https://drive.google.com/file/d/a-long-hash/view?usp=sharing.
Put the hash and the file name in the scripts/gd-downloader.sh
Run with:
```
bash scripts/gd-downloader.sh
```

Kaggle Dataset Downloader

To download datasets from Kaggle, you need to:

Go to your Kaggle account, get an API key in the API section. And download the JSON file.

Open a terminal and run:

pip install -q kaggle
pip install -q kaggle-cli
mkdir -p ~/.kaggle
cp "your/path/to/kaggle.json" ~/.kaggle/
cat ~/.kaggle/kaggle.json 
chmod 600 ~/.kaggle/kaggle.json

# For competition datasets
    kaggle competitions download -c dataset_name -p download_to_folder
# For other datasets
kaggle datasets download -d user/dataset_name -p download_to_folder

Replace:

your /path/to/kaggle.json with your path to kaggle.json on drive.
download_to_folder with the folder where you’d like to store the downloaded dataset.
dataset_name and/or user/dataset_name.

source: https://towardsdatascience.com/a-quicker-way-to-download-kaggle-datasets-in-google-collab-abe90bf8c866

Huggingface Dataset Downloader

Huggingface datasets package provides a way of loading datasets easily from their Hub.

Homepage

The tutorials and use cases can be found from their homepage.

Tips

This section recommends tips for software developing.

Environmental Variable Setups

You can use dotenv to load system variables. It can protect your keys and passwords from leaking. To install in Python with pip, use pip install python-dotenv. To use this package:

Prepare an .env file, an example can be
```
OPENAI_API_KEY = "sk-xxx"
```

Example Python script usage:

import os
from dotenv import load_dotenv
env_path = "scripts/secrets.env"
load_dotenv(dotenv_path=env_path, verbose=True)
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")

Example Jupyter Notebook usage:

import os
from dotenv import load_dotenv
%load_ext dotenv
%dotenv ./scripts/secrets.env
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Deep Learning Environment Setups

TODO

Docker Setups

Docker Compose

Conda Setups

LLM API Setups

FastChat

Dataset Preparation

Google Drive Downloader

Kaggle Dataset Downloader

Huggingface Dataset Downloader

Tips

Environmental Variable Setups

Files

README.md

Latest commit

History

README.md

File metadata and controls

Deep Learning Environment Setups

TODO

Docker Setups

Docker Compose

Conda Setups

LLM API Setups

FastChat

Dataset Preparation

Google Drive Downloader

Kaggle Dataset Downloader

Huggingface Dataset Downloader

Tips

Environmental Variable Setups