This repo contains my ways of setting up various project environment setups. Please suggest more easy-to-use piplines and DL dev tips.
- Add Kaggle downloader for opensource datasets downloading.
- Add conda setups.
- MLC-LLM: https://betterprogramming.pub/frameworks-for-serving-llms-60b7f7b23407
This setup contains a way of setting up an exprimenting env with docker. The environment was based on Nvidia NGC Docker Pytorch images.
There are many advantages of using this method:
- You don't have to setup nvcc, cuDNN, etc.
- You can fire up an env without messing up the Windows system. The only thing you need is docker-desktop.
- This setup is developed and maintained by Nvidia themselves.
The only thing you need is a proper Nvidia GPU driver, and nvidia-container-toolkit based on you system setting. Here is a guide if you need to setup nvidia-container-toolkit in Linux.
To use this setup, you have to:
-
Specify all the required packages you need in the
requirement.txt
. -
Replace the desired base image in
./exp_container/Dockerfile
. -
Give a proper name and tag for your image. For example
liux2/app-framework-experiment:exp
. And build the docker image with:bash exp_container/build_docker.sh
-
Change the flags in
./exp_container/env_docker.sh
based on your needs. And finally, fire up the conatiner withbash exp_container/env_docker.sh
.
Tips:
-
The
-it
flag in the build docker command will start an interactive terminal for you. From here, you can choose to start a Jupyter-lab environment with:bash exp_container/start_jupyter.sh
-
The
-v
flag with the path specified will attach your current dir into docker container. -
After the Jupyter-lab env been setup, you can choose to use the docker Jupyter kernal from VS code if you prefer local IDE setups.
-
If you decide to migrate this env to another machine, use
docker save -o {{backup_file.tar}} {{image_name:tag}}
to save your image to a tar file with name and tag preserved. And use:
docker load -i {{backup_file.tar}}
to load it on the target machine.
An easier alternative is to use docker compose. You can set the necessary parameters
in docker-compose file, and run with docker compose up
or docker compose up -d
for detached mode. You can enter the terminal with the follwing parameter set:
stdin_open: true
tty: true
and enter with docker compose exec {service_name} sh
. Here the {service_name} should
be the name you used for the container, dev is the name used in docker compose file.
This section provides various ways of serving LLM APIs.
This setup contains a way of setting up an OpenAI-style API for your desired LLM based on fastchat.
To use this setup, you have to:
- Download the checkpoints from your LLM source repo into
LLM_fastchat_api/
. - Specify your LLM requirements in
requirements.txt
. - Change any necessary params in the
docker-compose.yml
. - Fireup the API with
docker compose up
and shut it down withdocker compose down
.
Tips:
- You can choose to change the backend to vLLM for faster experience with this tutorial.
Google drive is an easy-to-use dataset storing and sharing application. To download files from Goole drive to your server with exploiting the high bandwidth:
-
Prepare your dataset as a compressed file, and click share file.
-
Get your user id by requiring a Link, the id will be a long hash in the middle that looks like
https://drive.google.com/file/d/a-long-hash/view?usp=sharing
. -
Put the hash and the file name in the
scripts/gd-downloader.sh
-
Run with:
bash scripts/gd-downloader.sh
To download datasets from Kaggle, you need to:
-
Go to your Kaggle account, get an API key in the API section. And download the JSON file.
-
Open a terminal and run:
pip install -q kaggle pip install -q kaggle-cli mkdir -p ~/.kaggle cp "your/path/to/kaggle.json" ~/.kaggle/ cat ~/.kaggle/kaggle.json chmod 600 ~/.kaggle/kaggle.json # For competition datasets kaggle competitions download -c dataset_name -p download_to_folder # For other datasets kaggle datasets download -d user/dataset_name -p download_to_folder
Replace:
- your
/path/to/kaggle.json
with your path tokaggle.json
on drive. download_to_folder
with the folder where you’d like to store the downloaded dataset.dataset_name
and/oruser/dataset_name
.
- your
Huggingface datasets
package provides a way of loading datasets easily from their Hub.
The tutorials and use cases can be found from their homepage.
This section recommends tips for software developing.
You can use dotenv
to load system variables. It can protect your keys and passwords from
leaking. To install in Python with pip, use pip install python-dotenv
.
To use this package:
-
Prepare an
.env
file, an example can beOPENAI_API_KEY = "sk-xxx"
-
Example Python script usage:
import os from dotenv import load_dotenv env_path = "scripts/secrets.env" load_dotenv(dotenv_path=env_path, verbose=True) OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
-
Example Jupyter Notebook usage:
import os from dotenv import load_dotenv %load_ext dotenv %dotenv ./scripts/secrets.env OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")