# Tutorial 1: Getting started with Q-SPARC

In this series of tutorials, you will get a general idea of how to use `Q-SPARC`. 

This tutorial is an introduction to using a SCKAN knowledge database. The database being used as the example could be found in `tutorials\example`.

Before running the examples, you must ensure you've installed these dependencies.

`pip install -r requirements`


## 1. Setting the environment

Create and activate a new Conda virtual environment with Python 3.11 to isolate project dependencies and avoid conflicts with other environments.

`conda create -n llm_env python=3.11 -c conda-forge`

`conda activate	 llm_env`


## 2. Download LLM parameters from Hugging Face

Use the Hugging Face CLI tool to download the Qwen 3-32B large language model parameters with resume support, saving them to a specified local directory.

`huggingface-cli download --resume-download Qwen/Qwen3-32B --local-dir /YOUR_PATH/Document/Code/LLMs/Qwen3-32B`

## 3. Deploy it on the specific GPU

Launch the vLLM server, loading the downloaded model onto the specified GPU (GPU 0 here), exposing an HTTP API on port 8000 with optimized settings for efficient inference.


`CUDA_VISIBLE_DEVICES=0 vllm serve /YOUR_PATH/Documents/Code/LLMs/Qwen3-32B --host 0.0.0.0 --port 8000 --dtype auto --max-num-seqs 32 --max-model-len 4096 --tensor-parallel-size 1 --trust-remote-code --gpu-memory-utilization 0.9`

## 4. Check the LLM server status

Use the curl command to send a simple text generation request to the local LLM server to verify it is running properly and returning expected results.

`curl --noproxy "*" http://localhost:8000/v1/completions \
-H "Content-Type: application/json" \
-d '{
"model": "/YOUR_PATH/Documents/Code/LLMs/Qwen3-32B",
 "prompt": "What is the capital of New Zealand?",
"max_tokens": 512
}'`
