# Tutorial: Using API on MM-Vet and LLaVA-Wild

### Preparation: Dataset
We will use two datasets: MM-Vet and LLaVA-Wild. 
- You can download MM-Vet [here](https://github.com/yuweihao/MM-Vet/releases/download/v1/mm-vet.zip).
- You can ownload LLaVA-Wild following the instruction [here](https://github.com/haotian-liu/LLaVA/blob/main/docs/LLaVA_Bench.md). 

We store the downloaded and unzipped datasets in the `dataset/mm-vet` folder and the `dataset/LLaVA_Wild` folder, respectively.

### Preparation: API
You can create the enviroment for API following the instruction in `ReadMe.md`.

### Preparation: Inference Package
The inference LVLM is LLaVA1.5. 
We use [sglang](https://github.com/sgl-project/sglang?tab=readme-ov-file#frontend-structured-generation-language-sglang) package to accelerate inference.
You can run the following commands in your command shell to create an environment for Sglang and install the necessary dependencies.

In [None]:
# 1. Create a conda environment
conda create --name sglang python=3.11 -y
# 2. Activate the environment
conda activate sglang
# 3. Clone the repo
git clone https://github.com/sgl-project/sglang.git
cd sglang
# 4. (Pptional) The commit id of the Sglang repo we used is "83d2b30d759ec2e7e781d4da7d4c98c0b778b941". You can checkout to this commit id by running the following command.
git checkout 83d2b30d759ec2e7e781d4da7d4c98c0b778b941
# 5. Install Sglang
pip install --upgrade pip
pip install -e "python[all]"
# 6. (Optional) Install the dataset manager package. You may use you own code to import the dataset.
cd ../API/DatasetManager
pip install -e .

: 

**Troubleshooting**

- **ModuleNotFoundError**: No module named 'vllm.transformers_utils.configs.qwen'

    vLLM removed qwen in v0.3.3. Downgrading vllm package to 0.3.2 should solve this problem.

In [None]:
asd

: 

In [None]:
pip install vllm==0.3.2

### Preparation: Inference Script
We use `sglang_inference.py` to conduct inference, which can be run as follows.

In [None]:
cd .

python sglang_inference.py \
    --dataset mmvet \
    --batch_size 8 \
    --prompt_name masked \
    --image_folder /path/to/image_folder \
    --output_folder /path/to/output_folder \
    --exp_name mmvet_infrence \
    --port_value 30000 

# --dataset: Dataset, e.g., mmvet, LLaVA_Wild.
# --batch_size: Batch size used in Sglang. Increasing batch_size to speed up.
# --prompt_name: Prompt name, such as, masked, empty, sbs.
# --image_folder: Folder containing the masked image, only used when prompt_name is masked.
# --output_folder: Output Folder.
# --exp_name: Experiment name, used as the file name of the output.
# --port_value: Port of Sglang server. 


### API for MM-Vet

Run the following command in the command shell to generate the masked images using API. The masked images will be stored in the folder `../results/APICLIP_mmvet_ViT-L-14-336_22/1_3_BICUBIC_0`

In [None]:
cd API/API_CLIP

python main.py \
  --dataset mmvet \  
  --range 0 300 \ 
  --model_name ViT-L-14-336 \ 
  --layer_index 22 \  
  --batch_size 8 \ 
  --output_folder "../results" \
  --interpolate_method_name BICUBIC \ 
  --enhance_coe 1 \
  --kernel_size 3 \
  --grayscale 0

Start a Sglang server.

In [None]:
CUDA_VISIBLE_DEVICES=0 python3 -m sglang.launch_server \ 
    --model-path liuhaotian/llava-v1.5-13b \ 
    --tokenizer-path llava-hf/llava-1.5-13b-hf \ 
    --port 30000

Inference by directly inputting the query and the orginal image into LLaVA1.5. The result will be store in `../results/llava15_mmvet_empty.json`.

In [None]:
cd .

python sglang_inference.py \
    --dataset mmvet \
    --batch_size 8 \
    --prompt_name empty \
    --image_folder "" \
    --output_folder ../results \
    --exp_name llava15_mmvet_empty \
    --port_value 30000 

Inference by inputting the query and the API-masked image into LLaVA1.5. The result will be store in `../results/llava15_mmvet_api.json`.

In [None]:
cd .

python sglang_inference.py \
    --dataset mmvet \
    --batch_size 8 \
    --prompt_name empty \
    --image_folder "../results/APICLIP_mmvet_ViT-L-14-336_22/1_3_BICUBIC_0" \
    --output_folder ../results \
    --exp_name llava15_mmvet_api \
    --port_value 30000 

Modify the output file to meet the format requirements of the MM-Vet online evaluator. 
- `llava15_mmvet_empty.json` → `llava15_mmvet_empty_standard.json`
- `llava15_mmvet_api.json` → `llava15_mmvet_api_standard.json`

Submit the modified files to the [MM-Vet online evaluator](https://huggingface.co/spaces/whyu/MM-Vet_Evaluator) to obtain the final scores. For example, we got

|        | Without API | With API |
|:------:|:-----------:|:--------:|
| MM-Vet |     31.2    |   33.9   |

### API for LLaVA-Wild

Run the following command in the command shell to generate the masked images using API. The masked images will be stored in the folder `../results/APICLIP_LLaVA_Wild_ViT-L-14-336_22/10_7_LANCZOS_200`

In [None]:
cd API/API_CLIP

python main.py \
  --dataset LLaVA_Wild \  
  --range 0 100 \ 
  --model_name ViT-L-14-336 \
  --layer_index 22 \  
  --batch_size 8 \ 
  --output_folder "../results" \
  --interpolate_method_name LANCZOS \ 
  --enhance_coe 10 \
  --kernel_size 7 \
  --grayscale 200

Start a Sglang server.

In [None]:
CUDA_VISIBLE_DEVICES=0 python3 -m sglang.launch_server \ 
    --model-path liuhaotian/llava-v1.5-13b \ 
    --tokenizer-path llava-hf/llava-1.5-13b-hf \ 
    --port 30000

Inference by directly inputting the query and the orginal image into LLaVA1.5. The result will be store in `../results/llava15_llava_wild_empty.json`.

In [None]:
cd .

python sglang_inference.py \
    --dataset LLaVA_Wild \
    --batch_size 8 \
    --prompt_name empty \
    --image_folder "" \
    --output_folder ../results \
    --exp_name llava15_llava_wild_empty \
    --port_value 30000 

Inference by inputting the query and the API-masked image into LLaVA1.5. The result will be store in `../results/llava15_llava_wild_api.json`.

In [None]:
cd .

python sglang_inference.py \
    --dataset LLaVA_Wild \
    --batch_size 8 \
    --prompt_name empty \
    --image_folder "../results/APICLIP_LLaVA_Wild_ViT-L-14-336_22/10_7_LANCZOS_200" \
    --output_folder ../results \
    --exp_name llava15_llava_wild_api \
    --port_value 30000 

Modify the output file to meet the format requirements of the MM-Vet online evaluator. 
- `llava15_llava_wild_empty.json` → `llava15_llava_wild_empty_standard.jsonl`
- `llava15_llava_wild_api.json` → `llava15_llava_wild_api_standard.jsonl`

Follow the evaluation [pipeline](https://github.com/haotian-liu/LLaVA/blob/main/docs/Evaluation.md#llava-bench-in-the-wild) in LLaVA Repo to obtain the final score. For example, we got

|           | Without API | With API |
|:---------:|:-----------:|:--------:|
|LLaVA-Wild |     67.6    |   71.5   |