Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
113 commits
Select commit Hold shift + click to select a range
920b578
Update bach_size for 8x80 GB GPU
anandhu-eng Aug 24, 2025
dc00b25
Update the hash and submodules for v5.0
anandhu-eng Aug 24, 2025
c15c6da
Update run-nvidia.sh
anandhu-eng Aug 24, 2025
51bf543
llama 2 model download for v5.1-dev
anandhu-eng Aug 24, 2025
13e52a3
Update seperator and force update the submodules
anandhu-eng Aug 24, 2025
da09a92
Update gpu all run opts for podman
anandhu-eng Aug 24, 2025
85e48ec
fix indentation
anandhu-eng Aug 24, 2025
2d2fe86
custom -> mlcommons
anandhu-eng Aug 24, 2025
a9b433b
Update meta.yaml
anandhu-eng Aug 24, 2025
1839a49
Test commit
anandhu-eng Sep 1, 2025
e2c28c1
populate ompi paths for trtllm
anandhu-eng Sep 1, 2025
af7859c
[Automated Commit] Format Codebase [skip ci]
github-actions[bot] Sep 1, 2025
e88024e
test commit
anandhu-eng Sep 2, 2025
3f22241
Add customizability for tp size
anandhu-eng Sep 3, 2025
72b0931
add pp size parameter
anandhu-eng Sep 3, 2025
bdcb611
Add TP size
anandhu-eng Sep 3, 2025
9505664
Update meta.yaml
anandhu-eng Sep 3, 2025
8dabbe8
Update meta.yaml
anandhu-eng Sep 3, 2025
8730245
reset the python path as it clashes with the torch during build
anandhu-eng Sep 3, 2025
6a7f0ba
update cmake version for building ucxx library
anandhu-eng Sep 3, 2025
4baca94
Initialise git lfs while building trt llm
anandhu-eng Sep 3, 2025
9ed9f66
install git lfs
anandhu-eng Sep 3, 2025
71e671c
Update meta.yaml
anandhu-eng Sep 3, 2025
bffeb6f
add ccache library
anandhu-eng Sep 3, 2025
bc455c3
Add support for ccache
anandhu-eng Sep 3, 2025
3aeac6a
update version re
anandhu-eng Sep 3, 2025
3f056b3
Update meta.yaml
anandhu-eng Sep 3, 2025
6e47413
Update meta.yaml
anandhu-eng Sep 4, 2025
c21a8f0
Add support for PP size
anandhu-eng Sep 4, 2025
18d7496
create default value for tp and pp size
anandhu-eng Sep 4, 2025
28dc01c
Update run.sh
anandhu-eng Sep 4, 2025
92053f9
Update meta.yaml
anandhu-eng Sep 4, 2025
f0b3008
Add gzip extract command
anandhu-eng Sep 4, 2025
78a4cc8
[Automated Commit] Format Codebase [skip ci]
github-actions[bot] Sep 4, 2025
215fd3e
Enable proper error handling
anandhu-eng Sep 4, 2025
46e1a63
Add dependency to download the llama2 dataset from mlcommons
anandhu-eng Sep 4, 2025
41c76eb
Update code for dataset linkage
anandhu-eng Sep 4, 2025
2071341
[Automated Commit] Format Codebase [skip ci]
github-actions[bot] Sep 4, 2025
84fe118
Provide calibration dataset while quantizing llama2 70b model
anandhu-eng Sep 4, 2025
6948dd0
Update meta.yaml
anandhu-eng Sep 4, 2025
e801dea
update file paths
anandhu-eng Sep 4, 2025
59607ba
Update customize.py
anandhu-eng Sep 4, 2025
2da1447
Update run dir
anandhu-eng Sep 5, 2025
0431120
clean code
anandhu-eng Sep 5, 2025
e6da890
Update customize.py
anandhu-eng Sep 5, 2025
12ecb94
Update customize.py
anandhu-eng Sep 5, 2025
e7ae213
populate number of gpus to app-mlperf-inference-nvidia
anandhu-eng Sep 5, 2025
dc123a8
fix variation name
anandhu-eng Sep 5, 2025
6de0bf0
Add support for model precision
anandhu-eng Sep 5, 2025
1fc2766
[Automated Commit] Format Codebase [skip ci]
github-actions[bot] Sep 5, 2025
6fade3c
add float 4 and float8 variation
anandhu-eng Sep 5, 2025
15ec758
Update customize.py
anandhu-eng Sep 5, 2025
7af4f53
Update meta.yaml
anandhu-eng Sep 5, 2025
4f54746
Update customize.py
anandhu-eng Sep 5, 2025
53d7cfd
[Automated Commit] Format Codebase [skip ci]
github-actions[bot] Sep 5, 2025
e192a76
fix typo
anandhu-eng Sep 5, 2025
e53b9a8
Update customize.py
anandhu-eng Sep 5, 2025
6528999
Update customize.py
anandhu-eng Sep 5, 2025
27d9030
Update customize.py
anandhu-eng Sep 5, 2025
379ea7a
Update customize.py
anandhu-eng Sep 5, 2025
844450c
changes for testing
anandhu-eng Sep 5, 2025
bb9236c
Add calibration dataset dependency
anandhu-eng Sep 5, 2025
ee0d252
Empty the python path env variable
anandhu-eng Sep 6, 2025
38afc5e
Provide ompi path
anandhu-eng Sep 6, 2025
1adf8cf
[Automated Commit] Format Codebase [skip ci]
github-actions[bot] Sep 6, 2025
37fa429
populate PREPROCESSED_DATA_DIR
anandhu-eng Sep 8, 2025
d26e0e4
[Automated Commit] Format Codebase [skip ci]
github-actions[bot] Sep 8, 2025
3ad5ecc
Update meta.yaml
anandhu-eng Sep 8, 2025
2ca1e9e
test commit
anandhu-eng Sep 8, 2025
52b81fa
add name for nvidia scratch space to propagate the variations
anandhu-eng Sep 8, 2025
dd2631b
temperory commit
anandhu-eng Sep 8, 2025
bc11773
fix for accuracy check
anandhu-eng Sep 8, 2025
887868f
[Automated Commit] Format Codebase [skip ci]
github-actions[bot] Sep 8, 2025
cc9ba90
Update run-nvidia.sh
anandhu-eng Sep 8, 2025
c791e41
Add preprocess step from nvidia
anandhu-eng Sep 9, 2025
c998510
[Automated Commit] Format Codebase [skip ci]
github-actions[bot] Sep 9, 2025
6a162f5
skip sys utils if dataset taken from mlc storage
anandhu-eng Sep 9, 2025
2637b95
Merge branch 'anandhu-eng-patch-1' of https://github.com/mlcommons/ml…
anandhu-eng Sep 9, 2025
889ffd6
fix typo
anandhu-eng Sep 9, 2025
1d3157a
fix data path
anandhu-eng Sep 9, 2025
1a296ce
Update env variable
anandhu-eng Sep 9, 2025
3e6bbca
mount calibration dataset
anandhu-eng Sep 9, 2025
9cc3f5a
fix directory issue
anandhu-eng Sep 9, 2025
619734d
Update run.sh
anandhu-eng Sep 9, 2025
a192cd0
clean code
anandhu-eng Sep 9, 2025
11f4dde
revert default value for BUILD_TRTLLM
anandhu-eng Sep 9, 2025
5b30bd3
# -> ;
anandhu-eng Sep 9, 2025
9dcd96b
Update meta.yaml
anandhu-eng Sep 9, 2025
2ca11f8
Update dependency name
anandhu-eng Sep 9, 2025
07c8f3a
Update meta.yaml
anandhu-eng Sep 9, 2025
501e5b8
Update meta.yaml
anandhu-eng Sep 9, 2025
907ffff
Merge branch 'dev' into anandhu-eng-patch-1
anandhu-eng Sep 9, 2025
913cc96
Prevent hardcoding github repo owner
anandhu-eng Sep 9, 2025
f372f5e
Exclude windows-latest runs till wmic error is fixed
anandhu-eng Sep 10, 2025
990aea3
Support usage of hosted quantized llama2 models
anandhu-eng Sep 10, 2025
e6636e8
further polishing of code
anandhu-eng Sep 10, 2025
af1896c
Fix typo
anandhu-eng Sep 10, 2025
e3d0d22
preprocessing needs both calibration and validation files
anandhu-eng Sep 10, 2025
a45b497
fix typo
anandhu-eng Sep 10, 2025
f43bebb
fix typo
anandhu-eng Sep 10, 2025
0e05a53
point symlink one layer deeper to prevent duplicate folder
anandhu-eng Sep 10, 2025
7bc7642
revert to using cp instead of softlink
anandhu-eng Sep 10, 2025
55ab8f4
fix directory creation
anandhu-eng Sep 10, 2025
80faefc
wrap nvidia preprocessing under nvidia variation
anandhu-eng Sep 10, 2025
e39e190
update nvidia variation tags
anandhu-eng Sep 10, 2025
041a950
Updation of tags for calibration dataset
anandhu-eng Sep 10, 2025
db45516
Merge branch 'dev' into anandhu-eng-patch-1
arjunsuresh Sep 10, 2025
dbd4524
Update test-mlc-script-features.yml
arjunsuresh Sep 10, 2025
341ef52
fix typo
anandhu-eng Sep 10, 2025
af40703
turn windows run back on
anandhu-eng Sep 10, 2025
34875eb
fix typo
anandhu-eng Sep 10, 2025
5f1d7ce
[Automated Commit] Format Codebase [skip ci]
github-actions[bot] Sep 10, 2025
2233591
Fix typo
anandhu-eng Sep 10, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
66 changes: 56 additions & 10 deletions script/app-mlperf-inference-nvidia/customize.py
Original file line number Diff line number Diff line change
Expand Up @@ -321,15 +321,30 @@ def preprocess(i):
# path to which the data file is present
target_data_path = os.path.join(
env['MLPERF_SCRATCH_PATH'],
'preprocessed_data',
'open_orca')
'data',
'llama2-70b')
# path to the dataset file
target_data_file_path = os.path.join(
env['MLPERF_SCRATCH_PATH'],
'data',
'llama2-70b',
'open_orca_gpt4_tokenized_llama.sampled_24576.pkl')

preprocessed_data_for_accuracy_checker = os.path.join(
env['MLPERF_SCRATCH_PATH'],
'preprocessed_data',
'open_orca',
'open_orca_gpt4_tokenized_llama.sampled_24576.pkl')

if not env.get('LLAMA2_PRE_QUANTIZED_CHECKPOINT_PATH'):
target_calibration_data_file_path = os.path.join(
env['MLPERF_SCRATCH_PATH'],
'data',
'llama2-70b',
'open_orca_gpt4_tokenized_llama.calibration_1000.pkl')

tmp_tp_size = env['MLC_NVIDIA_TP_SIZE']
tmp_pp_size = env['MLC_NVIDIA_PP_SIZE']
if tmp_tp_size == "1":
fp8_model_path = os.path.join(
env['MLPERF_SCRATCH_PATH'],
Expand All @@ -343,15 +358,35 @@ def preprocess(i):
'models',
'Llama2',
'fp8-quantized-ammo',
f'llama2-70b-chat-hf-tp{tmp_tp_size}pp1-fp8')
f'llama2-70b-chat-hf-tp{tmp_tp_size}pp{tmp_pp_size}-fp8')

# check the presence of validation dataset
if not os.path.exists(target_data_file_path):
if env.get('MLC_NVIDIA_LLAMA_DATASET_FILE_PATH', '') == '':
if env.get('MLC_DATASET_OPENORCA_PREPROCESSED_PATH', '') == '':
return {
'return': 1, 'error': 'Please specify the path to LLAMA2 dataset (pickle file)'}
'return': 1, 'error': 'Llama2 70B validation dataset not present.'}
if not os.path.exists(target_data_path):
cmds.append(f"mkdir {target_data_path}")
cmds.append(f"mkdir -p {target_data_path}")
cmds.append(
f"ln -sf {env['MLC_NVIDIA_LLAMA_DATASET_FILE_PATH']} {target_data_file_path}")
f"ln -sf {env['MLC_DATASET_OPENORCA_PREPROCESSED_PATH']} {target_data_file_path}")

# check the presence of calibration dataset
if not env.get('LLAMA2_PRE_QUANTIZED_CHECKPOINT_PATH'):
if not os.path.exists(target_calibration_data_file_path):
if env.get('MLC_DATASET_OPENORCA_CALIBRATION_PATH', '') == '':
return {
'return': 1, 'error': 'Llama2 70B calibration dataset not present.'}
if not os.path.exists(target_data_path):
cmds.append(f"mkdir -p {target_data_path}")
cmds.append(
f"ln -sf {env['MLC_DATASET_OPENORCA_CALIBRATION_PATH']} {target_calibration_data_file_path}")

if not os.path.exists(preprocessed_data_for_accuracy_checker):
if not os.path.exists(preprocessed_data_for_accuracy_checker):
cmds.append(
f"mkdir -p {os.path.dirname(preprocessed_data_for_accuracy_checker)}")
cmds.append(
f"ln -sf {env['MLC_DATASET_OPENORCA_PREPROCESSED_PATH']} {preprocessed_data_for_accuracy_checker}")

model_name = "llama2-70b"
model_path = fp8_model_path
Expand Down Expand Up @@ -550,6 +585,11 @@ def preprocess(i):
if gpu_inference_streams:
run_config += f" --gpu_inference_streams={gpu_inference_streams}"

model_precision = env.get(
'MLC_MLPERF_MODEL_PRECISION').replace('float', 'fp')
if model_precision:
run_config += f" --precision={model_precision}"

dla_copy_streams = env.get(
'MLC_MLPERF_NVIDIA_HARNESS_DLA_COPY_STREAMS')
if dla_copy_streams:
Expand Down Expand Up @@ -688,8 +728,12 @@ def preprocess(i):
run_config += f" --use_fp8"

if "llama2" in env["MLC_MODEL"]:
run_config += f" --fp8_quant_model_path={fp8_model_path}"
run_config += f" --tensor_parallelism={tmp_tp_size}"
run_config += f" --checkpoint_dir={fp8_model_path}"
if env.get('MLC_MLPERF_INFERENCE_POST_5_0'):
run_config += f" --trtllm_build_flags=tensor_parallelism:{tmp_tp_size},pipeline_parallelism:{tmp_pp_size}"
else:
run_config += f" --tensor_parallelism={tmp_tp_size}"
run_config += f" --pipeline_parallelism={tmp_pp_size}"

enable_sort = env.get('MLC_MLPERF_NVIDIA_HARNESS_ENABLE_SORT')
if enable_sort and not is_false(enable_sort):
Expand Down Expand Up @@ -757,9 +801,11 @@ def preprocess(i):
hpcx_paths.append("/opt/hpcx/ucx/lib")
if os.path.exists("/opt/hpcx/ucc/lib"):
hpcx_paths.append("/opt/hpcx/ucc/lib")
if os.path.exists("/opt/hpcx/ompi/lib"):
hpcx_paths.append("/opt/hpcx/ompi/lib")

env['+LD_LIBRARY_PATH'] = hpcx_paths + env['+LD_LIBRARY_PATH']

env['+PYTHONPATH'] = []
# print(env)

return {'return': 0}
Expand Down
10 changes: 8 additions & 2 deletions script/app-mlperf-inference-nvidia/meta.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -355,6 +355,8 @@ variations:
group: batchsize-format-change
v5.0+:
group: batchsize-format-change
env:
MLC_MLPERF_INFERENCE_POST_5_0: "yes"
v5.0:
base:
- v5.0+
Expand Down Expand Up @@ -1279,13 +1281,17 @@ variations:
MLC_MLPERF_NVIDIA_HARNESS_NUM_SORT_SEGMENTS: '2'
MLC_MLPERF_NVIDIA_HARNESS_SKIP_POSTPROCESS: True

gpu_memory.80,pre5.0,num-gpus.2,llama2-70b,offline,run_harness:
gpu_memory.80,pre5.0,num-gpus.2,llama2-70b_,offline,run_harness:
default_variations:
batch-size: batch_size.896

gpu_memory.80,v5.0+,num-gpus.2,llama2-70b,offline,run_harness:
gpu_memory.80,v5.0+,num-gpus.2,llama2-70b_,offline,run_harness:
default_variations:
batch-size: batch_size."llama2-70b:1024"

gpu_memory.80,v5.0+,num-gpus.8,llama2-70b_,offline,run_harness:
default_variations:
batch-size: batch_size."llama2-70b:4096"

gpu_memory.16,pre5.0,gptj_,offline,run_harness:
default_variations:
Expand Down
90 changes: 87 additions & 3 deletions script/app-mlperf-inference/meta.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,8 @@ default_env:
MLC_TEST_QUERY_COUNT: '10'
MLC_MLPERF_QUANTIZATION: off
MLC_GET_PLATFORM_DETAILS: no
MLC_NVIDIA_TP_SIZE: "2"
MLC_NVIDIA_PP_SIZE: "1"

env:
MLC_MLPERF_PRINT_SUMMARY: "no"
Expand Down Expand Up @@ -62,8 +64,8 @@ input_mapping:
readme: MLC_MLPERF_README
debug: MLC_DEBUG_SCRIPT_BENCHMARK_PROGRAM
gpu_name: MLC_NVIDIA_GPU_NAME
nvidia_llama2_dataset_file_path: MLC_NVIDIA_LLAMA_DATASET_FILE_PATH
tp_size: MLC_NVIDIA_TP_SIZE
pp_size: MLC_NVIDIA_PP_SIZE
use_dataset_from_host: MLC_USE_DATASET_FROM_HOST

predeps: False
Expand Down Expand Up @@ -324,9 +326,21 @@ variations:
MLC_MLPERF_NVIDIA_SKIP_GPTJ:
- "yes"
- tags: get,ml-model,llama2-70b,_nvidia,_fp8
names:
- llama2-model
update_tags_from_env_with_prefix:
_tp-size.:
- MLC_NVIDIA_TP_SIZE
_pp-size.:
- MLC_NVIDIA_PP_SIZE
skip_if_env:
MLC_MLPERF_NVIDIA_SKIP_LLAMA2_70B:
- "yes"
- tags: get,dataset,preprocessed,openorca,_calibration,_mlcommons,_nvidia
skip_if_env:
MLC_MLPERF_NVIDIA_SKIP_LLAMA2_70B:
- "yes"
- tags: get,dataset,preprocessed,openorca,_validation,_mlcommons,_nvidia
skip_if_env:
MLC_MLPERF_NVIDIA_SKIP_LLAMA2_70B:
- "yes"
Expand Down Expand Up @@ -505,29 +519,65 @@ variations:
image_name: mlperf-inference-nvidia-v4.1-dev-llm
deps:
- tags: get,ml-model,llama2-70b,_nvidia,_fp8
names:
- llama2-model
update_tags_from_env_with_prefix:
_tp-size.:
- MLC_NVIDIA_TP_SIZE
_pp-size.:
- MLC_NVIDIA_PP_SIZE
- tags: get,dataset,preprocessed,openorca,_calibration,_mlcommons,_nvidia
- tags: get,dataset,preprocessed,openorca,_validation,_mlcommons,_nvidia
env:
BUILD_TRTLLM: 1

nvidia-original,r4.1_default,llama2-70b_:
docker:
deps:
- tags: get,ml-model,llama2-70b,_nvidia,_fp8
names:
- llama2-model
update_tags_from_env_with_prefix:
_tp-size.:
- MLC_NVIDIA_TP_SIZE
_pp-size.:
- MLC_NVIDIA_PP_SIZE
- tags: get,dataset,preprocessed,openorca,_calibration,_mlcommons,_nvidia
- tags: get,dataset,preprocessed,openorca,_validation,_mlcommons,_nvidia
env:
BUILD_TRTLLM: 1

nvidia-original,r5.0_default,llama2-70b_:
docker:
deps:
- tags: get,ml-model,llama2-70b,_nvidia,_fp8
names:
- llama2-model
update_tags_from_env_with_prefix:
_tp-size.:
- MLC_NVIDIA_TP_SIZE
_pp-size.:
- MLC_NVIDIA_PP_SIZE
- tags: get,dataset,preprocessed,openorca,_calibration,_mlcommons,_nvidia
- tags: get,dataset,preprocessed,openorca,_validation,_mlcommons,_nvidia

nvidia-original,r5.1-dev_default,llama2-70b_:
default_variations:
precision: float8
docker:
deps:
- tags: get,ml-model,llama2-70b,_nvidia,_fp8,_v5.0
names:
- llama2-model
update_tags_from_env_with_prefix:
_tp-size.:
- MLC_NVIDIA_TP_SIZE
_pp-size.:
- MLC_NVIDIA_PP_SIZE
- tags: get,dataset,preprocessed,openorca,_calibration,_mlcommons,_nvidia
- tags: get,dataset,preprocessed,openorca,_validation,_mlcommons,_nvidia
env:
BUILD_TRTLLM: 1

nvidia-original:
docker:
Expand Down Expand Up @@ -594,6 +644,8 @@ variations:
update_tags_from_env_with_prefix:
"_gpu_memory." :
- MLC_NVIDIA_GPU_MEMORY
"_num-gpus.":
- MLC_CUDA_NUM_DEVICES
update_tags_from_env:
- MLC_NVIDIA_HARNESS_GPU_VARIATION

Expand Down Expand Up @@ -1293,6 +1345,16 @@ variations:
MLC_USE_MODEL_FROM_HOST:
- 'yes'
tags: get,ml-model,llama2
names:
- llama2-model
- tags: get,dataset,preprocessed,openorca,_calibration,_mlcommons
enable_if_any_env:
MLC_USE_DATASET_FROM_HOST:
- 'yes'
- tags: get,dataset,preprocessed,openorca,_validation,_mlcommons
enable_if_any_env:
MLC_USE_DATASET_FROM_HOST:
- 'yes'

llama2-70b_,amd:
docker:
Expand All @@ -1306,6 +1368,8 @@ variations:
MLC_USE_MODEL_FROM_HOST:
- 'yes'
tags: get,ml-model,llama2,_amd,_pytorch
names:
- llama2-model

mixtral-8x7b:
group:
Expand Down Expand Up @@ -1830,6 +1894,12 @@ variations:
fp32:
alias: float32

fp4:
alias: float4

fp8:
alias: float8

float32:
group: precision
default: true
Expand All @@ -1842,6 +1912,16 @@ variations:
kilt-harness:
tags: _fp32

float4:
group: precision
env:
MLC_MLPERF_MODEL_PRECISION: float4

float8:
group: precision
env:
MLC_MLPERF_MODEL_PRECISION: float8

float16:
group: precision
env:
Expand Down Expand Up @@ -2128,10 +2208,10 @@ variations:
reproducibility
add_deps_recursive:
nvidia-inference-common-code:
tags: _custom,_v5.1-dev
tags: _mlcommons,_v5.1-dev
nvidia-inference-server:
version: r5.0
tags: _custom
tags: _mlcommons
nvidia-harness:
tags: _v5.0
intel-harness:
Expand Down Expand Up @@ -2285,6 +2365,9 @@ docker:
- "${{ GPTJ_CHECKPOINT_PATH }}:${{ GPTJ_CHECKPOINT_PATH }}"
- "${{ MLC_CRITEO_PREPROCESSED_PATH }}:${{ MLC_CRITEO_PREPROCESSED_PATH }}"
- "${{ LLAMA2_CHECKPOINT_PATH }}:${{ LLAMA2_CHECKPOINT_PATH }}"
- "${{ LLAMA2_PRE_QUANTIZED_CHECKPOINT_PATH }}:${{ LLAMA2_PRE_QUANTIZED_CHECKPOINT_PATH }}"
- "${{ MLC_DATASET_OPENORCA_PREPROCESSED_PATH }}:${{ MLC_DATASET_OPENORCA_PREPROCESSED_PATH }}"
- "${{ MLC_DATASET_OPENORCA_CALIBRATION_PATH }}:${{ MLC_DATASET_OPENORCA_CALIBRATION_PATH }}"
- "${{ MLC_NVIDIA_LLAMA_DATASET_FILE_PATH }}:${{ MLC_NVIDIA_LLAMA_DATASET_FILE_PATH }}"
- "${{ SDXL_CHECKPOINT_PATH }}:${{ SDXL_CHECKPOINT_PATH }}"
- "${{ MLC_DATASET_KITS19_PREPROCESSED_PATH }}:${{ MLC_DATASET_KITS19_PREPROCESSED_PATH }}"
Expand Down Expand Up @@ -2314,3 +2397,4 @@ docker:
intel_gptj_int8_model_path: MLC_MLPERF_INFERENCE_INTEL_GPTJ_INT8_MODEL_PATH
nvidia_llama2_dataset_file_path: MLC_NVIDIA_LLAMA_DATASET_FILE_PATH
tp_size: MLC_NVIDIA_TP_SIZE
pp_size: MLC_NVIDIA_PP_SIZE
12 changes: 12 additions & 0 deletions script/build-mlperf-inference-server-nvidia/customize.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
from mlc import utils
import os
import shutil
from utils import *


def preprocess(i):
Expand All @@ -18,6 +19,15 @@ def preprocess(i):
env['+LIBRARY_PATH'].append(os.path.join(
env['MLC_TENSORRT_INSTALL_PATH'], "lib"))

if is_true(env.get('BUILD_TRTLLM')):
hpcx_paths = []
if os.path.exists("/opt/hpcx/ucx/lib"):
hpcx_paths.append("/opt/hpcx/ucx/lib")
if os.path.exists("/opt/hpcx/ucc/lib"):
hpcx_paths.append("/opt/hpcx/ucc/lib")
if os.path.exists("/opt/hpcx/ompi/lib"):
hpcx_paths.append("/opt/hpcx/ompi/lib")

cxxflags = [
"-Wno-error=switch",
"-DDALI_1_15=1",
Expand All @@ -38,6 +48,8 @@ def preprocess(i):
env['+ CXXFLAGS'] = []

env['+ CXXFLAGS'] += cxxflags
env['+LD_LIBRARY_PATH'] = hpcx_paths + env['+LD_LIBRARY_PATH']
env['+PYTHONPATH'] = []
return {'return': 0}


Expand Down
Loading
Loading