Skip to content

Conversation

zheliuyu
Copy link
Contributor

@zheliuyu zheliuyu commented Sep 16, 2025

What does this PR do?

Add support for npu kernelize.

Checklist Before Starting

  • Search for similar PRs.
  • Test layers kernelize.
  • Ruff lint the code.

Test

On GPU: pytest -v test_layer.py

========================================================================================================== test session starts ==========================================================================================================
platform linux -- Python 3.12.3, pytest-8.4.2, pluggy-1.6.0 -- /root/miniconda3/bin/python
cachedir: .pytest_cache
rootdir: /root/kernels
configfile: pytest.ini
plugins: anyio-4.6.2.post1
collected 24 items                                                                                                                                                                                                                      

test_layer.py::test_arg_kinds PASSED                                                                                                                                                                                              [  4%]
test_layer.py::test_hub_forward[SiluAndMulWithKernel] PASSED                                                                                                                                                                      [  8%]
test_layer.py::test_hub_forward[SiluAndMulStringDevice] PASSED                                                                                                                                                                    [ 12%]
test_layer.py::test_hub_forward_rocm SKIPPED (skipping ROCm-only test on host without ROCm)                                                                                                                                       [ 16%]
test_layer.py::test_hub_forward_xpu SKIPPED (skipping XPU-only test on host without XPU)                                                                                                                                          [ 20%]
test_layer.py::test_hub_forward_npu SKIPPED (skipping NPU-only test on host without NPU)                                                                                                                                          [ 25%]
test_layer.py::test_rocm_kernel_mapping PASSED                                                                                                                                                                                    [ 29%]
test_layer.py::test_capability PASSED                                                                                                                                                                                             [ 33%]
test_layer.py::test_layer_fallback_works PASSED                                                                                                                                                                                   [ 37%]
test_layer.py::test_local_layer_repo PASSED                                                                                                                                                                                       [ 41%]
test_layer.py::test_torch_compile_layer_without_fallback[cuda-SiluAndMulWithKernel] PASSED                                                                                                                                        [ 45%]
test_layer.py::test_torch_compile_layer_without_fallback[cuda-SiluAndMulNoCompileKernel] PASSED                                                                                                                                   [ 50%]
test_layer.py::test_torch_compile_layer_with_fallback[cuda-SiluAndMulWithKernel] PASSED                                                                                                                                           [ 54%]
test_layer.py::test_torch_compile_layer_with_fallback[cuda-SiluAndMulNoCompileKernel] PASSED                                                                                                                                      [ 58%]
test_layer.py::test_mapping_contexts PASSED                                                                                                                                                                                       [ 62%]
test_layer.py::test_validate_kernel_layer PASSED                                                                                                                                                                                  [ 66%]
test_layer.py::test_invalid_mode_for_mapping_rejected PASSED                                                                                                                                                                      [ 70%]
test_layer.py::test_kernel_modes PASSED                                                                                                                                                                                           [ 75%]
test_layer.py::test_fallback_used_when_training PASSED                                                                                                                                                                            [ 79%]
test_layer.py::test_invalid_mode_rejected PASSED                                                                                                                                                                                  [ 83%]
test_layer.py::test_kernel_modes_inference PASSED                                                                                                                                                                                 [ 87%]
test_layer.py::test_kernel_modes_mixed PASSED                                                                                                                                                                                     [ 91%]
test_layer.py::test_kernel_modes_cross_fallback PASSED                                                                                                                                                                            [ 95%]
test_layer.py::test_layer_versions PASSED                                                                                                                                                                                         [100%]

=========================================================================================================== warnings summary ============================================================================================================
tests/test_layer.py::test_layer_fallback_works
  /root/kernels/src/kernels/layer.py:918: UserWarning: 
  No kernel mapping found for layer `SiluAndMulNonExisting`. Check if the layer name matches one of the kernels in the mapping or add the kernel you want to use to the mapping. Defaulting to original forward implementation.
    warnings.warn(

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
=============================================================================================== 21 passed, 3 skipped, 1 warning in 7.84s ================================================================================================

On NPU: pytest -v test_layer.py

================================================================================================================================================= test session starts =================================================================================================================================================
platform linux -- Python 3.10.8, pytest-8.3.2, pluggy-1.6.0 -- /root/miniconda3/bin/python
cachedir: .pytest_cache
rootdir: /root/test_kernels/kernels-main
configfile: pytest.ini
plugins: xdist-3.6.1, anyio-4.9.0
collected 24 items                                                                                                                                                                                                                                                                                                    

test_layer.py::test_arg_kinds PASSED                                                                                                                                                                                                                                                                            [  4%]
test_layer.py::test_hub_forward[SiluAndMulWithKernel] SKIPPED (skipping CUDA-only test on host without CUDA)                                                                                                                                                                                                    [  8%]
test_layer.py::test_hub_forward[SiluAndMulStringDevice] SKIPPED (skipping CUDA-only test on host without CUDA)                                                                                                                                                                                                  [ 12%]
test_layer.py::test_hub_forward_rocm SKIPPED (skipping ROCm-only test on host without ROCm)                                                                                                                                                                                                                     [ 16%]
test_layer.py::test_hub_forward_xpu SKIPPED (skipping XPU-only test on host without XPU)                                                                                                                                                                                                                        [ 20%]
test_layer.py::test_hub_forward_npu PASSED                                                                                                                                                                                                                                                                      [ 25%]
test_layer.py::test_rocm_kernel_mapping SKIPPED (Skip on npu devices)                                                                                                                                                                                                                                           [ 29%]
test_layer.py::test_capability SKIPPED (skipping CUDA-only test on host without CUDA)                                                                                                                                                                                                                           [ 33%]
test_layer.py::test_layer_fallback_works PASSED                                                                                                                                                                                                                                                                 [ 37%]
test_layer.py::test_local_layer_repo PASSED                                                                                                                                                                                                                                                                     [ 41%]
test_layer.py::test_torch_compile_layer_without_fallback[cuda-SiluAndMulWithKernel] SKIPPED (skipping CUDA-only test on host without CUDA)                                                                                                                                                                      [ 45%]
test_layer.py::test_torch_compile_layer_without_fallback[cuda-SiluAndMulNoCompileKernel] SKIPPED (skipping CUDA-only test on host without CUDA)                                                                                                                                                                 [ 50%]
test_layer.py::test_torch_compile_layer_with_fallback[cuda-SiluAndMulWithKernel] SKIPPED (skipping CUDA-only test on host without CUDA)                                                                                                                                                                         [ 54%]
test_layer.py::test_torch_compile_layer_with_fallback[cuda-SiluAndMulNoCompileKernel] SKIPPED (skipping CUDA-only test on host without CUDA)                                                                                                                                                                    [ 58%]
test_layer.py::test_mapping_contexts SKIPPED (skipping CUDA-only test on host without CUDA)                                                                                                                                                                                                                     [ 62%]
test_layer.py::test_validate_kernel_layer PASSED                                                                                                                                                                                                                                                                [ 66%]
test_layer.py::test_invalid_mode_for_mapping_rejected SKIPPED (skipping CUDA-only test on host without CUDA)                                                                                                                                                                                                    [ 70%]
test_layer.py::test_kernel_modes SKIPPED (skipping CUDA-only test on host without CUDA)                                                                                                                                                                                                                         [ 75%]
test_layer.py::test_fallback_used_when_training SKIPPED (skipping CUDA-only test on host without CUDA)                                                                                                                                                                                                          [ 79%]
test_layer.py::test_invalid_mode_rejected PASSED                                                                                                                                                                                                                                                                [ 83%]
test_layer.py::test_kernel_modes_inference SKIPPED (skipping CUDA-only test on host without CUDA)                                                                                                                                                                                                               [ 87%]
test_layer.py::test_kernel_modes_mixed SKIPPED (skipping CUDA-only test on host without CUDA)                                                                                                                                                                                                                   [ 91%]
test_layer.py::test_kernel_modes_cross_fallback SKIPPED (skipping CUDA-only test on host without CUDA)                                                                                                                                                                                                          [ 95%]
test_layer.py::test_layer_versions PASSED                                                                                                                                                                                                                                                                       [100%]

================================================================================================================================================== warnings summary ===================================================================================================================================================
tests/test_layer.py::test_layer_fallback_works
  /root/test_kernels/kernels-main/src/kernels/layer.py:918: UserWarning: 
  No kernel mapping found for layer `SiluAndMulNonExisting`. Check if the layer name matches one of the kernels in the mapping or add the kernel you want to use to the mapping. Defaulting to original forward implementation.
    warnings.warn(

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
====================================================================================================================================== 7 passed, 17 skipped, 1 warning in 11.58s ======================================================================================================================================

@zheliuyu zheliuyu marked this pull request as ready for review September 17, 2025 03:20
@zheliuyu
Copy link
Contributor Author

@danieldk Hi, we have added support for npu to the kernels.

This pr is ready for review. Thanks!

@danieldk
Copy link
Member

Thanks for the PR! Are you also planning to add support to kernel-builder (since it's the best way to build compliant kernels)?

Copy link
Member

@danieldk danieldk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bunch of suggestions, also some questions about compatibility between versions, etc.

@zheliuyu zheliuyu force-pushed the main branch 3 times, most recently from 3d5866a to 357d531 Compare September 18, 2025 12:15
@zheliuyu
Copy link
Contributor Author

zheliuyu commented Sep 18, 2025

Thanks for the PR! Are you also planning to add support to kernel-builder (since it's the best way to build compliant kernels)?

Thanks for asking! Yes, we are actively working on adding npu support to kernel-builder.

@zheliuyu zheliuyu force-pushed the main branch 3 times, most recently from 166b775 to 6a58157 Compare September 19, 2025 06:55
Co-authored-by: Daniël de Kok <me@danieldk.eu>
@zheliuyu
Copy link
Contributor Author

On NPU: pytest -v ./kernels/tests/

=========================================================================================================================== test session starts ============================================================================================================================
platform linux -- Python 3.10.8, pytest-8.3.2, pluggy-1.6.0 -- /root/miniconda3/bin/python
cachedir: .pytest_cache
rootdir: /root/test_kernels/kernels
configfile: pytest.ini
plugins: xdist-3.6.1, anyio-4.9.0
collected 73 items                                                                                                                                                                                                                                                         

kernels/tests/test_basic.py::test_gelu_fast SKIPPED (skipping CUDA-only test on host without CUDA)                                                                                                                                                                   [  1%]
kernels/tests/test_basic.py::test_local_kernel SKIPPED (skipping CUDA-only test on host without CUDA)                                                                                                                                                                [  2%]
kernels/tests/test_basic.py::test_local_kernel_path_types SKIPPED (skipping CUDA-only test on host without CUDA)                                                                                                                                                     [  4%]
kernels/tests/test_basic.py::test_relu_metal[dtype0] SKIPPED (skipping macOS-only test on non-macOS platform)                                                                                                                                                        [  5%]
kernels/tests/test_basic.py::test_relu_metal[dtype1] SKIPPED (skipping macOS-only test on non-macOS platform)                                                                                                                                                        [  6%]
kernels/tests/test_basic.py::test_has_kernel[kernel_exists0] SKIPPED (skipping CUDA-only test on host without CUDA)                                                                                                                                                  [  8%]
kernels/tests/test_basic.py::test_has_kernel[kernel_exists1] SKIPPED (skipping CUDA-only test on host without CUDA)                                                                                                                                                  [  9%]
kernels/tests/test_basic.py::test_has_kernel[kernel_exists2] SKIPPED (skipping CUDA-only test on host without CUDA)                                                                                                                                                  [ 10%]
kernels/tests/test_basic.py::test_has_kernel[kernel_exists3] SKIPPED (skipping CUDA-only test on host without CUDA)                                                                                                                                                  [ 12%]
kernels/tests/test_basic.py::test_version PASSED                                                                                                                                                                                                                     [ 13%]
kernels/tests/test_basic.py::test_universal_kernel SKIPPED (skipping CUDA-only test on host without CUDA)                                                                                                                                                            [ 15%]
kernels/tests/test_benchmarks.py::test_gelu_small SKIPPED (skipping CUDA-only test on host without CUDA)                                                                                                                                                             [ 16%]
kernels/tests/test_benchmarks.py::test_gelu_medium SKIPPED (skipping CUDA-only test on host without CUDA)                                                                                                                                                            [ 17%]
kernels/tests/test_benchmarks.py::test_gelu_large SKIPPED (skipping CUDA-only test on host without CUDA)                                                                                                                                                             [ 19%]
kernels/tests/test_doctest.py::test_func_docstring[get_kernel] SKIPPED (skipping CUDA-only test on host without CUDA)                                                                                                                                                [ 20%]
kernels/tests/test_doctest.py::test_func_docstring[get_local_kernel] SKIPPED (skipping CUDA-only test on host without CUDA)                                                                                                                                          [ 21%]
kernels/tests/test_doctest.py::test_func_docstring[get_locked_kernel] SKIPPED (skipping CUDA-only test on host without CUDA)                                                                                                                                         [ 23%]
kernels/tests/test_doctest.py::test_func_docstring[has_kernel] SKIPPED (skipping CUDA-only test on host without CUDA)                                                                                                                                                [ 24%]
kernels/tests/test_doctest.py::test_func_docstring[install_kernel] SKIPPED (skipping CUDA-only test on host without CUDA)                                                                                                                                            [ 26%]
kernels/tests/test_doctest.py::test_func_docstring[kernelize] SKIPPED (skipping CUDA-only test on host without CUDA)                                                                                                                                                 [ 27%]
kernels/tests/test_doctest.py::test_func_docstring[load_kernel] SKIPPED (skipping CUDA-only test on host without CUDA)                                                                                                                                               [ 28%]
kernels/tests/test_doctest.py::test_func_docstring[register_kernel_mapping] SKIPPED (skipping CUDA-only test on host without CUDA)                                                                                                                                   [ 30%]
kernels/tests/test_doctest.py::test_func_docstring[replace_kernel_forward_from_hub] SKIPPED (skipping CUDA-only test on host without CUDA)                                                                                                                           [ 31%]
kernels/tests/test_doctest.py::test_func_docstring[use_kernel_forward_from_hub] SKIPPED (skipping CUDA-only test on host without CUDA)                                                                                                                               [ 32%]
kernels/tests/test_doctest.py::test_func_docstring[use_kernel_mapping] SKIPPED (skipping CUDA-only test on host without CUDA)                                                                                                                                        [ 34%]
kernels/tests/test_doctest.py::test_class_docstring[CUDAProperties] SKIPPED (skipping CUDA-only test on host without CUDA)                                                                                                                                           [ 35%]
kernels/tests/test_doctest.py::test_class_docstring[Device] SKIPPED (skipping CUDA-only test on host without CUDA)                                                                                                                                                   [ 36%]
kernels/tests/test_doctest.py::test_class_docstring[LayerRepository] SKIPPED (skipping CUDA-only test on host without CUDA)                                                                                                                                          [ 38%]
kernels/tests/test_doctest.py::test_class_docstring[LocalLayerRepository] SKIPPED (skipping CUDA-only test on host without CUDA)                                                                                                                                     [ 39%]
kernels/tests/test_doctest.py::test_class_docstring[LockedLayerRepository] SKIPPED (skipping CUDA-only test on host without CUDA)                                                                                                                                    [ 41%]
kernels/tests/test_doctest.py::test_class_docstring[Mode] SKIPPED (skipping CUDA-only test on host without CUDA)                                                                                                                                                     [ 42%]
kernels/tests/test_doctest.py::test_member_docstring[CUDAProperties] SKIPPED (skipping CUDA-only test on host without CUDA)                                                                                                                                          [ 43%]
kernels/tests/test_doctest.py::test_member_docstring[Device] SKIPPED (skipping CUDA-only test on host without CUDA)                                                                                                                                                  [ 45%]
kernels/tests/test_doctest.py::test_member_docstring[LayerRepository] SKIPPED (skipping CUDA-only test on host without CUDA)                                                                                                                                         [ 46%]
kernels/tests/test_doctest.py::test_member_docstring[LocalLayerRepository] SKIPPED (skipping CUDA-only test on host without CUDA)                                                                                                                                    [ 47%]
kernels/tests/test_interval_tree.py::test_find_smallest_interval_match_with_multiple_overlaps PASSED                                                                                                                                                                 [ 49%]
kernels/tests/test_interval_tree.py::test_find_single_match PASSED                                                                                                                                                                                                   [ 50%]
kernels/tests/test_interval_tree.py::test_no_match_outside_all_ranges PASSED                                                                                                                                                                                         [ 52%]
kernels/tests/test_interval_tree.py::test_no_match_in_gap_between_ranges PASSED                                                                                                                                                                                      [ 53%]
kernels/tests/test_interval_tree.py::test_boundary_conditions_start_and_end PASSED                                                                                                                                                                                   [ 54%]
kernels/tests/test_interval_tree.py::test_empty_tree PASSED                                                                                                                                                                                                          [ 56%]
kernels/tests/test_interval_tree.py::test_multiple_equally_specific_matches PASSED                                                                                                                                                                                   [ 57%]
kernels/tests/test_interval_tree.py::test_property_based_interval_tree PASSED                                                                                                                                                                                        [ 58%]
kernels/tests/test_interval_tree.py::test_property_based_edge_cases PASSED                                                                                                                                                                                           [ 60%]
kernels/tests/test_interval_tree.py::test_unique_intervals_override PASSED                                                                                                                                                                                           [ 61%]
kernels/tests/test_kernel_locking.py::test_download_all_hash_validation PASSED                                                                                                                                                                                       [ 63%]
kernels/tests/test_kernel_locking.py::test_load_locked SKIPPED (skipping CUDA-only test on host without CUDA)                                                                                                                                                        [ 64%]
kernels/tests/test_kernel_locking.py::test_layer_locked SKIPPED (skipping CUDA-only test on host without CUDA)                                                                                                                                                       [ 65%]
kernels/tests/test_kernel_upload.py::test_kernel_upload_deletes_as_expected SKIPPED (need --token option to run this test)                                                                                                                                           [ 67%]
kernels/tests/test_layer.py::test_arg_kinds PASSED                                                                                                                                                                                                                   [ 68%]
kernels/tests/test_layer.py::test_hub_forward[SiluAndMulWithKernel] SKIPPED (skipping CUDA-only test on host without CUDA)                                                                                                                                           [ 69%]
kernels/tests/test_layer.py::test_hub_forward[SiluAndMulStringDevice] SKIPPED (skipping CUDA-only test on host without CUDA)                                                                                                                                         [ 71%]
kernels/tests/test_layer.py::test_hub_forward_rocm SKIPPED (skipping ROCm-only test on host without ROCm)                                                                                                                                                            [ 72%]
kernels/tests/test_layer.py::test_hub_forward_xpu SKIPPED (skipping XPU-only test on host without XPU)                                                                                                                                                               [ 73%]
kernels/tests/test_layer.py::test_hub_forward_npu PASSED                                                                                                                                                                                                             [ 75%]
kernels/tests/test_layer.py::test_rocm_kernel_mapping SKIPPED (Skip on npu devices)                                                                                                                                                                                  [ 76%]
kernels/tests/test_layer.py::test_capability SKIPPED (skipping CUDA-only test on host without CUDA)                                                                                                                                                                  [ 78%]
kernels/tests/test_layer.py::test_layer_fallback_works PASSED                                                                                                                                                                                                        [ 79%]
kernels/tests/test_layer.py::test_local_layer_repo PASSED                                                                                                                                                                                                            [ 80%]
kernels/tests/test_layer.py::test_torch_compile_layer_without_fallback[cuda-SiluAndMulWithKernel] SKIPPED (skipping CUDA-only test on host without CUDA)                                                                                                             [ 82%]
kernels/tests/test_layer.py::test_torch_compile_layer_without_fallback[cuda-SiluAndMulNoCompileKernel] SKIPPED (skipping CUDA-only test on host without CUDA)                                                                                                        [ 83%]
kernels/tests/test_layer.py::test_torch_compile_layer_with_fallback[cuda-SiluAndMulWithKernel] SKIPPED (skipping CUDA-only test on host without CUDA)                                                                                                                [ 84%]
kernels/tests/test_layer.py::test_torch_compile_layer_with_fallback[cuda-SiluAndMulNoCompileKernel] SKIPPED (skipping CUDA-only test on host without CUDA)                                                                                                           [ 86%]
kernels/tests/test_layer.py::test_mapping_contexts SKIPPED (skipping CUDA-only test on host without CUDA)                                                                                                                                                            [ 87%]
kernels/tests/test_layer.py::test_validate_kernel_layer PASSED                                                                                                                                                                                                       [ 89%]
kernels/tests/test_layer.py::test_invalid_mode_for_mapping_rejected SKIPPED (skipping CUDA-only test on host without CUDA)                                                                                                                                           [ 90%]
kernels/tests/test_layer.py::test_kernel_modes SKIPPED (skipping CUDA-only test on host without CUDA)                                                                                                                                                                [ 91%]
kernels/tests/test_layer.py::test_fallback_used_when_training SKIPPED (skipping CUDA-only test on host without CUDA)                                                                                                                                                 [ 93%]
kernels/tests/test_layer.py::test_invalid_mode_rejected PASSED                                                                                                                                                                                                       [ 94%]
kernels/tests/test_layer.py::test_kernel_modes_inference SKIPPED (skipping CUDA-only test on host without CUDA)                                                                                                                                                      [ 95%]
kernels/tests/test_layer.py::test_kernel_modes_mixed SKIPPED (skipping CUDA-only test on host without CUDA)                                                                                                                                                          [ 97%]
kernels/tests/test_layer.py::test_kernel_modes_cross_fallback SKIPPED (skipping CUDA-only test on host without CUDA)                                                                                                                                                 [ 98%]
kernels/tests/test_layer.py::test_layer_versions PASSED                                                                                                                                                                                                              [100%]

============================================================================================================================= warnings summary =============================================================================================================================
tests/test_layer.py::test_layer_fallback_works
  /root/test_kernels/kernels/src/kernels/layer.py:918: UserWarning: 
  No kernel mapping found for layer `SiluAndMulNonExisting`. Check if the layer name matches one of the kernels in the mapping or add the kernel you want to use to the mapping. Defaulting to original forward implementation.
    warnings.warn(

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
================================================================================================================ 19 passed, 54 skipped, 1 warning in 17.58s ================================================================================================================

@zheliuyu
Copy link
Contributor Author

@danieldk
All review comments have been addressed, and the code and tests have been updated. Could you please take another look? Thank you very much.

@danieldk
Copy link
Member

Thanks for asking! Yes, we are actively working on adding npu support to kernel-builder

Nice! 🚀

I'll try to do another review pass today or Monday.

@zheliuyu
Copy link
Contributor Author

zheliuyu commented Sep 22, 2025

Supplement test by transformers

script:

from transformers import AutoModelForCausalLM, AutoTokenizer
import logging


logging.basicConfig(level=logging.DEBUG)

model_name = "Qwen/Qwen3-0.6B"

# load the tokenizer and the model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    use_kernels=True,
    device_map="auto"
)

# prepare the model input
prompt = "Give me a short introduction to large language model."
messages = [
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
    enable_thinking=True # Switches between thinking and non-thinking modes. Default is True.
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

# conduct text completion
generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=32768
)
output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist() 

# parsing thinking content
try:
    # rindex finding 151668 (</think>)
    index = len(output_ids) - output_ids[::-1].index(151668)
except ValueError:
    index = 0

thinking_content = tokenizer.decode(output_ids[:index], skip_special_tokens=True).strip("\n")
content = tokenizer.decode(output_ids[index:], skip_special_tokens=True).strip("\n")

print("thinking content:", thinking_content)
print("content:", content)

output:

INFO:root:Using layer `RMSNorm` from repo `kernels-ext-npu/RMSNorm` (revision: main), layer `RMSNorm`
DEBUG:root:kernelize mode: Mode.INFERENCE, repo mode: Mode.FALLBACK

=========repeat x N=============

INFO:root:Using layer `RMSNorm` from repo `kernels-ext-npu/RMSNorm` (revision: main), layer `RMSNorm`
DEBUG:root:kernelize mode: Mode.INFERENCE, repo mode: Mode.FALLBACK

thinking content: <think>
Okay, the user wants a short introduction to a large language model. Let me start by recalling what a large language model is. It's a type of artificial intelligence that can understand and generate text, right? So I should mention its ability to process and generate text. 

I need to keep it concise. Maybe start with a definition, then highlight key features like understanding and generating text. Also, mention its applications, like language translation or customer service. 

Wait, should I include something about how it's trained? Like using massive datasets and deep learning. That's important for explaining its capabilities. Also, maybe touch on its versatility in different languages. 

Avoid jargon. Keep it simple. Let me check if I'm covering all aspects: definition, key features, applications, training methods, and versatility. That should cover it. Alright, time to put it all together in a friendly, informative way.
</think>
content: A large language model (LLM) is an AI system designed to understand and generate human language. It can comprehend context, understand diverse languages, and create text, making it useful for tasks like language translation, content creation, and customer support. Developed through massive datasets and deep learning, these models are versatile and capable of adapting to various tasks.

danieldk added a commit that referenced this pull request Sep 23, 2025
This change add support for Huawei Ascend NPUs. This is #146 with some formatting/typing fixes.

Co-authored-by: zheliuyu <15750543867@163.com>
Copy link
Member

@danieldk danieldk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot! I have merged this through #155 (adds some formatting/typing fixes to pass CI).

@danieldk danieldk closed this Sep 23, 2025
@zheliuyu zheliuyu changed the title Add support for npu kernelize [ merged through #155] Add support for npu kernelize Sep 23, 2025
@zheliuyu zheliuyu changed the title [ merged through #155] Add support for npu kernelize [Merged through #155] Add support for npu kernelize Sep 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants