forked from ggml-org/llama.cpp
    
        
        - 
                Notifications
    You must be signed in to change notification settings 
- Fork 0
merge from zhouwg ggml-hexagon #66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
          
     Merged
      
        
      
            l3utterfly
  merged 186 commits into
  l3utterfly:ggml-hexagon
from
jeffzhou2000:pr_to_upstream
  
      
      
   
  May 23, 2025 
      
    
                
     Merged
            
            merge from zhouwg ggml-hexagon #66
                    l3utterfly
  merged 186 commits into
  l3utterfly:ggml-hexagon
from
jeffzhou2000:pr_to_upstream
  
      
      
   
  May 23, 2025 
              
            Conversation
  
    
      This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
      Learn more about bidirectional Unicode characters
    
  
  
    
    * Modern Linux defaults /proc/sys/kernel/yama/ptrace_scope to 1 * Fixed lldb attach * Simplify by having the child do ggml_print_backtrace_symbols
ggml-ci
* wip llama 4 conversion * rm redundant __init__ * fix conversion * fix conversion * test impl * try this * reshape patch_embeddings_0 * fix view * rm ffn_post_norm * cgraph ok * f32 for pos embd * add image marker tokens * Llama4UnfoldConvolution * correct pixel shuffle * fix merge conflicts * correct * add debug_graph * logits matched, but it still preceives the image incorrectly * fix style * add image_grid_pinpoints * handle llama 4 preprocessing * rm load_image_size * rm unused line * fix * small fix 2 * add test & docs * fix llava-1.6 test * test: add notion of huge models * add comment * add warn about degraded quality
* sycl: reviewing and updating docs * Updates Runtime error codes * Improves OOM troubleshooting entry * Added a llama 3 sample * Updated supported models * Updated releases table
…rg#13482) * Remove mmap workaround on windows After some testing I found that mmap is supported on windows and for many GPUs on Linux. Therefore I remove the workaround for windows since it is not necessary. * Update llama-bench README SYCL backend introduced a workaround that allows execution of llama-bench also without specifying `--mmp 0` flag
* Update CANN model support status * Update of model support * update * update * update * fix format of CANN.md * fix format of CANN.md * fix format of CANN.md
* kv-cache : prepare for SWA ggml-ci * kv-cache : initial iSWA implementation ggml-ci * kv-cache : rework error recovery logic ggml-ci * models : fix Phi-3 SWA parameters ggml-ci * model : adjust Granite to rope factor changes ggml-ci * server : check if context can do shifts ggml-ci * iswa : for now, always enable shifts (experiment) ggml-ci * kv-cache : simplify SWA logic ggml-ci * kv-cache : apply defrag when we fail to find slots for the batch ggml-ci * llama : update docs about llama_decode ggml-ci * kv-cache : update warning logs when no space for the batch is available ggml-ci * llama : add llama_kv_self_seq_pos_min() * kv-cache : keep track of partial SWA computes and print warnings * server : disallow use cases involving partial SWA context ggml-ci * llama : add param to control SWA cache size ggml-ci * minor : clean-up ggml-ci
…to fix infinity values in output (ggml-org#13639)
* CUDA: skip fully masked-out KV in FA vec kernel
* Update mtmd-helper.cpp * Update tools/mtmd/mtmd-helper.cpp Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com> --------- Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>
* small fixes * remove ifdef
…ITY op to accelerate D2D memory copy (ggml-org#13647) * musa: fix build warning (unused parameter) Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> * musa: upgrade MUSA SDK version to rc4.0.1 Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> * musa: use mudnn::Unary::IDENTITY op to accelerate D2D memory copy Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> * Update ggml/src/ggml-cuda/cpy.cu Co-authored-by: Johannes Gäßler <johannesg@5d6.de> * musa: remove MUDNN_CHECK_GEN and use CUDA_CHECK_GEN instead in MUDNN_CHECK Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> --------- Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
* model : disable SWA for Phi models ggml-ci * model : update warning message * model : print warning only if n_swa > 0 * model : fix typo
* kv-cache : simplify the interface ggml-ci * context : revert llama_batch_allocr position change ggml-ci
* server : fix first message identification When using the OpenAI SDK (https://github.com/openai/openai-node/blob/master/src/lib/ChatCompletionStream.ts#L623-L626) we noticed that the expected assistant role is missing in the first streaming message. Fix this by correctly checking for the first message. Co-authored-by: Piotr Stankiewicz <piotr.stankiewicz@docker.com> Signed-off-by: Dorin Geman <dorin.geman@docker.com> * server : Fix checks for first role message for stream=True Co-authored-by: Piotr Stankiewicz <piotr.stankiewicz@docker.com> Signed-off-by: Dorin Geman <dorin.geman@docker.com> --------- Signed-off-by: Dorin Geman <dorin.geman@docker.com> Co-authored-by: Piotr Stankiewicz <piotr.stankiewicz@docker.com>
* Add the endpoints /api/tags and /api/chat Add the endpoints /api/tags and /api/chat, and improved the model metadata response * Remove trailing whitespaces * Removed code that is not needed for copilot to work.
* ggml : add ggml_gelu_na (not approximated) * fix naming order * rename na --> erf * apply review suggesions * revert naming order
Signed-off-by: Emmanuel Ferdman <emmanuelferdman@gmail.com>
* switch retrieval to llama_encode * enable --no-warmup for retrieval
…orks in a standard Android APP)
  
    Sign up for free
    to join this conversation on GitHub.
    Already have an account?
    Sign in to comment
  
      Labels
      
    Apple Metal
  
    build
  
    devops
  
    documentation
  Improvements or additions to documentation 
  
    examples
  
    ggml
  
    Nvidia GPU
  
    python
  
    script
  
    server
  
    SYCL
  
    testing
  
    Vulkan
  Add this suggestion to a batch that can be applied as a single commit.
  This suggestion is invalid because no changes were made to the code.
  Suggestions cannot be applied while the pull request is closed.
  Suggestions cannot be applied while viewing a subset of changes.
  Only one suggestion per line can be applied in a batch.
  Add this suggestion to a batch that can be applied as a single commit.
  Applying suggestions on deleted lines is not supported.
  You must change the existing code in this line in order to create a valid suggestion.
  Outdated suggestions cannot be applied.
  This suggestion has been applied or marked resolved.
  Suggestions cannot be applied from pending reviews.
  Suggestions cannot be applied on multi-line comments.
  Suggestions cannot be applied while the pull request is queued to merge.
  Suggestion cannot be applied right now. Please check back later.
  
    
  
    
No description provided.