[Android] initial UpdateLogitsOrProbOnCPUSync freezes OS

## 🐛 Bug

:wave: been diving more into the freezing issue noted in https://github.com/mlc-ai/mlc-llm/issues/1379, creating a new issue with better details and ways to reproduce.

When I load a llama-2-7b model (any of the default configured quantizations) and prompt the model with a large input:
- Android OS freezes and becomes unresponsive
- System application crashes and restarts itself
- The output of the LLM starts appearing in the MLC-Chat app after the os becomes responsive again.

Additional information: prompt size ~ 2700 characters (which is gets truncated: `The prompt tokens are more than `max_window_size`, the input will be truncated.`) which is not the root cause but makes it easier to reproduce.

-----------------------

When timing execution of all the different functions in llm_chat.cc, I can see that the underlying issue is coming from:

> SampleTokenFromLogits executed in 65189.915079 ms

Diving more deep into where it's being hold back, it's coming from:

```cpp
  void UpdateLogitsOrProbOnCPUSync(NDArray logits_or_prob) {    
    if (!logits_on_cpu_.defined()) {
      logits_on_cpu_ = logits_or_prob.CopyTo(DLDevice{kDLCPU, 0});
    } else {
      ICHECK_EQ(logits_on_cpu_->shape[0], logits_or_prob->shape[0])
          << "Expect size of logits remain unchanged";
      logits_on_cpu_.CopyFrom(logits_or_prob);
    }
    TVMSynchronize(device_.device_type, device_.device_id, nullptr);
  }
```

The next line when logits aren't found on the CPU is the last line to hit before it halts:

> logits_on_cpu_ = logits_or_prob.CopyTo(DLDevice{kDLCPU, 0});

Will continue to look more into what above line does but any :eyes: are highly appreciated!



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Android] initial UpdateLogitsOrProbOnCPUSync freezes OS #1401

🐛 Bug

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Android] initial UpdateLogitsOrProbOnCPUSync freezes OS #1401

Description

🐛 Bug

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions