inference was killed due to memory(100GB was used)

Notice: In order to resolve issues more efficiently, please raise issue following the template.
（注意：为了更加高效率解决您遇到的问题，请按照模板提问，补充细节）

## ❓ Questions and Help

### Before asking:
1. search the issues.
2. search the docs.



#### What is your question?
The length of the audio file is 15 hours, and when I use the code below for inference, the process is killed due to running out of memory. Is this behavior normal, or could it be a memory leak?
```python3
    model = AutoModel(
        model="paraformer-zh",
        vad_model="fsmn-vad",
        punc_model="ct-punc",
        spk_model="cam++",
    )

    res = model.generate(input=audio_path, batch_size_s=batch_size_s, hotword=hotword)
```
![image](https://github.com/user-attachments/assets/ad616442-5102-4c8b-900f-677dc3271991)


#### Code

```python3
    model = AutoModel(
        model="paraformer-zh",
        vad_model="fsmn-vad",
        punc_model="ct-punc",
        spk_model="cam++",
    )

    res = model.generate(input=audio_path, batch_size_s=batch_size_s, hotword=hotword)
```

#### What have you tried?

I have tried shorter length of audio, which is normal.

#### What's your environment?

 - OS (e.g., Linux): Ubuntu 22.04
 - FunASR Version (e.g., 1.0.0): 4294d2166ebcf560e9e2ccb5c8454fa4973f061d 
 - ModelScope Version (e.g., 1.11.0): 1.18.1
 - PyTorch Version (e.g., 2.0.0): 
 - 
 ```
 pytorch-wpe              0.0.1
torch                    2.4.1
torch-complex            0.4.4
torchaudio               2.4.1
torchvision              0.19.1
(funasr) yuhang@a2:/opt/repo/FunAS
 ```
 - How you installed funasr (`pip`, source): source git code, pip install -e .
 - Python version: 3.10.14
 - GPU (e.g., V100M32) 4090
```shell
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.183.01             Driver Version: 535.183.01   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 4090        Off | 00000000:01:00.0 Off |                  Off |
| 30%   49C    P2             220W / 450W |  10265MiB / 24564MiB |     75%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

inference was killed due to memory(100GB was used) #2116

❓ Questions and Help

Before asking:

What is your question?

Code

What have you tried?

What's your environment?

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

inference was killed due to memory(100GB was used) #2116

Description

❓ Questions and Help

Before asking:

What is your question?

Code

What have you tried?

What's your environment?

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions