Patching code unsuitable for batch inference

Hello! The code that runs the entropy model during the computation of patch boundaries appears to be bugged.

I ran into the issue while testing a slight modification of `demo.py` for batch inference. I essentially just changed the line:
```
prompts = [prompt]
```
to
```
prompts = [prompt] * 10
```

I then ran `python3 demo.py "A BLT has"` after downloading the usual HF weights. Even though greedy decoding is enabled by default, I unexpectedly ended up with 10 very different generations. 

The issue seems to originate in [this function](https://github.com/facebookresearch/blt/blob/4ae7a625940743c9438a20ee8a8d7fab898bcd69/bytelatent/data/patcher.py#L63) in `patcher.py`, e.g. in these lines:
```
        max_length = getattr(entropy_model, "max_length", 8192)
        batch_numel = max_length * patching_batch_size
        splits = torch.split(tokens.flatten(), batch_numel)
```
The code generally does not appear to respect boundaries between sequences in a batch, except potentially in a "soft" way via EOS tokens, and will gladly flatten different sequences into the same chunk. Since `entropy_model.max_length` is not defined by default and ends up as `8192`, in this case it simply puts all of the sequences into the same chunk without differentiating them. The result is that the entropy model assigns lower scores to every subsequent repetition of the same prompt in a batch, so that sequences later on in the batch end up with larger and larger patches. 

This is obviously a silly example, but in general one could imagine repeated instructions during batch inference being unreasonably squashed into gigantic patches.

Is this unintentional (the shape annotations in this function match the actual code, so I'm unsure), and were the HF weights trained this way? This sort of thing makes more sense during training, when subsequent documents are on average less closely related to each other, but even there, this could change the distribution of patch sizes to one not seen even during single-sequence inference. Happy to write a PR once this is clarified.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Patching code unsuitable for batch inference #123

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Patching code unsuitable for batch inference #123

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions