Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No "batched" Inference #170

Open
michaelfeil opened this issue Jul 24, 2024 · 4 comments
Open

No "batched" Inference #170

michaelfeil opened this issue Jul 24, 2024 · 4 comments
Assignees
Labels
bug Something isn't working priority-p1 triage

Comments

@michaelfeil
Copy link

michaelfeil commented Jul 24, 2024

I noticed a number of various things are incorrectly implemented.

classifier = pipeline("sentiment-analysis", device="cpu",
                model="distilbert/distilbert-base-uncased-finetuned-sst-2-english")

def is_positive_dialogue_ending(file) -> bool:
    dialogue_ending = file.read()[-512:] # 512 chars != 512 tokens
    return classifier(dialogue_ending)[0]["label"] == "POSITIVE" # NOT performing batched inference, performs threaded, blocking single inference -> really bad once you are on GPU.

To perform batched inference, you would need to add multiple batches, with a batch size > 1 to the sentence classification pipeline.

classifier = pipeline("sentiment-analysis", device="cpu",  
                # tokenizer from left 
                truncation="right/left", # sets truncation from the right side, I think this is not possible
                model="distilbert/distilbert-base-uncased-finetuned-sst-2-english")

def is_positive_dialogue_ending(files: list) -> list[bool]:
    dialogue_endings = [file.read()for file in files] # something ike this
    return (label  == "POSITIVE" for label in  classifier(dialogue_endings ))  # something like this

As a result, I launched https://github.com/michaelfeil/embed and https://github.com/michaelfeil/infinity for sentence classification. The backend queues and batches the requests, allowing better instructions to be used. This is useful for CPU, but crucial for e.g. CPU usage!

from embed import BatchedInference
from concurrent.futures import Future

def is_positive_dialogue_ending(file) -> bool:
    dialogue_ending = file.read()[-512:] # 512 chars != 512 tokens
    return classifier(dialogue_ending)[0]["label"] == "POSITIVE" # NOT performing batched inference, performs threaded, blocking single inference -> really bad once you are on GPU.

# Run any model
register = BatchedInference(
    model_id=[
        # classification models
        "distilbert/distilbert-base-uncased-finetuned-sst-2-english")
    ],
    # engine to `torch` or `optimum`
    engine="torch",
    # device `cuda` (Nvidia/AMD) or `cpu`
    device="cpu",
)


def is_positive_dialogue_ending(file) -> bool:
    """not multiprocessing recommended, but neither is transformers. Threading is encouraged."
    dialogue_ending = file.read()[-512:] # 512 chars != 512 tokens
    future: "Future" = register.classify(model_id="philschmid/tiny-bert-sst2-distilled", sentences=[dialogue_ending])
    # best: defer to later stage
    return future.result()[0]["label"] == "POSITIVE"
@shcheklein shcheklein added bug Something isn't working triage priority-p1 labels Jul 24, 2024
@volkfox
Copy link
Contributor

volkfox commented Jul 24, 2024

need to fix batching ASAP for many similar HuggingFace models to work

@dmpetrov
Copy link
Member

This is related to batch_map #84 I prioritized this one.

@dmpetrov dmpetrov changed the title Inefficency - no "batched" Inference, and some major No "batched" Inference, and some major Jul 25, 2024
@dmpetrov dmpetrov changed the title No "batched" Inference, and some major No "batched" Inference Jul 25, 2024
@shcheklein
Copy link
Member

@dberenbaum can this be closed now?

@dberenbaum
Copy link
Collaborator

See the note from #191:

I think we should keep open #170. That request seems to be specifically about using futures to batch individual results using the existing .map without needing a separate .batch_map(). I think .batch_map() may be both simpler to implement and explain for now, but I think we could come back to the ideas in #170 in the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working priority-p1 triage
Projects
None yet
Development

No branches or pull requests

5 participants