-
Notifications
You must be signed in to change notification settings - Fork 25.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
google/siglip-so400m-patch14-384 inference output mismatch with pipeline output #30951
Comments
I found that the pipeline uses a prompt, which adds "This is a photo of " before every candidate label. For example, when we pass in the label "2 cats", the pipeline converts it to "This is is a photo of 2 cats". We can modify the code to match this behaviour with the manual approach, and we'll get the same results: from PIL import Image
import requests
from transformers import AutoProcessor, AutoModel
import torch
model = AutoModel.from_pretrained("google/siglip-so400m-patch14-384")
processor = AutoProcessor.from_pretrained("google/siglip-so400m-patch14-384")
url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)
labels = ["2 cats", "this is a photo of 2 dogs"]
texts = [f"this is a photo of {label}" for label in labels]
inputs = processor(text=texts, images=image, padding="max_length", return_tensors="pt")
with torch.no_grad():
outputs = model(**inputs)
logits_per_image = outputs.logits_per_image
probs = torch.sigmoid(logits_per_image) # these are the probabilities
print(f"{probs[0][0]:.2%} that '{texts[0]}'") # 50.89% that 'this is a photo of 2 cats |
Wow that requires some docs... |
Hi @aliencaocao, thanks for raising this issue! This is in the docs how it isn't super obvious. If you'd like to open a PR to update the example for the pipeline to highlight this I'd be very happy to review. @NielsRogge Could you fix the example on the siglip page, as you'll have permissions to open and merge the PR there? |
Sure i will do it soon, thanks for pointing that out |
PR made |
System Info
transformers
version: 4.41.0- distributed_type: NO
- mixed_precision: fp16
- use_cpu: False
- debug: False
- num_processes: 1
- machine_rank: 0
- num_machines: 1
- gpu_ids: all
- rdzv_backend: static
- same_network: True
- main_training_function: main
- enable_cpu_affinity: False
- downcast_bf16: no
- tpu_use_cluster: False
- tpu_use_sudo: False
- dynamo_config: {'dynamo_backend': 'INDUCTOR'}
Who can help?
@amyeroberts
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
Using sample code from https://huggingface.co/google/siglip-so400m-patch14-384:
The output mismatches with the pipeline approach:
and the difference is massive.
Pipeline approach seem to give the right results, which also align with inference API.
Expected behavior
Correct result for the manual approach
The text was updated successfully, but these errors were encountered: