You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
hello, in the inference.py you offered in #14, I see the multi-modal input tokens for LLM, it includes bbox token, but I can't find where you replace the bbox token or you use the image feature which got from clip and interpolate. Can you explain it for me? thank you.
The text was updated successfully, but these errors were encountered:
hello, in the inference.py you offered in #14, I see the multi-modal input tokens for LLM, it includes bbox token, but I can't find where you replace the bbox token or you use the image feature which got from clip and interpolate. Can you explain it for me? thank you.
The text was updated successfully, but these errors were encountered: