Vision models support (WIP) #457

gilcu3 · 2023-11-08T16:59:19Z

This PR also depends on #453. It adds support for the current vision model from openai. Feel free to try and let me know if anything breaks.

Alpha162 · 2023-11-09T15:23:54Z

I've been having good success with it so-far, except that unlike the vision experience on chat.openai.com, it doesn't seem to have persistence when sending images. So if I asked it to describe a photo with say a car in it, the response is exactly as expected, then a follow-on question like 'what colour is the car' fails with it not knowing what I'm referencing.

I have the bot in a few group chats, and although the trigger is working so the bot doesn't respond to everything, in order for it to respond to images in the chat I've set the 'IGNORE_GROUP_VISION=false'. It still honours the trigger for standard text queries, but it responds to every image sent in the chat without a trigger.

Amazing work getting it to this state so quickly, thank you :)

gilcu3 · 2023-11-09T16:53:09Z

@Alpha162 could you try again? I tried fixing both issues with the previous commits. Thanks for reporting

Alpha162 · 2023-11-12T15:44:20Z

@Alpha162 could you try again? I tried fixing both issues with the previous commits. Thanks for reporting

Knocked both issues out of the park, no issues 👍

rokipet · 2023-11-12T23:05:27Z

im getting this error how to fix it ? 2023-11-12 18:02:12,913 - root - ERROR - OpenAIHelper.interpret_image() got multiple values for argument 'prompt'
Traceback (most recent call last):
File "C:\Users\Administrator\Downloads\Bot Updated\chatgpt-telegram-bot-086f8447376b3faa27631bfe13b654fd54223757\bot\telegram_bot.py", line 514, in _execute
interpretation, tokens = await self.openai.interpret_image(chat_id, temp_file_png, prompt=prompt)

gilcu3 · 2023-11-12T23:34:16Z

im getting this error how to fix it ? 2023-11-12 18:02:12,913 - root - ERROR - OpenAIHelper.interpret_image() got multiple values for argument 'prompt'
Traceback (most recent call last):
File "C:\Users\Administrator\Downloads\Bot Updated\chatgpt-telegram-bot-086f8447376b3faa27631bfe13b654fd54223757\bot\telegram_bot.py", line 514, in _execute
interpretation, tokens = await self.openai.interpret_image(chat_id, temp_file_png, prompt=prompt)

The only way I can think of this error could happen is if the code calling that function has been changed somehow. How did you get there? Are you using an unmodified version of this branch?

rokipet · 2023-11-13T00:20:10Z

i tried to combine both codes tss and vision and is working only error when i upload a image is that one

gilcu3 · 2023-11-13T07:00:17Z

i tried to combine both codes tss and vision and is working only error when i upload a image is that one

If that's the case, try using the develop branch in my fork, it has everything integrated and works for me.

rokipet · 2023-11-14T02:32:46Z

Would you be able to add this ?

https://platform.openai.com/docs/assistants/how-it-works
And add / all the kind of model right away ?

gilcu3 · 2023-11-14T08:04:07Z

Would you be able to add this ?

https://platform.openai.com/docs/assistants/how-it-works

Adding the assistants api is certainly a good feature. Still, that would be for another PR. For now I am waiting for @n3d1117 to handle all new PRs first.

And add / all the kind of model right away ?

What do you mean by that?

SkySlider · 2023-11-14T14:42:04Z

I have an issue when providing custom instruction with the image (in one message)

 - root - ERROR - Can't parse entities: can't find end of the entity starting at byte offset 1045
Traceback (most recent call last):
  File "/root/chatgpt-telegram-bot/bot/telegram_bot.py", line 532, in _execute
    await update.effective_message.reply_text(
  File "/root/chatgpt-telegram-bot/venv/lib/python3.11/site-packages/telegram/_message.py", line 1074, in reply_text
    return await self.get_bot().send_message(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/chatgpt-telegram-bot/venv/lib/python3.11/site-packages/telegram/ext/_extbot.py", line 2633, in send_messag
e
    return await super().send_message(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/chatgpt-telegram-bot/venv/lib/python3.11/site-packages/telegram/_bot.py", line 381, in decorator
    result = await func(self, *args, **kwargs)  # skipcq: PYL-E1102
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/chatgpt-telegram-bot/venv/lib/python3.11/site-packages/telegram/_bot.py", line 807, in send_message
    return await self._send_message(
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/chatgpt-telegram-bot/venv/lib/python3.11/site-packages/telegram/ext/_extbot.py", line 507, in _send_message
    result = await super()._send_message(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/chatgpt-telegram-bot/venv/lib/python3.11/site-packages/telegram/_bot.py", line 559, in _send_message
    result = await self._post(
             ^^^^^^^^^^^^^^^^^
  File "/root/chatgpt-telegram-bot/venv/lib/python3.11/site-packages/telegram/_bot.py", line 469, in _post
    return await self._do_post(
           ^^^^^^^^^^^^^^^^^^^^
  File "/root/chatgpt-telegram-bot/venv/lib/python3.11/site-packages/telegram/ext/_extbot.py", line 325, in _do_post
    return await super()._do_post(
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/chatgpt-telegram-bot/venv/lib/python3.11/site-packages/telegram/_bot.py", line 497, in _do_post
    return await request.post(
           ^^^^^^^^^^^^^^^^^^^
  File "/root/chatgpt-telegram-bot/venv/lib/python3.11/site-packages/telegram/request/_baserequest.py", line 168, in post
    result = await self._request_wrapper(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/chatgpt-telegram-bot/venv/lib/python3.11/site-packages/telegram/request/_baserequest.py", line 328, in _request_wrapper
    raise BadRequest(message)
telegram.error.BadRequest: Can't parse entities: can't find end of the entity starting at byte offset 1045

There is no issues when sending an image without prompt/description. What I was trying to do is to create HTML code based on the layout pictured.

gilcu3 · 2023-11-14T14:52:07Z

There is no issues when sending an image without prompt/description. What I was trying to do is to create HTML code based on the layout pictured.

Luckily you mentioned the construction of HTML code, and I was able to reproduce the issue, I guess it is related to functions. I will try and fix it and get back here.

gilcu3 · 2023-11-14T15:18:17Z

@SkySlider The error should be fixed now. The culprit was that if the response from chatgpt is bigger than vision_max_tokens then the message is cut, and maybe the Markdown text was left in the middle, in which case telegram fails to parse it. I solved it by following the behavior of the bot elsewhere: if a message fails try to send it again without formatting.

SkySlider · 2023-11-14T15:58:50Z

@SkySlider The error should be fixed now. The culprit was that if the response from chatgpt is bigger than vision_max_tokens then the message is cut, and maybe the Markdown text was left in the middle, in which case telegram fails to parse it. I solved it by following the behavior of the bot elsewhere: if a message fails try to send it again without formatting.

All good now, appreciate it!

n3d1117 · 2023-11-16T17:29:09Z

This looks great, thanks @gilcu3! I will be testing #453, #456, #457 and #462 as soon as possibile!

n3d1117 · 2023-11-18T16:39:20Z

Hi @gilcu3, #453 and #456 have been merged! 🎉
This one and #462 require some conflicts to be resolved in order to align them with the main branch.
I will take a look at them tomorrow (unless you feel like resolving them first). Thanks again!

gilcu3 · 2023-11-18T17:07:02Z

Hi @gilcu3, #453 and #456 have been merged! 🎉 This one and #462 require some conflicts to be resolved in order to align them with the main branch. I will take a look at them tomorrow (unless you feel like resolving them first). Thanks again!

Thanks, I already merged the changes here. The only change made is that for interpreting images now it does not take the user configured model, but the only model that can currently can do this.

n3d1117 · 2023-11-19T14:30:29Z

Hi @gilcu3, I took some time to test this and really liking it so far, thanks!
One question, could we support the auto option for fidelity image understanding and set it as default? I see you only added low and high, I'm guessing due to difficulties in counting tokens with auto option?

gilcu3 · 2023-11-19T14:42:07Z

Hi @gilcu3, I took some time to test this and really liking it so far, thanks! One question, could we support the auto option for fidelity image understanding and set it as default? I see you only added low and high, I'm guessing due to difficulties in counting tokens with auto option?

Hi, I really don't remember seeing that parameter before. I guess if it is mentioned in the response, we could do the token counting easily. I will test and see if that's the case.
PS: Checked, I don't see which detail parameter was used in the response. One thing I did notice though is that the number of tokens is in the response object, so I think we could do something much better. And the same could be done for the rest of the bot. I think this was added recently.

Things that are not yet supported, but could very well be: streaming the response, adding the image itself to the conversation history (this seems to be what OpenAI believes to be appropriate).

k3it · 2023-11-19T16:54:32Z

fwiw this pull is working beautifully for me. I really like that the image comment is part of the prompt. Good work!

gilcu3 · 2023-11-19T20:07:59Z

@n3d1117 after the last two commits, I am no longer doing the token count by myself, therefore the default is now auto.

One thing that we may discuss later is the following: currently the image is not added to the history. This makes it possible use other models that do support functions, and use the vision model just for interpreting a single image. But, then no follow up questions about the images are possible, which probably is a nice feature in the vision model. The other variant is to use the model as specified by the user for everything, and simply let the user know if it tries to do something the current model cannot handle.

n3d1117 · 2023-11-20T22:32:40Z

Awesome @gilcu3, thanks!

Re: the image in the history, as far as I understand there are three paths:

Only allow image processing and follow up questions if the vision model is specified in the config (e.g. OPENAI_MODEL=gpt-4-vision-preview). The downside is that this model will also be used for everything else
(current) Keep using the model defined by the user, but allow one-time image processing using vision model. Follow up questions about the image will not work
Once an image has been received, add it to the history and from then on keep using vision model, until conversation expires/resets

What do you think would be best? I'm slightly in favor of either 3 or a configurable option to choose between 2 and 3.
Wondering if @k3it and @AlexHTW have any input on this?

k3it · 2023-11-20T22:56:55Z

The current setup seems to work quite good, since it preserves the context of the image interpretation. this allows follow ups to be handled by other models and plugins. in some cases i cut and pasted the same image with a different prompt as a comment, if for some reason i wasnt happy with the original interpretation.

a more general approach that could work for the vision and other models:

create a chat group with the bot
enable telegram group topics
keep separate context within each topic (including possibly image history)
a command or keyword to generate a response as a new topic. this would create a brand new topic tab and a new context

the topics support would probably require a lot of work to implement though
just my $0.02 :)

gianlucaalfa · 2023-11-21T08:57:05Z

Once an image has been received, add it to the history and from then on keep using vision model, until conversation expires/resets

Hello! What about something like solution 3, but also with a "preference" of the model. Currently only one model support vision, but maybe in the future there will be more. So it is necessary something to set the "preferred vision model" in the .env file.

But I see another issue. "Once and image is received", it switches to the other model. So does it mean that it loose the previous history after the model switch? Or is this solved by "add it to the history" like you said?

Grazie :)

gilcu3 · 2023-11-21T12:14:40Z

@k3it the topics support would probably require a lot of work to implement though just my $0.02 :)

Interesting, I had not heard about topic support. But yeah probably it is out of the scope of this PR, still good to keep in mind.

@n3d1117 What do you think would be best? I'm slightly in favor of either 3 or a configurable option to choose between 2 and 3.

I can implement that, I only need the option name and explanation to put on the README. For me the hardest part is how to explain these options to the users, and also which one should be default (I am inclined by the current option clearly :) )

Jipok · 2023-11-23T10:20:41Z

Do I understand correctly that if a bot responds to a message without a picture, then it does not “see” the previously sent images?

gilcu3 · 2023-11-23T10:24:30Z

Do I understand correctly that if a bot responds to a message without a picture, then it does not “see” the previously sent images?

Yes, unless we implement options 1 or 3. The problem is that the image itself is not currently added to the history, as that cannot be used by non-vision models (which are the ones that do support functions).

n3d1117 · 2023-11-24T13:25:34Z

I can implement that, I only need the option name and explanation to put on the README. For me the hardest part is how to explain these options to the users, and also which one should be default (I am inclined by the current option clearly :) )

Hi @gilcu3, what about ENABLE_VISION_FOLLOWUP_QUESTIONS to switch between option 2 and 3? My personal opinion is that it should be true by default 😃 but feel free to implement it your way

While we're at it, should we maybe make the vision model configurable, in case OpenAI adds more in the future? i.e. something like VISION_MODEL=gpt-4-vision-preview instead of hardcoding it, as @gianlucaalfa was suggesting.

gilcu3 · 2023-11-25T13:32:38Z

@n3d1117 it was a bit harder than I expected, but I think it is done. One thing though is that we don't really have a good way to do a summary when there is an image in the history, and chatgpt probably is not doing the best job... So we could remove the image just for that case, or leave as it is, hoping it will be possible in the future :) Feel free to test it and let me know if it needs any fix.

iamjackg · 2023-12-07T21:39:07Z

Anything left to do here, or are we just waiting for @n3d1117 to have a second to review this?

n3d1117 · 2023-12-10T23:01:50Z

Looks great to me, thanks again @gilcu3 and sorry for the long wait!

gilcu3 added 2 commits November 8, 2023 17:56

added support for vision

9e7d870

vision model do not seem to support function calling

086f844

gilcu3 added 2 commits November 9, 2023 17:19

respect keyword in group, vision

4c29d7b

vision support for history (not including the image itself)

f69f9e5

change to use image files in memory

b2f2114

fix bug when telegram fails to parse message

1ec7ae1

Merge branch 'main' into vision-support

28ec9f9

Merge branch 'main' into vision-support

ac9e8e2

gilcu3 added 2 commits November 19, 2023 20:50

added proper token count

7fa2cdb

use auto by default for VISION_DETAIL

7329cd7

added stream support

6de497d

n3d1117 mentioned this pull request Nov 24, 2023

Image support & env edit #438

Closed

added ENABLE_VISION_FOLLOW_UP_QUESTIONS support

237705c

n3d1117 merged commit 05a7b5b into n3d1117:main Dec 10, 2023

n3d1117 mentioned this pull request Dec 10, 2023

Add support for GPT4's vision #477

Closed

gilcu3 deleted the vision-support branch December 11, 2023 14:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Vision models support (WIP) #457

Vision models support (WIP) #457

gilcu3 commented Nov 8, 2023

Alpha162 commented Nov 9, 2023

gilcu3 commented Nov 9, 2023 •

edited

Alpha162 commented Nov 12, 2023

rokipet commented Nov 12, 2023

gilcu3 commented Nov 12, 2023

rokipet commented Nov 13, 2023

gilcu3 commented Nov 13, 2023

rokipet commented Nov 14, 2023

gilcu3 commented Nov 14, 2023

SkySlider commented Nov 14, 2023 •

edited

gilcu3 commented Nov 14, 2023

gilcu3 commented Nov 14, 2023

SkySlider commented Nov 14, 2023

n3d1117 commented Nov 16, 2023 •

edited

n3d1117 commented Nov 18, 2023

gilcu3 commented Nov 18, 2023 •

edited

n3d1117 commented Nov 19, 2023

gilcu3 commented Nov 19, 2023 •

edited

k3it commented Nov 19, 2023 •

edited

gilcu3 commented Nov 19, 2023

n3d1117 commented Nov 20, 2023

k3it commented Nov 20, 2023

gianlucaalfa commented Nov 21, 2023 •

edited

gilcu3 commented Nov 21, 2023 •

edited

Jipok commented Nov 23, 2023

gilcu3 commented Nov 23, 2023

n3d1117 commented Nov 24, 2023

gilcu3 commented Nov 25, 2023

iamjackg commented Dec 7, 2023

n3d1117 commented Dec 10, 2023

Vision models support (WIP) #457

Vision models support (WIP) #457

Conversation

gilcu3 commented Nov 8, 2023

Alpha162 commented Nov 9, 2023

gilcu3 commented Nov 9, 2023 • edited

Alpha162 commented Nov 12, 2023

rokipet commented Nov 12, 2023

gilcu3 commented Nov 12, 2023

rokipet commented Nov 13, 2023

gilcu3 commented Nov 13, 2023

rokipet commented Nov 14, 2023

gilcu3 commented Nov 14, 2023

SkySlider commented Nov 14, 2023 • edited

gilcu3 commented Nov 14, 2023

gilcu3 commented Nov 14, 2023

SkySlider commented Nov 14, 2023

n3d1117 commented Nov 16, 2023 • edited

n3d1117 commented Nov 18, 2023

gilcu3 commented Nov 18, 2023 • edited

n3d1117 commented Nov 19, 2023

gilcu3 commented Nov 19, 2023 • edited

k3it commented Nov 19, 2023 • edited

gilcu3 commented Nov 19, 2023

n3d1117 commented Nov 20, 2023

k3it commented Nov 20, 2023

gianlucaalfa commented Nov 21, 2023 • edited

gilcu3 commented Nov 21, 2023 • edited

Jipok commented Nov 23, 2023

gilcu3 commented Nov 23, 2023

n3d1117 commented Nov 24, 2023

gilcu3 commented Nov 25, 2023

iamjackg commented Dec 7, 2023

n3d1117 commented Dec 10, 2023

gilcu3 commented Nov 9, 2023 •

edited

SkySlider commented Nov 14, 2023 •

edited

n3d1117 commented Nov 16, 2023 •

edited

gilcu3 commented Nov 18, 2023 •

edited

gilcu3 commented Nov 19, 2023 •

edited

k3it commented Nov 19, 2023 •

edited

gianlucaalfa commented Nov 21, 2023 •

edited

gilcu3 commented Nov 21, 2023 •

edited