[Issue]: Autogen with vision models like GPT-4o creates HUGE spike in usage and bill

### Describe the issue

I created three agents to read document images, which are black and white financial documents, and are not very huge in terms of resolution (around 1k x 2k or smaller). The model I used for all of them is GPT-4o. 

The flow it creates is mostly linear, i.e. from agent 1 -> agent 2 -> final agent to summarize as output.
However, for only 400 images I uploaded, it already costs me like USD200 +, and the context tokens used are about 28+ million tokens!

I wonder if this is because Autogen inserts image bits into the prompt itself? If so, shouldn't the best way is to upload the images to some place and then just insert the image path link to the prompts?

### Steps to reproduce

Step1 - The agents are constructed as follows:
```

image_agent = MultimodalConversableAgent(
    name="image-content-extracter",
    max_consecutive_auto_reply=10,
    llm_config={"config_list": config_list_gpt4, "temperature": 0.05, "max_tokens": 1024, "cache_seed": None},
    human_input_mode="NEVER"

)

agent_1 = MultimodalConversableAgent(
    name="agent_1 ",
    system_message='''You are a helpful agent.
                    Look at the image and compare the extraction results against those extracted by image-content-extracter, and then correct them if any mistakes found.
                                       ''',
    max_consecutive_auto_reply=4,
    llm_config={"config_list": config_list_gpt4, "temperature": 0, "max_tokens": 1024, "cache_seed": None},
    human_input_mode="NEVER"
)

agent_2 = MultimodalConversableAgent(
            name="agent_2 ",
            system_message='''You are agent_1's assistant.  you put the finalized results in a JSON format.
                                              ''',
            max_consecutive_auto_reply=2,
            llm_config={"config_list": config_list_gpt4, "temperature": 0, "max_tokens": 800, "response_format":{ "type": "json_object" }, "cache_seed": None},
            human_input_mode="NEVER"
        )

coder = autogen.AssistantAgent(
    name="coding_assistant",
    system_message="Helpful coding assistant.",
    llm_config={"config_list": config_list_gpt4, "temperature": 0.1, "max_tokens": 2048},
 )

groupchat = autogen.GroupChat(agents=[user_proxy, image_agent_llava, bookkeeper, bookkeeper_assistant], messages=[], max_round=5)
group_chat_manager = autogen.GroupChatManager(groupchat=groupchat, llm_config=gpt4_llm_config)

```

Step -2
```
user_prompt = "<this is a detailed prompt of about 1700 tokens>"

session = user_proxy.initiate_chat(
    group_chat_manager,
    message= user_prompt
)
```

Step-3

execute the above multiagent model, with about 500 images. Each is a standard invoice image.

### Screenshots and logs

![Screenshot 2024-05-28 211431](https://github.com/microsoft/autogen/assets/140989990/89483922-f346-4348-b26e-39c210d28edb)
single day usage of tokens. But only about 400 images uploaded.

### Additional Information

the right way to send image for OpenAI api is not sending string but this method:
```
 {
        "role": "user",
        "content": [
            {"type": "text", "text": 'How many bananas?'},
            {
                "type": "image_url",
                "image_url": {"url": f"data:image/x-png;base64,{base64_image}", "detail": "low"},
            },
        ],
    }
```

please make the changes.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Issue]: Autogen with vision models like GPT-4o creates HUGE spike in usage and bill #2827

Describe the issue

Steps to reproduce

Screenshots and logs

Additional Information

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Issue]: Autogen with vision models like GPT-4o creates HUGE spike in usage and bill #2827

Description

Describe the issue

Steps to reproduce

Screenshots and logs

Additional Information

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions