Merge pull request #17 from ijwfly/feature/dalle-3

Dall-E 3 support with API usage calculation Function calling refactoring Readme update for easier installation settings.py comments clarification (thanks @yaroslavyaroslav for idea and contribution)
ijwfly · Dec 12, 2023 · f509c80 · f509c80
2 parents 2161852 + 68833d7
commit f509c80
Show file tree

Hide file tree

Showing 18 changed files with 406 additions and 146 deletions.
diff --git a/README.md b/README.md
@@ -3,26 +3,33 @@
 This GitHub repository contains the implementation of a telegram bot, designed to facilitate seamless interaction with GPT-3.5 and GPT-4, state-of-the-art language models by OpenAI.  
 
 🔥 **GPT-4 Turbo + Vision preview support (gpt-4-1106-preview + gpt-4-vision-preview)**  
+🔥 **DALL-E 3 Image generation support**
 
 🔑 **Key Features**
 
-1. **Dynamic Dialog Management**: The bot automatically manages the context of the conversation, eliminating the need for the user to manually reset the context using the /reset command. You still can reset dialog manually if needed.
-2. **Automatic Context Summarization**: In case the context size exceeds the model's maximum limit, the bot automatically summarizes the context to ensure the continuity of the conversation.
-3. **Functions Support**: You can embed functions within the bot. This allows the GPT to invoke these functions when needed, based on the context. The description of the function and its parameters are extracted from the function's docstring. See the `app/context/function_manager.py` file for more details.
-4. **Sub-dialogue Mechanism**: "Chat Thread Isolation" feature, where if a message is replied to within the bot, only the corresponding message chain is considered as context. This adds an extra level of context control for the users.
-5. **Voice Recognition**: The bot is capable of transcribing voice messages, allowing users to use speech as context or prompt for ChatGPT.
-6. **API Usage Tracking**: The bot includes a function that tracks and provides information about the current month's usage of the OpenAI API. This allows users to monitor and manage their API usage costs.
-7. **Model Support**: The bot supports both gpt-3.5-turbo and gpt-4 models with the capability to switch between them on-the-fly.
-8. **Context Window Size Customization**: The bot provides a feature to customize the maximum context window size. This allows users to set the context size for gpt-3.5-turbo and gpt-4 models individually, enabling more granular control over usage costs. This feature is particularly useful for managing API usage and optimizing the balance between cost and performance.
-9. **Access Control**: The bot includes a feature for access control. Each user is assigned a role (stranger, basic, advanced, admin), and depending on the role, they gain access to the bot. Role management is carried out through a messaging mechanism, with inline buttons sent to the admin for role changes.
+1. **Model Support**: gpt-3.5-turbo, gpt-4, gpt-4-1106-preview, gpt-4-vision-preview.
+2. **Image Generation**: You can ask bot to generate images using DALL-E 3 model, use bot just like official chatgpt app.
+3. **Dynamic Dialog Management**: The bot automatically manages the context of the conversation, eliminating the need for the user to manually reset the context using the /reset command. You still can reset dialog manually if needed.
+4. **Automatic Context Summarization**: In case the context size exceeds the model's maximum limit, the bot automatically summarizes the context to ensure the continuity of the conversation.
+5. **Function calling support**: You can embed functions within the bot. This allows the GPT to invoke these functions when needed, based on the context. `app/context/function_manager.py` file for more details.
+6. **Sub-dialogue Mechanism**: When you reply to a message, the bot only looks at that specific conversation thread, making it easier to manage multiple discussions at once.
+7. **Voice Recognition**: The bot is capable of transcribing voice messages, allowing users to use speech as context or prompt for ChatGPT.
+8. **API Usage Tracking**: The bot includes a function that tracks and provides information about the usage of the OpenAI API. This allows users to monitor and manage their API usage costs.
+9. **Context Window Size Customization**: You can setup maximum context window size for each model in `app/context/context_manager.py` file. When context size exceeds this limit, bot will automatically summarize context.
+10. **Access Control**: The bot includes a feature for access control. Each user is assigned a role (stranger, basic, advanced, admin), and depending on the role, they gain access to the bot. Role management is carried out through a messaging mechanism, with inline buttons sent to the admin for role changes.
 
 🔧 **Installation**
 
 To get this bot up and running, follow these steps:
 
 1. Set the `TELEGRAM_BOT_TOKEN` and `OPENAI_TOKEN` variables in the `settings.py` file.
 2. Set the `IMAGE_PROXY_URL` to your server IP / hostname in the `settings.py` file.
-3. Run `docker-compose up -d` in the root directory of the project.
+3. (optional) Set the `USER_ROLE_MANAGER_CHAT_ID` variable in the `settings.py` file to your telegram id. This is required for access control.
+4. (optional) Set the `ENABLE_USER_ROLE_MANAGER_CHAT` variable in the `settings.py` file to `True`. This is required for access control.
+5. (optional) Set the `USER_ROLE_*` variables in the `settings.py` file to desired roles.
+6. Run `docker-compose up -d` in the root directory of the project.
+
+If you've done optional steps, when you send your first message to the bot, you will get a management message with your telegram id and info. You can use this message to setup your role as admin.
 
 🤖 **Commands**
 ```
@@ -35,4 +42,14 @@ To get this bot up and running, follow these steps:
 /gpt4vision - set model to gpt-4-vision-preview
 /usage_all - show usage for all users
 ```
-These commands will provide additional interaction control for the bot users.
+These commands will provide additional interaction control for the bot users. You can find most settings in settings menu, commands are just shortcuts for them.
+
+
+⚠️ **Troubleshooting**
+
+If you have any issues with the bot, please create an issue in this repository. I will try to help you as soon as possible.  
+
+Here are some typical issues and solutions:  
+- ```Error code: 400 - {'error': {'message': 'Invalid image.', 'type': 'invalid_request_error' ...}}``` - This error usually occurs when openai cannot access the image. Make sure you set up the `IMAGE_PROXY_URL` variable correctly with your server IP / hostname. 
+You can try to open this url in your browser to check if it works. Also you can debug the setup by looking at `chatgpttg.message` table in postgres, there will be message with image url. You can try to open this url in your browser to check if it works.
+- ```Error code: 400 - {'error': {'message': 'Invalid content type. image_url is only supported by certain models.', 'type': 'invalid_request_error' ...}}``` - This error usually occurs when you have image in your context, but current model doesn't support vision. You can try to change model to gpt-4-vision-preview or reset your context with /reset command.
diff --git a/app/bot/message_processor.py b/app/bot/message_processor.py
@@ -83,16 +83,23 @@ async def handle_gpt_response(self, chat_gpt_manager, context_manager, response_
         if response_dialog_message.function_call:
             function_name = response_dialog_message.function_call.name
             function_args = response_dialog_message.function_call.arguments
-            function_response_raw = await function_storage.run_function(function_name, function_args)
+            function_class = function_storage.get_function_class(function_name)
+            function = function_class(self.user, self.db, context_manager, self.message)
+            function_response_raw = await function.run_str_args(function_args)
 
-            function_response = DialogUtils.prepare_function_response(function_name, function_response_raw)
             function_response_message_id = -1
             if self.user.function_call_verbose:
                 with suppress(BadRequest):
                     # TODO: split function call message if it's too long
-                    function_response_text = f'Function call: {function_name}({function_args})\n\n{function_response_raw}'
+                    function_response_text = f'Function call: {function_name}({function_args})\n\nResponse: {function_response_raw}'
                     function_response_tg_message = await send_telegram_message(self.message, function_response_text)
                     function_response_message_id = function_response_tg_message.message_id
+
+            if function_response_raw is None:
+                # None means there is no need to pass response to GPT or add it to context
+                return
+
+            function_response = DialogUtils.prepare_function_response(function_name, function_response_raw)
             context_dialog_messages = await context_manager.add_message(function_response, function_response_message_id)
             response_generator = await chat_gpt_manager.send_user_message(self.user, context_dialog_messages, is_cancelled)
 

diff --git a/app/bot/scheduled_tasks.py b/app/bot/scheduled_tasks.py
@@ -4,7 +4,7 @@
 import logging
 
 import settings
-from app.bot.utils import get_completion_usage_response_all_users
+from app.bot.utils import get_usage_response_all_users
 
 FAIL_LIMIT = 5
 WAIT_BETWEEN_RETRIES = 5
@@ -66,7 +66,7 @@ async def get_monthly_usage():
 
         previous_month = datetime.datetime.now(settings.POSTGRES_TIMEZONE).replace(day=1) - datetime.timedelta(days=1)
         previous_month = previous_month.date()
-        result = await get_completion_usage_response_all_users(db, previous_month)
+        result = await get_usage_response_all_users(db, previous_month)
         await bot.send_message(
             settings.USER_ROLE_MANAGER_CHAT_ID, result
         )

diff --git a/app/bot/settings_menu.py b/app/bot/settings_menu.py
@@ -97,8 +97,9 @@ def __init__(self, bot: Bot, dispatcher: Dispatcher, db: DB):
             'current_model': VisibleOptionsSetting('current_model', GPT_MODELS_OPTIONS),
             'current_model_preview': VisibleOptionsSetting('current_model', GPT_MODELS_OPTIONS_PREVIEW),
             'gpt_mode': ChoiceSetting('GPT mode', 'gpt_mode', list(settings.gpt_mode.keys())),
-            'voice_as_prompt': OnOffSetting('Voice as prompt', 'voice_as_prompt'),
             'use_functions': OnOffSetting('Use functions', 'use_functions'),
+            'image_generation': OnOffSetting('Image generation', 'image_generation'),
+            'voice_as_prompt': OnOffSetting('Voice as prompt', 'voice_as_prompt'),
             'function_call_verbose': OnOffSetting('Verbose function calls', 'function_call_verbose'),
             'streaming_answers': OnOffSetting('Streaming answers', 'streaming_answers'),
             # 'auto_summarize': OnOffSetting('Auto summarize', 'auto_summarize'),
@@ -107,6 +108,7 @@ def __init__(self, bot: Bot, dispatcher: Dispatcher, db: DB):
         self.minimum_required_roles = {
             'current_model': settings.USER_ROLE_CHOOSE_MODEL,
             'current_model_preview': settings.USER_ROLE_CHOOSE_MODEL,
+            'image_generation': settings.USER_ROLE_IMAGE_GENERATION,
             'streaming_answers': settings.USER_ROLE_STREAMING_ANSWERS,
         }
         self.dispatcher.register_callback_query_handler(self.process_callback, lambda c: SETTINGS_PREFIX in c.data)

diff --git a/app/bot/telegram_bot.py b/app/bot/telegram_bot.py
@@ -12,9 +12,10 @@
 from app.bot.settings_menu import Settings
 from app.bot.user_middleware import UserMiddleware
 from app.bot.user_role_manager import UserRoleManager
-from app.bot.utils import (get_hide_button, get_completion_usage_response_all_users, TypingWorker)
+from app.bot.utils import (get_hide_button, get_usage_response_all_users, TypingWorker)
 from app.bot.utils import send_telegram_message
-from app.openai_helpers.utils import calculate_completion_usage_price, calculate_whisper_usage_price, OpenAIAsync
+from app.openai_helpers.utils import (calculate_completion_usage_price, calculate_whisper_usage_price, OpenAIAsync,
+                                      calculate_image_generation_usage_price)
 from app.storage.db import DBFactory, User
 from app.storage.user_role import check_access_conditions, UserRole
 from app.openai_helpers.chatgpt import GptModel
@@ -113,6 +114,15 @@ async def get_usage(self, message: types.Message, user: User):
         completion_usages = await self.db.get_user_current_month_completion_usage(user.id)
         result = []
         total = whisper_price
+
+        image_generation_usage = await self.db.get_user_current_month_image_generation_usage(user.id)
+        for usage in image_generation_usage:
+            price = calculate_image_generation_usage_price(
+                usage['model'], usage['resolution'], usage['usage_count']
+            )
+            total += price
+            result.append(f'*{usage["model"]}:* {usage["usage_count"]} images, {usage["resolution"]} resolution, ${price}')
+
         for usage in completion_usages:
             price = calculate_completion_usage_price(usage.prompt_tokens, usage.completion_tokens, usage.model)
             total += price
@@ -138,7 +148,7 @@ async def get_usage_all_users(self, message: types.Message, user: User):
             month = datetime.datetime.now(settings.POSTGRES_TIMEZONE) + relativedelta(months=month_offset)
             month = month.date()
 
-        result = await get_completion_usage_response_all_users(self.db, month)
+        result = await get_usage_response_all_users(self.db, month)
         await send_telegram_message(
             message, result, reply_markup=get_hide_button()
         )

diff --git a/app/bot/utils.py b/app/bot/utils.py
@@ -8,7 +8,8 @@
 from aiogram import types
 from aiogram.utils.exceptions import CantParseEntities
 
-from app.openai_helpers.utils import calculate_completion_usage_price, calculate_whisper_usage_price
+from app.openai_helpers.utils import (calculate_completion_usage_price, calculate_whisper_usage_price,
+                                      calculate_image_generation_usage_price)
 
 TYPING_TIMEOUT = 180
 TYPING_DELAY = 2
@@ -145,6 +146,15 @@ async def edit_telegram_message(message: types.Message, text: str, message_id, p
         return await message.bot.edit_message_text(text, chat_id, message_id)
 
 
+async def send_photo(message: types.Message, photo_bytes, caption=None, reply_markup=None):
+    if message.reply_to_message is None:
+        send_message = message.answer_photo
+    else:
+        send_message = message.reply_photo
+
+    return await send_message(photo_bytes, caption=caption, reply_markup=reply_markup)
+
+
 def merge_dicts(dict_1, dict_2):
     """
     This function merge two dicts containing strings using plus operator on each key
@@ -161,16 +171,21 @@ def merge_dicts(dict_1, dict_2):
     return result
 
 
-async def get_completion_usage_response_all_users(db, month_date: date = None) -> str:
+async def get_usage_response_all_users(db, month_date: date = None) -> str:
     completion_usages = await db.get_all_users_completion_usage(month_date)
     whisper_usages = await db.get_all_users_whisper_usage(month_date)
+    image_generation_usages = await db.get_all_users_image_generation_usage(month_date)
     result = []
     for name, user_completion_usages in completion_usages.items():
         user_usage_price = 0
         for usage in user_completion_usages:
             user_usage_price += calculate_completion_usage_price(
                 usage.prompt_tokens, usage.completion_tokens, usage.model
             )
+        for usage in image_generation_usages.get(name, []):
+            user_usage_price += calculate_image_generation_usage_price(
+                usage['model'], usage['resolution'], usage['usage_count']
+            )
         user_whisper_usage = whisper_usages.get(name, 0)
         user_usage_price += calculate_whisper_usage_price(user_whisper_usage)
         result.append((name, user_usage_price))

diff --git a/app/context/function_manager.py b/app/context/function_manager.py
@@ -1,9 +1,12 @@
 from typing import Optional
 
 import settings
-from app.functions.wolframalpha import query_wolframalpha
+from app.functions.dalle_3 import GenerateImageDalle3
+from app.functions.wolframalpha import QueryWolframAlpha
 from app.openai_helpers.function_storage import FunctionStorage
 from app.storage.db import DB, User
+from app.storage.user_role import check_access_conditions
+from settings import USER_ROLE_IMAGE_GENERATION
 
 
 class FunctionManager:
@@ -17,7 +20,15 @@ def get_static_functions():
         functions = []
 
         if settings.ENABLE_WOLFRAMALPHA:
-            functions.append(query_wolframalpha)
+            functions.append(QueryWolframAlpha)
+
+        return functions
+
+    def get_conditional_functions(self):
+        functions = []
+
+        if self.user.image_generation and check_access_conditions(USER_ROLE_IMAGE_GENERATION, self.user.role):
+            functions.append(GenerateImageDalle3)
 
         return functions
 
@@ -26,6 +37,8 @@ async def process_functions(self) -> Optional[FunctionStorage]:
             return None
 
         functions = self.get_static_functions()
+        functions += self.get_conditional_functions()
+
         if not functions:
             return None
 

diff --git a/app/functions/base.py b/app/functions/base.py
@@ -0,0 +1,59 @@
+from typing import Optional
+
+import pydantic
+from aiogram.types import Message
+from abc import ABC, abstractmethod
+
+
+class OpenAIFunctionParams(pydantic.BaseModel):
+    pass
+
+
+class OpenAIFunction(ABC):
+    PARAMS_SCHEMA = OpenAIFunctionParams
+
+    def __init__(self, user, db, context_manager, message: Message):
+        self.user = user
+        self.db = db
+        self.context_manager = context_manager
+        self.message = message
+
+    @abstractmethod
+    async def run(self, params: OpenAIFunctionParams) -> Optional[str]:
+        pass
+
+    async def run_dict_args(self, params: dict):
+        try:
+            params = self.PARAMS_SCHEMA(**params)
+        except Exception as e:
+            return f"Parsing error: {e}"
+        return await self.run(params)
+
+    async def run_str_args(self, params: str):
+        try:
+            params = self.PARAMS_SCHEMA.parse_raw(params)
+        except Exception as e:
+            return f"Parsing error: {e}"
+        return await self.run(params)
+
+    @classmethod
+    @abstractmethod
+    def get_description(cls) -> str:
+        pass
+
+    @classmethod
+    def get_name(cls) -> str:
+        return cls.__name__
+
+    @classmethod
+    def get_params_schema(cls) -> dict:
+        params_schema = cls.PARAMS_SCHEMA.schema()
+        return params_schema
+
+    @classmethod
+    def get_system_prompt_addition(cls) -> Optional[str]:
+        """
+        Returns text to add to system prompt when this function is added to context. You can use this to add
+        additional instructions about how to use this function.
+        """
+        return None