Skip to content
This repository has been archived by the owner on Nov 3, 2023. It is now read-only.

Multiple users, multiple conversations, multiple contexts #2942

Closed
AdamArutyunov opened this issue Aug 5, 2020 · 17 comments
Closed

Multiple users, multiple conversations, multiple contexts #2942

AdamArutyunov opened this issue Aug 5, 2020 · 17 comments

Comments

@AdamArutyunov
Copy link

So!

My task was to develop an interface (API), that allows many users to talk with ParlAI (Blender bot) and set unique persona for every user (so, name of the bot talking to user 1 could be Sarah and bot talking with user 2 could be Jessica). I did it, but there is problem with perfomance and I do not know whether this solution is good from the point of project architecture. That is how I did it.

1. Entry point
I decided to use websockets chat service as entry point to bot. But I ran into the problem: every new connection new WebsocketAgent creates with new random .sid, and bot thinks that is other person and sends him standard message ("welcome, type begin..."). So I did small hack: I am passing user ID into the message and change Agent's .sid every time (and do not generate random .sid in init). That is how it looks like (socket.py):

def open(self):  
    if self.sid not in self.subs.values():  
        self.set_nodelay(True)  # do not generating random uuid!
def on_message(self, message_text):
    logging.info('websocket message from client: {}'.format(message_text))
    message = json.loads(message_text)

    self.sid = message.get('user_id')
    self.subs[self.sid] = self
    print(f"Current subscribers:", self.subs)
    print("Changed sid to " + self.sid)
    message = {
        'text': message.get('text', ''),
        'payload': message.get('payload'),
        'sender': {'id': self.sid},
        'recipient': {'id': 0},
    }
    self.message_callback(message)

This hack works properly. If you know better easy way to do this, please, share with me.

2. Config
I use InteractiveWorld from parlai/tasks/blended_skill_talk. Moreower, I copy-pasted MessengerOverworld and MessengerBotChatOnboardWorld from parlai/chat_service/tasks/chatbot/worlds.py. So, my websocket config looks like:

tasks:
  default:
    onboard_world: MessengerBotChatOnboardWorld
    task_world: InteractiveWorld
    timeout: 18000
    agents_required: 0
task_name: chatbot
world_module: parlai.tasks.blended_skill_talk.worlds
overworld: MessengerOverworld
max_workers: 3000
opt:
  debug: True
  models:
    blender_90M:
      model: transformer/generator
      model_file: zoo:blender/blender_90M/model
      interactive_mode: True
      no_cuda: True
  include_personas: False
  safety: None
  model: transformer/generator
  model_file: zoo:blender/blender_90M/model
  interactive_mode: True
  no_cuda: True
  datatype: valid
  display_partner_persona: True
additional_args:
  page_id: 1 # Configure Your Own Page

(There is some confusion at the end because world creator do not recognite model and model_file in blender_90M suboption, so I pasted it direct into opt)

3. Worlds
I need to set all contexts manually, so in ParlAI/parlai/tasks/blended_skill_talk/worlds.py, in _load_personas I inserted line before return:

# No contexts, because user will set them manually
contexts = [['', '']]
return contexts

To use InteractiveWorld with websockets I needed to implement static generate_wold function:

@staticmethod
    def generate_world(opt, agents):
        agent = create_agent(opt, requireModelExists=True)  # bot agent
        agents.append(agent)
        if opt['models'] is None:
            raise RuntimeError("Model must be specified")
        return InteractiveWorld(
            opt,
            agents
        )

Moreower, I needed to change parley function in ParlAI/parlai/tasks/interactive/worlds.py. First of all, creating .first_time bool parameter and do this:

if self.first_time:
    agents[0].observe(
        {
            'id': 'World',
            'text': 'Welcome to the AIDA chatbot. '
            'You are now paired with a bot — feel free to send a message. '
            'Type /done to finish the chat.',
        }
    )
    self.first_time = False
    return

Then we check whether "[DONE]" in user's message:

act_text = act.get('text', None)
acts[0] = act
if act_text and '[DONE]' in act_text:
    agents[0].observe(validate(Message({'text': 'Goodbye!', 'episode_done': True})))
    self.reset()
    return

Important thing! We check if "your persona:" in user's message and if it is true, we send context message to bot:

if act_text and act_text.startswith('your persona:'):
    context_act = Message({'id': 'context', 'text': act_text, 'episode_done': False})
    agents[1].observe(validate(context_act))

So!

This all works good and works as I supposed to work, but there is problem. Every time new user writes to bot, it creates new Agent, new task, new InteractiveWorld, loads new model and all of this takes about 12% of RAM (1.9 GB). So, seven users easy turning server into the brick.

That is why I opened this issue. What is the best way to implement this feature using only one world?

I have an idea, but I do not know whether it is good or not. We continue using one Agent and one World, but with user ID we also pass all user's messages of chat history and loads all this messages to context. Then we generate an answer and clear world with world.reset() function.

Does this idea good? In this way, how to use Overworld and OnboardWorld?

You can watch my forked repository: https://github.com/AdamArutyunov/ParlAI

@stephenroller
Copy link
Contributor

stephenroller commented Aug 5, 2020

Super cool, really amazing what you've done

You might need to create your own custom world to manage all this context. The world_module: parlai.tasks.blended_skill_talk.worlds line of the config is where that's being chosen from. You can define your own world and more carefully manage it all yourself.

One really important point: agent.clone() can be used to create a copy of an agent that reuses all the weights etc from the original (while maintaining separate dialogue history, etc). This is the secret to how we can launch hundreds of simultaneous worlds without overloading a server.

@stephenroller
Copy link
Contributor

(that is, instead of using create_agent, make it a call to original_agent.clone(), where original_agent is something you instantiate on server initialization)

@AdamArutyunov
Copy link
Author

Thank you! I'll try to create one agent due to initialization and cloning that "pure" model every time new user starts the conversation. Maybe I will post something important in this topic, so I ask you not to close it.

@stephenroller
Copy link
Contributor

Just skimmed your branch quickly, very cool. I think others would love a telegram chat service being added to ParlAI upstream. Let me know if you're interested in generalizing some of your work.

@AdamArutyunov
Copy link
Author

AdamArutyunov commented Aug 6, 2020

Ok! But I am developing commercial order, so it is question of confidentiality.

I created "cache" dict and cloning agent if it has been already created:

@staticmethod
def generate_world(opt, agents):
    if 'agent' in cache:
        agent = cache['agent'].clone()
    else:
        agent = create_agent(opt, requireModelExists=True)
        cache['agent'] = agent

    agents.append(agent)

It works! New agents take only 500Mb of RAM. But I am not sure: is this memory heap is good for 90M model or this is still place for optimization? Just because everything is relative and I do not know if this heap bad or good.

At this moment server with 16GB RAM allows to store about 25-30 context, that is good, but still not enough for big project.

@stephenroller
Copy link
Contributor

Wow a new agent takes 500mb? I would expect more like.... 5mb.

@stephenroller
Copy link
Contributor

(Also I would advise against BlenderBot for commercial purposes. There are a number of real issues in terms of safety, coherence, etc. It's very much research)

@AdamArutyunov
Copy link
Author

AdamArutyunov commented Aug 7, 2020

Hmm, maybe world takes 500 MB, not agent? I have no idea.

Do I understand correctly that agent for model is called TransformerGeneratorAgent? It extends TorchGeneratorAgent and TorchAgent, but no one of this classes override clone() function from Agent, that contains only one line:

return type(self)(self.opt, self.share())

Is that correct? Maybe I should override clone() in one of these classes?

UPD: Indeed, world heaps memory, not agent. I'm definitely sure, I did some tests. I tried to call world.clone(), but it raises

World default had error TypeError("__init__() missing 2 required positional arguments: 'receiver_id' and 'task_id'")

But clone() does not require any parameters.
(and as far as I read World class, clone do full deepcopy, so there is no profit)

Is there way to reduce world's memory heap?

@stephenroller
Copy link
Contributor

You might be able to lower memory usage by using quantization. We don't have this implemented (yet), but the pytorch docs have a tutorial on it.

@AdamArutyunov
Copy link
Author

So, at this moment, final solution is to create one world, pass all user messages as contexts, generate answer and full reset the world. In future I think this will be paralleled to multiply worlds, but now I have problems with transforming async HTTP requests to syncro chatting with world.

@Ufukdogann
Copy link

Ufukdogann commented Nov 22, 2020

@AdamArutyunov hello,

Currently, I am using web_browser in order to let people communicate with a chatbot on a website. However, I have to let multiple users use it at the same time, and unfortunately, web_browserdoes not support multiple conversations at the same time, whenever a user closes the conversation, it closes for all.

I found your fork and read that you have solved that problem in your fork.

Can you let me know the command lines, please? Which terminal codes should I run in order to see your work?

Best regards,

@AdamArutyunov
Copy link
Author

@Ufukdogann ,

Hello!

My final solution is passing message history to agent, let him observe every message inside it, and reset agent after every request. This trick allows me to keep only one world and one agent, so, it increases queue time, but removes memory limit. So, with this solution I can keep infinite users.

All changes are currently available in parlai/chat_services/services/websocket folder. There is an API on Flask which can help you with new abstraction layer. You can always watch differences between my fork and original repo.

However, I must warn you that this project is commercial order and you cannot use my repo in your projects.

@nikhil-iyer-97
Copy link

Hi @AdamArutyunov , could you let me know how you pass message history to agent? Thanks!

@AdamArutyunov
Copy link
Author

@nikhil-iyer-97 hello! Every agent implements observe() method or inherits it from parent class. So, you just pass message into observe() call in loop for every message. I don't remember if you should pass string or Message object (probably second), but you can always view my code as example of batch observations.

@nikhil-iyer-97
Copy link

@AdamArutyunov I see you have maintained the message history as a list of strings, which i guess helps you maintain the persona of the agent, when you pass the history to the agent. What i could not find was how your model learns the user persona using the history everytime you reset the agent. Please let me know about this. Thanks

@AdamArutyunov
Copy link
Author

AdamArutyunov commented Dec 2, 2020

@nikhil-iyer-97 to be honest, I did not understand what "learns the user persona" does exactly mean. Message history should be passed to API with every request. For example, client must do something like this:

Sending message history with last message: ["Hello!"]
Getting from model response "Hello, how are you?"
Passing model response to user
Getting response from user ("I'm fine, and you?")
Sending message history with last message: ["Hello!", "Hello, how are you?", "I'm fine, and you?"]
...

So, "user persona" is determined by all previous message history.

@nikhil-iyer-97
Copy link

Thanks, I just wanted to know how your model or agent learns the user persona after you reset the agent ( basically where you make sure that the user when back online, doesnt feel like he/she is talking to a new agent, rather continuing from where they left off). Since you reset it everytime, I was confused

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants