Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Learn bot #58

Closed
vladimirmyshkovski opened this issue Jul 19, 2017 · 13 comments
Closed

Learn bot #58

vladimirmyshkovski opened this issue Jul 19, 2017 · 13 comments

Comments

@vladimirmyshkovski
Copy link

I have a few questions about the training.

I understand how learning works like "learn * is *", or "my name is *"

And I understand that this data is stored in memory, that is, as long as the application is running.

But I would like to supplement existing files, or even create new ones, through this training.

I also thought that you can use the XML database to store this information.

Why have not I seen an example of using a database for these purposes?

But even if you learn how to create files, or if you want to use a database, the question is how to sort these data by the necessary categories.

I still have very weak knowledge in neural networks and in-depth training.
But I am constantly learning, and soon I will be able to do relatively complicated things.

@keiffster
Copy link
Owner

Hi,

The XML files contains the rules which are parsed and loaded into memory to create the parse tree that then interprets the questions you ask the bot. There is no XML database as such its all in memory

So to teach the bot the ability to answer new questions you create your own .aiml files with the appropriate grammars and then reload the entire bot.

I am in the process of writing a tutorial for aiml, and there is some basic documentation in the wiki on how to get started.

I'm not sure what you are referring to in terms of "sort these data by the necessary categories"

K

@vladimirmyshkovski
Copy link
Author

@keiffster,
Concerning the "sort these data by the necessary categories", I meant the following:

If the user has asked a question for which there is no answer, I want to understand to which category (which AIML file this data can be attributed), then the bot should ask a leading question to find out what exactly the person means, in order to Afterwards, having received the same question, he had an answer to it.

And what about the fact that the user data is lost after the bot reboots?
I need to store some data about the user, which he called. For example, his name, date of birth, some identified preferences, etc.
What is the best way to do this?

I understand that it is necessary to record at least the session ID in the database, and to link user data to this session, but I have not yet fully understood how to do this.

I already write the session to the user in the cookie, and I get it to the REST backend, but I do not understand what to do next.

@keiffster
Copy link
Owner

If a question is not understood, then part of the recursive tree decent is to look for a more and more generic pattern. For example
* can be used to match anything, and given the priority rules this will be the last rule checked

You are correct at this time there is no persistence once the app terminates.

The typically way with AIML to remember something is to use the tag which writes the knowledge as new rules which are then reloaded at restart.

However there is no session management within the app at this time, its not something that is covered in AIML 2.0 spec but something I am looking at and may result in the introduction of some sort of session storage database

@seghcder
Copy link

I think conversation and var/predicate persistence might be good, but then anything further than that I think is going beyond the core bot framework. For us, much user information is stored in other systems and we real-time pull what we need from those systems. There's nothing stopping someone writing an extension/service for such functions as an optional add-on :-)

Perhaps if we are just reloading the braintree it can also reload session info, but perhaps a full reload of the XML should (by default) wipe the lot and start from scratch. Would be good to have the option - this leads to my next point...

Can we do partial refresh of AIML, sets, predicates etc, without restarting the entire bot? That might require more fundamental changes (as I think every brain row would need to track the original source aiml/config, in case later we want to refresh it). The idea is to reduce the changes that require a full "reboot". Perhaps something for the back-backlog.... not urgent.

@vladimirmyshkovski
Copy link
Author

I do not know how it works, I did not have time to figure it out yet, but they use a MySQL

https://www.program-o.com/

supposely allows you to store aiml into sql
One of the things I’ve done recently was to write an algorithm for handling tags that includes a “lookup” table that only contains the pattern and topic fields, along with the ID of the category in the main table. This has proven to be a HUGE performance improvement with categories that contain a lot of tags, in that the script just makes a (comparatively) simple SQL query, rather than going the process of finding the “best match” (which is a rather convoluted process) every time. A prime example of such a category is the “BURGER WARS” category in the standard ALICE AIML set. Using SRAI lookup has improved response times for that category alone by over 70%.

The way it looks is simple. When the script encounters an tag, it performs a quick search like this:

“SELECT ID FROM SRAI_LOOKUP WHERE PATTERN = ‘$pattern’ AND TOPIC = ‘$topic’;”

If a match is returned, another direct query is made to the main table, based on the returned id, which is lightning fast, comparatively. The contents of the template field are then parsed into the current response.

If no match is found, a nearly identical query is made against the main table, and if a direct match is found, the located id is added (along with the pattern and topic) to the lookup table, for later use. If that query fails, then it’s handed back to the function that performs the full search, and the resulting info is once again added to the lookup table.

The algorithm is “smart” enough to leave out the topic if it hasn’t been set, which shaves off a little time, and leaves out the tag altogether, since 99.99% of all SRAI calls don’t require it. In fact, you could also say the same for the tag, but the possibility of there being a matching category within a given topic is a bit higher, so I decided to work with that one.

There is one drawback with this system, in that a new chatbot contains an empty lookup table, so while it’s “learning” all of the SRAI calls there’s about a 2-3% decrease in performance, but as the lookup table grows that performance hit is offset more and more by the improved response times gained through use of the lookup table.

@ideasean , @keiffster what do you think about it?

@seghcder
Copy link

seghcder commented Jul 22, 2017

For our requirements, we don't need a large AIML set, and the performance to date has been fine. The fact that Program-Y runs all-Python is a key feature for us. Adding a MySQL requirement opens up all sorts of issues for enterprise deployments (support, standards, which security zone the bot goes in, firewall rules, ...).

For a more general chatbot with a large AIML set, I can see things might require optimising. I tried loading The Professor demo bot but didn't get far. But then, I think for the bulk of its content I would push the query to a wikipedia or other knowledge API. Even a human at some point would say "have you heard of this thing called Google?"

If optimisation was required, I would perhaps suggest trying one of the caching or indexed data-frame python libraries (or other approaches). Given Python is one of the most popular big data languages (after R), I am sure there would be a way of speeding things up without the overhead of a full RDBMS.

I've also been reading Designing Bots. It makes an interesting point that a bot that instantly responds "feels weird" to many people. Some delay is a good thing (perhaps could be a feature Keith - a minimum response delay?? ) :-)

@vladimirmyshkovski
Copy link
Author

vladimirmyshkovski commented Jul 22, 2017

The problem that I personally see for myself, in all bots that use AIML so far, is that their files are static, bots can not learn in the process of communication, and in fact, they read responses from files. This is a good option when the predictability of the bot is important. He does not have the one that I did not write to the AIML file, or that can not be obtained from external services.

But when I need a bot to learn, and become smarter from communication with each person, supplementing my library of knowledge, the solutions that I found do not fit.

I really like the Project-y, and that's the best I've found, especially since I'm basically writing to Python. However, I'm still looking for how I can do what I want with him.

With performance, everything is fine, more than, except that the Professor, requires about 16 GB of RAM :) But he responds super-fast.

@seghcder
Copy link

Unfortunately users cannot always be trusted...

Microsoft silences its new A.I. bot Tay, after Twitter users teach it racism

I think Keith is doing a great job to make it as extensible as possible. Hopefully there is a mix of extensions and updates to core that will work. Perhaps break down some of your core update priorities and Keith can put them on the backlog. Sometimes backlog items seem to get done at the time I find I need them! :-)

@vladimirmyshkovski
Copy link
Author

Yes, users can not be trusted learning bot, almost always, I agree.

But how then to fill the bot with thousands of AIML files, give him thousands of brain files so that he knows the colors, cars, moods, and so on.

But even the problem is not this.

If I want to initiate a conversation from the bot, for example, if I write to him, that went to the parking lot behind the car so that on return, the bot asked how I went to the parking lot, or if I say that I'm sick, so that the bot inquired about my health, etc.

I am am confused by the very technology of AIML. It initially seems simple and attractive, but then, personally to me, it seems the imitation of AI.

@seghcder
Copy link

All AI so far is an imitation of intelligence, just in varying degrees :-) Even Eliza is still considered an "artificial" intelligence.

AIML is generally a pattern based approach. What you are probably looking for is a mix between NLP and deep learning.
http://www.wildml.com/2016/04/deep-learning-for-chatbots-part-1-introduction/

Facebook M, with the millions behind it, is still only part of the way there
https://www.technologyreview.com/s/604117/facebooks-perfect-impossible-chatbot/?utm_campaign=add_this&utm_source=twitter&utm_medium=post

Lebrun adds, “It’s so hard, and we make progress slowly, but I think we have everything we need.” He could be right, but you can also imagine someone who met Eliza in 1964 saying much the same thing.

I don't imply to give up… There is a ton of value that a bot interface can add quickly. It also forces developers to think harder about user interaction, which is a good thing. I think we are still a long way from a generally intelligent artificial intelligence. According to some, that's a good thing too :-)

@vladimirmyshkovski
Copy link
Author

What do you think about https://github.com/bwilcox-1234/ChatScript ?
He has a client for a python, he gives a lot of opportunities

@seghcder
Copy link

Unfortunately it doesn't support AIML. This was a factor for us. Your requirements may differ though. Cleverbot also seems to have more learning capabilities like you are after, however it hasn't done so well at the Loebner prize for the past few years.

@keiffster
Copy link
Owner

Just an update on Professor. Yes it is huge, close to 500k terms, but also over 80k problems. Another contributor is currently work8ng through these bu5 dies highlight that there are a lot of broken aiml files out there.

A fix for Professor is close, huge thanks to michel who has be3n working hard cleaning this up

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants