New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Idea for a simple rule based classifier #54
Comments
I'm wondering if it makes sense to make a distinction between rule-based systems. In my mind, there are two kinds of systems possible when we consider trees. Case WhenTree WhenDifferencesBoth systems are trees. However, the
I'm curious, what's your take on this? There's something to be said to make a distinction between these two classes of trees at the user-interface level. But I'm curious if I'm missing something. A minor comment: your operators ('=', '<>', '<', '>', '<=', '>=') seem sound, but should we perhaps add a |
I guess technically one can be transformed to the other and I also have some code that spits out both representations and also a third one from any decision tree. So it boils down to user interface. While both would be possible even there, I do not see non-technical people being able to create what you describe as "tree when". The third representation I was talking might be worth considering, though. It goes like: like case-when but have all the possible rules that will lead to a certain prediction batched up at one place and not scattered around.
in could be expressed with a combination of terms, but I also like the in as a convenience operator. |
Yeah, the underlying implementation should certainly be done by a parent class. But a long-term plan for this library is to add a user-interface so that folks may more easily declare rules. With that in mind, I might prefer child classes that might make a distinction.
Is it possible to make a conceptual drawing of this? Jjust make sure we'll be talking about the same thing, I often find pictures say more than words. |
I meant, all three representations are equivalent, so internally it does not matter how we store the rules.
my drawings tend to suck, but I tried: |
Have you thought about a pythonic API to declare these rules? I tried doing something like this before. clf = (Rules(dataf=df)
.casewhen(lambda d: d['age'] < 16 & d['income'], "risk")
.casewhen(lambda d: d['n_accounts'] >= 10, "risk")) This kind of works in python. It's relatively clear to write but it's tricky to get it into a nice set of serialized rules because of all the lambdas. We could also say, "let's just assume shallow sklearn trees for now and see if we can get that translated into SQL first". But I worry things will get nitty-gritty fast with all the SQL variants out there. |
First thought: wouldn't this be somewhat inside of the fit method? Seems like sklearn has some thoughts about where a model learns, and this would be in fit, no? Having Python Code for rules sounds off to me: If the author of rules can write Python, why noy just let them write arbitrary Python Code? Also, as I mentioned: it is pretty straight forward to translate shallow rules in deep ones and the other way around, so I am sure we can generate SQL from any sort of tree representation. |
Just wondering Vincent, what tool do you use to make your awesomely simple drawings? |
A lot of it is screenbrush. |
I thought about wipping something like this together, but then remembered to google first and found these libraries: The library lets you define variables that business rules can act on, and the potential actions that can be taken, and then https://github.com/venmo/business-rules They then have this simple UI to generate the JSON files: https://github.com/venmo/business-rules-ui I was thinking of something similar but then having BusinessRules defined as python classes and then a RulesEngine that is scikit-learn compatible that consists of a collection of BusinessRules (either a list or a dictionary to define a tree like structure). You could then both export this to ( Then you would have to develop a UI on top of that to make it truly user friendly. |
So I made a quick demo here: https://github.com/oegedijk/rule_estimator It is a slightly different approach as the user would have to define classes instead of functions and then wrapping those functions in estimators. For now you can define simple gt/ge/lt/le |
@oegedijk there's some user interface elements that I am experimenting with in line with that you're suggesting, but with a slightly different vantage point. A first demo can be found as part of my csvconf talk. It starts at 40:00. I think the problem isn't that we're not capable of translating casewhen-style domain rules into python. That's a syntax problem and that's solved. I think the problem lies more in the user interface, which is in-line with the ui demo, but it's not just the declaration of rules. It's two main issues:
I've got some ideas in this realm as well as small local demos, but nothing is ready for prime time just yet. There is one demo live though that I've made on behalf of my employer, Rasa, in case you're interested (check out the Bulk Labelling demo). |
Closing due to radio silence. |
Ideas for a rule based classifier after discussion with
This classifier would not have the full power of Python, but is rather a collection of rules entered by domain experts who are not necessarily technical people.
Rules
Rules have no structure and are always interpreted as disjunctions (or) and can be composed of conjunctions (and). To resolve conflict they can have a simple priority field.
Format of the rules could be
Examples
Rules need not be expressed as plain text, but also a structured format of nested lists/arrays. A parser for a text format like this would be possible with a very simple recursive descent parser.
API
Debugging support for plotting pairwise decision boundaries would be helpful.
The text was updated successfully, but these errors were encountered: