Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PMML supports for choosing model when the condition is satisfied #110

Closed
liupei101 opened this issue Oct 12, 2018 · 7 comments

Comments

Projects
None yet
4 participants
@liupei101
Copy link

commented Oct 12, 2018

Hi, Contributors!
I have workflow involving sklearn2pmml, which is listed below:

# Example
pipeline = PMMLPipeline([
	("classifier", DecisionTreeClassifier())
])
pipeline.fit(iris_df[iris_df.columns.difference(["Species"])], iris_df["Species"])
sklearn2pmml(pipeline, "DecisionTreeIris.pmml", with_repr = True)

# My workflow 
pipeline = PMMLPipeline([
       {
           "X['Widths'] > 20": ("classifier", DecisionTreeClassifier()),
           "X['Widths'] < 20": ("classifier", XGBClassifier())
       }
])
pipeline.fit(iris_df[iris_df.columns.difference(["Species"])], iris_df["Species"])
sklearn2pmml(pipeline, "DecisionTreeIris.pmml", with_repr = True)

I searched for basic usage of sklearn2pmml, it can convert trained model to pmml.
but I don't know how to implement my workflow!

Does sklearn2pmml support for choosing model when the condition is satisfied?

thx!

@vruusmann

This comment has been minimized.

Copy link
Member

commented Oct 12, 2018

Does sklearn2pmml support for choosing model when the condition is satisfied?

Is your workflow valid Python/Scikit-Learn syntax in the first place?

PMML can represent it using the model segmentation approach:
http://dmg.org/pmml/v4-3/MultipleModels.html

In brief, there would be a top-level MiningModel element, which contains a TreeModel and a MiningModel (that's for XGBoost) child elements. Both segments are associated with a predicate which determines if they should be selected or not.

In JPMML-SkLearn/SkLearn2PMML this can be implemented by introducing a custom estimator class.

@vruusmann

This comment has been minimized.

Copy link
Member

commented Oct 12, 2018

Pseudo-code about this custom estimator class usage:

pipeline = PMMLPipeline([
  ("classifier", ModelSelector([
    ("X['Widths'] >= 20", DecisionTreeClassifier()),
    ("X['Widths'] < 20", XGBClassifier()),
  ]))
])

I wonder how you would fit such a workflow? Is the goal to split the training dataset between two child models already during the training?

@liupei101

This comment has been minimized.

Copy link
Author

commented Oct 12, 2018

Thank you very much at first ! I am so sorry for not explaining my problem clearly.

In fact, I want to make a web application for predicting risk for patients. The application should serve for two independent population(such as people with or without X-ray inspection) by using two corresponding predictive models.

So I should follow the logic below(pseudo-code):

if the patient with X-ray inspection:
    # trainset: (train_X_with_xray, train_y_with_xray)
    # base estimator: XGBoost Classifier
    # fitted by training data involving variables related to the result of X-ray inspection.
    Model1 = model(...)
    # predict
    risk = Model1.predict()
else if the patient without X-ray inspection:
    # trainset: (train_X_without_xray, train_y_without_xray)
    # base estimator: XGBoost Classifier
    # fitted by training data not involving variables related to the result of X-ray inspection.
    Model2 = model(...)
    # predict
    risk = Model2.predict()

Now I face the problem that I should use single PMML file to give result after inputting patient's information to PMML, but not use two PMML files(one for patient with X-ray inspection, the other for patient without X-ray inspection) combining with if-else in JavaScript at the front of web to reach my target!

@vruusmann Thanks for your Pseudo-code about this custom estimator class usage, I will get more about ModelSelector , or can you give some suggestions about the problem I face with for your convenience ?

Thank you very much!

@vruusmann

This comment has been minimized.

Copy link
Member

commented Oct 13, 2018

This custom class should actually be named ModelChoice, because the suffix "Selector" has special meaning in Scikit-Learn already (feature selectors).

So, class ModelChoice should implement both fit() and predict() functionality:

  • During fit(), every member model is trained using a subset of the training dataset for which the predicate evaluated to True.
  • During predict(), the prediction is made using the first model for which the predicate evaluated to True.

This solution wouldn't be too difficult to implement, because there is a reusable predicate translator component already available:
https://github.com/jpmml/jpmml-sklearn/blob/master/src/main/javacc/predicate.jj

@liupei101 My schedule is pretty tight during the next week. If you want to speed things up, then you could prototype the Python side of ModelChoice class yourself.

@liupei101 liupei101 closed this Oct 15, 2018

@vruusmann

This comment has been minimized.

Copy link
Member

commented Oct 15, 2018

Reopening, because this is an interesting functionality that should be implemented.

@vruusmann vruusmann reopened this Oct 15, 2018

@guleatoma

This comment has been minimized.

Copy link

commented Feb 28, 2019

Hey! I have the exact same issue, I tried to handle it through preprocessing and Ruleset but couldn't make it work. Any update on this?

Thanks a lot.

@avogels

This comment has been minimized.

Copy link

commented Mar 26, 2019

Hello, I would be very interested in this feature as well! Thanks and regards.

@vruusmann vruusmann closed this in 2649be1 Jun 20, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.