Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing field when trying to score lightgbm model #48

Closed
jwgwalton opened this issue Apr 16, 2019 · 9 comments
Closed

Missing field when trying to score lightgbm model #48

jwgwalton opened this issue Apr 16, 2019 · 9 comments

Comments

@jwgwalton
Copy link

jwgwalton commented Apr 16, 2019

Hi i've used https://github.com/jpmml/jpmml-lightgbm to generate a PMML file from a lightgbm model
lightgbm.txt
lightgbm.pmml.txt

When i load this into the server i get the following response

curl -X PUT --data-binary @lightgbm.pmml -H "Content-type: text/xml" http://localhost:8080/openscoring/model/lightgbm
{
  "id" : "lightgbm",
  "miningFunction" : "classification",
  "summary" : "Ensemble model",
  "properties" : {
    "created.timestamp" : "2019-04-16T07:24:15.570+0000",
    "accessed.timestamp" : null,
    "file.size" : 613912,
    "file.md5sum" : "b00ce9c05625965800699ab94da460ab"
  },
  "schema" : {
    "inputFields" : [ {
      "id" : "region",
      "dataType" : "double",
      "opType" : "continuous",
      "values" : [ "[0.0, 61.0]" ]
    }, {
      "id" : "site_group",
      "dataType" : "double",
      "opType" : "continuous",
      "values" : [ "[0.0, 178.0]" ]
    }, {
      "id" : "clean_title_cos_sim_keywords_string",
      "dataType" : "double",
      "opType" : "continuous",
      "values" : [ "[-1.0, 1.0]" ]
    }, {
      "id" : "clean_title_cos_sim_client_id",
      "dataType" : "double",
      "opType" : "continuous",
      "values" : [ "[-1.0, 1.0]" ]
    }, {
      "id" : "clean_description_cos_sim_keywords_string",
      "dataType" : "double",
      "opType" : "continuous",
      "values" : [ "[-1.0, 1.0]" ]
    }, {
      "id" : "clean_description_cos_sim_client_id",
      "dataType" : "double",
      "opType" : "continuous",
      "values" : [ "[-1.0, 0.726566731929779]" ]
    }, {
      "id" : "client_relevancy",
      "dataType" : "double",
      "opType" : "continuous",
      "values" : [ "[-1.0, 1.0]" ]
    } ],
    "targetFields" : [ {
      "id" : "_target",
      "dataType" : "integer",
      "opType" : "categorical",
      "values" : [ "0", "1" ]
    } ],
    "outputFields" : [ {
      "id" : "probability(0)",
      "dataType" : "double",
      "opType" : "continuous"
    }, {
      "id" : "probability(1)",
      "dataType" : "double",
      "opType" : "continuous"
    } ]
  }
}%

However when i try to test this i get the following error

curl -X POST --data-binary @lightgbm_request.json -H "Content-type: application/json" http://localhost:8080/openscoring/model/lightgbm
{
  "message" : "Field \"transformedLgbmValue\" is not defined"
}%   

with the following request json

{
        "id": "1",
        "arguments": {
                "region": 58.0,
                "site_group": 10.0,
                "clean_title_cos_sim_keywords_string": 0.5951485633850098,
                "clean_title_cos_sim_client_id": 0.04875922203063965,
                "clean_description_cos_sim_keywords_string": 0.46828553080558777,
                "clean_description_cos_sim_client_id": 0.1009560078382492,
                "client_relevancy": 0.64421546459198
        }
}

Based off the returned model schema i don't understand why i can't score this? Looking through the PMML file there is a transformedLgbmValue but it isn't in the expected inputFields?

@jwgwalton
Copy link
Author

@vruusmann hopefully you'll be able to help me. I generated PMML files for the models in the jpmml-lightgbm package to compare to my model and some of them contain "transformedLgbmValue". I'm however struggling to get this to work with the openscoring-service. Any help would be much appreciated cheers.

@vruusmann
Copy link
Member

It's interesting that the same (and possibly very major) issue gets reported twice by two different people in such a short timeframe.

Issue 47 is about XGBoost, whereas this issue is about LightGBM. However, the exception condition is exactly the same - the second stage of a GBT model (a RegressionModel element that performs the logit transformation) thinks that the first stage of the GBT model (a MiningModel element) returned a missing value as a prediction.

This issue provides a fully reproducible example, so I can observe this exceptional behaviour myself - thanks for that!

I'm puzzled right now. In most cases this exception means that the first stage of the GBT model was executed with incomplete input. But I can see that in this case the argument data record is complete (contains values for all seven input fields).

@vruusmann
Copy link
Member

The most puzzling part for me is that if I download the latest 1.4.8 release of the JPMML-Evaluator command-line application, and convert the JSON request to a CSV file request, then the prediction succeeds without problem:

$ java -jar pmml-evaluator-example-executable-1.4.8.jar --model lightgbm.pmml --input input.csv --output output.csv

If there was a problem with the PMML file, or the 1.4.8 version of the JPMML-Evaluator library, then the above execution should fail with the same org.jpmml.evaluator.MissingFieldException: Field "transformedLgbmValue" is not defined message. But the prediction is successful - this suggests that there's something wrong with the Openscoring REST web service layer.

input.csv.txt
output.csv.txt

It's worth pointing out that issue 47 reports that the latest 1.4.X release of Openscoring works fine. It's the 2.0-SNAPSHOT codebase (ie. git clone) that is broken.

@vruusmann
Copy link
Member

On line 60 of the example lightgbm.pmml.txt file there is the following OutputField element declaration:

<OutputField name="transformedLgbmValue" optype="continuous" dataType="double" feature="transformedValue" isFinalResult="false">

If the value of the OutputField@isFinalResult attribute is changed from false to true, then the scoring works fine:

<OutputField name="transformedLgbmValue" optype="continuous" dataType="double" feature="transformedValue" isFinalResult="true">

This attribute value was toggled in one of the latest JPMML-LightGBM (as well as JPMML-XGBoost) releases. Both false and true values are valid in that location. However, looks like that Openscoring is using an output field filtering strategy, which is pruning non-final output field values a bit too aggressively.

@xiaoluoyfy
Copy link

@vruusmann so the current solution is to modify the xgb/lgb pmml file manually?

@jwgwalton
Copy link
Author

Cheers @vruusmann

@sam-s
Copy link

sam-s commented May 3, 2019

I am getting Field "expDecisionFunction(0)" is not defined for an iris.pmml created from GradientBoostingClassifier by sklearn2pmml.
Is this related or should I create a separate issue?

@vruusmann
Copy link
Member

@sam-s If you're absolutely sure that your input data record is complete (ie. there are no missing/omitted) values, then it's the same thing

@vruusmann
Copy link
Member

It's possible to disable output field filtering by commenting out this line (invocation of ModelEvaluatorBuilder#setOutputFilter(OutputFilters.KEEP_FINAL_RESULTS)):
https://github.com/openscoring/openscoring/blob/master/openscoring-service/src/main/java/org/openscoring/service/Openscoring.java#L195

This issue only affects the WIP codebase. If you've bothered to build a WIP version manually, then my recommendation is that you should comment out the above line, and rebuild.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants