Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Message: Datafield/NeuralLayer? Issue with STATISTICA NN pmml. #13

Closed
kstawiski opened this issue Jul 15, 2016 · 7 comments
Closed

Message: Datafield/NeuralLayer? Issue with STATISTICA NN pmml. #13

kstawiski opened this issue Jul 15, 2016 · 7 comments

Comments

@kstawiski
Copy link

kstawiski commented Jul 15, 2016

Hi again,
I'm strugling with new issue on neural network. He is the problem:

konrad@gen:~$ curl -H 'Content-Type: application/json' -X POST -d '{"id":"Uw4Fz","arguments" : { "hsa-miR-1246":"0","hsa-miR-1307-5p":"0","hsa-miR-150-5p":"0","hsa-miR-1307-5p":"0","hsa-miR-16-2-3p":"0","hsa-miR-200a-3p":"0","hsa-miR-200c-3p":"0","hsa-miR-203a":"0","hsa-miR-23b-3p":"0","hsa-miR-29a-3p":"0","hsa-miR-30d-5p":"0","hsa-miR-320b":"0","hsa-miR-320c":"0","hsa-miR-320d":"0","hsa-miR-32-5p":"0","hsa-miR-335-5p":"0","hsa-miR-450b-5p":"0","hsa-miR-486-3p":"0","hsa-miR-92a-3p":"0"}}' http://localhost:8080/openscoring/model/NNCFS
{
  "message" : "DataField"
}

The model looks like this:

konrad@gen:~$ curl -X GET http://localhost:8080/openscoring/model/NNCFS
{
  "id" : "NNCFS",
  "miningFunction" : "classification",
  "summary" : "Neural network",
  "properties" : {
    "created.timestamp" : "2016-07-15T12:37:22.249+0000",
    "accessed.timestamp" : "2016-07-15T16:28:28.503+0000",
    "file.size" : 9830,
    "file.md5sum" : "ed5534fb3c8e3d120b99bfabb9f3f0d1"
  },
  "schema" : {
    "activeFields" : [ {
      "id" : "hsa-miR-16-2-3p",
      "opType" : "continuous"
    }, {
      "id" : "hsa-miR-200a-3p",
      "opType" : "continuous"
    }, {
      "id" : "hsa-miR-200c-3p",
      "opType" : "continuous"
    }, {
      "id" : "hsa-miR-320b",
      "opType" : "continuous"
    }, {
      "id" : "hsa-miR-320d",
      "opType" : "continuous"
    } ],
    "groupFields" : [ ],
    "targetFields" : [ {
      "id" : "2 group",
      "opType" : "categorical",
      "values" : [ "Cancer", "Controls+Borderline" ]
    } ],
    "outputFields" : [ ]
  }
}

I don't understand the error message and what is the problem here.
Thank you very much in advance for help!
Konrad

@vruusmann
Copy link
Member

The exception is also sent to openscoring log file. What does it say?

I have a reason to think that your PMML file is invalid. More specifically, there's something wrong with the DataDictionary element (does not contain any DataField child elements?). Could you share your PMML file?

Every activeField object should provide id, dataType and opType mappings:

"activeFields" : [
  {
    "id" : "hsa-miR-16-2-3p",
    "dataType" : "double"
    "opType" : "continuous"
  }
]

For some reason, there is no dataType mapping in your model response.

@kstawiski
Copy link
Author

Thank you for your quick response. Here it is (I had to change extenstion to txt):
NeuralNetwork.txt

@kstawiski
Copy link
Author

I have reminded myself of my previous issue with STATISTICA (#4) and added dataType="" in all DataFields. Now I have new message:

{
"message" : "NeuralLayer"
}

@kstawiski kstawiski changed the title Message: Datafield? Message: Datafield/NeuralLayer? Issue with STATISTICA NN pmml. Jul 15, 2016
@vruusmann
Copy link
Member

There's no point in adding empty dataType="" to all DataField elements.

You should add dataType="double" to continuous fields, and dataType="string" to categorical fields.

@vruusmann
Copy link
Member

I tried evaluating your NN model with the example data record. The exception message "NeuralLayer" is another InvalidFeatureException, which is raised because the second NeuralLayer doesn't specify the activationFunction attribute.

Am I correct to assume that both neural layers use tanh activation function?

<NeuralLayer numberOfNeurons="2" activationFunction="tanh" normalizationMethod="softmax">

Anyway, after defining this attribute, the example data record evaluates to the following result:

{
  "id" : "Uw4Fz",
  "result" : {
    "2 group" : "Controls+Borderline"
  }
}

@vruusmann
Copy link
Member

What should we do about STATISTICA neural network models then? Every time you train a new model, you must to do the following:

  1. Add missing DataField@dataType attributes.
  2. Add missing NeuralLayer@activationFunction attributes.

This correction work could be automated by developing a special-purpose JPMML-Model Visitor class:

public class StatisticaCorrector extends AbstractVisitor {

  @Override
  public VisitorAction visit(DataField dataField){
    if(dataField.getDataType() == null){
      OpType opType = dataField.getOpType();
      switch(opType){
        case CONTINUOUS:
          dataField.setDataType(DataType.DOUBLE);
          break;
        case CATEGORICAL:
          dataField.setDataType(DataType.STRING);
          break;
      }  
    }
    return super.visit(dataField);
  }

  @Override
  public VisitorAction visit(NeuralLayer neuralLayer){
    if(neuralLayer.getActivationFunction() == null){
      neuralLayer.setActivationFunction(ActivationFunctionType.TANH);
    }
    return super.visit(neuralLayer);
  }
}

If you register this JPMML-Model Visitor class in Openscoring configuration file, then all the uploaded STATISTICA neural network models would be corrected automatically.

@kstawiski
Copy link
Author

Thank you. All is working now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants