How to set the batch size for prediction? #36

o0windseed0o · 2018-01-02T21:21:39Z

Hi all, I think it is possible to set the training batch size as 100 and predicting size as 10, right?
So I tried different sizes of predicting batch sizes, from 1, 10, to 100, after predicting, there are different results:
It is for binary classification using match_pyramid and predict totally 42,155 samples.
size = 1 numpy.core._internal.AxisError: axis 1 is out of bounds for array of dimension 1
size=10 predict and output predicting results for 42,142 samples
size=50 predict and output predicting results for 42,142 samples
size=100 predict and output predicting results for 42,092 samples
Anyone knows what was wrong?

uduse · 2018-01-03T01:43:52Z

@o0windseed0o can you paste your complete config file here? It's probably an iteration boundary related bug somewhere in our code.

o0windseed0o · 2018-01-03T04:16:30Z

@uduse Thanks for your reply! Please see the following.

{
 "net_name": "match_pyramid",
  "global":{
      "model_type": "PY",
      "weights_file": "examples/QA/weights/matchpyramid_classify.weights",
      "save_weights_iters": 10,
      "num_iters": 200,
      "display_interval": 10,
      "test_weights_iters": 200,
      "optimizer": "adam",
      "learning_rate": 0.0001
  },
  "inputs": {
    "share": {
        "text1_corpus": "./data/QA/corpus_preprocessed.txt",
        "text2_corpus": "./data/QA/corpus_preprocessed.txt",
        "use_dpool": true,
        "embed_size": 100,
        "train_embed": true,
        "vocab_size": 28780,
        "target_mode": "classification",
        "class_num": 2,
        "text1_maxlen": 25,
        "text2_maxlen": 50
    },
    "train": {
        "input_type": "PointGenerator", 
        "phase": "TRAIN",
        "use_iter": false,
        "query_per_iter": 20,
        "batch_per_iter": 5,
        "batch_size": 100,
        "relation_file": "./data/QA/relation_train.txt"
    },
    "valid": {
        "input_type": "PointGenerator", 
        "phase": "EVAL",
        "batch_size": 100,
        "relation_file": "./data/QA/relation_train.txt"
    },
    "test": {
        "input_type": "PointGenerator", 
        "phase": "EVAL",
        "batch_size": 100,
        "relation_file": "./data/QA/relation_test.txt"
    },
    "predict": {
        "input_type": "PointGenerator", 
        "phase": "PREDICT",
        "batch_size": 50,
        "relation_file": "./data/QA/relation_test.txt"
    }
  },
  "outputs": {
    "predict": {
      "save_format": "TEXTNET",
      "save_path": "predict.test.medqa_matchpyramid_classify.txt"
    }
  },
  "model": {
    "model_path": "matchzoo/models/",
    "model_py": "matchpyramid.MatchPyramid",
    "setting": {
        "kernel_count": 32, 
        "kernel_size": [3, 3], 
        "dpool_size": [3, 10],
        "dropout_rate": 0.5
    }
  },
  "losses": [
    {
       "object_name": "categorical_crossentropy",
       "object_params": {}
    }
  ],
  "metrics": [ "accuracy" ]
}

There are several paramters in the config file that I don't know how to set, such as query_per_iter and bath_per_iter. Are there any instructions or introductions on how to write config files?

As to the error, if it's not related to the config file, it might be the operation on generating batches, since the missing samples are always the last ones.

o0windseed0o · 2018-01-06T00:04:00Z

Have you figured out what caused the problem? Or can anybody tell me which py file should be checked, related to the batch generator?

uduse · 2018-01-08T02:50:06Z

@o0windseed0o haven't got a chance to dive into the problem yet. You might want to look at PointGenerator class.

o0windseed0o · 2018-01-08T04:56:36Z

@uduse I have checked the PointerGenerator class, and I think the problem should be around the while True loop in the get_batch_generator function. There are no operations on how to deal with those samples that cannot be built up to a batch. Please check from here when you are free. Thank you!

bwanglzu · 2018-05-25T10:00:01Z

Need to figure out whether a bug exist.

Genie-Liu · 2018-07-03T10:00:11Z

Today I come to the same situation: I have 5000 prediction samples, and it output 4998 predicts.
But no matter how I change my batch_size, the output is always 4998.
Then later I found that there's duplicate samples in my prediction sample.

@o0windseed0o Not sure if you have the duplicate sample.

@bwanglzu By the way, can the model fixed the duplicate situation?

bwanglzu · 2018-07-03T10:11:18Z

Apparently there's something wrong in Generator, I guess @faneshion and @yangliuy are the right person to ask.

@Genie-Liu and you provide a bit more context?

bwanglzu · 2018-07-17T12:51:19Z

@faneshion any ideas?

bwanglzu added help wanted question labels May 25, 2018

bwanglzu removed the help wanted label Jun 14, 2018

bwanglzu assigned yangliuy and faneshion Jul 3, 2018

bwanglzu mentioned this issue Aug 16, 2018

A question about the manner of input data to model.fit_generator() #289

Closed

bwanglzu closed this as completed Oct 7, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to set the batch size for prediction? #36

How to set the batch size for prediction? #36

o0windseed0o commented Jan 2, 2018

uduse commented Jan 3, 2018

o0windseed0o commented Jan 3, 2018 •

edited by uduse

Loading

o0windseed0o commented Jan 6, 2018

uduse commented Jan 8, 2018 •

edited

Loading

o0windseed0o commented Jan 8, 2018

bwanglzu commented May 25, 2018

Genie-Liu commented Jul 3, 2018

bwanglzu commented Jul 3, 2018 •

edited

Loading

bwanglzu commented Jul 17, 2018

How to set the batch size for prediction? #36

How to set the batch size for prediction? #36

Comments

o0windseed0o commented Jan 2, 2018

uduse commented Jan 3, 2018

o0windseed0o commented Jan 3, 2018 • edited by uduse Loading

o0windseed0o commented Jan 6, 2018

uduse commented Jan 8, 2018 • edited Loading

o0windseed0o commented Jan 8, 2018

bwanglzu commented May 25, 2018

Genie-Liu commented Jul 3, 2018

bwanglzu commented Jul 3, 2018 • edited Loading

bwanglzu commented Jul 17, 2018

o0windseed0o commented Jan 3, 2018 •

edited by uduse

Loading

uduse commented Jan 8, 2018 •

edited

Loading

bwanglzu commented Jul 3, 2018 •

edited

Loading