KeyError: 'item_id_char' #105

goldenrati0 · 2019-02-16T18:01:33Z

Hello,

My small_dataset.csv looks like this

user_id,item_id,dow,hod,category_id,cusine_id,restaurant_id,item_count
-6088859261566846925,601d4dc5-2ad,7,3,9c64431e-79f,fb8b5b7f-013,706247b3-86e,1
-4662982070704311223,af48a23f-f80,7,2,a989cb37-db5,8631ea96-a78,ea90883c-a2f,1
-4849981472186386714,bab8cfae-930,1,1,4f31b9ea-059,5aa1a680-942,d9ab8a13-380,1
-5400975689533971605,3ae9d553-282,4,0,37d510f8-745,c9aa3993-86f,0b8d021a-c5c,1
4810905585891548302,3d673534-8f8,3,1,4264559d-470,5aa1a680-942,5772716f-1f3,1
8523163246689801258,f5cab6a9-8f5,2,3,60db56cd-21e,2d470ebd-e46,4e4c60a4-bca,1
6141444640923780397,e1d46baf-89c,6,17,b7226a64-569,5aa1a680-942,910396af-1dd,1
7782246672655206426,617f0d50-e63,2,23,68f8231d-6e1,8631ea96-a78,344071ad-459,1
-3808793554476278979,56903221-9d1,5,5,79803e34-c64,e153fa17-ac6,5746c349-5e1,1

My model_definition looks like this (I defined it programatically)

INPUT_COLUMN_NAMES = "user_id, item_id, dow, hod, category_id, cusine_id, restaurant_id, item_count".split(", ")
NUMERIC_COLUMN_NAMES = ["dow", "hod", "item_count"]
OUTPUT_COLUMN_NAMES = "item_id, category_id, restaurant_id".split(", ")

model_definition = {
        "input_features": [
            {
                "name": col_name,
                "type": "numerical" if col_name in NUMERIC_COLUMN_NAMES else "text",
                "encoder": "parallel_cnn"
            } for col_name in INPUT_COLUMN_NAMES
        ],
        "output_features": [
            {
                "name": col_name,
                "type": "numerical" if col_name in NUMERIC_COLUMN_NAMES else "text",
            } for col_name in OUTPUT_COLUMN_NAMES
        ],
        "training": {
            "epochs": 10
        }
    }

When I run the train() function I get the error KeyError: 'item_id_char'

Log

INFO:root:Model name: run
INFO:root:Output path: ../ludwig_model\_run_5
INFO:root:

INFO:root:ludwig_version: '0.1.0'
INFO:root:command: 'D:/playground/ml-playground/ludwig-test/cnn.py'
INFO:root:dataset_type: 'generic'
INFO:root:random_seed: 42
INFO:root:model_definition: {   'combiner': {'type': 'concat'},
    'input_features': [   {   'encoder': 'parallel_cnn',
                              'level': 'word',
                              'name': 'user_id',
                              'tied_weights': None,
                              'type': 'text'},
                          {   'encoder': 'parallel_cnn',
                              'level': 'word',
                              'name': 'item_id',
                              'tied_weights': None,
                              'type': 'text'},
                          {   'encoder': 'parallel_cnn',
                              'name': 'dow',
                              'tied_weights': None,
                              'type': 'numerical'},
                          {   'encoder': 'parallel_cnn',
                              'name': 'hod',
                              'tied_weights': None,
                              'type': 'numerical'},
                          {   'encoder': 'parallel_cnn',
                              'level': 'word',
                              'name': 'category_id',
                              'tied_weights': None,
                              'type': 'text'},
                          {   'encoder': 'parallel_cnn',
                              'level': 'word',
                              'name': 'cusine_id',
                              'tied_weights': None,
                              'type': 'text'},
                          {   'encoder': 'parallel_cnn',
                              'level': 'word',
                              'name': 'restaurant_id',
                              'tied_weights': None,
                              'type': 'text'},
                          {   'encoder': 'parallel_cnn',
                              'name': 'item_count',
                              'tied_weights': None,
                              'type': 'numerical'}],
    'output_features': [   {   'decoder': 'generator',
                               'dependencies': [],
                               'level': 'char',
                               'loss': {   'class_distance_temperature': 0,
                                           'class_weights': 1,
                                           'type': 'softmax_cross_entropy',
                                           'weight': 1},
                               'name': 'item_id',
                               'reduce_dependencies': 'sum',
                               'reduce_input': 'sum',
                               'type': 'text',
                               'weight': 1},
                           {   'decoder': 'generator',
                               'dependencies': [],
                               'level': 'char',
                               'loss': {   'class_distance_temperature': 0,
                                           'class_weights': 1,
                                           'type': 'softmax_cross_entropy',
                                           'weight': 1},
                               'name': 'category_id',
                               'reduce_dependencies': 'sum',
                               'reduce_input': 'sum',
                               'type': 'text',
                               'weight': 1},
                           {   'decoder': 'generator',
                               'dependencies': [],
                               'level': 'char',
                               'loss': {   'class_distance_temperature': 0,
                                           'class_weights': 1,
                                           'type': 'softmax_cross_entropy',
                                           'weight': 1},
                               'name': 'restaurant_id',
                               'reduce_dependencies': 'sum',
                               'reduce_input': 'sum',
                               'type': 'text',
                               'weight': 1}],
    'preprocessing': {   'bag': {   'fill_value': '',
                                    'format': 'space',
                                    'lowercase': 10000,
                                    'missing_value_strategy': 'fill_with_const',
                                    'most_common': False},
                         'binary': {   'fill_value': 0,
                                       'missing_value_strategy': 'fill_with_const'},
                         'category': {   'fill_value': '<UNK>',
                                         'lowercase': False,
                                         'missing_value_strategy': 'fill_with_const',
                                         'most_common': 10000},
                         'force_split': False,
                         'image': {'missing_value_strategy': 'backfill'},
                         'numerical': {   'fill_value': 0,
                                          'missing_value_strategy': 'fill_with_const'},
                         'sequence': {   'fill_value': '',
                                         'format': 'space',
                                         'lowercase': False,
                                         'missing_value_strategy': 'fill_with_const',
                                         'most_common': 20000,
                                         'padding': 'right',
                                         'padding_symbol': '<PAD>',
                                         'sequence_length_limit': 256,
                                         'unknown_symbol': '<UNK>'},
                         'set': {   'fill_value': '',
                                    'format': 'space',
                                    'lowercase': False,
                                    'missing_value_strategy': 'fill_with_const',
                                    'most_common': 10000},
                         'split_probabilities': (0.7, 0.1, 0.2),
                         'stratify': None,
                         'text': {   'char_format': 'characters',
                                     'char_most_common': 70,
                                     'char_sequence_length_limit': 1024,
                                     'fill_value': '',
                                     'lowercase': True,
                                     'missing_value_strategy': 'fill_with_const',
                                     'padding': 'right',
                                     'padding_symbol': '<PAD>',
                                     'unknown_symbol': '<UNK>',
                                     'word_format': 'space_punct',
                                     'word_most_common': 20000,
                                     'word_sequence_length_limit': 256},
                         'timeseries': {   'fill_value': '',
                                           'format': 'space',
                                           'missing_value_strategy': 'fill_with_const',
                                           'padding': 'right',
                                           'padding_value': 0,
                                           'timeseries_length_limit': 256}},
    'training': {   'batch_size': 128,
                    'bucketing_field': None,
                    'decay': False,
                    'decay_rate': 0.96,
                    'decay_steps': 10000,
                    'dropout_rate': 0.0,
                    'early_stop': 3,
                    'epochs': 50,
                    'gradient_clipping': None,
                    'increase_batch_size_on_plateau': 0,
                    'increase_batch_size_on_plateau_max': 512,
                    'increase_batch_size_on_plateau_patience': 5,
                    'increase_batch_size_on_plateau_rate': 2,
                    'learning_rate': 0.001,
                    'learning_rate_warmup_epochs': 5,
                    'optimizer': {   'beta1': 0.9,
                                     'beta2': 0.999,
                                     'epsilon': 1e-08,
                                     'type': 'adam'},
                    'reduce_learning_rate_on_plateau': 0,
                    'reduce_learning_rate_on_plateau_patience': 5,
                    'reduce_learning_rate_on_plateau_rate': 0.5,
                    'regularization_lambda': 0,
                    'regularizer': 'l2',
                    'staircase': False,
                    'validation_field': 'combined',
                    'validation_measure': 'loss'}}
INFO:root:

INFO:root:Using full dataframe
INFO:root:Building dataset (it may take a while)
INFO:root:Loading NLP pipeline
D:\playground\ml-playground\ludwig-test\venv\lib\site-packages\ludwig\features\numerical_feature.py:63: FutureWarning: Method .as_matrix will be removed in a future version. Use .values instead.
  np.float32).as_matrix()
INFO:root:Writing train set metadata with vocabulary
Traceback (most recent call last):
  File "D:/playground/ml-playground/ludwig-test/cnn.py", line 27, in <module>
    main()
  File "D:/playground/ml-playground/ludwig-test/cnn.py", line 23, in main
    train(model)
  File "D:/playground/ml-playground/ludwig-test/cnn.py", line 13, in train
    model.train()
  File "D:\playground\ml-playground\ludwig-test\model.py", line 63, in train
    self.model.train(data_df=self.data_frame, logging_level=logging.INFO, output_directory="../ludwig_model")
  File "D:\playground\ml-playground\ludwig-test\venv\lib\site-packages\ludwig\api.py", line 448, in train
    random_seed=random_seed)
  File "D:\playground\ml-playground\ludwig-test\venv\lib\site-packages\ludwig\data\preprocessing.py", line 561, in preprocess_for_training
    [training_set, validation_set, test_set]
  File "D:\playground\ml-playground\ludwig-test\venv\lib\site-packages\ludwig\data\preprocessing.py", line 769, in replace_text_feature_level
    feature['level']
KeyError: 'item_id_char'

The text was updated successfully, but these errors were encountered:

w4nderlust · 2019-02-17T00:36:18Z

Please try using the latest code from master as the issue was already solved by this 1820ef5 and exposed by Issue #56 . Feel free to reopen if the problem persists.

w4nderlust closed this as completed Feb 17, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KeyError: 'item_id_char' #105

KeyError: 'item_id_char' #105

goldenrati0 commented Feb 16, 2019

w4nderlust commented Feb 17, 2019

KeyError: 'item_id_char' #105

KeyError: 'item_id_char' #105

Comments

goldenrati0 commented Feb 16, 2019

w4nderlust commented Feb 17, 2019