Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KeyError: 'item_id_char' #105

Closed
goldenrati0 opened this issue Feb 16, 2019 · 1 comment
Closed

KeyError: 'item_id_char' #105

goldenrati0 opened this issue Feb 16, 2019 · 1 comment

Comments

@goldenrati0
Copy link

Hello,

My small_dataset.csv looks like this

user_id,item_id,dow,hod,category_id,cusine_id,restaurant_id,item_count
-6088859261566846925,601d4dc5-2ad,7,3,9c64431e-79f,fb8b5b7f-013,706247b3-86e,1
-4662982070704311223,af48a23f-f80,7,2,a989cb37-db5,8631ea96-a78,ea90883c-a2f,1
-4849981472186386714,bab8cfae-930,1,1,4f31b9ea-059,5aa1a680-942,d9ab8a13-380,1
-5400975689533971605,3ae9d553-282,4,0,37d510f8-745,c9aa3993-86f,0b8d021a-c5c,1
4810905585891548302,3d673534-8f8,3,1,4264559d-470,5aa1a680-942,5772716f-1f3,1
8523163246689801258,f5cab6a9-8f5,2,3,60db56cd-21e,2d470ebd-e46,4e4c60a4-bca,1
6141444640923780397,e1d46baf-89c,6,17,b7226a64-569,5aa1a680-942,910396af-1dd,1
7782246672655206426,617f0d50-e63,2,23,68f8231d-6e1,8631ea96-a78,344071ad-459,1
-3808793554476278979,56903221-9d1,5,5,79803e34-c64,e153fa17-ac6,5746c349-5e1,1

My model_definition looks like this (I defined it programatically)

INPUT_COLUMN_NAMES = "user_id, item_id, dow, hod, category_id, cusine_id, restaurant_id, item_count".split(", ")
NUMERIC_COLUMN_NAMES = ["dow", "hod", "item_count"]
OUTPUT_COLUMN_NAMES = "item_id, category_id, restaurant_id".split(", ")

model_definition = {
        "input_features": [
            {
                "name": col_name,
                "type": "numerical" if col_name in NUMERIC_COLUMN_NAMES else "text",
                "encoder": "parallel_cnn"
            } for col_name in INPUT_COLUMN_NAMES
        ],
        "output_features": [
            {
                "name": col_name,
                "type": "numerical" if col_name in NUMERIC_COLUMN_NAMES else "text",
            } for col_name in OUTPUT_COLUMN_NAMES
        ],
        "training": {
            "epochs": 10
        }
    }

When I run the train() function I get the error KeyError: 'item_id_char'

Log

INFO:root:Model name: run
INFO:root:Output path: ../ludwig_model\_run_5
INFO:root:

INFO:root:ludwig_version: '0.1.0'
INFO:root:command: 'D:/playground/ml-playground/ludwig-test/cnn.py'
INFO:root:dataset_type: 'generic'
INFO:root:random_seed: 42
INFO:root:model_definition: {   'combiner': {'type': 'concat'},
    'input_features': [   {   'encoder': 'parallel_cnn',
                              'level': 'word',
                              'name': 'user_id',
                              'tied_weights': None,
                              'type': 'text'},
                          {   'encoder': 'parallel_cnn',
                              'level': 'word',
                              'name': 'item_id',
                              'tied_weights': None,
                              'type': 'text'},
                          {   'encoder': 'parallel_cnn',
                              'name': 'dow',
                              'tied_weights': None,
                              'type': 'numerical'},
                          {   'encoder': 'parallel_cnn',
                              'name': 'hod',
                              'tied_weights': None,
                              'type': 'numerical'},
                          {   'encoder': 'parallel_cnn',
                              'level': 'word',
                              'name': 'category_id',
                              'tied_weights': None,
                              'type': 'text'},
                          {   'encoder': 'parallel_cnn',
                              'level': 'word',
                              'name': 'cusine_id',
                              'tied_weights': None,
                              'type': 'text'},
                          {   'encoder': 'parallel_cnn',
                              'level': 'word',
                              'name': 'restaurant_id',
                              'tied_weights': None,
                              'type': 'text'},
                          {   'encoder': 'parallel_cnn',
                              'name': 'item_count',
                              'tied_weights': None,
                              'type': 'numerical'}],
    'output_features': [   {   'decoder': 'generator',
                               'dependencies': [],
                               'level': 'char',
                               'loss': {   'class_distance_temperature': 0,
                                           'class_weights': 1,
                                           'type': 'softmax_cross_entropy',
                                           'weight': 1},
                               'name': 'item_id',
                               'reduce_dependencies': 'sum',
                               'reduce_input': 'sum',
                               'type': 'text',
                               'weight': 1},
                           {   'decoder': 'generator',
                               'dependencies': [],
                               'level': 'char',
                               'loss': {   'class_distance_temperature': 0,
                                           'class_weights': 1,
                                           'type': 'softmax_cross_entropy',
                                           'weight': 1},
                               'name': 'category_id',
                               'reduce_dependencies': 'sum',
                               'reduce_input': 'sum',
                               'type': 'text',
                               'weight': 1},
                           {   'decoder': 'generator',
                               'dependencies': [],
                               'level': 'char',
                               'loss': {   'class_distance_temperature': 0,
                                           'class_weights': 1,
                                           'type': 'softmax_cross_entropy',
                                           'weight': 1},
                               'name': 'restaurant_id',
                               'reduce_dependencies': 'sum',
                               'reduce_input': 'sum',
                               'type': 'text',
                               'weight': 1}],
    'preprocessing': {   'bag': {   'fill_value': '',
                                    'format': 'space',
                                    'lowercase': 10000,
                                    'missing_value_strategy': 'fill_with_const',
                                    'most_common': False},
                         'binary': {   'fill_value': 0,
                                       'missing_value_strategy': 'fill_with_const'},
                         'category': {   'fill_value': '<UNK>',
                                         'lowercase': False,
                                         'missing_value_strategy': 'fill_with_const',
                                         'most_common': 10000},
                         'force_split': False,
                         'image': {'missing_value_strategy': 'backfill'},
                         'numerical': {   'fill_value': 0,
                                          'missing_value_strategy': 'fill_with_const'},
                         'sequence': {   'fill_value': '',
                                         'format': 'space',
                                         'lowercase': False,
                                         'missing_value_strategy': 'fill_with_const',
                                         'most_common': 20000,
                                         'padding': 'right',
                                         'padding_symbol': '<PAD>',
                                         'sequence_length_limit': 256,
                                         'unknown_symbol': '<UNK>'},
                         'set': {   'fill_value': '',
                                    'format': 'space',
                                    'lowercase': False,
                                    'missing_value_strategy': 'fill_with_const',
                                    'most_common': 10000},
                         'split_probabilities': (0.7, 0.1, 0.2),
                         'stratify': None,
                         'text': {   'char_format': 'characters',
                                     'char_most_common': 70,
                                     'char_sequence_length_limit': 1024,
                                     'fill_value': '',
                                     'lowercase': True,
                                     'missing_value_strategy': 'fill_with_const',
                                     'padding': 'right',
                                     'padding_symbol': '<PAD>',
                                     'unknown_symbol': '<UNK>',
                                     'word_format': 'space_punct',
                                     'word_most_common': 20000,
                                     'word_sequence_length_limit': 256},
                         'timeseries': {   'fill_value': '',
                                           'format': 'space',
                                           'missing_value_strategy': 'fill_with_const',
                                           'padding': 'right',
                                           'padding_value': 0,
                                           'timeseries_length_limit': 256}},
    'training': {   'batch_size': 128,
                    'bucketing_field': None,
                    'decay': False,
                    'decay_rate': 0.96,
                    'decay_steps': 10000,
                    'dropout_rate': 0.0,
                    'early_stop': 3,
                    'epochs': 50,
                    'gradient_clipping': None,
                    'increase_batch_size_on_plateau': 0,
                    'increase_batch_size_on_plateau_max': 512,
                    'increase_batch_size_on_plateau_patience': 5,
                    'increase_batch_size_on_plateau_rate': 2,
                    'learning_rate': 0.001,
                    'learning_rate_warmup_epochs': 5,
                    'optimizer': {   'beta1': 0.9,
                                     'beta2': 0.999,
                                     'epsilon': 1e-08,
                                     'type': 'adam'},
                    'reduce_learning_rate_on_plateau': 0,
                    'reduce_learning_rate_on_plateau_patience': 5,
                    'reduce_learning_rate_on_plateau_rate': 0.5,
                    'regularization_lambda': 0,
                    'regularizer': 'l2',
                    'staircase': False,
                    'validation_field': 'combined',
                    'validation_measure': 'loss'}}
INFO:root:

INFO:root:Using full dataframe
INFO:root:Building dataset (it may take a while)
INFO:root:Loading NLP pipeline
D:\playground\ml-playground\ludwig-test\venv\lib\site-packages\ludwig\features\numerical_feature.py:63: FutureWarning: Method .as_matrix will be removed in a future version. Use .values instead.
  np.float32).as_matrix()
INFO:root:Writing train set metadata with vocabulary
Traceback (most recent call last):
  File "D:/playground/ml-playground/ludwig-test/cnn.py", line 27, in <module>
    main()
  File "D:/playground/ml-playground/ludwig-test/cnn.py", line 23, in main
    train(model)
  File "D:/playground/ml-playground/ludwig-test/cnn.py", line 13, in train
    model.train()
  File "D:\playground\ml-playground\ludwig-test\model.py", line 63, in train
    self.model.train(data_df=self.data_frame, logging_level=logging.INFO, output_directory="../ludwig_model")
  File "D:\playground\ml-playground\ludwig-test\venv\lib\site-packages\ludwig\api.py", line 448, in train
    random_seed=random_seed)
  File "D:\playground\ml-playground\ludwig-test\venv\lib\site-packages\ludwig\data\preprocessing.py", line 561, in preprocess_for_training
    [training_set, validation_set, test_set]
  File "D:\playground\ml-playground\ludwig-test\venv\lib\site-packages\ludwig\data\preprocessing.py", line 769, in replace_text_feature_level
    feature['level']
KeyError: 'item_id_char'
@w4nderlust
Copy link
Collaborator

Please try using the latest code from master as the issue was already solved by this 1820ef5 and exposed by Issue #56 . Feel free to reopen if the problem persists.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants