I encountered key error by using my own data set #333

PaulZhangIsing · 2019-01-04T06:23:00Z

We should not do the things described below, otherwise it shall yield very wierd result, as only few data are passing into processing.,

And we shall take a look at data processing, ensure that text_a and label are correctly passed into

and do this at create examples

def _create_examples(self, lines, set_type):
    """Creates examples for the training and dev sets."""
    examples = []
    for (i, line) in enumerate(lines):
        if i == 0:
            continue
        guid = "%s-%s" % (set_type, i)
        label = tokenization.convert_to_unicode(line[0])
        text_a = tokenization.convert_to_unicode(line[1])
        # text_b = tokenization.convert_to_unicode(line[2])
        examples.append(
            InputExample(guid=guid, text_a=text_a, text_b=None, label=label))
    random.shuffle(examples)
    return examples


I have encountered some key errors, similar to previous issue I suppose, but I did the same thing accordingly, I didn't manage to solve it.

The following are the things shown on screen


/Paul $ python run_classifier.py   --task_name=bosco   --do_train=true    --do_eval=true    --dopredict=true   --data_dir=$MY_DATASET    --vocab_file=$BERT_BASE_DIR/vocab.txt    --bert_config_file=$BERT_BASE_DIR/bert_config.json    --init_checkpoint=$BERT_BASE_DIR/bert_model.ckpt    --max_seq_length=128   --train_batch_size=32   --learning_rate=5e-5  --num_train_epochs=50.0      --output_dir=.data/output
WARNING:tensorflow:Estimator's model_fn (<function model_fn_builder.<locals>.model_fn at 0x7fb390f0ed90>) includes params argument, but params are not passed to Estimator.
INFO:tensorflow:Using config: {'_model_dir': '.data/bosco_output', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': 1000, '_save_checkpoints_secs': None, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': None, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7fb38387a5f8>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1, '_tpu_config': TPUConfig(iterations_per_loop=1000, num_shards=8, num_cores_per_replica=None, per_host_input_for_training=3, tpu_job_name=None, initial_infeed_sleep_secs=None, input_partition_dims=None), '_cluster': None}
INFO:tensorflow:_TPUContext: eval_on_tpu True
WARNING:tensorflow:eval_on_tpu ignored because use_tpu is False.
INFO:tensorflow:Writing example 0 of 29206
Traceback (most recent call last):
  File "run_classifier.py", line 1010, in <module>
    tf.app.run()
  File "/home/yuwei/anaconda2/envs/py36/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 125, in run
    _sys.exit(main(argv))
  File "run_classifier.py", line 899, in main
    train_examples, label_list, FLAGS.max_seq_length, tokenizer, train_file)
  File "run_classifier.py", line 518, in file_based_convert_examples_to_features
    max_seq_length, tokenizer)
  File "run_classifier.py", line 487, in convert_single_example
    label_id = label_map[example.label]
KeyError: 'Quality'

The text was updated successfully, but these errors were encountered:

PaulZhangIsing · 2019-01-08T07:29:19Z

Seems it is similar to issue #80

PaulZhangIsing · 2019-01-09T03:49:55Z

Seems there is some bug on line 489 of run_classifier.py.
I added a tab before features and everything is fine

mice4869 · 2019-01-10T05:36:32Z

how did you solve this problem? I am not sure which line is line 489 @PaulZhangIsing

PaulZhangIsing · 2019-01-10T06:26:48Z

how did you solve this problem? I am not sure which line is line 489 @PaulZhangIsing

about here, I make feature line into the if loop
in the file, run_classifier.py , line 489

for (ex_index, example) in enumerate(examples):
if ex_index % 10000 == 0:
tf.logging.info("Writing example %d of %d" % (ex_index, len(examples)))

feature = convert_single_example(ex_index, example, label_list,

max_seq_length, tokenizer)

mice4869 · 2019-01-10T06:34:58Z

Im sorry but would you mind share you code of run_classfier.py to me? I have been stucked in this problem for several days. And if you dont mind, I`d like to see your data set too. I have some worries of my data set. @PaulZhangIsing . My email is zhanghaord@163.com

PaulZhangIsing · 2019-01-10T08:51:01Z

Im sorry but would you mind share you code of run_classfier.py to me? I have been stucked in this problem for several days. And if you dont mind, I`d like to see your data set too. I have some worries of my data set. @PaulZhangIsing . My email is zhanghaord@163.com

Sorry I unable to do so. But you can send yours to me and I try to edit on it and send it back to u?

mice4869 · 2019-01-10T08:55:13Z

send my dataset and code to you,please check your email. @PaulZhangIsing .

PaulZhangIsing · 2019-01-11T06:42:07Z

I need this weekend to check as my company currently unable to connect to outlook

…

Sent from my iPhone On 10 Jan 2019, at 16:55, mice4869 <notifications@github.com<mailto:notifications@github.com>> wrote: send my dataset and code to you,please check your email. @PaulZhangIsing<https://github.com/PaulZhangIsing> . — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub<#333 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/Aab_R5Z8eLNWYFSyW5U0Glc9dW6coVYlks5vBwAEgaJpZM4ZpNAN>.

mice4869 · 2019-01-11T06:45:53Z

it`s ok. I got this problem fixed. Thank you man@PaulZhanglsing

PaulZhangIsing · 2019-01-11T08:51:12Z

So basically what have u done?

…

Sent from my iPhone On 11 Jan 2019, at 14:46, mice4869 <notifications@github.com<mailto:notifications@github.com>> wrote: it`s ok. I got this problem fixed. Thank you man@PaulZhanglsing — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub<#333 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/Aab_R9YgkVPb8jCZN96E3GjlYjKLRdh3ks5vCDM6gaJpZM4ZpNAN>.

Danysolism · 2019-02-05T16:04:19Z

Hi, I'm having the same problem. I added a tab in line 489 but I am still getting the same error. How did you solved it?

PaulZhangIsing · 2019-02-21T01:08:26Z

Hi, I'm having the same problem. I added a tab in line 489 but I am still getting the same error. How did you solved it?

You should take a look at data processor instead.

AsafBanana · 2019-07-21T17:04:48Z

I manged to solved it by adding:
[CLS]
[SEP]
[UNK]
[MASK]
To my vocab file.

rjurney · 2019-10-11T21:27:00Z

@AsafBanana I have done this for [CLS] and [SEP] but I still get: KeyError: '[CLS]'

jasondennis · 2021-05-18T14:37:05Z

actually, the main problem is when you solve the problem such as issue #717

df_train = pd.read_csv("data/train.tsv", header =None, sep="\t", encoding = "UTF-8", quotechar='"')
df_bert_train = pd.DataFrame({'0':df_train[0],
'1':df_train[1],
'2':df_train[2],
'3':df_train[3],
'4':df_train[4].replace(r'\n',' ',regex=True)})
df_bert_train.to_csv('data/train.tsv', sep='\t', index=False, header=False, encoding="UTF-8")

header=False ~is the key

open your tsv file and check the header
maybe you will solve the problem
thats what i have done to fix it

PaulZhangIsing closed this as completed Jan 9, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

I encountered key error by using my own data set #333

I encountered key error by using my own data set #333

PaulZhangIsing commented Jan 4, 2019 •

edited

PaulZhangIsing commented Jan 8, 2019 •

edited

PaulZhangIsing commented Jan 9, 2019

mice4869 commented Jan 10, 2019

PaulZhangIsing commented Jan 10, 2019 •

edited

mice4869 commented Jan 10, 2019

PaulZhangIsing commented Jan 10, 2019

mice4869 commented Jan 10, 2019

PaulZhangIsing commented Jan 11, 2019 via email

mice4869 commented Jan 11, 2019

PaulZhangIsing commented Jan 11, 2019 via email

Danysolism commented Feb 5, 2019

PaulZhangIsing commented Feb 21, 2019

AsafBanana commented Jul 21, 2019

rjurney commented Oct 11, 2019

jasondennis commented May 18, 2021

I encountered key error by using my own data set #333

I encountered key error by using my own data set #333

Comments

PaulZhangIsing commented Jan 4, 2019 • edited

PaulZhangIsing commented Jan 8, 2019 • edited

PaulZhangIsing commented Jan 9, 2019

mice4869 commented Jan 10, 2019

PaulZhangIsing commented Jan 10, 2019 • edited

mice4869 commented Jan 10, 2019

PaulZhangIsing commented Jan 10, 2019

mice4869 commented Jan 10, 2019

PaulZhangIsing commented Jan 11, 2019 via email

mice4869 commented Jan 11, 2019

PaulZhangIsing commented Jan 11, 2019 via email

Danysolism commented Feb 5, 2019

PaulZhangIsing commented Feb 21, 2019

AsafBanana commented Jul 21, 2019

rjurney commented Oct 11, 2019

jasondennis commented May 18, 2021

PaulZhangIsing commented Jan 4, 2019 •

edited

PaulZhangIsing commented Jan 8, 2019 •

edited

PaulZhangIsing commented Jan 10, 2019 •

edited