Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

I encountered key error by using my own data set #333

Closed
PaulZhangIsing opened this issue Jan 4, 2019 · 15 comments
Closed

I encountered key error by using my own data set #333

PaulZhangIsing opened this issue Jan 4, 2019 · 15 comments

Comments

@PaulZhangIsing
Copy link

PaulZhangIsing commented Jan 4, 2019

We should not do the things described below, otherwise it shall yield very wierd result, as only few data are passing into processing.,

And we shall take a look at data processing, ensure that text_a and label are correctly passed into

and do this at create examples

def _create_examples(self, lines, set_type):
    """Creates examples for the training and dev sets."""
    examples = []
    for (i, line) in enumerate(lines):
        if i == 0:
            continue
        guid = "%s-%s" % (set_type, i)
        label = tokenization.convert_to_unicode(line[0])
        text_a = tokenization.convert_to_unicode(line[1])
        # text_b = tokenization.convert_to_unicode(line[2])
        examples.append(
            InputExample(guid=guid, text_a=text_a, text_b=None, label=label))
    random.shuffle(examples)
    return examples

I have encountered some key errors, similar to previous issue I suppose, but I did the same thing accordingly, I didn't manage to solve it.

The following are the things shown on screen


/Paul $ python run_classifier.py   --task_name=bosco   --do_train=true    --do_eval=true    --dopredict=true   --data_dir=$MY_DATASET    --vocab_file=$BERT_BASE_DIR/vocab.txt    --bert_config_file=$BERT_BASE_DIR/bert_config.json    --init_checkpoint=$BERT_BASE_DIR/bert_model.ckpt    --max_seq_length=128   --train_batch_size=32   --learning_rate=5e-5  --num_train_epochs=50.0      --output_dir=.data/output
WARNING:tensorflow:Estimator's model_fn (<function model_fn_builder.<locals>.model_fn at 0x7fb390f0ed90>) includes params argument, but params are not passed to Estimator.
INFO:tensorflow:Using config: {'_model_dir': '.data/bosco_output', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': 1000, '_save_checkpoints_secs': None, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': None, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7fb38387a5f8>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1, '_tpu_config': TPUConfig(iterations_per_loop=1000, num_shards=8, num_cores_per_replica=None, per_host_input_for_training=3, tpu_job_name=None, initial_infeed_sleep_secs=None, input_partition_dims=None), '_cluster': None}
INFO:tensorflow:_TPUContext: eval_on_tpu True
WARNING:tensorflow:eval_on_tpu ignored because use_tpu is False.
INFO:tensorflow:Writing example 0 of 29206
Traceback (most recent call last):
  File "run_classifier.py", line 1010, in <module>
    tf.app.run()
  File "/home/yuwei/anaconda2/envs/py36/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 125, in run
    _sys.exit(main(argv))
  File "run_classifier.py", line 899, in main
    train_examples, label_list, FLAGS.max_seq_length, tokenizer, train_file)
  File "run_classifier.py", line 518, in file_based_convert_examples_to_features
    max_seq_length, tokenizer)
  File "run_classifier.py", line 487, in convert_single_example
    label_id = label_map[example.label]
KeyError: 'Quality'
@PaulZhangIsing
Copy link
Author

PaulZhangIsing commented Jan 8, 2019

Seems it is similar to issue #80

@PaulZhangIsing
Copy link
Author

Seems there is some bug on line 489 of run_classifier.py.
I added a tab before features and everything is fine

@mice4869
Copy link

how did you solve this problem? I am not sure which line is line 489 @PaulZhangIsing

@PaulZhangIsing
Copy link
Author

PaulZhangIsing commented Jan 10, 2019

how did you solve this problem? I am not sure which line is line 489 @PaulZhangIsing

about here, I make feature line into the if loop
in the file, run_classifier.py , line 489

for (ex_index, example) in enumerate(examples):
if ex_index % 10000 == 0:
tf.logging.info("Writing example %d of %d" % (ex_index, len(examples)))

feature = convert_single_example(ex_index, example, label_list,

max_seq_length, tokenizer)

@mice4869
Copy link

Im sorry but would you mind share you code of run_classfier.py to me? I have been stucked in this problem for several days. And if you dont mind, I`d like to see your data set too. I have some worries of my data set. @PaulZhangIsing . My email is zhanghaord@163.com

@PaulZhangIsing
Copy link
Author

Im sorry but would you mind share you code of run_classfier.py to me? I have been stucked in this problem for several days. And if you dont mind, I`d like to see your data set too. I have some worries of my data set. @PaulZhangIsing . My email is zhanghaord@163.com

Sorry I unable to do so. But you can send yours to me and I try to edit on it and send it back to u?

@mice4869
Copy link

send my dataset and code to you,please check your email. @PaulZhangIsing .

@PaulZhangIsing
Copy link
Author

PaulZhangIsing commented Jan 11, 2019 via email

@mice4869
Copy link

it`s ok. I got this problem fixed. Thank you man@PaulZhanglsing

@PaulZhangIsing
Copy link
Author

PaulZhangIsing commented Jan 11, 2019 via email

@Danysolism
Copy link

Hi, I'm having the same problem. I added a tab in line 489 but I am still getting the same error. How did you solved it?

@PaulZhangIsing
Copy link
Author

Hi, I'm having the same problem. I added a tab in line 489 but I am still getting the same error. How did you solved it?

You should take a look at data processor instead.

@AsafBanana
Copy link

I manged to solved it by adding:
[CLS]
[SEP]
[UNK]
[MASK]
To my vocab file.

@rjurney
Copy link

rjurney commented Oct 11, 2019

@AsafBanana I have done this for [CLS] and [SEP] but I still get: KeyError: '[CLS]'

@jasondennis
Copy link

actually, the main problem is when you solve the problem such as issue #717

df_train = pd.read_csv("data/train.tsv", header =None, sep="\t", encoding = "UTF-8", quotechar='"')
df_bert_train = pd.DataFrame({'0':df_train[0],
'1':df_train[1],
'2':df_train[2],
'3':df_train[3],
'4':df_train[4].replace(r'\n',' ',regex=True)})
df_bert_train.to_csv('data/train.tsv', sep='\t', index=False, header=False, encoding="UTF-8")

header=False ~is the key

open your tsv file and check the header
maybe you will solve the problem
thats what i have done to fix it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants