🐛 Trying to use ROUGE metric : pyarrow.lib.ArrowInvalid: Column 1 named references expected length 534 but got length 323 #116

astariul · 2020-05-15T01:12:06Z

I'm trying to use rouge metric.

I have to files : test.pred.tokenized and test.gold.tokenized with each line containing a sentence.
I tried :

import nlp

rouge = nlp.load_metric('rouge')
with open("test.pred.tokenized") as p, open("test.gold.tokenized") as g:
    for lp, lg in zip(p, g):
            rouge.add(lp, lg)

But I meet following error :

pyarrow.lib.ArrowInvalid: Column 1 named references expected length 534 but got length 323

Full stack-trace :

Traceback (most recent call last):
  File "<stdin>", line 3, in <module>
  File "/home/me/.venv/transformers/lib/python3.6/site-packages/nlp/metric.py", line 224, in add
    self.writer.write_batch(batch)
  File "/home/me/.venv/transformers/lib/python3.6/site-packages/nlp/arrow_writer.py", line 148, in write_batch
    pa_table: pa.Table = pa.Table.from_pydict(batch_examples, schema=self._schema)
  File "pyarrow/table.pxi", line 1550, in pyarrow.lib.Table.from_pydict
  File "pyarrow/table.pxi", line 1503, in pyarrow.lib.Table.from_arrays
  File "pyarrow/public-api.pxi", line 390, in pyarrow.lib.pyarrow_wrap_table
  File "pyarrow/error.pxi", line 85, in pyarrow.lib.check_status
pyarrow.lib.ArrowInvalid: Column 1 named references expected length 534 but got length 323

(nlp installed from source)

The text was updated successfully, but these errors were encountered:

thomwolf · 2020-05-15T07:42:01Z

Can you share your data files or a minimally reproducible example?

astariul · 2020-05-17T22:21:47Z

Sure, here is a Colab notebook reproducing the error.

ArrowInvalid: Column 1 named references expected length 36 but got length 56

lhoestq · 2020-05-28T13:37:03Z

This is because add takes as input a batch of elements and you provided only one. I think we should have add for one prediction/reference and add_batch for a batch of predictions/references. This would make it more coherent with the way we use Arrow.

Let me do this change

lhoestq · 2020-05-28T13:37:39Z

Thanks for noticing though. I was mainly used to do .compute directly ^^

astariul · 2020-05-28T23:43:07Z

Thanks @lhoestq it works :)

astariul changed the title ~~Trying to use ROUGE metric : pyarrow.lib.ArrowInvalid: Column 1 named references expected length 534 but got length 323~~ 🐛 Trying to use ROUGE metric : pyarrow.lib.ArrowInvalid: Column 1 named references expected length 534 but got length 323 May 15, 2020

thomwolf added the metric bug A bug in a metric script label May 17, 2020

lhoestq mentioned this issue May 28, 2020

have 'add' and 'add_batch' for metrics #212

Merged

astariul closed this as completed May 28, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🐛 Trying to use ROUGE metric : pyarrow.lib.ArrowInvalid: Column 1 named references expected length 534 but got length 323 #116

🐛 Trying to use ROUGE metric : pyarrow.lib.ArrowInvalid: Column 1 named references expected length 534 but got length 323 #116

astariul commented May 15, 2020

thomwolf commented May 15, 2020

astariul commented May 17, 2020

lhoestq commented May 28, 2020

lhoestq commented May 28, 2020

astariul commented May 28, 2020

🐛 Trying to use ROUGE metric : pyarrow.lib.ArrowInvalid: Column 1 named references expected length 534 but got length 323 #116

🐛 Trying to use ROUGE metric : pyarrow.lib.ArrowInvalid: Column 1 named references expected length 534 but got length 323 #116

Comments

astariul commented May 15, 2020

thomwolf commented May 15, 2020

astariul commented May 17, 2020

lhoestq commented May 28, 2020

lhoestq commented May 28, 2020

astariul commented May 28, 2020