Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

馃悰 Trying to use ROUGE metric : pyarrow.lib.ArrowInvalid: Column 1 named references expected length 534 but got length 323 #116

Closed
astariul opened this issue May 15, 2020 · 5 comments 路 Fixed by #212
Labels
metric bug A bug in a metric script

Comments

@astariul
Copy link

I'm trying to use rouge metric.

I have to files : test.pred.tokenized and test.gold.tokenized with each line containing a sentence.
I tried :

import nlp

rouge = nlp.load_metric('rouge')
with open("test.pred.tokenized") as p, open("test.gold.tokenized") as g:
    for lp, lg in zip(p, g):
            rouge.add(lp, lg)

But I meet following error :

pyarrow.lib.ArrowInvalid: Column 1 named references expected length 534 but got length 323


Full stack-trace :

Traceback (most recent call last):
  File "<stdin>", line 3, in <module>
  File "/home/me/.venv/transformers/lib/python3.6/site-packages/nlp/metric.py", line 224, in add
    self.writer.write_batch(batch)
  File "/home/me/.venv/transformers/lib/python3.6/site-packages/nlp/arrow_writer.py", line 148, in write_batch
    pa_table: pa.Table = pa.Table.from_pydict(batch_examples, schema=self._schema)
  File "pyarrow/table.pxi", line 1550, in pyarrow.lib.Table.from_pydict
  File "pyarrow/table.pxi", line 1503, in pyarrow.lib.Table.from_arrays
  File "pyarrow/public-api.pxi", line 390, in pyarrow.lib.pyarrow_wrap_table
  File "pyarrow/error.pxi", line 85, in pyarrow.lib.check_status
pyarrow.lib.ArrowInvalid: Column 1 named references expected length 534 but got length 323

(nlp installed from source)

@astariul astariul changed the title Trying to use ROUGE metric : pyarrow.lib.ArrowInvalid: Column 1 named references expected length 534 but got length 323 馃悰 Trying to use ROUGE metric : pyarrow.lib.ArrowInvalid: Column 1 named references expected length 534 but got length 323 May 15, 2020
@thomwolf
Copy link
Member

Can you share your data files or a minimally reproducible example?

@thomwolf thomwolf added the metric bug A bug in a metric script label May 17, 2020
@astariul
Copy link
Author

Sure, here is a Colab notebook reproducing the error.

ArrowInvalid: Column 1 named references expected length 36 but got length 56

@lhoestq
Copy link
Member

lhoestq commented May 28, 2020

This is because add takes as input a batch of elements and you provided only one. I think we should have add for one prediction/reference and add_batch for a batch of predictions/references. This would make it more coherent with the way we use Arrow.

Let me do this change

@lhoestq
Copy link
Member

lhoestq commented May 28, 2020

Thanks for noticing though. I was mainly used to do .compute directly ^^

@astariul
Copy link
Author

Thanks @lhoestq it works :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
metric bug A bug in a metric script
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants