Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some questions about data processing #6

Closed
Orange2107 opened this issue May 6, 2024 · 5 comments
Closed

Some questions about data processing #6

Orange2107 opened this issue May 6, 2024 · 5 comments

Comments

@Orange2107
Copy link

Orange2107 commented May 6, 2024

Hi,
It is a meaningful work : )
I have some questions.
The "text" in each subject is like: [[Findings1,Impression1],[Findings2,Impression2][Findings3,Impression3]...[Findings,Impression]].
Is this correct?
I have seen in the before Issues, it said if findings is missing and just take impression as a list [impression]. But actually when I try it, the "back_translation.py" showes wrong that "ValueError: not enough values to unpack (expected 2, got 1)" which means "list: [impression]" is incorrect.
Thank you.

@kihyunU
Copy link
Collaborator

kihyunU commented May 7, 2024

I think your text column is not a problem. Can you share the whole error messages?

@Orange2107
Copy link
Author

Tanks for your reply.
This is the error:

Traceback (most recent call last):
File "back_translation.py", line 136, in
backtranslation.do_back_translation(
File "back_translation.py", line 88, in do_back_translation
for batch in tqdm(dataloader):
File "/home/comp/miniconda3/envs/cxr-clip/lib/python3.8/site-packages/tqdm/std.py", line 1178, in iter
for obj in iterable:
File "/home/comp/miniconda3/envs/cxr-clip/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 652, in next
data = self._next_data()
File "/home/comp/miniconda3/envs/cxr-clip/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1347, in _next_data
return self._process_data(data)
File "/home/comp/miniconda3/envs/cxr-clip/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1373, in _process_data
data.reraise()
File "/home/comp/miniconda3/envs/cxr-clip/lib/python3.8/site-packages/torch/_utils.py", line 461, in reraise
raise exception
ValueError: Caught ValueError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/home/comp/miniconda3/envs/cxr-clip/lib/python3.8/site-packages/torch/utils/data/_utils/worker.py", line 302, in _worker_loop
data = fetcher.fetch(index)
File "/home/comp/miniconda3/envs/cxr-clip/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 52, in fetch
return self.collate_fn(data)
File "back_translation.py", line 56, in collate_fn
tokens = self.tokenizer(instances, return_tensors="pt", padding=True)
File "/home/comp/miniconda3/envs/cxr-clip/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 2523, in call
encodings = self._call_one(text=text, text_pair=text_pair, **all_kwargs)
File "/home/comp/miniconda3/envs/cxr-clip/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 2609, in _call_one
return self.batch_encode_plus(
File "/home/comp/miniconda3/envs/cxr-clip/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 2800, in batch_encode_plus
return self._batch_encode_plus(
File "/home/compt/miniconda3/envs/cxr-clip/lib/python3.8/site-packages/transformers/tokenization_utils.py", line 731, in _batch_encode_plus
ids, pair_ids = ids_or_pair_ids
ValueError: not enough values to unpack (expected 2, got 1)

If I keep the lack text as "", this error is not be happended.

@kihyunU
Copy link
Collaborator

kihyunU commented May 7, 2024

What about "instances" (in line 55) at the time of the error?

@Orange2107
Copy link
Author

Maybe something wrongs in my process code.
By the way, If the findings and impression are all be null, what this text list [] should be?
Thank you

@kihyunU
Copy link
Collaborator

kihyunU commented May 7, 2024

No we deleted sample if there is no findings or impression.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants