-
Notifications
You must be signed in to change notification settings - Fork 178
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix Bug: KeyError: 'text' Corresponding to issue #296 #300
Conversation
File data_juice/config/config.py lines 418-429 did not consider the situation when arg: text_key was initialized to 'text', resulting in arg: text_key not being updated properly and always being initialized to the value of 'text'
Original Code:
Fixed:
|
Hi @shiweijiezero , thanks for your contribution! You're correct👍🏻. According to your report and modification, we found out that a previous PR #165 made the code snippet useless because args for each OP are initialized by the default args and command line args before. data-juicer/data_juicer/config/config.py Lines 418 to 434 in 094440b
In the data-juicer config modules, args of OPs are expected to be overwritten in this priority order: default args -> config args -> command line args. Following this rule and based on your modification, we suggest you continue to fix this bug completely by:
......
else:
if 'text_key' not in args or args['text_key'] is None:
args['text_key'] = text_key
if 'image_key' not in args or args['image_key'] is None:
args['image_key'] = cfg.image_key
if 'audio_key' not in args or args['audio_key'] is None:
args['audio_key'] = cfg.audio_key
if 'video_key' not in args or args['video_key'] is None:
args['video_key'] = cfg.video_key
op[op_name] = args
...... Thanks for your contribution again! Feel free to discuss the fix suggestions with us if you have any further considerations! |
Yeah, |
One last thing, you might need to apply pre-commit checking according to coding style doc. |
Normalize Format
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice. Thanks for your contribution!
Solution for issue #296
File data_juice/config/config.py lines 418-429 did not consider the situation when arg: text_key was initialized to 'text', resulting in arg: text_key not being updated properly and always being initialized to the value of 'text'
Thus, I just supplement the set of "text_key".