Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

from_dataset ignore null instance #454

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

MorningForest
Copy link
Collaborator

Description:简要描述这次PR的内容
vocaburary的from_datasets支持某个instance存在空字符串。会跳过并打印warning信息。增加对应的测试用例

Main reason: 做出这次修改的原因

Checklist 检查下面各项是否完成

Please feel free to remove inapplicable items for your PR.

  • The PR title starts with [$CATEGORY] (例如[bugfix]修复bug,[new]添加新功能,[test]修改测试,[rm]删除旧代码)
  • Changes are complete (i.e. I finished coding on this PR) 修改完成才提PR
  • All changes have test coverage 修改的部分顺利通过测试。对于fastnlp/fastnlp/的修改,测试代码必须提供在fastnlp/test/
  • Code is well-documented 注释写好,API文档会从注释中抽取
  • To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change 修改导致例子或tutorial有变化,请找核心开发人员

Changes: 逐项描述修改的内容

Mention: 找人review你的PR

@修改过这个文件的人
@核心开发人员

@@ -396,6 +396,10 @@ def from_dataset(self, *datasets, field_name:Union[str,List[str]], no_create_ent
def construct_vocab(ins, no_create_entry=False):
for fn in field_name:
field = ins[fn]
# 如果 field 为空或者 None, 那么直接跳过即可。
if field is None or len(field) == 0:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个地方有风险,假设field是数字就G了,数字无法len。可能需要先判断 hasattr(bar, 'len')

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants