Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

attribute description added to extracted info in Chinese #152

Closed
ihorizons2022 opened this issue May 4, 2023 · 5 comments
Closed

attribute description added to extracted info in Chinese #152

ihorizons2022 opened this issue May 4, 2023 · 5 comments
Assignees
Labels
bug Something isn't working

Comments

@ihorizons2022
Copy link

code snippet:

schema = Object(
id="post",
description=(
'''
社交媒体博主在社交媒体上发布的脚本
'''
),
attributes=[
Text(
id="ingredient",
description="化妆品的原料和成分",
examples=[],
many=True,
),
Text(
id="function",
description="产品能够起到的作用",
examples=[
],
many=True,
),
Text(
id="brand",
description="文案中的化妆品品牌",
examples=[],
many=True,
),
Text(
id="product",
description="宣传的化妆品产品",
examples=[],
many=True,
),
Text(
id="skin",
description="皮肤的类型和状态",
examples=[
],
many=True,
),
Text(
id="target",
description="品牌或者产品适用的用户人群",
examples=[],
many=True,
),
Text(
id="feeling",
description="使用化妆品后的个人感受",
examples=[],
many=True,
),
Text(
id="scene",
description="适合使用化妆品的地点,气候,节日,季节,场合等",
examples=[],
many=True,
),
Text(
id="promotion",
description="产品促销信息",
examples=[],
many=True,
),
Text(
id="special",
description="产品的优势和特点",
examples=[
],
many=True,
),
Text(
id="category",
description="化妆品所属的品类",
examples=[
("第二 有一支好的防晒霜", '防晒霜')
],
many=True,
)
],
many=False
)

but the output likes:
{'post': {'brand': ['ZOTO'],
'product': ['防晒霜'],
'function': ['防晒'],
'skin': ['皮肤类型和状态'],
'target': ['用户人群'],
'feeling': ['使用化妆品后的个人感受'],
'scene': ['适合使用化妆品的地点,气候,节日,季节,场合等'],
'promotion': ['产品促销信息'],
'category': ['防晒霜']}}

actually, '皮肤类型和状态' is attribute description not extracted info

@ihorizons2022
Copy link
Author

ihorizons2022 commented May 4, 2023

if provide examples in Chinese, the result will be messy
seems Chinese characters translated to Unicode in final prompt

image

@eyurtsev
Copy link
Owner

eyurtsev commented May 5, 2023

@ihorizons2022 Thanks for filing the issue going to take a look how to handle!

@eyurtsev eyurtsev self-assigned this May 5, 2023
@eyurtsev eyurtsev added the bug Something isn't working label May 5, 2023
@eyurtsev
Copy link
Owner

eyurtsev commented May 7, 2023

Fix merged into main

@eyurtsev
Copy link
Owner

eyurtsev commented May 7, 2023

@ihorizons2022 I changed the default behavior of the JSON encoder to avoid encoding in ASCII. So original characters should be preserved as are for the LLM to see.

Could you let me know if this helps and/or if you see any other issues associated with extraction when working with text in Chinese?

It should be sufficient for you to bump the library version to 0.9.2.

@ihorizons2022
Copy link
Author

thank you for the quick response.
close the issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants