attribute description added to extracted info in Chinese #152

ihorizons2022 · 2023-05-04T12:59:14Z

code snippet:

schema = Object(
id="post",
description=(
'''
社交媒体博主在社交媒体上发布的脚本
'''
),
attributes=[
Text(
id="ingredient",
description="化妆品的原料和成分",
examples=[],
many=True,
),
Text(
id="function",
description="产品能够起到的作用",
examples=[
],
many=True,
),
Text(
id="brand",
description="文案中的化妆品品牌",
examples=[],
many=True,
),
Text(
id="product",
description="宣传的化妆品产品",
examples=[],
many=True,
),
Text(
id="skin",
description="皮肤的类型和状态",
examples=[
],
many=True,
),
Text(
id="target",
description="品牌或者产品适用的用户人群",
examples=[],
many=True,
),
Text(
id="feeling",
description="使用化妆品后的个人感受",
examples=[],
many=True,
),
Text(
id="scene",
description="适合使用化妆品的地点，气候，节日，季节，场合等",
examples=[],
many=True,
),
Text(
id="promotion",
description="产品促销信息",
examples=[],
many=True,
),
Text(
id="special",
description="产品的优势和特点",
examples=[
],
many=True,
),
Text(
id="category",
description="化妆品所属的品类",
examples=[
("第二有一支好的防晒霜", '防晒霜')
],
many=True,
)
],
many=False
)

but the output likes:
{'post': {'brand': ['ZOTO'],
'product': ['防晒霜'],
'function': ['防晒'],
'skin': ['皮肤类型和状态'],
'target': ['用户人群'],
'feeling': ['使用化妆品后的个人感受'],
'scene': ['适合使用化妆品的地点，气候，节日，季节，场合等'],
'promotion': ['产品促销信息'],
'category': ['防晒霜']}}

actually, '皮肤类型和状态' is attribute description not extracted info

ihorizons2022 · 2023-05-04T13:50:44Z

if provide examples in Chinese, the result will be messy
seems Chinese characters translated to Unicode in final prompt

eyurtsev · 2023-05-05T16:37:23Z

@ihorizons2022 Thanks for filing the issue going to take a look how to handle!

eyurtsev · 2023-05-07T19:14:26Z

Fix merged into main

eyurtsev · 2023-05-07T19:32:14Z

@ihorizons2022 I changed the default behavior of the JSON encoder to avoid encoding in ASCII. So original characters should be preserved as are for the LLM to see.

Could you let me know if this helps and/or if you see any other issues associated with extraction when working with text in Chinese?

It should be sufficient for you to bump the library version to 0.9.2.

ihorizons2022 · 2023-05-08T10:54:30Z

thank you for the quick response.
close the issue

eyurtsev self-assigned this May 5, 2023

eyurtsev added the bug Something isn't working label May 5, 2023

eyurtsev mentioned this issue May 7, 2023

Fix to allow encoding non ascii in JSON #153

Merged

ihorizons2022 closed this as completed May 8, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

attribute description added to extracted info in Chinese #152

attribute description added to extracted info in Chinese #152

ihorizons2022 commented May 4, 2023

ihorizons2022 commented May 4, 2023 •

edited

eyurtsev commented May 5, 2023

eyurtsev commented May 7, 2023

eyurtsev commented May 7, 2023

ihorizons2022 commented May 8, 2023

attribute description added to extracted info in Chinese #152

attribute description added to extracted info in Chinese #152

Comments

ihorizons2022 commented May 4, 2023

ihorizons2022 commented May 4, 2023 • edited

eyurtsev commented May 5, 2023

eyurtsev commented May 7, 2023

eyurtsev commented May 7, 2023

ihorizons2022 commented May 8, 2023

ihorizons2022 commented May 4, 2023 •

edited