You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@ihorizons2022 I changed the default behavior of the JSON encoder to avoid encoding in ASCII. So original characters should be preserved as are for the LLM to see.
Could you let me know if this helps and/or if you see any other issues associated with extraction when working with text in Chinese?
It should be sufficient for you to bump the library version to 0.9.2.
code snippet:
schema = Object(
id="post",
description=(
'''
社交媒体博主在社交媒体上发布的脚本
'''
),
attributes=[
Text(
id="ingredient",
description="化妆品的原料和成分",
examples=[],
many=True,
),
Text(
id="function",
description="产品能够起到的作用",
examples=[
],
many=True,
),
Text(
id="brand",
description="文案中的化妆品品牌",
examples=[],
many=True,
),
Text(
id="product",
description="宣传的化妆品产品",
examples=[],
many=True,
),
Text(
id="skin",
description="皮肤的类型和状态",
examples=[
],
many=True,
),
Text(
id="target",
description="品牌或者产品适用的用户人群",
examples=[],
many=True,
),
Text(
id="feeling",
description="使用化妆品后的个人感受",
examples=[],
many=True,
),
Text(
id="scene",
description="适合使用化妆品的地点,气候,节日,季节,场合等",
examples=[],
many=True,
),
Text(
id="promotion",
description="产品促销信息",
examples=[],
many=True,
),
Text(
id="special",
description="产品的优势和特点",
examples=[
],
many=True,
),
Text(
id="category",
description="化妆品所属的品类",
examples=[
("第二 有一支好的防晒霜", '防晒霜')
],
many=True,
)
],
many=False
)
but the output likes:
{'post': {'brand': ['ZOTO'],
'product': ['防晒霜'],
'function': ['防晒'],
'skin': ['皮肤类型和状态'],
'target': ['用户人群'],
'feeling': ['使用化妆品后的个人感受'],
'scene': ['适合使用化妆品的地点,气候,节日,季节,场合等'],
'promotion': ['产品促销信息'],
'category': ['防晒霜']}}
actually, '皮肤类型和状态' is attribute description not extracted info
The text was updated successfully, but these errors were encountered: