Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GetElementsTool produce unicode when the elements contain non-ascii text #4265

Closed
2 of 14 tasks
jack482653 opened this issue May 7, 2023 · 1 comment
Closed
2 of 14 tasks

Comments

@jack482653
Copy link
Contributor

System Info

LanChain version: 0.0.158
Platform: macOS 13.3.1
Python version: 3.11

Who can help?

@vowelparrot

Information

  • The official example notebooks/scripts
  • My own modified scripts

Related Components

  • LLMs/Chat Models
  • Embedding Models
  • Prompts / Prompt Templates / Prompt Selectors
  • Output Parsers
  • Document Loaders
  • Vector Stores / Retrievers
  • Memory
  • Agents / Agent Executors
  • Tools / Toolkits
  • Chains
  • Callbacks/Tracing
  • Async

Reproduction

Code:

from langchain.agents.agent_toolkits import PlayWrightBrowserToolkit
from langchain.tools.playwright.utils import run_async

# This import is required only for jupyter notebooks, since they have their own eventloop
import nest_asyncio
nest_asyncio.apply()

from playwright.async_api import async_playwright

playwright = async_playwright()
device = {
    "user_agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/113.0.5672.53 Safari/537.36",
    "screen": {
      "width": 1920,
      "height": 1080
    },
    "viewport": {
      "width": 1280,
      "height": 720
    },
    "device_scale_factor": 1,
    "is_mobile": False,
    "has_touch": False,
    "default_browser_type": "chromium"
}
browser = run_async(playwright.start())
browser = run_async(browser.chromium.launch(headless=True))
context = await browser.new_context(**device)

toolkit = PlayWrightBrowserToolkit.from_browser(async_browser=browser)
tools = toolkit.get_tools()

tools_by_name = {tool.name: tool for tool in tools}
navigate_tool = tools_by_name["navigate_browser"]
get_elements_tool = tools_by_name["get_elements"]
extract_text_tool = tools_by_name["extract_text"]

url = "https://www.ftvnews.com.tw/news/detail/2023505W0297"
await navigate_tool.arun({"url": url})
await get_elements_tool.arun({"selector": "article"})

Result:

[{"innerText": "\\u8d99\\u6021\\u7fd4\\u8aaa\\uff1a\\u570b\\u6c11\\u9ee8\\u982d\\u75db\\u7684\\u662f\\uff0c\\u8a72\\u5982\\u4f55\\u628a\\u90ed\\u53f0\\u9298\\u300c\\u8f15\\u8f15\\u5730\\u653e\\u4e0b\\u300d\\u3002\\n\\n\\u8ad6\\u58c7\\u4e2d\\u5fc3\\uff0f\\u6797\\u975c\\u82ac\\u5831\\u5c0e\\n\\n\\u570b\\u6c11\\u9ee82024\\u7e3d\\u7d71\\u4eba\\u9078\\u5c1a\\u672a\\u5e95\\u5b9a\\uff0c\\u65b0\\u5317\\u5e02\\u9577\\u4faf\\u53cb\\u5b9c\\u8207\\u9d3b\\u6d77\\u5275\\u8fa6\\u4eba\\u90ed\\u53f0\\u9298\\u5be6\\u529b\\u76f8\\u7576\\uff0c\\u4f46\\u5982\\u4eca\\u50b3\\u51fa\\u570b\\u6c11\\u9ee8\\u5df2\\u5167\\u5b9a\\u4faf\\u51fa\\u99ac\\u53c3\\u9078\\u3002\\u5c0d\\u6b64\\uff0c\\u6c11\\u9032\\u9ee8\\u53f0\\u5317\\u5e02\\u8b70\\u54e1\\u8d99\\u6021\\u7fd4\\u5728\\u300a\\u5168\\u570b\\u7b2c\\u4e00\\u52c7\\u300b\\u7bc0\\u76ee\\u4e2d\\u8868\\u793a\\uff0c\\u90ed\\u53f0\\u9298\\u73fe\\u5728\\u5df2\\u7d93\\u4e82\\u4e86\\u3001\\u6025\\u4e86\\uff0c\\u300c\\u56e0\\u70ba\\u4ed6\\u77e5\\u9053\\u4ed6\\u5feb\\u88ab\\u505a\\u6389\\u4e86\\uff01\\u300d\\u800c\\u570b\\u6c11\\u9ee8\\u63a5\\u4e0b\\u4f86\\u8981\\u601d\\u8003\\u7684\\u662f\\uff0c\\u300c\\u5982\\u4f55\\u628a\\u90ed\\u53f0\\u9298\\u8f15\\u8f15\\u5730\\u653e\\u4e0b\\uff0c\\u4e00\\u65e6\\u653e\\u5f97\\u592a\\u5feb\\u3001\\u7834\\u788e\\u4e86\\uff0c\\u5c0d\\u4e0d\\u8d77\\uff0c\\u4ed6\\u53c8\\u518d\\u6b21\\u8ddf\\u4f60\\u570b\\u6c11\\u9ee8\\u7ffb\\u81c9\\u300d\\u3002\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\u66f4\\u591a\\u65b0\\u805e\\uff1a \\u5feb\\u65b0\\u805e\\uff0f\\u51fa\\u7344\\u756b\\u9762\\u66dd\\uff01\\u3000\\u8d99\\u7389\\u67f1\\u7372\\u5047\\u91cb\\u624b\\u6bd4\\u8b9a\\uff1a\\u975e\\u5e38\\u9ad8\\u8208\\n\\n\\u90ed\\u53f0\\u9298\\u8207\\u4faf\\u53cb\\u5b9c\\u7684\\u570b\\u6c11\\u9ee8\\u7e3d\\u7d71\\u53c3\\u9078\\u5fb5\\u53ec\\u4e4b\\u722d\\uff0c\\u8d8a\\u8da8\\u767d\\u71b1\\u5316\\u3002\\u300a\\u5168\\u570b\\u7b2c\\u4e00\\u52c7\\u300b\\u4f86\\u8cd3\\u8d99\\u6021\\u7fd4\\u6307\\u51fa\\uff0c\\u90ed\\u53f0\\u9298\\u6700\\u8fd1\\u62cb\\u51fa\\u7684\\u8a31\\u591a\\u8b70\\u984c\\uff0c\\u88ab\\u6279\\u8a55\\u6b20\\u7f3a\\u5468\\u5168\\u7684\\u601d\\u8003\\uff0c\\u5305\\u62ec\\u300c\\u6211\\u8981\\u7528AI\\u53bb\\u8655\\u7406\\u8a50\\u9a19\\u3001\\u6211\\u8981\\u7528\\u6a5f\\u5668\\u4eba\\u53bb\\u7dad\\u8b77\\u53f0\\u7063\\u7684\\u6230\\u5834\\u300d\\u7b49\\u7b49\\uff0c\\u70ba\\u4ec0\\u9ebc\\u9019\\u6a23\\u8aaa\\uff1f\\u300c\\u56e0\\u70ba\\u4ed6\\u6025\\u4e86\\uff0c\\u4ed6\\u77e5\\u9053\\u5728\\u6574\\u500b\\u904e\\u7a0b\\u7576\\u4e2d\\u5df2\\u7d93\\u90fd\\u88ab\\u5167\\u5b9a\\u4e86\\u300d\\u3002\\n\\n\\n\\n\\n\\n\\n\\n\\u8d99\\u6021\\u7fd4\\uff1a\\u4faf\\u3001\\u90ed\\u50cf\\u6253\\u96fb\\u52d5\\u5169\\u5144\\u5f1f\\uff0c \\u53ea\\u6709\\u4e00\\u500b\\u4eba\\u771f\\u73a9\\u3002\\uff08\\u5716\\uff0f\\u6c11\\u8996\\u65b0\\u805e\\uff09\\n\\n\\n\\n\\n\\u8d99\\u6021\\u7fd4\\u5206\\u6790\\u6307\\u51fa\\uff0c\\u90ed\\u53f0\\u9298\\u6700\\u8fd1\\u7684\\u8655\\u5883\\uff0c\\u8b93\\u4ed6\\u60f3\\u5230\\u7db2\\u8def\\u4e0a\\u4e00\\u5f35\\u8ff7\\u56e0\\u54cf\\u5716\\uff0c\\u300c\\u5c31\\u662f\\u5169\\u500b\\u5144\\u5f1f\\u5728\\u6253\\u96fb\\u52d5\\uff0c\\u5169\\u500b\\u90fd\\u6253\\u5f97\\u5f88\\u8a8d\\u771f\\uff0c\\u4f46\\u53ea\\u6709\\u54e5\\uff08\\u4faf\\u53cb\\u5b9c\\uff09\\u9059\\u63a7\\u5668\\u6709\\u63d2\\u9032\\u96fb\\u73a9\\u88e1\\u9762\\uff0c\\u5f1f\\u5f1f\\uff08\\u90ed\\u53f0\\u9298\\uff09\\u7684\\u662f\\u5b8c\\u5168\\u6c92\\u6709\\u63d2\\u9032\\u53bb\\u3002\\u5f1f\\u5f1f\\u5c31\\u662f\\u5728\\u6253\\u5047\\u7403\\uff0c\\u81eahigh\\u800c\\u5df2\\u300d\\u3002\\n\\n\\u8d99\\u6021\\u7fd4\\u9032\\u4e00\\u6b65\\u8868\\u793a\\uff1a\\u300c\\u6240\\u4ee5\\u6211\\u5c31\\u8aaa\\uff0c\\u53ea\\u6709\\u4faf\\u53cb\\u5b9c\\u771f\\u7684\\u5728\\u73a9\\uff0c\\u90ed\\u53f0\\u9298\\u4ee5\\u70ba\\u4ed6\\u5728\\u73a9\\uff0c\\u4f46\\u4ed6\\u9023\\u63d2\\u982d\\u90fd\\u6c92\\u63d2\\u9032\\u53bb\\uff0c\\u4e0d\\u77e5\\u9053\\u5728\\u73a9\\u4ec0\\u9ebc\\uff0c\\u4f46\\u91cd\\u9ede\\u662f\\uff0c\\u6839\\u672c\\u5f9e\\u982d\\u5230\\u5c3e\\u5c31\\u6c92\\u6709\\u4ed6\\u7684\\u4efd\\uff0c\\u53ea\\u4e0d\\u904e\\u9084\\u662f\\u6709\\u756b\\u9762\\u300d\\u3001\\u300c\\u56e0\\u70ba\\u4faf\\u53cb\\u5b9c\\u4e5f\\u5728\\u6309\\uff0c\\u5c31\\u662f\\u54e5\\u54e5\\u4e5f\\u5728\\u6309\\uff0c\\u5f1f\\u5f1f\\u6309\\u5f97\\u5f88\\u958b\\u5fc3\\uff0c\\u4ee5\\u70ba\\u662f\\u4ed6\\u5728\\u8df3\\uff0c\\u4f46\\u5176\\u5be6\\u662f\\u54e5\\u54e5\\u5728\\u8df3\\u3002\\u300d\\n\\n\\u4f46\\u90ed\\u8463\\u53ef\\u4ee5\\u4efb\\u6191\\u570b\\u6c11\\u9ee8\\u611a\\u5f04\\u55ce\\uff1f\\u8d99\\u6021\\u7fd4\\u8a8d\\u70ba\\uff0c\\u73fe\\u5728\\u570b\\u6c11\\u9ee8\\u5982\\u679c\\u8981\\u8aaa\\u670d\\u5927\\u5bb6\\uff0c\\u9019\\u500b\\u662f\\u4e00\\u500b\\u516c\\u6b63\\u7684\\u9078\\u8209\\uff0c\\u5c31\\u61c9\\u8a72\\u628a\\u6c11\\u8abf\\u7684\\u57fa\\u6e96\\u3001\\u6642\\u9593\\u9ede\\u62ff\\u51fa\\u4f86\\uff0c\\u300c\\u4f60\\u628a\\u5230\\u6642\\u5019\\u8003\\u616e\\u7684\\uff0c\\u4e0d\\u540c\\u56e0\\u7d20\\u8ddf\\u767e\\u5206\\u6bd4\\u5168\\u90e8\\u90fd\\u62ff\\u51fa\\u4f86\\uff0c\\u8aaa\\u5c0d\\u4e0d\\u8d77\\uff0c\\u5ba2\\u89c0\\u800c\\u8a00\\u5c31\\u662f\\u4faf\\u53cb\\u5b9c\\u6bd4\\u8f03\\u5f37\\uff0c\\u6211\\u89ba\\u5f97\\u9019\\u6703\\u8cb7\\u55ae\\u7684\\u300d\\uff0c\\u4f46\\u662f\\u4eca\\u5929\\u5982\\u679c\\u4f60\\u662f\\u9ed1\\u7bb1\\u4f5c\\u696d\\uff0c\\u8aaa\\u4e0d\\u51fa\\u4f86\\u4efb\\u4f55\\u7684\\u4f9d\\u64da\\uff0c\\u6700\\u5f8c\\u5c31\\u63a8\\u4faf\\u53cb\\u5b9c\\u7684\\u8a71\\uff0c\\u570b\\u6c11\\u9ee8\\u6703\\u6709\\u9ebb\\u7169\\u3002\\n\\u300c\\u70ba\\u4ec0\\u9ebc\\uff1f\\u4f60\\u628a\\u4ed6\\u653e\\u5f97\\u592a\\u5feb\\u3001 \\u7834\\u788e\\u4e86\\uff0c\\u4ed6\\u53c8\\u518d\\u6b21\\u8ddf\\u4f60\\u570b\\u6c11\\u9ee8\\u7ffb\\u81c9\\uff0c\\u751a\\u81f3\\u65bc\\u53bb\\u52a0\\u5165\\u7b2c\\u4e09\\u9ee8\\uff0c\\u6240\\u4ee5\\u570b\\u6c11\\u9ee8\\u73fe\\u5728\\u8981\\u601d\\u8003\\u7684\\u5c31\\u662f\\uff0c\\u8981\\u5982\\u4f55\\u628a\\u90ed\\u53f0\\u9298\\u8f15\\u8f15\\u5730\\u653e\\u4e0b\\u3002\\u300d\\n\\n\\u66f4\\u591a\\u65b0\\u805e\\uff1a \\u8cf4\\u6e05\\u5fb7\\u65b0\\u5317\\u6c11\\u8abf\\u8d85\\u8eca\\u4faf\\u53cb\\u5b9c6\\uff05\\u3000\\u7acb\\u59d4\\u5206\\u6790\\u300c2\\u95dc\\u9375\\u300d\\u4faf\\u5931\\u53bb\\u512a\\u52e2"}]

Expected behavior

Should return a text like this:

[{"innerText": "趙怡翔說:..."}]
vowelparrot pushed a commit that referenced this issue May 7, 2023
Adds ensure_ascii=False when dumping json in the GetElementsTool
Fixes issue #4265
@dosubot
Copy link

dosubot bot commented Aug 28, 2023

Hi, @jack482653! I'm Dosu, and I'm here to help the LangChain team manage their backlog. I wanted to let you know that we are marking this issue as stale.

From what I understand, you opened this issue regarding the GetElementsTool in the LangChain library. The problem is that the tool converts non-ascii text to unicode, while the expected behavior is for it to return the text as is. Currently, there hasn't been any activity or comments on the issue.

Before we close this issue, we wanted to check with you if it is still relevant to the latest version of the LangChain repository. If it is, please let the LangChain team know by commenting on the issue. Otherwise, feel free to close the issue yourself or it will be automatically closed in 7 days.

Thank you for your understanding and contribution to the LangChain project!

@dosubot dosubot bot added the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label Aug 28, 2023
@dosubot dosubot bot closed this as not planned Won't fix, can't repro, duplicate, stale Sep 10, 2023
@dosubot dosubot bot removed the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label Sep 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant