Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Encoding Issue: Incorrect Umlaut (Speacial Character) Handling in SQL Execution Leading to Wrong Query Results #21018

Open
5 tasks done
jeppelsh opened this issue Apr 29, 2024 · 0 comments
Labels
Ɑ: agent Related to agents module 🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature

Comments

@jeppelsh
Copy link

jeppelsh commented Apr 29, 2024

Checked other resources

  • I added a very descriptive title to this issue.
  • I searched the LangChain documentation with the integrated search.
  • I used the GitHub search to find a similar question and didn't find it.
  • I am sure that this is a bug in LangChain rather than my code.
  • The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).

Example Code

This is the code I execute, but without my db it will be hard to reproduce....

from cx_Oracle import makedsn
from langchain.sql_database import SQLDatabase 

dsn_tns = makedsn(host=host, port=port, service_name=service_name)
connection_string = f"oracle+cx_oracle://{usr}:{pwd}@{dsn_tns}?encoding=UTF-8&nencoding=UTF-8"
db = SQLDatabase.from_uri(connection_string, schema=schema, include_tables=include_tables)
print("Dialect:", db.dialect)

from langchain.agents import create_sql_agent 
from langchain.agents.agent_toolkits import SQLDatabaseToolkit

toolkit = SQLDatabaseToolkit(db=db, llm=chat_client)
agent_executor = create_sql_agent(llm=chat_client, toolkit=toolkit, agent_type="openai-tools", verbose=True, return_intermediate_steps=True)

Error Message and Stack Trace (if applicable)

No error message, because no error is being raised.

Description

Description

When using LangChain for NL2SQL, there is a discrepancy between the displayed SQL in the intermediate steps and the actual SQL that is executed. The executed SQL seems to mishandle umlauts (ä, ö, ü), using escape sequences (e.g., \u00fc for ü) instead of the actual umlaut characters. This results in incorrect query execution, as the database does not recognize the conditions specified due to encoding errors.

Expected Behavior

The SQL query should be executed exactly as shown in the intermediate steps, preserving the correct encoding for special characters such as umlauts. The conditions in the WHERE clause should correctly filter the records based on the given values.

Actual Behavior

The executed SQL does not correctly handle the encoding of umlauts, leading to no matches in conditions that involve these characters, even when appropriate records exist in the database.

Output of invoke()

Invoking: `sql_db_query_checker` with `{'query': "SELECT COUNT(*) AS open_cancellation_orders FROM auftraege WHERE type = 'K\\u00fcndigung erfassen' AND status = 'offener K\\u00fcndigungsvorgang'"}`

``sql
SELECT COUNT(*) AS open_cancellation_orders 
FROM auftraege 
WHERE type = 'Kündigung erfassen' 
AND status = 'offener Kündigungsvorgang'
``
Invoking: `sql_db_query` with `{'query': "SELECT COUNT(*) AS open_cancellation_orders FROM auftraege WHERE type  = 'K\\u00fcndigung erfassen' AND status = 'offener K\\u00fcndigungsvorgang'"}`

[(0,)]Es gibt derzeit keine offenen Kündigungsaufträge in der Datenbank.

The SQL statement in the middle is correct (including the special characters), this is what should be executed. And if I execute this SQL statement in another tool, I get the correct result.
However, the SQL statement at the top and the bottom (with wrong speacial character representation) seems to be what is actually executed. Since the value in the where-clause is wrong, the query does not return any row, which is wrong.
What surprises me is that even within the same output two different versions of the SQL statement are displayed, one correct and one incorrect, and that unfortunately the wrong one is executed.

System Info

System Information

OS: Windows
OS Version: 10.0.19045
Python Version: 3.11.9 (tags/v3.11.9:de54cf5, Apr 2 2024, 10:12:12) [MSC v.1938 64 bit (AMD64)]

Package Information

langchain_core: 0.1.42
langchain: 0.1.16
langchain_community: 0.0.32
langsmith: 0.1.31
langchain_openai: 0.1.3
langchain_text_splitters: 0.0.1

Packages not installed (Not Necessarily a Problem)

The following packages were not found:

langgraph
langserve

@dosubot dosubot bot added Ɑ: agent Related to agents module 🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature labels Apr 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Ɑ: agent Related to agents module 🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature
Projects
None yet
Development

No branches or pull requests

1 participant