Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue: security concerns with exec() via multiple agents and Shell tool #5294

Closed
juppytt opened this issue May 26, 2023 · 3 comments
Closed
Labels
🤖:security Related to security issues, CVEs

Comments

@juppytt
Copy link
Contributor

juppytt commented May 26, 2023

Issue you'd like to raise.

TL;DR: The use of exec() in agents can lead to remote code execution vulnerabilities. Some Huggingface projects use such agents, despite the potential harm of LLM-generated Python code.

#1026 and #814 discuss the security concerns regarding the use of exec() in llm_math chain. The comments in #1026 proposed methods to sandbox the code execution, but due to environmental issues, the code was patched to replace exec() with numexpr.evaluate() (#2943). This restricted the execution capabilities to mathematical functionalities only. This bug was assigned the CVE number CVE-2023-29374.

As shown in the above issues, the usage of exec() in a chain can pose a significant security risk, especially when the chain is running on a remote machine. This seems common scenario for projects in Huggingface.

However, in the latest langchain, exec() is still used in PythonReplTool and PythonAstReplTool.
https://github.com/hwchase17/langchain/blob/aec642febb3daa7dbb6a19996aac2efa92bbf1bd/langchain/tools/python/tool.py#L55

https://github.com/hwchase17/langchain/blob/aec642febb3daa7dbb6a19996aac2efa92bbf1bd/langchain/tools/python/tool.py#L102

These functions are called by Pandas Dataframe Agent, Spark Dataframe Agent, CSV Agent. It seems they are intentionally designed to pass the LLM output to PythonAstTool or PythonAstReplTool to execute the LLM-generated code in the machine.

The documentation for these agents explicitly states that they should be used with caution since LLM-generated Python code can be potentially harmful. For instance:
https://github.com/hwchase17/langchain/blob/aec642febb3daa7dbb6a19996aac2efa92bbf1bd/docs/modules/agents/toolkits/examples/pandas.ipynb#L12

Despite this, I have observed several projects in Huggingface using create_pandas_dataframe_agent and create_csv_agent.

Suggestion:

Fixing this issue as done in llm_math chain seems challenging.
Simply restricting the LLM-generated code to Pandas and Spark execution might not be sufficient because there are still numerous malicious tasks that can be performed using those APIs. For instance, Pandas can read and write files.

Meanwhile, it seems crucial to emphasize the security concerns related to LLM-generated code for the overall security of LLM apps. Merely limiting execution to specific frameworks or APIs may not fully address the underlying security risks.

@juppytt
Copy link
Contributor Author

juppytt commented May 29, 2023

I have come across a functionality similar to exec() in shell tool available on https://github.com/hwchase17/langchain/blob/master/langchain/tools/shell/tool.py.

Shell tool is designed to execute shell commands generated by the LLM. The documentation for this tool does not provide a warning about its potential safety concerns. (https://python.langchain.com/en/latest/modules/agents/tools/examples/bash.html?highlight=shell)

However, within the code itself, there is a cautionary message stating that it may be unsafe to use the tool.
https://github.com/hwchase17/langchain/blob/99a1e3f3a309852da989af080ba47288dcb9a348/langchain/tools/shell/tool.py#L32-L35

It would be advisable to include a warning in the documentation to alert users about the potential risks involved.

Specifically, when utilizing the shell tool in an LLM application running on a server, it is possible for malicious users to exploit it, gaining control over the shell command execution and potentially running arbitrary code on the server. Similarly, if the LLM application is running on a local machine and receives untrusted data as input, such as a malicious webpage or untrusted files, the generated shell command by the LLM can be manipulated by this untrusted data, enabling the execution of arbitrary code on the local machine. It is crucial to be aware of these security risks when using the shell tool in such scenarios.

@juppytt juppytt changed the title Issue: security concerns with exec() via Pandas Dataframe Agent, Spark Dataframe Agent, CSV Agent Issue: security concerns with exec() via multiple agents and Shell tools May 29, 2023
@juppytt juppytt changed the title Issue: security concerns with exec() via multiple agents and Shell tools Issue: security concerns with exec() via multiple agents and Shell tool May 29, 2023
@mick-net
Copy link

Related #2301

@dosubot
Copy link

dosubot bot commented Sep 19, 2023

Hi, @juppytt! I'm Dosu, and I'm here to help the LangChain team manage their backlog. I wanted to let you know that we are marking this issue as stale.

From what I understand, you raised concerns about the use of exec() in agents, which can lead to remote code execution vulnerabilities. You suggested addressing the security concerns related to LLM-generated code and not solely relying on restricting execution to specific frameworks or APIs. In the comments, you provided an example of a shell tool that also poses potential safety concerns and suggested including a warning in the documentation.

Before we close this issue, we wanted to check with you if it is still relevant to the latest version of the LangChain repository. If it is, please let us know by commenting on the issue. Otherwise, feel free to close the issue yourself, or it will be automatically closed in 7 days.

Thank you for your contribution to the LangChain project!

@dosubot dosubot bot added the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label Sep 19, 2023
@dosubot dosubot bot closed this as not planned Won't fix, can't repro, duplicate, stale Sep 26, 2023
@dosubot dosubot bot removed the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label Sep 26, 2023
@eyurtsev eyurtsev added the 🤖:security Related to security issues, CVEs label Mar 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🤖:security Related to security issues, CVEs
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants