-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The system can be manipulated by user input to always return full marks and predefined feedback #4
Comments
https://www.ibm.com/blog/prevent-prompt-injection/ There some other solutions like Delimiter A second Agent could possible improve stuff like // Logic for valid Output to prevent /improve something like getting full points for one word as well. |
Excellent, can I ad your name to the changelog as someone who has given advice and made a suggestion. It will appear in this file. https://github.com/marcusgreen/moodle-qtype_aitext/blob/main/changelog.md |
Feel free to add my name to the changelog. Alexander Mikasch of Moodle.NRW (https://moodlenrw.de/) If I find time I will introduce the qtype to my team and inspect it in more detail. Good work! |
Thanks Alex, I have been giving it a lot of thought since yesterday. Can you email me at marcusavgreen at gmail.com |
I have created a new branch It is a first attempt at the code and would benefit from refining. |
Why you don't use the open AI assistent api. With this api the user can't "escape" from the user context and change the system context. |
Issue
Describe the bug
The system can be manipulated by user input to always return full marks and predefined feedback, which compromises the integrity of the automated grading process. The user can input a specific prompt that causes the LLM to disregard all previous inputs and system prompts, resulting in a JSON object that always gives full marks.
To Reproduce
Steps to reproduce the behavior:
Expected behavior
The system should correctly evaluate the user's input based on the provided criteria and marking scheme, without being influenced by manipulation prompts. The LLM should not be able to be overridden by specific user inputs that force it to give full marks.
Screenshots
Desktop (please complete the following information):
-all
Additional context
This issue allows users to bypass the intended grading mechanism, resulting in unfair assessments and undermining the reliability of the automated grading process. Implementing stricter input validation and prompt handling can help prevent this exploitation.
Yes, the problem could potentially be solved using two agents. The idea is that the first agent processes the user's input and generates a preliminary score and feedback. The second agent then reviews the output of the first agent for manipulation attempts and ensures that the feedback and score adhere to the expected criteria. Here's an overview of how this could work:
Solution with Two Agents
First Agent (Scoring and Feedback Generator):
Second Agent (Validation and Security Check):
Example Workflow
User Input:
First Agent:
Second Agent:
Pseudocode
Here is a simplified pseudocode example of how this could be implemented: JUST A GETTING STARTED IDEA
This implementation provides an additional layer of security and ensures that the assessments are fair and accurate.
Ofc its just an example that you get the idea.
The text was updated successfully, but these errors were encountered: