Skip to content

FEAT: Add fetch function for SecLists AI LLM Bias Testing datasets (#267)#280

Merged
romanlutz merged 8 commits into
microsoft:mainfrom
KutalVolkan:feature/add-fetch-function-seclists
Aug 11, 2024
Merged

FEAT: Add fetch function for SecLists AI LLM Bias Testing datasets (#267)#280
romanlutz merged 8 commits into
microsoft:mainfrom
KutalVolkan:feature/add-fetch-function-seclists

Conversation

@KutalVolkan
Copy link
Copy Markdown
Contributor

Hi @romanlutz,

To the best of my knowledge, I have completed the code implementation for the SecLists AI LLM Bias Testing (#267).

Summary

Added a function to fetch SecLists AI LLM Bias Testing datasets, process the data, and convert it into a PromptDataset. This includes handling placeholders for Country, Region, Nationality, Gender, and Skin-Color.

Changes

Please review the changes and let me know if there are any improvements or adjustments needed.

Best regards,
Volkan

@romanlutz romanlutz linked an issue Jul 23, 2024 that may be closed by this pull request
@romanlutz romanlutz mentioned this pull request Jul 23, 2024
Copy link
Copy Markdown
Contributor

@rdheekonda rdheekonda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work, Kutal and Roman.

@KutalVolkan
Copy link
Copy Markdown
Contributor Author

Hello @romanlutz, @rdheekonda,

During testing of the 8_test_seclists_bias_testing.py file, I encountered an error related to data type conversion in DuckDB, specifically trying to convert a string '2965249278352' to an INT128. Additionally, a TypeError: 'NoneType' object is not iterable was raised, indicating that the query result was None, likely due to the data type conversion error.

Steps to Reproduce

  1. Merge the pull request from feature/many-shot-jailbreaking (the version Roman tried to merge, which possibly included commits from the main PyRIT repo) into feature/many-shot-jailbreaking.
  2. Merge feature/many-shot-jailbreaking into feature/add-fetch-function-seclists.
  3. Run the 8_test_seclists_bias_testing.py script.

Alternatively, it should be possible to just run the 8_test_seclists_bias_testing.py script.

Error Message

Error fetching data from table PromptMemoryEntries: (duckdb.duckdb.ConversionException) Conversion Error: Could not convert string '2965249278352' to INT128
[SQL: SELECT "PromptMemoryEntries".id AS "PromptMemoryEntries_id", "PromptMemoryEntries".role AS "PromptMemoryEntries_role", "PromptMemoryEntries".conversation_id AS "PromptMemoryEntries_conversation_id", "PromptMemoryEntries".sequence AS "PromptMemoryEntries_sequence", "PromptMemoryEntries".timestamp AS "PromptMemoryEntries_timestamp", "PromptMemoryEntries".labels AS "PromptMemoryEntries_labels", "PromptMemoryEntries".prompt_metadata AS "PromptMemoryEntries_prompt_metadata", "PromptMemoryEntries".converter_identifiers AS "PromptMemoryEntries_converter_identifiers", "PromptMemoryEntries".prompt_target_identifier AS "PromptMemoryEntries_prompt_target_identifier", "PromptMemoryEntries".orchestrator_identifier AS "PromptMemoryEntries_orchestrator_identifier", "PromptMemoryEntries".response_error AS "PromptMemoryEntries_response_error", "PromptMemoryEntries".original_value_data_type AS "PromptMemoryEntries_original_value_data_type", "PromptMemoryEntries".original_value AS "PromptMemoryEntries_original_value", "PromptMemoryEntries".original_value_sha256 AS "PromptMemoryEntries_original_value_sha256", "PromptMemoryEntries".converted_value_data_type AS "PromptMemoryEntries_converted_value_data_type", "PromptMemoryEntries".converted_value AS "PromptMemoryEntries_converted_value", "PromptMemoryEntries".converted_value_sha256 AS "PromptMemoryEntries_converted_value_sha256"
FROM "PromptMemoryEntries"
WHERE ("PromptMemoryEntries".orchestrator_identifier ->> $1) = $2::UUID]
[parameters: ('id', UUID('b88e999d-2595-4b96-8188-f8abeb52fdfa'))]
(Background on this error at: https://sqlalche.me/e/20/9h9h)
Traceback (most recent call last):
  File "C:\Users\vkuta\anaconda3\envs\pyrit-dev\Lib\site-packages\sqlalchemy\engine\base.py", line 1970, in _exec_single_context
    self.dialect.do_execute(
  File "C:\Users\vkuta\anaconda3\envs\pyrit-dev\Lib\site-packages\sqlalchemy\engine\default.py", line 924, in do_execute
    cursor.execute(statement, parameters)
  File "C:\Users\vkuta\anaconda3\envs\pyrit-dev\Lib\site-packages\duckdb_engine\__init__.py", line 162, in execute
    self.__c.execute(statement, parameters)
duckdb.duckdb.ConversionException: Conversion Error: Could not convert string '2965249278352' to INT128

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\Users\vkuta\projects\PyRIT\pyrit\memory\duckdb_memory.py", line 272, in query_entries
    return query.all()
           ^^^^^^^^^^^
  File "C:\Users\vkuta\anaconda3\envs\pyrit-dev\Lib\site-packages\sqlalchemy\orm\query.py", line 2673, in all
    return self._iter().all()  # type: ignore
           ^^^^^^^^^^^^
  File "C:\Users\vkuta\anaconda3\envs\pyrit-dev\Lib\site-packages\sqlalchemy\orm\query.py", line 2827, in _iter
    result: Union[ScalarResult[_T], Result[_T]] = self.session.execute(
                                                  ^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\vkuta\anaconda3\envs\pyrit-dev\Lib\site-packages\sqlalchemy\orm\session.py", line 2306, in execute
    return self._execute_internal(
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\vkuta\anaconda3\envs\pyrit-dev\Lib\site-packages\sqlalchemy\orm\session.py", line 2191, in _execute_internal
    result: Result[Any] = compile_state_cls.orm_execute_statement(
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\vkuta\anaconda3\envs\pyrit-dev\Lib\site-packages\sqlalchemy\orm\context.py", line 293, in orm_execute_statement
    result = conn.execute(
             ^^^^^^^^^^^^^
  File "C:\Users\vkuta\anaconda3\envs\pyrit-dev\Lib\site-packages\sqlalchemy\engine\base.py", line 1421, in execute
    return meth(
           ^^^^^
  File "C:\Users\vkuta\anaconda3\envs\pyrit-dev\Lib\site-packages\sqlalchemy\sql\elements.py", line 514, in _execute_on_connection
    return connection._execute_clauseelement(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\vkuta\anaconda3\envs\pyrit-dev\Lib\site-packages\sqlalchemy\engine\base.py", line 1643, in _execute_clauseelement
    ret = self._execute_context(
          ^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\vkuta\anaconda3\envs\pyrit-dev\Lib\site-packages\sqlalchemy\engine\base.py", line 1849, in _execute_context
    return self._exec_single_context(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\vkuta\anaconda3\envs\pyrit-dev\Lib\site-packages\sqlalchemy\engine\base.py", line 1989, in _exec_single_context
    self._handle_dbapi_exception(
  File "C:\Users\vkuta\anaconda3\envs\pyrit-dev\Lib\site-packages\sqlalchemy\engine\base.py", line 2356, in _handle_dbapi_exception
    raise sqlalchemy_exception.with_traceback(exc_info[2]) from e
  File "C:\Users\vkuta\anaconda3\envs\pyrit-dev\Lib\site-packages\sqlalchemy\engine\base.py", line 1970, in _exec_single_context
    self.dialect.do_execute(
  File "C:\Users\vkuta\anaconda3\envs\pyrit-dev\Lib\site-packages\sqlalchemy\engine\default.py", line 924, in do_execute
    cursor.execute(statement, parameters)
  File "C:\Users\vkuta\anaconda3\envs\pyrit-dev\Lib\site-packages\duckdb_engine\__init__.py", line 162, in execute
    self.__c.execute(statement, parameters)
sqlalchemy.exc.DataError: (duckdb.duckdb.ConversionException) Conversion Error: Could not convert string '2965249278352' to INT128
[SQL: SELECT "PromptMemoryEntries".id AS "PromptMemoryEntries_id", "PromptMemoryEntries".role AS "PromptMemoryEntries_role", "PromptMemoryEntries".conversation_id AS "PromptMemoryEntries_conversation_id", "PromptMemoryEntries".sequence AS "PromptMemoryEntries_sequence", "PromptMemoryEntries".timestamp AS "PromptMemoryEntries_timestamp", "PromptMemoryEntries".labels AS "PromptMemoryEntries_labels", "PromptMemoryEntries".prompt_metadata AS "PromptMemoryEntries_prompt_metadata", "PromptMemoryEntries".converter_identifiers AS "PromptMemoryEntries_converter_identifiers", "PromptMemoryEntries".prompt_target_identifier AS "PromptMemoryEntries_prompt_target_identifier", "PromptMemoryEntries".orchestrator_identifier AS "PromptMemoryEntries_orchestrator_identifier", "PromptMemoryEntries".response_error AS "PromptMemoryEntries_response_error", "PromptMemoryEntries".original_value_data_type AS "PromptMemoryEntries_original_value_data_type", "PromptMemoryEntries".original_value AS "PromptMemoryEntries_original_value", "PromptMemoryEntries".original_value_sha256 AS "PromptMemoryEntries_original_value_sha256", "PromptMemoryEntries".converted_value_data_type AS "PromptMemoryEntries_converted_value_data_type", "PromptMemoryEntries".converted_value AS "PromptMemoryEntries_converted_value", "PromptMemoryEntries".converted_value_sha256 AS "PromptMemoryEntries_converted_value_sha256"
FROM "PromptMemoryEntries"
WHERE ("PromptMemoryEntries".orchestrator_identifier ->> $1) = $2::UUID]
[parameters: ('id', UUID('b88e999d-2595-4b96-8188-f8abeb52fdfa'))]
(Background on this error at: https://sqlalche.me/e/20/9h9h)
Traceback (most recent call last):
  File "C:\Users\vkuta\projects\PyRIT\doc\demo\8_test_seclists_bias_testing.py", line 142, in <module>
    asyncio.run(run())
  File "C:\Users\vkuta\anaconda3\envs\pyrit-dev\Lib\asyncio\runners.py", line 190, in run
    return runner.run(main)
           ^^^^^^^^^^^^^^^^
  File "C:\Users\vkuta\anaconda3\envs\pyrit-dev\Lib\asyncio\runners.py", line 118, in run
    return self._loop.run_until_complete(task)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\vkuta\anaconda3\envs\pyrit-dev\Lib\asyncio\base_events.py", line 654, in run_until_complete
    return future.result()
           ^^^^^^^^^^^^^^^
  File "C:\Users\vkuta\projects\PyRIT\doc\demo\8_test_seclists_bias_testing.py", line 115, in run
    memory = orchestrator.get_memory()
             ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\vkuta\projects\PyRIT\pyrit\orchestrator\orchestrator_class.py", line 75, in get_memory
    return self._memory.get_prompt_request_piece_by_orchestrator_id(orchestrator_id=self._id)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\vkuta\projects\PyRIT\pyrit\memory\memory_interface.py", line 163, in get_prompt_request_piece_by_orchestrator_id
    return sorted(prompt_pieces, key=lambda x: (x.conversation_id, x.timestamp))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: 'NoneType' object is not iterable

Thank you for your wonderful support so far! Any pointers, hints, or suggestions on how to resolve this error would be greatly appreciated!

@romanlutz
Copy link
Copy Markdown
Contributor

@KutalVolkan can you try deleting your results folder and rerun? Over time, we may make small changes to the DB and that can throw it off. Sadly, the errors are fairly hard to decipher. In most cases that I've seen deleting the results folder gives you a fresh start and the error doesn't show again.

…d fixed issues across all files.- Updated .gitignore for better exclusion.- Import changes in many_shot_jailbreak.ipynb and .py - Updated datasets initialization and fetching scripts.- Added new seclists_bias_testing notebooks and scripts.
@KutalVolkan KutalVolkan force-pushed the feature/add-fetch-function-seclists branch from fed64e4 to 4f9855b Compare August 3, 2024 11:40
@KutalVolkan
Copy link
Copy Markdown
Contributor Author

@KutalVolkan can you try deleting your results folder and rerun? Over time, we may make small changes to the DB and that can throw it off. Sadly, the errors are fairly hard to decipher. In most cases that I've seen deleting the results folder gives you a fresh start and the error doesn't show again.

Hi Roman,

Your suggestion to delete the results folder and rerun worked perfectly, thank you!

Additionally, I've successfully merged my branch with the PyRIT main branch and added the fetch function for SecLists AI LLM Bias Testing datasets. You can review all the changes at your convenience. Please feel free to provide any feedback, and if there are any changes you'd like to see, I'll be happy to make them.

Thanks again for your help!

Best,
Volkan

Comment thread pyrit/datasets/fetch_example_datasets.py
Comment thread .gitignore Outdated
Copy link
Copy Markdown
Contributor

@romanlutz romanlutz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! A few tweaks and we should be able to merge. I'll make sure to ask my teammates for feedback as well (if any)

Comment thread .gitignore Outdated
Comment thread doc/code/orchestrators/many_shot_jailbreak.py Outdated
Comment thread doc/code/orchestrators/seclists_bias_testing.py Outdated
Comment thread doc/code/orchestrators/seclists_bias_testing.py Outdated
Comment thread doc/code/orchestrators/seclists_bias_testing.py Outdated
Comment thread doc/code/orchestrators/seclists_bias_testing.py Outdated
Comment thread pyrit/datasets/fetch_example_datasets.py
Comment thread pyrit/datasets/fetch_example_datasets.py Outdated
Comment thread doc/code/orchestrators/seclists_bias_testing.py Outdated
@romanlutz romanlutz marked this pull request as ready for review August 5, 2024 21:43
@romanlutz romanlutz changed the title [DRAFT] Add fetch function for SecLists AI LLM Bias Testing datasets (#267) FEAT: Add fetch function for SecLists AI LLM Bias Testing datasets (#267) Aug 5, 2024
…ed .gitignore to ignore unnecessary files.- Modified many_shot_jailbreak.ipynb and many_shot_jailbreak.py with improvements.- Modified seclists_bias_testing.ipynb and seclists_bias_testing.py for better functionality.- Updated fetch_example_datasets.py for enhanced placeholder management.Next steps:- Write comprehensive unit and integration tests to validate functionality.
…chestrator logic in seclists_bias_testing.ipynb
…chestrator logic in seclists_bias_testing.ipynb
Comment thread doc/code/orchestrators/seclists_bias_testing.py Outdated
Comment thread pyrit/datasets/fetch_example_datasets.py Outdated
Copy link
Copy Markdown
Contributor

@romanlutz romanlutz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing! Thank you for your patience and for incorporating all our feedback. Let me know if there's anything you still meant to add, otherwise I'll be happy to merge.

@KutalVolkan
Copy link
Copy Markdown
Contributor Author

KutalVolkan commented Aug 9, 2024

Hi @romanlutz ,

I wanted to let you know that I've resolved all issues. Additionally, I've updated our file handling to include encoding='utf-8' to ensure compatibility with different languages.

Everything is done on my end, and it can be merged. Please, let me know if you need anything else, and thank you for your great support!

@romanlutz
Copy link
Copy Markdown
Contributor

Awesome! Can you add pycountry to the pyproject.toml? I think that's all that's left.

@KutalVolkan
Copy link
Copy Markdown
Contributor Author

Awesome! Can you add pycountry to the pyproject.toml? I think that's all that's left.

Hello Roman,

Done! Thank you for your great support!

@romanlutz romanlutz merged commit a578162 into microsoft:main Aug 11, 2024
@KutalVolkan KutalVolkan deleted the feature/add-fetch-function-seclists branch August 12, 2024 06:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add fetch function for SecLists AI LLM Bias Testing datasets

3 participants