-
Notifications
You must be signed in to change notification settings - Fork 146
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How are you going about cleaning? #31
Comments
I use a regex to search for a specific subset of instructions/inputs/outputs that are highly likely to be wrong. After that, I manually go and verify that something is indeed incorrect and query gpt-3.5-turbo or fix them by hand. |
Much of it has been hand curated with the help of various tools and regular expressions (see tools directory). There exists a tool that can use GPT3.5/4 to double-check the answers. |
i bet one could convince the gpt-3.5 api to take in a line, and decide whether it's acceptable or needs fixed, and if it needs fixed to suggest a fix, which could then be quick-reviewed (accept or reject) by a human |
We have this capability already built out in one of the tools. I think the first 1000 entries have been checked in this manner. |
How are you going about cleaning this?
Manually or with GPT-4.
The text was updated successfully, but these errors were encountered: