-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature: -- plan
flag
#12
Comments
The discovery that only GPT-4 can self-improve, while weaker models cannot, is very intriguing, indicating a new type of emergent ability (i.e. to improve upon natural language feedback) may only exist when the model is "mature" (large and well-aligned) enough Large Language Models (LLMs) have shown remarkable aptitude in code generation but still struggle on challenging programming tasks. Self-repair -- in which the model debugs and fixes mistakes in its own code -- has recently become a popular way to boost performance in these settings. However, only very limited studies on how and when self-repair works effectively exist in the literature, and one might wonder to what extent a model is really capable of providing accurate feedback on why the code is wrong when that code was generated by the same model. In this paper, we analyze GPT-3.5 and GPT-4's ability to perform self-repair on APPS, a challenging dataset consisting of diverse coding challenges. |
this is an easy one
https://arxiv.org/pdf/2303.06689.pdf idea
The text was updated successfully, but these errors were encountered: