-
Notifications
You must be signed in to change notification settings - Fork 76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue with Handling Input Prompt Files in codegen-inference.py #32
Comments
Sometimes the tokenizer won't return the same number of tokens. So you may loose the token limitation a bit in CodeT/RepoCoder/build_prompt.py Line 15 in 35f54d6
|
if you use the force truncation by Tokenizer, it will change the last line of code, thus affecting the target hole of code completion. A better way is to reduce the length of the retrieved context or remove the in-file context from the beginning lines. You have to check more carefully whether the 'rg-one-gram-ws-20-ss-24.jsonl' is generated correctly since the code here CodeT/RepoCoder/build_prompt.py Line 77 in 35f54d6
|
Thank you very much for your prompt response! I appreciate your suggestions and will attempt both solutions. However, I have some concerns regarding the process, as I am trying to replicate the results you presented in your paper on the codegen-350M-mono model, specifically those in Table 2 (a, b). I followed the instructions provided in the README file meticulously and made no changes to the code logic other than modifying the hardcoded input paths. The steps I followed are as outlined below:
Could you please let me know if you encountered any issues when you achieved the results documented in your paper? I am concerned there may be an issue with my process since I am strictly adhering to the steps mentioned without altering any fundamental code logic. Your guidance on this matter would be greatly appreciated. |
The pipeline looks great. While if you want to get the results for the 3rd and 4th iteration, you may need to change the |
@pppyb - thanks for posting this detailed issue. It has helped me understand the process of using this repo better. Although, I still do have one doubt remaining - did you implement codegen-inference.py yourself to query codegen? Or is this a part of the repository? Thanks a ton! |
Hey @kechenliuuu3469 -- apologies for the lack of response earlier on this. I think inference.py is a part of the repository and I hope this issue #28 will solve your problem. |
Thank you very much for the contributions of the authors. While attempting to implement the RepoCoder method, we encountered an issue in the codegen_inference.py file. After modifying the file to:
It seems that this file can only generate results for the in-file method.we encountered the following error:
The text was updated successfully, but these errors were encountered: