-
Notifications
You must be signed in to change notification settings - Fork 76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Retrieved Content #33
Comments
We are truly sorry the generated content cannot be restored for now. We would love to help you reproduce the results though. |
Thanks for being willing to help! Just want to ask about the retrieval setting. When you do retrieval, do you filter out the file containing the code to complete? If not, the model is possible to retrieve the target of code generation as the context, which does not make much sense to me -- if you want to use a model to help you complete the code, by the time you call the model, the target code does not exist in the repo yet. If you do, is there an efficient way to do that? |
Of course, we need to filter out the target file to avoid leakage. However, we did not filter out all the content in the target file. We keep the content in the front of the target file that is not covered by the context provided to the LM. For example, file A has 100 lines, we have line 20-80 as the unfinished code, and line 81 as the completion hole. During retrieval, we also retrieve line 1-19 as useful supplementary Information for the completion. The code related to this matter is CodeT/RepoCoder/search_code.py Line 44 in 35f54d6
The context_start_lineno is metadata we stored for each completion case. |
Thanks. The setting makes sense to me! I retrieved the GT context for the "function" split by adapting your code (window_size=50, slice_size=5). Then I filtered out the unfinished part using your logic (line 44) and run code generation using ChatGPT (2k tokens for GT context, 2k for infile context). I only got Pass@1=0.2895, and you reported 0.4263. The full evaluation results are:
Here's the GT context I got: Can you run your code to produce the GT context file for RepoEval-function, so I can compare the difference? Thanks!!! |
i am also try to run this project successfully, cheer up and thx for your guys‘ issue, it gives me hope to do it!thx! |
If you're interested, here's our implementation of gt retrieval (without filtering the unfinished part): https://github.com/code-rag-bench/code-rag-bench/tree/main?tab=readme-ov-file#retrieval. |
wow!thx for your support!i will try it again!
获取 Outlook for iOS<https://aka.ms/o0ukef>
…________________________________
发件人: Yiqing Xie ***@***.***>
发送时间: Sunday, July 14, 2024 8:03:35 AM
收件人: microsoft/CodeT ***@***.***>
抄送: SUN, Binwen [Alumni] ***@***.***>; Comment ***@***.***>
主题: Re: [microsoft/CodeT] Retrieved Content (Issue #33)
CAUTION: This email is not originated from PolyU. Do not click links or open attachments unless you recognize the sender and know the content is safe.
i am also try to run this project successfully, cheer up and thx for your guys‘ issue, it gives me hope to do it!thx!
If you're interested, here's our implementation of gt retrieval (without filtering the unfinished part): https://github.com/code-rag-bench/code-rag-bench/tree/main?tab=readme-ov-file#retrieval.
―
Reply to this email directly, view it on GitHub<#33 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/A2SEINXCABAU7KMWYGS35VTZMG55PAVCNFSM6AAAAABIPPAYIOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMRXGE2DENBSG4>.
You are receiving this because you commented.Message ID: ***@***.***>
[https://www.polyu.edu.hk/emaildisclaimer/PolyU_Email_Signature.jpg]
Disclaimer:
This message (including any attachments) contains confidential information intended for a specific individual and purpose. If you are not the intended recipient, you should delete this message and notify the sender and The Hong Kong Polytechnic University (the University) immediately. Any disclosure, copying, or distribution of this message, or the taking of any action based on it, is strictly prohibited and may be unlawful.
The University specifically denies any responsibility for the accuracy or quality of information obtained through University E-mail Facilities. Any views and opinions expressed are only those of the author(s) and do not necessarily represent those of the University and the University accepts no liability whatsoever for any losses or damages incurred or caused to any party as a result of the use of such information.
|
Hi, can you also provide the retrieved content for Table 3, or the code retrieved by UniXCoder? Thanks!
The text was updated successfully, but these errors were encountered: