Retrieved Content #33

yiqingxyq · 2024-05-29T18:26:18Z

Hi, can you also provide the retrieved content for Table 3, or the code retrieved by UniXCoder? Thanks!

zfj1998 · 2024-06-18T08:16:41Z

We are truly sorry the generated content cannot be restored for now. We would love to help you reproduce the results though.

yiqingxyq · 2024-06-18T08:42:42Z

Thanks for being willing to help! Just want to ask about the retrieval setting. When you do retrieval, do you filter out the file containing the code to complete?

If not, the model is possible to retrieve the target of code generation as the context, which does not make much sense to me -- if you want to use a model to help you complete the code, by the time you call the model, the target code does not exist in the repo yet.

If you do, is there an efficient way to do that?

zfj1998 · 2024-06-22T03:34:29Z

Of course, we need to filter out the target file to avoid leakage. However, we did not filter out all the content in the target file. We keep the content in the front of the target file that is not covered by the context provided to the LM. For example, file A has 100 lines, we have line 20-80 as the unfinished code, and line 81 as the completion hole. During retrieval, we also retrieve line 1-19 as useful supplementary Information for the completion.

The code related to this matter is

CodeT/RepoCoder/search_code.py

Line 44 in 35f54d6

if metadata['end_line_no'] <= query_line['metadata']['context_start_lineno']:

The context_start_lineno is metadata we stored for each completion case.

yiqingxyq · 2024-06-22T23:00:08Z

Thanks. The setting makes sense to me!

I retrieved the GT context for the "function" split by adapting your code (window_size=50, slice_size=5). Then I filtered out the unfinished part using your logic (line 44) and run code generation using ChatGPT (2k tokens for GT context, 2k for infile context). I only got Pass@1=0.2895, and you reported 0.4263.

The full evaluation results are:

{
    "EM": 0.10723860589812333,
    "ES": 0.48067297674081083,
    "Pass@1": 0.289544235924933,
}

Here's the GT context I got:
repoeval-function-4k-gt-top5-filter.jsonl.txt

Can you run your code to produce the GT context file for RepoEval-function, so I can compare the difference? Thanks!!!

binwensun · 2024-07-08T04:47:37Z

i am also try to run this project successfully, cheer up and thx for your guys‘ issue, it gives me hope to do it！thx！

yiqingxyq · 2024-07-14T00:03:13Z

i am also try to run this project successfully, cheer up and thx for your guys‘ issue, it gives me hope to do it！thx！

If you're interested, here's our implementation of gt retrieval (without filtering the unfinished part): https://github.com/code-rag-bench/code-rag-bench/tree/main?tab=readme-ov-file#retrieval.

binwensun · 2024-07-14T00:05:14Z

wow！thx for your support！i will try it again！获取 Outlook for iOS<https://aka.ms/o0ukef>

…

________________________________ 发件人: Yiqing Xie ***@***.***> 发送时间: Sunday, July 14, 2024 8:03:35 AM 收件人: microsoft/CodeT ***@***.***> 抄送: SUN, Binwen [Alumni] ***@***.***>; Comment ***@***.***> 主题: Re: [microsoft/CodeT] Retrieved Content (Issue #33) CAUTION: This email is not originated from PolyU. Do not click links or open attachments unless you recognize the sender and know the content is safe. i am also try to run this project successfully, cheer up and thx for your guys‘ issue, it gives me hope to do it！thx！ If you're interested, here's our implementation of gt retrieval (without filtering the unfinished part): https://github.com/code-rag-bench/code-rag-bench/tree/main?tab=readme-ov-file#retrieval. ― Reply to this email directly, view it on GitHub<#33 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/A2SEINXCABAU7KMWYGS35VTZMG55PAVCNFSM6AAAAABIPPAYIOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMRXGE2DENBSG4>. You are receiving this because you commented.Message ID: ***@***.***> [https://www.polyu.edu.hk/emaildisclaimer/PolyU_Email_Signature.jpg] Disclaimer: This message (including any attachments) contains confidential information intended for a specific individual and purpose. If you are not the intended recipient, you should delete this message and notify the sender and The Hong Kong Polytechnic University (the University) immediately. Any disclosure, copying, or distribution of this message, or the taking of any action based on it, is strictly prohibited and may be unlawful. The University specifically denies any responsibility for the accuracy or quality of information obtained through University E-mail Facilities. Any views and opinions expressed are only those of the author(s) and do not necessarily represent those of the University and the University accepts no liability whatsoever for any losses or damages incurred or caused to any party as a result of the use of such information.

zfj1998 closed this as completed Nov 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Retrieved Content #33

Retrieved Content #33

yiqingxyq commented May 29, 2024

zfj1998 commented Jun 18, 2024

yiqingxyq commented Jun 18, 2024 •

edited

Loading

zfj1998 commented Jun 22, 2024 •

edited

Loading

yiqingxyq commented Jun 22, 2024 •

edited

Loading

binwensun commented Jul 8, 2024

yiqingxyq commented Jul 14, 2024

binwensun commented Jul 14, 2024 via email

Retrieved Content #33

Retrieved Content #33

Comments

yiqingxyq commented May 29, 2024

zfj1998 commented Jun 18, 2024

yiqingxyq commented Jun 18, 2024 • edited Loading

zfj1998 commented Jun 22, 2024 • edited Loading

yiqingxyq commented Jun 22, 2024 • edited Loading

binwensun commented Jul 8, 2024

yiqingxyq commented Jul 14, 2024

binwensun commented Jul 14, 2024 via email

yiqingxyq commented Jun 18, 2024 •

edited

Loading

zfj1998 commented Jun 22, 2024 •

edited

Loading

yiqingxyq commented Jun 22, 2024 •

edited

Loading