Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About Data Format #2

Closed
hmdgit opened this issue Jul 2, 2020 · 1 comment
Closed

About Data Format #2

hmdgit opened this issue Jul 2, 2020 · 1 comment

Comments

@hmdgit
Copy link

hmdgit commented Jul 2, 2020

Hi,

I have a data in this format (doctring,code).

I figured out that the data you have used is in the following format:

BooleanValue<CODESPLIT>URL<CODESPLIT>returnType.methodName<CODESPLIT>docString<CODESPLIT>code

In order to use the code for code search, I have the following concerns:

  1. How can I generate BooleanValue for my dataset? Is there any benefit in code search?
  2. Though, I can work to generate 'returnType.methodName'. But, Is there any use of it in code search?
  3. Is there any use of URL in training and evaluating model?

Kindly let me know about it.

@fengzhangyin
Copy link
Collaborator

The data format contains multiple fields, but only docstring and code are useful. Other fields just retain some information.

If docstring and code are from the same example, BooleanValue is 1. There is a detailed introduction in Section 4.1 of the paper.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants