New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

help about PostReferenceGH #6

Closed
mashalahmad opened this Issue Nov 9, 2018 · 1 comment

Comments

Projects
None yet
2 participants
@mashalahmad
Copy link

mashalahmad commented Nov 9, 2018

hey could you please tell what does Copies attribute means in PostRefernceGH table?

and how can I get to know that a class of GitHub has many clones from stack overflow?

I read you paper Attribution Required: Stack Overflow Code Snippets in GitHub Projects where you use CPD to detect the clones. is it suitable? or is there anyway to get it from PostRefernceGH table.

@sbaltes sbaltes self-assigned this Nov 9, 2018

@sbaltes sbaltes added the help wanted label Nov 9, 2018

@sbaltes

This comment has been minimized.

Copy link
Member

sbaltes commented Nov 9, 2018

could you please tell what does Copies attribute means in PostRefernceGH table?

Sure, Copies indicates how often that exact file appears in the dataset. For a certain FileId, it is equal to:

SELECT COUNT(*)
FROM `sotorrent-org.2018_09_23.PostReferenceGH`
WHERE FileId="<FILE_ID>";

and how can I get to know that a class of GitHub has many clones from stack overflow?
I read you paper Attribution Required: Stack Overflow Code Snippets in GitHub Projects where you use CPD to detect the clones. is it suitable? or is there anyway to get it from PostRefernceGH table.

Unfortunately, I can only provide support for the dataset here. It's up to you to find a suitable approach to detect the code clones. You could use CPD, but most likely only on a sample of projects and snippets. However, there are many other code clone detectors available. You could start with these papers:

The corresponding full paper for the ICSE extended abstract you mentioned is now also available:

@sbaltes sbaltes closed this Nov 9, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment