You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Apr 11, 2023. It is now read-only.
I want to make a code search corpus. I have collected a lots of GitHub repositories. Now I need to deconstruct code into tokens to extract functions and comments. You describe in the paper CodeSearchNet Challenge Evaluating the State of Semantic Code Search: We then tokenize all Go, Java, JavaScript, Python, PHP and Ruby functions (or methods) using TreeSitter — GitHub’s universal parser — and, where available, their respective documentation text using a heuristic regular expression.
I can extract functions in python. But it hasn't comments. How do you extract functions with comments? Can you share your codes?
The text was updated successfully, but these errors were encountered:
Thank you for your reply! I have try the function parer in CodeSearchNet/function_parser/ folder. But I met some problems:
What is the input? In the examples, the input is library keras-team/keras. Is it https://github.com/keras-team/keras? But it's a repository. Is it one repository per input?
I want to make a code search corpus. I have collected a lots of GitHub repositories. Now I need to deconstruct code into tokens to extract functions and comments. You describe in the paper
CodeSearchNet Challenge Evaluating the State of Semantic Code Search
:We then tokenize all Go, Java, JavaScript, Python, PHP and Ruby functions (or methods) using TreeSitter — GitHub’s universal parser — and, where available, their respective documentation text using a heuristic regular expression.
I can extract functions in python. But it hasn't comments. How do you extract functions with comments? Can you share your codes?
The text was updated successfully, but these errors were encountered: