Skip to content

Conversation

@bzz
Copy link
Member

@bzz bzz commented Jan 24, 2020

Converts a split of CodeSearchNet to the text format that OpenNMT uses

Example:

wget 'https://s3.amazonaws.com/code-search-net/CodeSearchNet/v2/java.zip'
unzip java.zip

python notebooks/codesearchnet-opennmt.py --data_dir='java/final/jsonl/valid' --newline='\\n'

wc -l src-trian.txt tgt-trian.txt
  15328 src-trian.txt
  15328 tgt-trian.txt
  30656 total

@bzz bzz requested a review from m09 January 24, 2020 11:40
Copy link
Contributor

@m09 m09 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@m09 m09 force-pushed the add-codesearchnet-preproc branch 4 times, most recently from cce3726 to 3477cf4 Compare January 24, 2020 17:52
Signed-off-by: m09 <142691+m09@users.noreply.github.com>
@m09 m09 force-pushed the add-codesearchnet-preproc branch from 3477cf4 to f2521a7 Compare January 24, 2020 18:10
@m09 m09 merged commit 394388a into master Jan 24, 2020
@m09 m09 deleted the add-codesearchnet-preproc branch January 24, 2020 20:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants