-
Notifications
You must be signed in to change notification settings - Fork 301
Roberta docstring reworking #910
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Roberta docstring reworking #910
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR!
Overall LGTM! Left some minor comments on the style.
[" afternoon sun", " night moon"] | ||
preprocessor("The quick b rown fox jumped.") | ||
# Tokenize a batch of single sentences . |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: remove space between sentences and period.
preprocessor(["The quick brown fox jumped.", "Call me Ishmael."]) | ||
# Preprocess a batch of sentence pairs. | ||
# When handling multiple sequences, always convert to tensors first ! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same, remove the extra space
it should be the file path to merge rules. The merge rule file | ||
should have one merge rule per line. Every merge rule contains | ||
merge entities separated by a space. | ||
vocabulary: A dictionary or a string filename path. If passing a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggested edit:
A dictionary mapping tokens to integer ids, or file path to a json file containing the token to id mapping.
vocabulary: A dictionary or a string filename path. If passing a | ||
filename, the file should be a json file. both json and dictionary | ||
should map a single word piece token string to an integer id. | ||
merges: A list of merge rules or a string filename path, If passing a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: filename path => file path, and use "." to replace ","
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, LGTM!
One minor thing - in the future after you fix a comment, you may click on "resolve" button so that we know it's been addressed.
#867
link to colab used for testing: https://colab.research.google.com/drive/1TR4dCo_15IOaMdWZq1rrYcK4mscW5ACS#scrollTo=Iy8I5mZpYeMz