-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
on the Validator #11
Comments
Dear @wenjie-p, Yep, some of the parts are missing, you can find out which using
The
Then I make an
Then I add in the punctuation, replacing by nothing:
Punctuation symbols that are less frequent or might result in non-words, e.g. Then I look for symbols which should be replaced with other symbols, e.g. normalising different kinds of apostrophes:
I added Italian in 9e7da6a, if you would like other languages from the missing list, please feel free to open separate issues so that we can discuss if there are some complicated points. |
Closed in 9e7da6a. |
Dear @ftyers Thanks for the update of file btw, I am happy to add more missing data for this toolkit : ) |
I have a Makefile locally that makes that addition:
And then I can upload it to pip with And thanks for the offer, I am happy to accept PRs! :) |
Hi, thanks for the practical toolkit for CV data preprocessing!
I recently utilized this toolkit to validate data of different languages, but found the
Validator
failed to initialize, i.e.it
. After checking the code I found, the initialization ofValidator
demandsdata/$lang/validate.tsv
to be given.Thus my question is: 1) Will the missing data be updated recently? and 2) How to prepare the
data/$lang/validate.tsv
file from the scratch?Thanks in advance!
The text was updated successfully, but these errors were encountered: