-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix MST-parser tokenization #188
Comments
@alexei-gl @akolonin I was thinking the following. Based on the following:
Why don't we dissociate tokenization from the current parsing problem completely and remove tokenization from pair-counting and mst-parsing? What I mean is that I can remove all tokenization from this processes and just use the tokenization in whatever file we input (so it would do a sentence split based on spaces only, just like the file-based parser is doing now). |
@glicerico, @akolonin That sounds reasonable. In that case we still have to agree on 4.0.affix and probably 4.0.regex contents for MI-counting/MST-parsing (LG any mode), induced grammar dictionaries, used by grammar tester, in order for pre/post processing to be disabled in |
@alexei-gl I'm proposing to use an empty affix file. I understand that's the only one used for tokenization in Link Grammar, but I may be missing something. However, I'm not aware how you may use that file for your post-parsing processes, so I'm not sure if it makes sense to use an empty affix file. |
@glicerico I would love if both MI-Obsever and MST-Parser would have "spaces-only" tokenization option as you have already suggested to Sergey Shalyapin. We would use this option to avoid all these confusions and have things under full control of Pre-Cleaner (improving pre-cleaner for directed speech would be separate task for mid-term). |
@akolonin @alexei-gl
|
@glicerico - please make sure the new MST-Parses are in new format so we have MI-values attached to the links. |
I merged the branch in PR singnet/learn#5 |
@glicerico |
I believe this issue has been handled... @akolonin should we close this issue? |
Yes, @alexei-gl just have completed verifying MST-parses |
for the following parsing settings, rows 54-59:
https://docs.google.com/spreadsheets/d/1TPbtGrqZ7saUHhOIi5yYmQ9c-cvVlAGqY14ATMPVCq4/edit#gid=963717716
LG "English"
Baseline "random":
Baseline "sequential":
R=6, Weight = 1, mst-weight - none
R=6, Weight = 6/r, mst-weight = +1/r
LG "ANY", all parses, no mst-weight
This is temporary solution for #93
The text was updated successfully, but these errors were encountered: