Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong Pearse code with suffix -e #77

Open
Ansa211 opened this issue Aug 25, 2017 · 8 comments
Open

Wrong Pearse code with suffix -e #77

Ansa211 opened this issue Aug 25, 2017 · 8 comments
Labels

Comments

@Ansa211
Copy link
Contributor

Ansa211 commented Aug 25, 2017

bin/words sancte perfide improbe

The Pearse code on the "e SUFFIX" line is for some reason 01, not 05 as expected.

I assume one has to look around src/words_engine/words_engine-list_package.adb lines 479--485 and 725--750, but I was not able to solve the issue.
(It's also not clear to me why this is not handled through ADDONS.LAT.)

@mk270
Copy link
Owner

mk270 commented Aug 25, 2017

thanks again .. please state the exact reproduction steps, and the expected and observed behaviour.

@Ansa211
Copy link
Contributor Author

Ansa211 commented Aug 25, 2017

Setup Words to DO_PEARSE_CODES, e.g. by copying the whole WORDS.MDV file like this:

echo "HAVE_STATISTICS_FILE              N
WRITE_STATISTICS_FILE             N
SHOW_DICTIONARY                   N
SHOW_DICTIONARY_LINE              N
SHOW_DICTIONARY_CODES             Y
DO_PEARSE_CODES                   Y
DO_ONLY_INITIAL_WORD              N
FOR_WORD_LIST_CHECK               N
DO_ONLY_FIXES                     N
DO_FIXES_ANYWAY                   N
USE_PREFIXES                      Y
USE_SUFFIXES                      Y
USE_TACKONS                       Y
DO_MEDIEVAL_TRICKS                Y
DO_SYNCOPE                        Y
DO_TWO_WORDS                      Y
INCLUDE_UNKNOWN_CONTEXT           Y
NO_MEANINGS                       N
OMIT_ARCHAIC                      Y
OMIT_MEDIEVAL                     N
OMIT_UNCOMMON                     N
DO_I_FOR_J                        N
DO_U_FOR_V                        N
PAUSE_IN_SCREEN_OUTPUT            N
NO_SCREEN_ACTIVITY                N
UPDATE_LOCAL_DICTIONARY           N
UPDATE_MEANINGS                   N
MINIMIZE_OUTPUT                   N
START_FILE_CHARACTER             '@'
CHANGE_PARAMETERS_CHARACTER      '#'
CHANGE_DEVELOPER_MODES_CHARACTER '!'" > WORD.MDV

bin/words perfide ends with:

01 e                    SUFFIX                  
06 -ly; -ily;  Converting ADJ to ADV
01 perfid.e             ADV    POS                         
02 perfidus, perfida, perfidum  ADJ   [XXXBX]  
03 faithless, treacherous, false, deceitful;

The initial 01 in this part of the output should be 05 (it's a SUFFIX line), but as it comes from the code mentioned above and not from the ADDONS.LAT file, it gets a wrong code.

@ids1024
Copy link
Contributor

ids1024 commented Aug 25, 2017

What are Pearse codes based on? Googling "pearse code" doesn't help. Are they a standard thing (in computational linguistics, etc.), or just used in words?

"Pearse" must come from somewhere (someone's name, probably)...

@mk270
Copy link
Owner

mk270 commented Aug 25, 2017

I'm guessing Roger Pearse. The classics / linguistics / compsci nexus on this github issues board is occasionally useful :)

@mk270
Copy link
Owner

mk270 commented Aug 28, 2017

Thanks @Ansa211 , I can now reproduce the problem. What is the basis on which you know the correct Pearse codes? Is the system documented somewhere?

@mk270 mk270 added the bug label Aug 28, 2017
@Ansa211
Copy link
Contributor Author

Ansa211 commented Aug 29, 2017

I don't think it is really documented, but from experimenting with it a little bit, this is what I expect:

01 stem.ending      morphological analysis
02 dictionary forms         [dictionary codes]
03 translation
04 used only for unknown words
05 used for affixes: this line appears before lines 01,02,03, and gives the affix
06 this code has double meaning (which is a bit stupid):
        in conjunction with 05, it gives the English translation of the affix
        without 05, it gives information about a trick used to analyse the word 

I think that for the 06 lines of the second type to appear in the output, one has to have at least one of the following lines in WORD.MOD:

DO_COMPOUNDS                      Y
DO_TRICKS                         Y

and for the 02 lines, DO_DICTIONARY_FORMS Y has to be there.
Then try words abdias and abierant to see the second version of the 06 code.

Running grep '\b0[0-9]\b' * */* */*/* */*/*/* on the directory with Words seems to imply that there are only two places in which codes are output, the files src/words_engine/words_engine-list_package.adb and src/words_engine/words_engine-parse.adb (in the second one lines with code 00 seem to be output; but I haven't seen them in the actual output, and I don't know of any setting which would switch that on).

@mk270
Copy link
Owner

mk270 commented Aug 29, 2017

Please check out commit ac417de which breaks the Pearse Code machinery into its own module

@Ansa211
Copy link
Contributor Author

Ansa211 commented Sep 1, 2017

I've checked it out; the pull request has been updated to reflect these changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants