Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Grammar: make check gives different results on Mac and Linux #37

Open
albbas opened this issue Nov 27, 2023 · 25 comments
Open

Grammar: make check gives different results on Mac and Linux #37

albbas opened this issue Nov 27, 2023 · 25 comments
Assignees

Comments

@albbas
Copy link
Contributor

albbas commented Nov 27, 2023

On commit 83ba211 I get 6 fails on my Linux box and 5 fails on Mac

The file that fails on Linux, and not on Mac is: DEV-syn-pred-attr-PASS.yaml

@albbas
Copy link
Contributor Author

albbas commented Nov 27, 2023

The failing test on Linux is:
[ 4/55][FAIL fp1] operidum:opereridum (, ()) => operidum:[opteridum, optieridum, superidum, tuperidum, duperidum, kuperidum, doneridum, operadume, hoveridum, oarridum] (typo)

@albbas
Copy link
Contributor Author

albbas commented Nov 27, 2023

This is the result on Mac:
[ 4/55][PASS tp] operidum:opereridum (, ()) => operidum:[opteridum, optieridum, kuperidum, duperidum, superidum, tuperidum, jieridum, opereridum, opteridam, åhkeridum] (typo)

@albbas
Copy link
Contributor Author

albbas commented Nov 27, 2023

Could it be that the speller package (lib-speller-something) is out of sync on Mac nightly and Linux nightly, since this is a difference in suggestions on a typo?

@albbas
Copy link
Contributor Author

albbas commented Nov 27, 2023

apt-cache show divvun-gramcheck

Package: divvun-gramcheck
Source: libdivvun
Version: 0.3.11+g563~e101aba9-1~jammy1
Architecture: amd64
Maintainer: Debian Science Team <debian-science-maintainers@alioth-lists.debian.net>
Installed-Size: 1587
Depends: libxml2-utils, libarchive13 (>= 3.0.4), libc6 (>= 2.34), libdivvun0 (>= 0.3.11+g563~e101aba9), libgcc-s1 (>= 3.3.1), libhfst55 (>= 3.16.0+g3882~0136e846), libhfstospell11 (>= 0.5.3+g381~9bed46c8), libpugixml1v5 (>= 1.4), libstdc++6 (>= 11)
Provides: libdivvun-bin, libdivvun-tools
Homepage: https://github.com/divvun/libdivvun
Priority: optional
Section: science
Filename: pool/main/libd/libdivvun/divvun-gramcheck_0.3.11+g563~e101aba9-1~jammy1_amd64.deb
Size: 332816
SHA256: eaf079c04167894a3cfbdc3a035cbb80a7dfd8c6067e8d9ec363353619af72d8
SHA1: 7fc49678a033301d47cd21514353fd96371180ad
MD5sum: 33ef45415af7e6be098652971a1a4ef8
Description: Grammar checker tools for Divvun languages
 Helper tools for grammar checking for Divvun languages
Description-md5: 8151298b2db6426a3d1d4f55957d131a

@albbas
Copy link
Contributor Author

albbas commented Nov 27, 2023

hfst-ospell --version gives this result

hfst-ospell --version

hfstospell 0.5.3
copyright (C) 2009 - 2018 University of Helsinki

on both machines

@flammie
Copy link
Contributor

flammie commented Nov 27, 2023

git hash e101aba9 is the most recent version from 3 days ago... but I don't think anyone has touched suggestion sorting in ages. Are the weights of the mismatching suggestions exactly the same? If so it can easily happen that they end up in different order under circumstances including different oses data structures and sort algorithms...

@snomos
Copy link
Member

snomos commented Nov 27, 2023

Does the test require the suggestions to be in a specific order? That seems like a receipt for bogus fails. What is relevant is whether the correct suggestion is in the list or not, and possibly the position of the correct suggestion (the higher the better). Other than that the order and amount of suggestions should not be considered at all.

@snomos snomos assigned unhammer and unassigned snomos and TinoDidriksen Nov 27, 2023
@albbas
Copy link
Contributor Author

albbas commented Nov 27, 2023

echo operidum|hfst-ospell tools/grammarcheckers/smj.zhfst -S on Mac gives the correct suggestion as the eighth suggestion, where as on Linux it is down in twenty-something

@albbas
Copy link
Contributor Author

albbas commented Nov 27, 2023

The weight on Linux is ~37.99, on Mac it is ~37,59

@albbas
Copy link
Contributor Author

albbas commented Nov 27, 2023

Does the test require the suggestions to be in a specific order? That seems like a receipt for bogus fails. What is relevant is whether the correct suggestion is in the list or not, and possibly the position of the correct suggestion (the higher the better). Other than that the order and amount of suggestions should not be considered at all.

No, the test framework uses the output of divvun-checker, looking for the correct suggestion among the suggestions given by divvun-checker.

@albbas
Copy link
Contributor Author

albbas commented Nov 27, 2023

The number of typo-suggestions from divvun-checker seems to be truncated to ten results

@unhammer
Copy link
Contributor

The weight on Linux is ~37.99, on Mac it is ~37,59

Should they not be the same regardless of platform?

(But hfst-ospell's last commit was in June, hfst's last was in September, so why did this only happen now?)

@albbas
Copy link
Contributor Author

albbas commented Nov 29, 2023

I tested the smj.zhfst that I build on my Mac on my Linux machne. The weights are different (see the comment above), but the wordforms are the same.

❯ echo operidum|hfst-ospell ~/Viežžamat/smj.zhfst -S
"operidum" is NOT in the lexicon:
Corrections for "operidum":
opteridum    27.590923
optieridum    31.590923
superidum    32.590923
tuperidum    32.590923
duperidum    32.590923
kuperidum    32.590923
doneridum    37.590923
operadume    37.590923
hoveridum    37.590923
oarridum    37.590923
vomeridum    37.590923
moveridum    37.590923
råhperidum    37.590923
gåhperidum    37.590923
noteridum    37.590923
voteridum    37.590923
poneridum    37.590923
poleridum    37.590923
apteridum    37.590923
exeridum    37.590923
roteridum    37.590923
opteridus    37.590923
opteridu    37.590923
dåhperidum    37.590923
poseridum    37.590923
rokeridum    37.590923
åhkeridum    37.590923
opteridam    37.590923
logeridum    37.590923
ageridum    37.590923
koseridum    37.590923
doseridum    37.590923
doteridum    37.590923
joderidum    37.590923
opteridup    37.590923
opteridim    37.590923
jieridum    37.590923
opereridum    37.590923

vs the the smj.zhfst build on the Linux box:

❯ echo operidum|hfst-ospell tools/spellcheckers/smj.zhfst -S
"operidum" is NOT in the lexicon:
Corrections for "operidum":
opteridum    27.996086
optieridum    31.996086
superidum    32.996086
tuperidum    32.996086
duperidum    32.996086
kuperidum    32.996086
doneridum    37.996086
operadume    37.996086
hoveridum    37.996086
oarridum    37.996086
vomeridum    37.996086
moveridum    37.996086
råhperidum    37.996086
gåhperidum    37.996086
noteridum    37.996086
voteridum    37.996086
poneridum    37.996086
poleridum    37.996086
apteridum    37.996086
exeridum    37.996086
roteridum    37.996086
opteridus    37.996086
opteridu    37.996086
dåhperidum    37.996086
poseridum    37.996086
rokeridum    37.996086
åhkeridum    37.996086
opteridam    37.996086
logeridum    37.996086
ageridum    37.996086
koseridum    37.996086
doseridum    37.996086
doteridum    37.996086
joderidum    37.996086
opteridup    37.996086
opteridim    37.996086
jieridum    37.996086
opereridum    37.996086

@albbas
Copy link
Contributor Author

albbas commented Nov 29, 2023

This is the output of hfst-ospell on my Mac, with natively built .zhfst vs the one built on Linux:

Mac-built

❯ echo operidum|hfst-ospell tools/spellcheckers/smj.zhfst -S
"operidum" is NOT in the lexicon:
Corrections for "operidum":
opteridum    27.590923
optieridum    31.590923
kuperidum    32.590923
duperidum    32.590923
superidum    32.590923
tuperidum    32.590923
jieridum    37.590923
opereridum    37.590923

Linux-built

❯ echo operidum|hfst-ospell ~/Downloads/smj.zhfst -S
"operidum" is NOT in the lexicon:
Corrections for "operidum":
opteridum    27.996086
optieridum    31.996086
kuperidum    32.996086
duperidum    32.996086
superidum    32.996086
tuperidum    32.996086
jieridum    37.996086
opereridum    37.996086

@albbas
Copy link
Contributor Author

albbas commented Nov 29, 2023

The weights depend on where they were built and the wanted suggestion is way further down the list on Linux than on Mac.

@snomos
Copy link
Member

snomos commented Nov 29, 2023

I notice the weights on the Linux side is consistently 0.5 higher than on the Mac. The whole weight difference is strange, the math and the source code should be identical. @flammie do you have any ideas?

@flammie
Copy link
Contributor

flammie commented Nov 29, 2023

Seems strange, I could've thought that there can be tiny variations in floating point math between operating systems and processors or so, but whole 0.4 is unexpected. I'd probably start with unzipping the zhfst files and diffing the hfst-fst2txt outputs hoping it has simple differences only, otherwise it needs to be debugged with some probably debug prints on each step of the build in the compiler...

@albbas
Copy link
Contributor Author

albbas commented Nov 29, 2023

After converting and a diffing the files, the differences seem to be massive:

❯ wc -l linux/*.txt mac/*.txt *.diff

manually ordered output

 1 933 236 linux/acceptor.default.txt
 1 930 427 mac/acceptor.default.txt
 3 863 469 acceptor.default.diff
 
 2 171 024 linux/errmodel.default.txt
 2 171 024 mac/errmodel.default.txt
 3 633 755 errmodel.default.diff

@flammie
Copy link
Contributor

flammie commented Nov 29, 2023

well error models seem same size, which is kind of unfortunate of course since the akseptor is the one that has 100 step build process. I don't know if there's any way to debug and bisect other than going through the process step by step and compare, maybe it diverges in some obvious step... I think since current hfst's also use openfst as library there can be than that version difference too between mac and linux

@TinoDidriksen
Copy link
Member

There is no version difference. Linux and macOS builds both use external OpenFST and Foma, and same version of them.

The OpenFST x86 and x86_64 Linux builds are with SSE math - without that, the HFST test suite failed. And it should actually keep things more consistent, as it forces 64 bit floating point math everywhere - it would use the 80 bit x87 FPU otherwise. And 0.5 is indeed a rather big difference.

@albbas
Copy link
Contributor Author

albbas commented Nov 30, 2023

When building fsts, LC_ALL affects the weights. On my Mac, echo $LC_ALL gives an empty line

When fsts are built using empty LC_ALL, hfst-ospell then gives this list:

❯ echo operidum|hfst-ospell tools/spellcheckers/smj.zhfst -S

"operidum" is NOT in the lexicon:
Corrections for "operidum":
opteridum    27.590923
optieridum    31.590923
kuperidum    32.590923
duperidum    32.590923
superidum    32.590923
tuperidum    32.590923
jieridum    37.590923
opereridum    37.590923

When I set export LC_ALL=C, then compile the fsts, it gives is the list:

echo operidum|hfst-ospell tools/spellcheckers/smj.zhfst -S

"operidum" is NOT in the lexicon:
Corrections for "operidum":
opteridum    27.656849
optieridum    31.656849
kuperidum    32.656849
duperidum    32.656849
superidum    32.656849
tuperidum    32.656849
jieridum    37.656849
opereridum    37.656849

@TinoDidriksen
Copy link
Member

LC_ALL is definitely important. I always ensure all machines have a UTF-8 locale. On Linux this is usually C.UTF-8 and on macOS en_US.UTF-8. The Greenlandic team's scripts check and error out if LC_ALL does not match icase regex UTF-?8

@flammie
Copy link
Contributor

flammie commented Nov 30, 2023

I could guess there's at least one step there that reads or writes weights as floats and uses decimal comma and other part that expects decimal dot or vice versa...

@flammie
Copy link
Contributor

flammie commented Nov 30, 2023

there are tons of environment variables and probably other ways to change locale settings and possibly some utf8 locales might not have that in name, locale -ck charmap command will usually tell what it resolves to in typical programs.

I'll try to experiment with some more minimal example if this can be reproduced under linux with , and . locales...

@flammie
Copy link
Contributor

flammie commented Nov 30, 2023

looking at the process now, there's all the weighing in tools/spellers that uses a lot of coreutils, I could guess there's also many ways to diverge there as we know even sort and uniq don't agree between linux and macos and locales (I can't reproduce diffs between linux C.utf8 and fi-FI.utf8 though).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants