-
-
Notifications
You must be signed in to change notification settings - Fork 250
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
lualatex.fmt not portable between 32-bit/64-bit machines #775
Comments
Akira just wrote me:
|
I could reproduce the error some days ago as I have both 32bit and 64bit binaries. But I doubt that this has any impact in the real world. I have the formats in different texmf-var trees and had to copy them manually to check the error. |
If the conclusion is that it will not be fixed, I don't object, at least
until someone complains. I'll just put it in the "known bugs" list.
--thanks, karl.
|
@kberry I don't know how feasible a fix at the engine level would be (making 32 bit systems byte compile Lua in the same format as 64bit) That would be the most stable fix, otherwise every time you bye compile anything that might get dumped you need to arrange to test for incompatible systems and arrange that the Lua is available in a file that can be reloaded in everyjob rather than using the byte compiled version which must be doable in theory but seems like work... Note a real user did hit this the other day so it does not just fail theoretical test setups: |
@davidcarlisle but wouldn't it be better, aka much simpler and more reliable, to have a simple test in the engine that the format is compatible and just bail out if not. I don't think it is unreasonable to say that formats generated for one architecture do not work with other architectures, and if that is possible for some engines but if not you get an appropriate error. |
@FrankMittelbach if formats had been architecture dependent from the start that would be quite natural but web2c implementations go to some lengths to make them cross platform and the default texlive directory structure has multiple bin directories one for each architecture installed, but only a single default directory for generated formats so |
@davidcarlisle well, as for "from the start", I think I remember that the format have not been cross compatible in the 80ties, so speaking from the start ... but perhaps I'm mistaken but anyway, I see your point. On the other hand, as you say, it is not clear how feasible that would be, while putting some flag in the formats for windows and luatex or even simpler identifying them by file name extension (which would also resolve the problem with parallel installations) should be fairly easy I would imagine |
Is it possible to avoid byte-compiled Lua in the fmt in the first place,
instead reading the .lua source files? As Olsak proposed (and rejected,
for himself). It seems you are already reading quite a few .lua files at
runtime. I'm not sure even sure what is being byte-compiled. expl3.lua?
That is, is it your explicit choice to byte-compile some files, or is it
something that is happening behind the scenes in the engine somehow, and
thus gets complicated to avoid?
Possibly the engine could somehow make byte-compiled Lua
architecture-independent. I'll ask Luigi. Presumably not for this year,
in any case.
Just for posterity: it's certainly true that fmt/base[/mem] files
were/are not sharable in the original Knuthian code. I went to some
trouble to make them sharable, I imagine in the 1980s (long before TeX
Live, anyway), when it became fairly common to share TeX trees over NFS,
e.g., between Sparcs and x86's.
Thanks,
Karl
|
Luigi just wrote me:
|
See also the thread on the luatex list starting https://tug.org/pipermail/luatex/2022-February/007612.html |
@kberry As the situation is I suspect unlikely to change before texlive 2022, would it be possible for texmf.cnf to arrange that the default path for luatex formats is architecture dependent (or at least different for win32) so that you can by default build both 32 and 64 bit formats? |
Sorry, I can't conceive of a reasonable way to handle system-dependent
.fmt files, only unreasonable ways. I think it would add a terrible
layer of confusion to something that is already one of the most
confusing aspects of the whole system.
Also, at this point I don't expect that the engine situation (unportable
bytecode) will change, ever, unless Marcel or Phelype or someone tackles
it. At least I don't see Luigi getting excited about it, and it's
nothing I'm going to work on. So a "temporary" solution for 2022 will
quite likely end up being permanent. Doesn't seem like a good
direction.
Thus, how about not byte-compiling any of your .lua files (which no one
answered me about -- I don't understand where/how that byte compilation
is done), but treating them all the way you treat the myriad .lua
sources now that are read at runtime? lualatexquotejobname.lua and
plenty more. Then the whole problem goes away. -k
|
I have since quite a long time three sets of binaries, the standard 32bit, 64bit and "experimental" and typically some of the formats are different. I have therefore in the 64bit and in the experimental bin folder a texmf.cnf which changes TEXMFSYSVAR:
That works without any problems. And imho it is easy to document as a work around if there are format clashes. |
@kberry it's not my code so I'm not sure how much complexity it would be to avoid byte compilation, it is not a single block of code it's a call before dump that iterates through any declared intarrays that are in the Lua state at that time and byte compiles them so they can be saved in the format,
You mention |
You mention `lualatexquotejobname.lua` but that's only a line or two of Lua
Um, I said that file and "myriad others". Yes, I know that file is tiny,
but it also loads fontloader*.lua and lualibs and tons more at
runtime. Thousands of lines of code, surely. Possibly still small
compared to Unicode tables, though, I realize.
avoiding dumping them in the format is a high price to pay
I understand. I guess it's up to you whether that price is worth having
unportable fmts. I don't want to work around this self-created problem
in full generality in fmtutil. My mind just boggles.
As far as I'm concerned, writing out Ulrike's change-the-envvar
workaround would suffice. How many people use 32-bit anything? (None of
this is Windows-specific, by the way.) Answer: hardly anyone. As I said
back at the beginning. -k
|
@kberry Out of curiosity, do you have one of those 32 bit format files handy? I think this is more of a Lua issue, but I don't know yet how much problematic it would be, and would rather try on an example file. We already discussed the portability of format files, but that was about endianness. That in particular can be done through patching of https://tug.org/svn/texlive/trunk/Build/source/libs/lua53/lua53-src/src/ldump.c?view=markup and the respective ( |
On Mon, 7 Feb 2022 at 23:29, kberry ***@***.***> wrote:
You mention `lualatexquotejobname.lua` but that's only a line or two of Lua
Um, I said that file and "myriad others". Yes, I know that file is tiny,
but it also loads fontloader*.lua and lualibs and tons more at
runtime. Thousands of lines of code, surely. Possibly still small
compared to Unicode tables, though, I realize.
avoiding dumping them in the format is a high price to pay
I understand. I guess it's up to you whether that price is worth having
unportable fmts. I don't want to work around this self-created problem
in full generality in fmtutil. My mind just boggles.
As far as I'm concerned, writing out Ulrike's change-the-envvar
workaround would suffice. How many people use 32-bit anything? (None of
this is Windows-specific, by the way.) Answer: hardly anyone. As I said
back at the beginning. -k
yes I realise it's not windows specific in principle but it seems windows
users are the most likely to have 64 and 32 bit binaries on the same system
at the present time.
I guess we are out of time for texlive 2022, I would say conceptually if
cross platform formats are a requirement we should look again at seeing if
luatex's byte compiler can be made cross platform as that is the documented
way to dump things in the format. I don't blame the luatex team if it's not
the highest priority though.
—
… Reply to this email directly, view it on GitHub
<#775 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAJVYAUY7IHUQUGXMNEXMCTU2BIWTANCNFSM5NSJHNAQ>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Actually, nevermind I got "cross compilation" to 32 bits working. Quick patch to LuaTeX revealed the real error message and also that by mistake I was looking at Lua 5.4 sources, because the error in 5.3 is obvious.
I didn't yet study all changes in Lua dumping for version 5.4, but there was this note in one of the commits:
So it may be safe to remove the checks even for 5.3(?). Without them I get to load OpTeX format correctly. No luck with
I used https://github.com/vlasakm/mmtex for testing this, maybe someone finds these rough steps useful:
Required patch: diff --git a/src/lua/src/ldump.c b/src/lua/src/ldump.c
index f025aca..d1c1ee6 100644
--- a/src/lua/src/ldump.c
+++ b/src/lua/src/ldump.c
@@ -186,8 +186,8 @@ static void DumpHeader (DumpState *D) {
DumpByte(LUAC_VERSION, D);
DumpByte(LUAC_FORMAT, D);
DumpLiteral(LUAC_DATA, D);
- DumpByte(sizeof(int), D);
- DumpByte(sizeof(size_t), D);
+ //DumpByte(sizeof(int), D);
+ //DumpByte(sizeof(size_t), D);
DumpByte(sizeof(Instruction), D);
DumpByte(sizeof(lua_Integer), D);
DumpByte(sizeof(lua_Number), D);
diff --git a/src/lua/src/lundump.c b/src/lua/src/lundump.c
index edf9eb8..4451518 100644
--- a/src/lua/src/lundump.c
+++ b/src/lua/src/lundump.c
@@ -247,8 +247,8 @@ static void checkHeader (LoadState *S) {
if (LoadByte(S) != LUAC_FORMAT)
error(S, "format mismatch in");
checkliteral(S, LUAC_DATA, "corrupted");
- checksize(S, int);
- checksize(S, size_t);
+ //checksize(S, int);
+ //checksize(S, size_t);
checksize(S, Instruction);
checksize(S, lua_Integer);
checksize(S, lua_Number);
diff --git a/src/luatex/meson.build b/src/luatex/meson.build
index 6439c00..1393767 100644
--- a/src/luatex/meson.build
+++ b/src/luatex/meson.build
@@ -3,7 +3,6 @@ luatex_src = files(
'src/luamd5/md5.c',
'src/luamd5/md5lib.c',
'src/luapeg/lpeg.c',
- 'src/luazip/src/luazip.c',
'src/luazlib/lgzip.c',
'src/luazlib/lzlib.c',
'src/slnunicode/slnunico.c',
@@ -221,7 +220,6 @@ luatex_dependencies = [
cc.find_library('dl', required: false),
dependency('libpng'),
dependency('zlib'),
- dependency('zziplib'),
lua_dep,
mplib_dep,
pplib_dep,
diff --git a/src/luatex/src/lua/llualib.c b/src/luatex/src/lua/llualib.c
index 0586aba..2b4db72 100644
--- a/src/luatex/src/lua/llualib.c
+++ b/src/luatex/src/lua/llualib.c
@@ -199,7 +199,8 @@ static int get_bytecode(lua_State * L)
#else
"bytecode", NULL)) {
#endif
- return luaL_error(L, "bad bytecode register");
+ // error message is on top of the stack
+ return lua_error(L);
} else {
lua_pushvalue(L, -1);
bytecode_register_shadow_set(L, k);
diff --git a/src/luatex/src/lua/luastuff.c b/src/luatex/src/lua/luastuff.c
index fb05c60..b10e6fe 100644
--- a/src/luatex/src/lua/luastuff.c
+++ b/src/luatex/src/lua/luastuff.c
@@ -179,7 +179,6 @@ static const luaL_Reg lualibs[] = {
#endif
/*tex additional (public) libraries */
{ "unicode", luaopen_unicode },
- { "zip", luaopen_zip },
{ "md5", luaopen_md5 },
{ "sha2", luaopen_sha2 },
{ "lfs", luaopen_lfs }, |
In Lua 5.4 (unlike Lua 5.3) "ints" and "sizes" are not dumped as are, but instead in a variable length encoding (with a bounds check when loading). This is even the reason why the checks on size_t and int I mentioned were later removed. It seems ideal for our use case, because 64 bit formats that are in the 32 bit bounds will work on 32 bit. I backported the changes to Lua 5.3. No changes in "LuaTeX" needed, just in TeX Live "libs", probably can be coordinated with other distributions as well, but I don't know what's their attitude towards format sharing. Here are the changes (note that I use a different setup and didn't actually test in TeX Live) I propose for LuaTeX: --- a/src/luatex/src/lua/llualib.c
+++ b/src/luatex/src/lua/llualib.c
@@ -195,11 +195,12 @@ static int get_bytecode(lua_State * L)
if (lua_load
(L, reader, (void *) (lua_bytecode_registers + k),
#ifdef LuajitTeX
- "bytecode")) {
+ "bytecode") != LUA_OK) {
#else
- "bytecode", NULL)) {
+ "bytecode", NULL) != 0) {
#endif
- return luaL_error(L, "bad bytecode register");
+ // error message is on top of the stack
+ return lua_error(L);
} else {
lua_pushvalue(L, -1);
bytecode_register_shadow_set(L, k); And here are the Lua ones: --- a/src/lua/src/ldump.c
+++ b/src/lua/src/ldump.c
@@ -55,8 +55,23 @@ static void DumpByte (int y, DumpState *D) {
}
+/* dumpInt Buff Size */
+#define DIBS ((sizeof(size_t) * 8 / 7) + 1)
+
+static void DumpSize (size_t x, DumpState *D) {
+ lu_byte buff[DIBS];
+ int n = 0;
+ do {
+ buff[DIBS - (++n)] = x & 0x7f; /* fill buffer in reverse order */
+ x >>= 7;
+ } while (x != 0);
+ buff[DIBS - 1] |= 0x80; /* mark last byte */
+ DumpVector(buff + DIBS - n, n, D);
+}
+
+
static void DumpInt (int x, DumpState *D) {
- DumpVar(x, D);
+ DumpSize(x, D);
}
@@ -72,17 +87,12 @@ static void DumpInteger (lua_Integer x, DumpState *D) {
static void DumpString (const TString *s, DumpState *D) {
if (s == NULL)
- DumpByte(0, D);
+ DumpSize(0, D);
else {
- size_t size = tsslen(s) + 1; /* include trailing '\0' */
+ size_t size = tsslen(s);
const char *str = getstr(s);
- if (size < 0xFF)
- DumpByte(cast_int(size), D);
- else {
- DumpByte(0xFF, D);
- DumpVar(size, D);
- }
- DumpVector(str, size - 1, D); /* no need to save '\0' */
+ DumpSize(size + 1, D);
+ DumpVector(str, size, D);
}
}
@@ -186,8 +196,6 @@ static void DumpHeader (DumpState *D) {
DumpByte(LUAC_VERSION, D);
DumpByte(LUAC_FORMAT, D);
DumpLiteral(LUAC_DATA, D);
- DumpByte(sizeof(int), D);
- DumpByte(sizeof(size_t), D);
DumpByte(sizeof(Instruction), D);
DumpByte(sizeof(lua_Integer), D);
DumpByte(sizeof(lua_Number), D);
--- a/src/lua/src/lundump.c
+++ b/src/lua/src/lundump.c
@@ -10,6 +10,7 @@
#include "lprefix.h"
+#include <limits.h>
#include <string.h>
#include "lua.h"
@@ -64,13 +65,30 @@ static lu_byte LoadByte (LoadState *S) {
}
-static int LoadInt (LoadState *S) {
- int x;
- LoadVar(S, x);
+static size_t LoadUnsigned (LoadState *S, size_t limit) {
+ size_t x = 0;
+ int b;
+ limit >>= 7;
+ do {
+ b = LoadByte(S);
+ if (x >= limit)
+ error(S, "integer overflow");
+ x = (x << 7) | (b & 0x7f);
+ } while ((b & 0x80) == 0);
return x;
}
+static size_t LoadSize (LoadState *S) {
+ return LoadUnsigned(S, ~(size_t)0);
+}
+
+
+static int LoadInt (LoadState *S) {
+ return cast_int(LoadUnsigned(S, INT_MAX));
+}
+
+
static lua_Number LoadNumber (LoadState *S) {
lua_Number x;
LoadVar(S, x);
@@ -87,10 +105,8 @@ static lua_Integer LoadInteger (LoadState *S) {
static TString *LoadString (LoadState *S, Proto *p) {
lua_State *L = S->L;
- size_t size = LoadByte(S);
+ size_t size = LoadSize(S);
TString *ts;
- if (size == 0xFF)
- LoadVar(S, size);
if (size == 0)
return NULL;
else if (--size <= LUAI_MAXSHORTLEN) { /* short string? */
@@ -247,8 +263,6 @@ static void checkHeader (LoadState *S) {
if (LoadByte(S) != LUAC_FORMAT)
error(S, "format mismatch in");
checkliteral(S, LUAC_DATA, "corrupted");
- checksize(S, int);
- checksize(S, size_t);
checksize(S, Instruction);
checksize(S, lua_Integer);
checksize(S, lua_Number); So why should this work? For runtime values, Lua uses either "lua_Integer" for integers or "lua_Number" for floats. Usually these are "long long" and "double" respectively. There is also Instruction, which is "unsigned int" (if it is at least 32 bits long) or "unsinged long". As long as these types are the same, I think that there shouldn't be any problems. Are there TeX Live architectures where Size of Somebody please review the patch, I also haven't thoroughly tested, though I got |
Also a thought: Lua is very configurable, and the types for "lua_Integer" and "Instruction" could be customly |
Are there TeX Live architectures where `sizeof(long long) != 8`,
double isn't IEEE 754 double precision number or `sizeof(unsigned
int)` != 4?
I have no way of knowing, but I suppose if it's a problem in practice (I
doubt it) we'll find out.
As for using types from stdint.h, I suspect that would lead us into
further complications without any real gain. Let's not go there unless
we need to.
Thanks much for working on this! I'll point Luigi to your patch here and
we'll see what he thinks. --karl
P.S. I tried to send 32-bit fmts last night but github helpfully
rejected the mail. Clearly you are well past that point. Yay.
|
Already posted an updated patch to the list. https://tug.org/pipermail/luatex/2022-February/007623.html It includes also code that (if needed) byte swaps loaded bytecode. |
|
3. How much slower does lualatex start up if it doesn't do the byte
compiling? Plausible to discern?
IIRC we saved around 0.6 seconds on a reasonably good laptop from 2016.
So it would be best if the optimization can be preserved for most
users. No opinion on the approaches, sorry.
|
That's not really the question here. At the moment, LaTeX never byte compiles instead of reading the source files (I thought about adding that though), instead only data created dynamically during the format building process gets preserved for the actual run as bytecode. So there are no files we could read instead. We could probably write the Lua code into TeX macros which then get parsed in Ignoring the bytecode question for a second, doesn't expl3 also store other system specific data in the format? E.g. what does |
Ah, I failed to realize the implication of just dumping tables, not code.
That actually seems like it should be easier to make system-independent
than dumping bytecode would be. Instead of dumping in the native
integer/whatever, use some defined order and read it back that way?
What obvious thing am I missing now :)?
As for platform description strings, if dumped, that does not seem like
an insurmountable problem to making the fmt system-independent. -k
|
This issue has been automatically marked as stale because it has not had recent activity. |
This issue has been automatically marked as stale because it has not had recent activity. |
Evidently lualatex.fmt cannot be shared between 32-bit and 64-bit systems. All other fmts, including xelatex.fmt, pdflatex.fmt, and the plain luahbtex.fmt, are sharable. Of course the fmts are created by the exact same version of luahbtex (current TL svn).
I get the same mesage with either creating lualatex.fmt on 32-bit and reading on 64-bit, or the other way around. (The fmts can be read fine on the machine where they are created.) I run this command in the TL
Build/source/Work/texk/web2c/ directory (I'll explain trytexenv below):
where ~/lualatex.fmt has been copied in from the other machine,
and get this error:
Akira reports that he gets the same message with his Windows binaries in the same cross-architecture situation.
Above I said "all" other fmts work, but in truth there is one other that doesn't: optex. I noticed this last year, and asked Petr about it. I don't know if it is related to whatever is going on with expl3.lua , but FWIW, he wrote (sorry for the poor line breaks, looked fine in my editor):
I expect that it's also the case for lualatex that "almost zero users" will notice if lualatex.fmt written on 64-bit fails to read on 32-bit and vice versa. It's pretty hard even to find such a pair of machines nowadays. However, if it'd ever be a problem it would surely be with latex where it got noticed. So, I wanted to report it to you.
If you want access to a 32-bit machine, I expect Nelson would be happy to provide. That's where I am testing this stuff.
wdyt? --thanks, karl.
Appendix. The
trytexenv
script I used above is this:where $tm is an envvar I define outside the script to be the "Master" tree for whatever system I'm working on. The actual /Master subdir from an svn checkout, etc.
These environment settings are needed because when running from the build dir, of course the usual search based on $0 and SELFAUTO* cannot find texmf.cnf, and thus nothing else is defined.
As for LUAINPUTS, which seems like it should not be necessary, I think it needs to be defined explicitly because of the different binary name. I can get different errors if I change the search order around. I didn't investigate in depth. Without defining it at all, the file
lualatexquotejobname.lua
can't be loaded, so the fmt reading aborts early.The text was updated successfully, but these errors were encountered: