Skip to content
Permalink

Comparing changes

Choose two branches to see what’s changed or to start a new pull request. If you need to, you can also .

Open a pull request

Create a new pull request by comparing changes across two branches. If you need to, you can also .
base repository: latex3/latex3
base: 2017-12-16
head repository: latex3/latex3
compare: 2018-04-30
This comparison is big! We’re only showing the most recent 250 commits
Commits on Mar 27, 2018
This is sufficiently widely-used that we can't hope
to keep it 'team only', and don't want loads of copies
of the primitive.
This is a tidy up of the 'already done' work.
Where we can avoid the primitive entirely, we do.
Otherwise we keep it for all uses in a module.
As \int_value:w is now public, and there is a performance gain,
this makes sense here.
We could use one per module, but that is a bit of a waste.
This *might* show up an issue as a couple of other people use the break functions.
As they we internal, there's a limited amount we can do ...
Commits on Mar 29, 2018
This requires a new public function, and is slightly slower.
The only places that really require the same functionality are the FPU
and bigint comparison. Both of these can use private copies which are
slightly more efficient. One might consider making the FPU version
wrap-up the "= 0" part other than for the bigint stuff.
Commits on Mar 30, 2018
In most drivers, the angle in \driver_box_use_rotate:Nn is allowed to
be an expression (because one needs its cosd and sind), so doing -#2
is unsafe; I changed to -(#2).  Perhaps l3fp could provide a function
that checks something is a decimal number and not an expression?
This requires loading l3intarray later.  Since I am somewhat likely to
implement an l3fparray it makes sense to load l3intarray and then
l3fparray just after l3fp.
…d, tn

The idea is that one can refer to a foreign internal by \cs{__foreign_...}
but not document it using \begin{function}{\__foreign_...} or similar, nor
use it in code wrapped by macrocode.
We have enough 'open ended' int expression usage that we do need
\int_eval:w I think. That allows us to reduce the number of
\@@_int_eval:w cases to those which are *really* needed (FPU).
Commits on Mar 31, 2018
This one really is 'just for us'.
.. should have been @@ from the start!
There were a couple of tricky cases here, at least in part due to the
lack of a set of interfaces for \kern.
Commits on Apr 01, 2018
That's slightly tricky because l3regex relied on the group that old
tl_build functions created.
Also actually test an error message
Makes finding 'real' uses of __ a bit easier.
Really goes with the \prg_map_.... changes.
Either a kernel-level message or 'do not do this'.
Commits on Apr 02, 2018
And to allow \dim_to_decimal_in_sp:n to be used in l3fp, make it faster.
It would be caught by \debug_on:n { check-expressions } but running all
tests with that option seems too expensive.
Raising a flag takes a time proportional to its height, while this
function takes a constant time.  When the actual height is not useful
it's much faster to only raise the flag once.
This is all about stuff only in l3prop.
We are moving to having everything either public-and-documented or
purely in the code section. These functions are not used outside of
their 'home' module so do not need wider docs (or moving to \__kernel).
A couple have to be made __kernel (I think).
This shows up when you have everything together.
A better place than \__unicode
Commits on Apr 03, 2018
The code is simple enough and there isn't any reasonable interface
This allows us to move \int_rand:nn to l3fp if we want.
I put the end of this year because we deprecated them at the very end of
last year.
This avoids one internal being 'out of order'.
A couple of cases outstanding in l3kernel
As these are 'legit' at this level.
This reverts commit d6fc40b.
... to make things clearer for non-experts.
This deals with an __ issue, and makes \int_rand:nn stable (seems reasonable).
Commits on Apr 06, 2018
This means we have only one set of data storage, avoiding __ issues.
This avoids some issues in test files.

Also means that we are in a known state.
The new approach is more general.  Not sure we should keep \seq_set_map:NNn
but it has been in l3candidates for a while so it would have to be deprecated
properly.
Commits on Apr 07, 2018
Some \tableofcontents were before even the title of the doc.  I've
left some others that were placed in reasonable places, but for most
files we don't even have a toc.
This should never matter since expl3 is only ever loaded in vertical
mode, but I do that for consistency with other lines in that file.
…ions

This is somewhat experimental.  It may make sense to provide a package that
suppresses \traceon output likewise for all non-expandable commands: even
for simple ones like \seq_put_left:Nn this reduces terminal output.
It is often simpler to use finer expansion control, specifically the
surprisingly useful \exp_last_unbraced:NNNNo.
Rather than deal with issues on the naming, by using some x-type
expansion at point-of-use we can drop the second argument here.
Commits on Apr 09, 2018
The name here may need adjustment: the "show" function is
\tl_show_analysis:n. Presumably one of them is the wrong way around ....

On the *returned* data, I've made the swap suggested by Bruno and have
charcode/catcode as for \char_generate:nn.
Commits on Apr 16, 2018
Eventually we'll use this in the sys module,
but at the moment it's too soon.
Commits on Apr 17, 2018
... as we only keep the engine name for the 'version' primitives.
As Chris argues, the namign should reflect the function,
even if here we are resticted in engine coverage.
For \tl_trim_spaces_apply:nN
This adds a new "!" modifier to the syntax which applies to optional
arguments. At present, it only does anything if the optional argument is
also trailing: others are left alone. The logic of that is something
like

    \DeclareDocumentCommand{\foo}{mom}{...}
    \foo{bar} [baz] {bop}

has to find a second mandatory, and grabbing "[" here as #3 would be
pretty strange.

What happens with multiple trailing optionals is not entirely clear.
I've gone for "!" applying on a per-argument basis, but that is only one
way of setting it up.
Commits on Apr 19, 2018
I got this by looking at all "updated" dates for functions in l3kernel.
This may miss syntax changes that only affected functions that were
subsequently updated.  The distinction between bug fixes and syntax
changes is not always easy to make.
Putting a one-level redirect here makes using these with a known number
of expansions tricky. Moreover, it doesn't actually help when we are not
using LuaTeX: a test is still required 'up front'.
Commits on Apr 20, 2018
When adding items to a comma list, spaces were removed but not braces,
even for items that did not need them.  Now the treatment is consistent
by having a common (internal) function to test whether an item should
be braced or not.  For instance,

    \clist_set:Nn \l_tmpa_clist { {a} }
    \clist_if_in:NnTF \l_tmpa_clist { a } {true} {false}

gave false but now gives true.
Commits on Apr 23, 2018
Almost all uses of int_step loops had a step of 1; adding these variants was
probably right.
These are the only two n-type synonyms of primitives, so the only two
n-type arguments that must be braced (the n-type argument of \cs_new:Npn
and similar functions must also be braced but only to end the p-type
argument before it).
Still some discussion to be had here on the best approach.
Commits on Apr 24, 2018
There's no actual content in interface3.
Commits on Apr 25, 2018
The trailing combining chars aren't set up for inputenc, which means
with TL'18 LaTeX2e they are pointing to an error message. That causes
havoc in the case changer: the older version has them just as chars.

Almost certainly these codepoints can't be used in an 8-bit document as
the fact they apply *after* the char is not supported by the engine. So
rather than try to pick up the error, simply adjust the test to stick to
legitimate input.
There is no need to use the primitive directly here as we can do the job
using \char_generate:nn.

We may want to think about a 'generic Unicode engine' switch, something
like \sys_if_engine_unicode_p: or \sys_if_enigine_utex_p: (as we already
use utex for some primitives).
Commits on Apr 26, 2018
Many of the functions already needed expr evaluation for underlying
implmentation reasons. The hit is therefore mainly in color functions.
This cuts out a lot of overful lines.

We might consider making this less repetitive ... but
that might be more work than it's worth.
This deals with more-or-less all of the ones dues to overlong code
lines. There are various places that perhaps might be handled by
re-writing some docs. More tricky are the non-trivial number that come
at the end-of-definition lines: they are outside of something that can
be changed in the source. There are also a reasonable number that are (I
think) in the syntax environments.
Commits on Apr 27, 2018
l3doc can't pick this up.
This allows proper linking of the versions without TF.
The RNG's low bits are not great, so use in priority its high bits.
Commits on Apr 28, 2018
Diffs due to:
- LuaTeX logging changes
- (u)pTeX boxing changes
- l3build normalisation adjustments
Commits on Apr 29, 2018
In some experiments that's about a 20% speed-up because we avoid re-reading
tokens that were expanded
These are the only two variable types with a "<var>_use:N" function
that does not just apply \the or expand a macro once.
It seems reasonable to use \__int_eval:w in these l3int functions.
They are the bottleneck in some upcoming \seq_shuffle:N code.
I am only committing this implementation for the record, and not updating
test files, because I will commit a much faster implementation using
\toks soon.
Commits on Apr 30, 2018
I couldn't find a good way to distinguish them so I took a bad way:
systematically compare every register that comes into fp expressions
with \infty and with \pi.
Some test file changes will likely be altered again:
l3build approach needs adjustment here.
Showing 374 changed files with 40,580 additions and 21,026 deletions.
@@ -15,3 +15,4 @@ xpackages/xfrontm/*.lof
xpackages/xfrontm/*.log
xpackages/xfrontm/*.lot
xpackages/xfrontm/*.toc
*.*~
@@ -84,4 +84,4 @@ LaTeX3 is developed by [The LaTeX3 Project](https://latex-project.org).

## Copyright

This README file is copyright 2017 The LaTeX3 Project.
This README file is copyright 2018 The LaTeX3 Project.
@@ -4,7 +4,6 @@ checkdeps = checkdeps or {maindir .. "/l3kernel"}
typesetdeps = typesetdeps or {maindir .. "/l3kernel"}
unpackdeps = unpackdeps or {maindir .. "/l3kernel"}

cmdchkfiles = cmdchkfiles or {"*.dtx"}
checkengines = checkengines
or {"pdftex", "xetex", "luatex", "ptex", "uptex"}
checksuppfiles = checksuppfiles or
@@ -18,8 +17,9 @@ checksuppfiles = checksuppfiles or
"SpecialCasing.txt",
"UnicodeData.txt",
}
tagfiles = tagfiles or {"*.dtx", "README.md"}
unpacksuppfiles = unpacksuppfiles or {"docstrip.tex"}
versionfiles = versionfiles or {"*.dtx", "README.md"}


packtdszip = true

@@ -33,42 +33,19 @@ if unpacksearch == nil then
end

-- Detail how to set the version automatically
setversion_update_line =
setversion_update_line or function(line, date, version)
local date = string.gsub(date, "%-", "/")
-- Replace the identifiers
if string.match(line, "^\\def\\ExplFileDate{%d%d%d%d/%d%d/%d%d}%%?$") or
string.match(line, "^%%? ?\\date{Released %d%d%d%d/%d%d/%d%d}$") then
line = string.gsub(line, "%d%d%d%d/%d%d/%d%d", date)
end
-- No real regex so do it one type at a time
for _,i in pairs({"Class", "File", "Package"}) do
if string.match(
line,
"^\\ProvidesExpl" .. i .. " *{[a-zA-Z0-9%-%.]+}"
) then
line = string.gsub(
line,
"{%d%d%d%d/%d%d/%d%d}",
"{" .. string.gsub(date, "%-", "/") .. "}"
)
end
end
-- Update the interlock
if string.match(
line, "^\\RequirePackage{expl3}%[%d%d%d%d/%d%d/%d%d%]$"
) then
line = "\\RequirePackage{expl3}[" .. date .. "]"
end
if string.match(
line, "^%%<package>\\@ifpackagelater{expl3}{%d%d%d%d/%d%d/%d%d}$"
) then
line = "%<package>\\@ifpackagelater{expl3}{" .. date .. "}"
end
if string.match(
line, "^Release %d%d%d%d/%d%d/%d%d$"
) then
line = "Release " .. date
function update_tag(file,content,tagname,tagdate)
local iso = "%d%d%d%d%-%d%d%-%d%d"
if string.match(file,"%.dtx$") then
content = string.gsub(content,
"\n\\ProvidesExpl" .. "(%w+ *{[^}]+} *){" .. iso .. "}",
"\n\\ProvidesExpl%1{" .. tagname .. "}")
return string.gsub(content,
"\n%% \\date{Released " .. iso .. "}\n",
"\n%% \\date{Released " .. tagname .. "}\n")
elseif string.match(file,"%.md$") then
return string.gsub(content,
"\nRelease " .. iso .. "\n",
"\nRelease " .. tagname .. "\n")
end
return line
return content
end
@@ -22,7 +22,7 @@ ctanbundles = {"l3kernel", "l3packages", "l3experimental"}
maindir = "."

-- A custom main function
function main (target)
function main(target)
local errorlevel
if target == "check" then
errorlevel = call(checkbundles, "check")
@@ -45,8 +45,14 @@ function main (target)
errorlevel = call(bundles, "setversion")
elseif target == "unpack" then
errorlevel = call (bundles, "unpack")
elseif target == "version" then
version ()
elseif target == "tag" then
if options["names"] and #options["names"] == 1 then
call(ctanbundles,"tag")
else
print("Tag name required")
help()
exit(1)
end
else
help ()
end
@@ -57,4 +63,7 @@ end

-- Find and run the build system
kpse.set_program_name("kpsewhich")
dofile(kpse.lookup("l3build.lua"))
if not release_date then
dofile(kpse.lookup("l3build.lua"))
end

@@ -28,4 +28,7 @@ dofile(maindir .. "/build-config.lua")

-- Find and run the build system
kpse.set_program_name("kpsewhich")
dofile(kpse.lookup("l3build.lua"))
if not release_date then
dofile(kpse.lookup("l3build.lua"))
end

Showing you all comments on commits in this comparison.

@jfbu

This comment has been minimized.

Copy link

@jfbu jfbu commented on f6b5fce Apr 29, 2018

@blefloch typo "explict"

@blefloch

This comment has been minimized.

Copy link
Member

@blefloch blefloch commented on f6b5fce Apr 29, 2018

Fixed, thanks.

@jfbu

This comment has been minimized.

Copy link

@jfbu jfbu commented on d4ebe94 Apr 30, 2018

@blefloch is using toks registers with odd indices non globally okay practice in expl3 ?

@jfbu

This comment has been minimized.

Copy link

@jfbu jfbu commented on c83d05b Apr 30, 2018

@blefloch Have you considered this: \pdfuniformdeviate 16384 may be obtained by rescaling from 2^28 = 16384^2 hence this does not introduce non-uniformity and it kills the low 14 bits. It you do this twice you get again 28bits, of higher quality than from a \pdfuniformdeviate \c__fp_rand_size_int. I am saying this in case the method in this commit is a bit more costly than former one. After all 16384 looks already big in LaTeX usage. Have you tested \pdfuniformdeviate 16384 taken modulo 2, does it have the bias you indicated.

@jfbu

This comment has been minimized.

Copy link

@jfbu jfbu commented on c83d05b Apr 30, 2018

I mean the point is that \pdfuniformdeviate 16384 is surely faster than doing it with \c__fp_rand_size_int and then rescaling by hand to 16384. Thus we get better bits for low cost. This is the idea.

@jfbu

This comment has been minimized.

Copy link

@jfbu jfbu commented on c83d05b Apr 30, 2018

I have done the following experiment.

  1. generate 100 random bits by the parity of \pdfuniformdeviate 16384
    (or also by the parity of \pdfuniformdeviate 2 to the power 28 to confirm your finding that it depends only on parity of seed).

  2. do this for all 25 seeds = 1, 2, ..., 25

  3. for each pair of indices I and J, count how many bits at same location are distinct for the two sequences respectively with seeds I and J. We expect this number to be about 50 in average, for I distinct of J.

  4. Output a matrix with these numbers.

This gives

 00 52 51 36 45 52 45 47 39 47 51 56 50 52 39 44 55 45 51 52 57 53 53 44 53
 52 00 47 56 41 54 51 39 55 43 47 50 50 40 53 48 41 51 55 58 53 55 47 56 41
 51 47 00 53 52 39 46 48 44 44 50 59 45 53 54 47 46 48 56 51 58 52 56 49 50
 36 56 53 00 43 44 45 53 49 43 49 42 52 52 41 48 55 49 47 50 45 53 51 48 57
 45 41 52 43 00 61 54 42 50 60 40 45 49 37 52 51 54 46 50 45 54 46 52 57 52
 52 54 39 44 61 00 45 51 43 41 55 46 46 56 55 58 51 51 53 52 49 49 53 50 49
 45 51 46 45 54 45 00 54 46 44 52 43 45 49 50 57 50 46 56 51 44 52 52 49 50
 47 39 48 53 42 51 54 00 48 52 40 49 51 55 46 51 42 56 48 51 62 52 48 51 42
 39 55 44 49 50 43 46 48 00 50 50 47 57 55 48 41 54 50 58 55 54 58 48 47 56
 47 43 44 43 60 41 44 52 50 00 54 55 37 55 42 51 48 48 56 57 56 48 48 49 50
 51 47 50 49 40 55 52 40 50 54 00 53 51 37 58 43 48 52 50 55 52 54 54 57 46
 56 50 59 42 45 46 43 49 47 55 53 00 48 54 37 50 57 45 55 52 49 57 51 56 57
 50 50 45 52 49 46 45 51 57 37 51 48 00 48 55 46 49 51 45 50 53 49 55 54 51
 52 40 53 52 37 56 49 55 55 55 37 54 48 00 53 50 45 53 49 48 47 43 59 56 47
 39 53 54 41 52 55 50 46 48 42 58 37 55 53 00 49 52 42 48 53 50 58 52 53 52
 44 48 47 48 51 58 57 51 41 51 43 50 46 50 49 00 49 51 45 56 59 53 55 48 55
 55 41 46 55 54 51 50 42 54 48 48 57 49 45 52 49 00 52 52 47 56 52 46 55 46
 45 51 48 49 46 51 46 56 50 48 52 45 51 53 42 51 52 00 56 55 44 50 54 49 54
 51 55 56 47 50 53 56 48 58 56 50 55 45 49 48 45 52 56 00 47 50 44 52 55 48
 52 58 51 50 45 52 51 51 55 57 55 52 50 48 53 56 47 55 47 00 47 53 39 50 57
 57 53 58 45 54 49 44 62 54 56 52 49 53 47 50 59 56 44 50 47 00 56 54 49 52
 53 55 52 53 46 49 52 52 58 48 54 57 49 43 58 53 52 50 44 53 56 00 52 49 42
 53 47 56 51 52 53 52 48 48 48 54 51 55 59 52 55 46 54 52 39 54 52 00 49 50
 44 56 49 48 57 50 49 51 47 49 57 56 54 56 53 48 55 49 55 50 49 49 49 00 55
 53 41 50 57 52 49 50 42 56 50 46 57 51 47 52 55 46 54 48 57 52 42 50 55 00

I am not good at statistical analysis but perhaps someone can conclude things out of this. For example the "36" at row 4 and column 1 is small, it is at 3 standard deviation below "50", which has about ``0.135%` likelihood, which looks small even if we multiply by 24, but perhaps not so.

Anyway, by simply making 2 such \pdfuniformdeviate instead of one we have probably much better quality 28bits, perhaps this can help reduce the workload on \int_rand:nn to achieve its goal.

@blefloch

This comment has been minimized.

Copy link
Member

@blefloch blefloch commented on c83d05b May 1, 2018

@jfbu

This comment has been minimized.

Copy link

@jfbu jfbu commented on c83d05b May 1, 2018

Yes, the pdftex manual should be more precise about the limitations of the PRNG. Arguably for reasonable usage in LaTeX, 16384 item is already large, what do you think? then the non-uniformity is not so dramatic.

Your question about statistics deserves attention indeed. I wanted earlier today to look at the sources but did not so far. All my information comes from the one added to expl3.

Although this does (edit: not) answer your question about the low bits I am reporting here that I did some tests using ent software (available from MacPorts on the Mac). I generated bytes (8bits) seven by seven (7*8=56=4*14) by using either the low of high 14bits of 4 calls to \pdfuniformdeviate2**28. Thus I created (with Python for helping conversion to actual bytes from hexadecimal strings produced by etex) 2 files of 7000 bytes each. One generated from the high bits, the other from the low bits. The ent seems to confirm (I tested seeds 1, 12345, 12346) that high bits are more random than low bits. For example with 12345 as seed:

******** LOW

$ ent testrandomchilow.bytes 
Entropy = 7.972049 bits per byte.

Optimum compression would reduce the size
of this 7000 byte file by 0 percent.

Chi square distribution for 7000 samples is 270.77, and randomly
would exceed this value 23.77 percent of the times.

Arithmetic mean value of data bytes is 128.2531 (127.5 = random).
Monte Carlo value for Pi is 3.073756432 (error 2.16 percent).
Serial correlation coefficient is 0.008059 (totally uncorrelated = 0.0).

********* HIGH

$ ent testrandomchihigh.bytes 
Entropy = 7.974596 bits per byte.

Optimum compression would reduce the size
of this 7000 byte file by 0 percent.

Chi square distribution for 7000 samples is 246.63, and randomly
would exceed this value 63.50 percent of the times.

Arithmetic mean value of data bytes is 127.8886 (127.5 = random).
Monte Carlo value for Pi is 3.152658662 (error 0.35 percent).
Serial correlation coefficient is 0.010002 (totally uncorrelated = 0.0).

The serial correlation test comes however always (I mean, in the three cases I tested...) lower for the low bits and is the only indicator seeing low bits more random than high bits. Of course these things tell us nothing about the fact that using another seed might lead to a surprisingly correlated outcome, as your parity finding demonstrates.

@jfbu

This comment has been minimized.

Copy link

@jfbu jfbu commented on d4ebe94 May 1, 2018

@blefloch already 12! > 2**28...

@jfbu

This comment has been minimized.

Copy link

@jfbu jfbu commented on d4ebe94 May 1, 2018

at the tex.sx question I added mention that with 12 elements the identity permutation is never generated

@jfbu

This comment has been minimized.

Copy link

@jfbu jfbu commented on c83d05b May 1, 2018

@blefloch Bad news indeed. I looked in pdftex.web the algorithm. It is purely linear modulo 2**28. A recipe initializes 55 integers depending on seed. All computations are done modulo 2**28. You can reduce modulo any divisor of that. Then a sequence is generated by x_n = x_{n-55} - x_{n-31} modulo 2**28. Again you can reduce modulo any 2**k. This means that the k low bits sequence is only indexed by the k low bits of the seed! For example, modulo 16, \pdfuniformedeviate 2**28 generates only 16 distinct sequences for all seeds !!!. I confirmed that with a little tex which outputs to a file 100 hex-digits (the 4 low bits of the generate sequence) for each seed=0, 1, ..., 1000. Then

$ sort testrandom4.out | uniq > temp ; wc -l temp
      16 temp

You observation on parity bits is completely general.

@jfbu

This comment has been minimized.

Copy link

@jfbu jfbu commented on c83d05b May 1, 2018

Here is a quote from Knuth TACP, vol II, chap. on random numbers, summary vii):

(he seems to comment a X<- aX+c mod N rule, with N power of 2 and a equal to 5 mod 8 so that it generates maximal size subgroup of multiplicative group)

The least significant (right-hand) digits of X are not very random, so de-
decisions based on the number X should always be influenced primarily by
the most significant digits. It is generally best to think of X as a random
fraction X/m between 0 and 1, that is, to visualize X with a radix point at
its left, rather than to regard X as a random integer between 0 and m — 1.
To compute a random integer between 0 and k—1, one should multiply by k
and truncate the result. (Don't divide by k; see exercise 3.4.1-3.)

Thus of course he is well aware of low bits badness (i haven't check the exercise). I have not checked pdftex, but I expect it to follow the advice so to do trunc(k*random(2**28)/2**28) to get random integer from 0 to k-1. This is not quite uniform though as you comment in the expl3 code.

edit the metapost code does the rescaling with rounding remapping upper limit of range to zero.

@jfbu

This comment has been minimized.

Copy link

@jfbu jfbu commented on c83d05b May 1, 2018

(not to spam but just to say that having a more limited number of possible sequences of say 4 low bits doesn't mean that the actual sequences do not behave randomly; imagine a RNG which has no seed. It can still be excellently random )

@blefloch

This comment has been minimized.

Copy link
Member

@blefloch blefloch commented on c83d05b May 1, 2018

@jfbu

This comment has been minimized.

Copy link

@jfbu jfbu commented on c83d05b May 2, 2018

I would recommend being terse in the documentation! I don't know what is the current state of research regarding "lagged Fibonacci sequences" as used in the MetaPost RNG. I opened for the first time yesterday volume II of Knuth TAOCP which contains information about it, and all of it may be dated nowadays.

Discussion starts at 3.2.2-(7) page 27. See also exercise 3.2.2.-30 which seems to indicate in our case the period of each sequence (whatever the seed) is 2**27(2**55-1) from the theory or primitive trinomials over Z/2Z, but I am currently confused whether the implementation in pdftex.web introduces a twist from running through the array of 55 integers "backwards", which would make the period longer (I am confused about that). By the way I corrected my post on tex.sx about impossibility to end up with seq_shuffle:N on 12 items with the identity permutation because I had checked only when doing it immediately after having set the seed. I initially though the underlying math was an iteration of a function F on a set Z/2**28Z but understood better only yesterday after having checked pdftex.web. Blocks of 55 such integers are used.

On mention of some issues of these "lagged Fibonacci" in TAOCP, see among others

  • 3.3.2 end of section J.
  • exercise 3.3.2-31 and 3.3.2-35 and 3.6-14.

In particular exercise 3.3.2-31 comments on parity bits for what I think is about the same recurrence as in pdftex.web (modulo the "backwards run through the array" I mentioned earlier which confuses me) seems to say that when we start at a random point in the sequence (your \sys_gset_rand_seed:n) there is more than 51% chance that runs of 79 parity bits have more 1s than 0s.

Nevertheless the parity bits are not that bad, they guided the construction to generate sequences with guaranteed very big periods (2**55-1 at least for the parity bits alone).

About documenting non-uniformity, perhaps something like "The underlying pseudo random number generator works with 28bits integers ; even if it achieved perfect uniform distribution, the rescaling (done by the engine) to produce an integer in a range with N values by necessity introduces non-uniformity, of relative order of magnitude about N/2**28. For N<= 2**14, \uniformdeviate returns untouched the engine produced value, for larger N's it takes extra steps to insure that the non-uniformity remains relatively bounded by about 1/2**14 ." Hmm, that's already a mouthful.
Alternative is to say nothing at all...

I find your plans about going via floats quite reasonable. I briefly examined an alternative but it looks more costly: the problem with the rescaling from fraction U/2**28 with U random integer is not really that low bits are not good, but simply that induced non-uniformity is in proportion to N/2**28 with N the target range. One could imagine trying to achieve morally trunc(N * (U*2**28 + V)/2**56) (or only 28+k with N having less than 14+k bits), but this looks costly as it loses the engine doing rescaling for us, also we can't easily use \numexpr temporary double-precision here.

@jfbu

This comment has been minimized.

Copy link

@jfbu jfbu commented on c83d05b May 2, 2018

ah ok I think I got a not too bad way for handling N>= 2**14. Here is algorithm:

  • compute Q such that (Q+1)*2**14 = N - R, R<2**14.

  • the idea is to consider separately the first Q*2**14 integers and the 2**14+R remaining ones.

  • if Q = 0, i.e. N< 2**15 such use directly \uniformdeviate N.

  • if Q>0 do N1 = (\uniformdeviate Q ) times 2**14 + \uniformdeviate 2**14. Also do N2 = Q*2**14 + \uniformdeviate (2**14 + R).

  • There remains to say when to return N1 and when to return N2. For this we need good approximation to fraction (2**14+R)/N which is probability we should return N2. Fortunately we can use \numexpr to evaluate F = \the\numexpr 2**28*R'/N\relax with no overflow, R' = (2**14+R). Then make a call to \uniformdeviate 2**28 and choose N2 if this returns an integer less than F. Else return N1.

Let's evaluate non uniformity of an integer when the returned value is N2. It is obtained with probability 1/R' (accepting uniformity here to simplify discussion) times F/2**28. Let e be absolute value of F/2**28 - R'/N. This e is if I am correct at most 1/2**29. Thus we get that probability 1/R' times F/2**28 is 1/N plus an error of absolute value at most e/R'. The relative error compared to 1/N is Ne/R'. But R' has been constructed to be at least 2**14 and N is at most 2**28, so we get 2**14e which is bounded above by 1/2**15. In fact there was non-uniformity that we neglected in 1/R' but as R' is less than 2**15 that non-uniformity was at most of the order of 2**15/2**28. Altogether we get relative non-uniformity at most a bit more than 1/2**13. Perhaps by looking better we can improve a bit and get closer to 1/2**14. Hopefully I made no mistake.

The non-uniformity of an integer obtained via the N1 road: it is chosen with probability 1/Q*2**14 times 1 - F/2**28. Again F/2**28 = R'/N ± e so and 1 - R'/N = Q/N so we get 1/N with an error e/Q*2**14, the ratio N times e/(Q*2**14) is I think at most 3e from our construction N = Q*2**14 + R' (Q not zero). Thus this is very small and what counts is the non-uniformity we neglected so far when obtaining N1 = U * 2**14 + V and that looks under control.

The whole thing should be tightened up but maybe a viable way.

@jfbu

This comment has been minimized.

Copy link

@jfbu jfbu commented on c83d05b May 2, 2018

in implementation of course we first decide whether it will return N1 or N2 (one call to \uniformdeviate) and only then do we compute N1 (2 calls) or N2 (one call).

Thus the doc could say: "for N less than 2**15, the engine primitive is used with no modification, for N>=2**15, expl3 code takes extra steps to achieve a uniform distribution up to a relative error bounded above by about 1/2**13, which is better than the N/2**28 which would have resulted from returning directly the engine value."

in all of the above I restricted myself mentally to N<2**28, the argument must be checked for higher values. I don't know what is maximum expl3 authorizes.

@jfbu

This comment has been minimized.

Copy link

@jfbu jfbu commented on c83d05b May 2, 2018

@blefloch Here is another idea, but it recycles the same trick that we need to approximate closely a frequency. I think you mentioned something about log(N) somewhere so perhaps you have already considered this, apologies if this is repetition.

Goal: draw a uniform integer < N with small non-uniformity.

Initialize: A = 0.

  1. If N = 1 stop: return value of A. Else compute M = N//2 and if N is even go to 2. else go to 3.

  2. do A<- A+M if \uniformdeviate 2 returns 0 else don't change A. Replace N by its half M and go back to 1.

  3. Define M' = M+1 and compute rounding F = \the\numexpr 2**28*M'/N\relax. This happens with no overflow. The exact f = M'/N (which is at least 1/2) differs from F/2**28 by relative error at most 1/2**28. Do \uniformdeviate 2**28,

  • if the result is >= F do A <- A+M' (M' = M+1 is the value slightly above half of N). Replace N by M and go to 1.

  • if the result was <F, do not modify A bu replace N by M' and go to 1.

In about log_2(N) steps the algorithm concludes (if N is even we consume one binary digit, if N is odd this is also true except in the sole case that N = 2*K-1 then we replace N by (N+1)/2 = 2*(K-1) but then we have a power of 2 and conclude in K-1 extra steps). Each step introduces a relative non-uniformity of 1 part in 2**28. This means we cover the whole range of TeX integers with something with at most 30/2**28 relative non-uniformity.

Rather than making possibly 30 calls to `\pdfuniformdeviate 2`, we can perhaps do only one to `\pdfuniformdeviate 2**28` and extract bits (2 calls extra will be needed for `N>= 2**28`; so say either one or two calls to `\pdfuniformdeviate 2**15` depending on size of `N`). However there is the cost of division by 2 then both for that "oracle" and for the evolving `N`. Furthermore we have the problem of the low bits (this is very confusing this business of low bits).

Sorry last paragraph was non-sense. The point of the manoeuver is that we do \uniformdeviate 2**28 in cases when N is odd. Only when N is even can we do \uniformdeviate 2.

@jfbu

This comment has been minimized.

Copy link

@jfbu jfbu commented on c83d05b May 2, 2018

I corrected last paragraph of previous comment. Possibly expl3 could provide two randint, especially for N<=2**14. The "fast" which is simply the engine primitive and the "better" which obtains uniformity with error Cste 2**-28 but at a cost in execution speed?

@jfbu

This comment has been minimized.

Copy link

@jfbu jfbu commented on c83d05b May 2, 2018

A variant, possibly faster: still the naive idea that to get a random integer in a range of size N which is split into one part of size K and one of size L we need good approximations of K/N (or L/N), which means we must somehow use not too small values of either K or L. Because we will do e.g. \numexpr\c_@@_rand_size_int*L/N\relax which is allowed by \numexpr "scaling" and is the more relatively precise for estimating L/N, the larger it is. If we arrange for K to be a power of 2, then a single \uniformdeviate will terminate in about K/N of the cases.

So the variant here is N = K + L, K a power of two, and L not too small in proportion to N. If we take K = 2**k with 2*K <= N < 4*K, we have N = K + L, K <= L < 3*K, and we get 1/2 <= L/N < 3/4. So in worst scenario we have 25% chance that next step will be with K hence concluded with one \uniformdeviate and 75% chances to continue with L which is bounded by 3N/4. If I did not make a mistake I evaluate that worst case scenarii least to at most 4 steps on average until reduction to power of 2 range. Hence at worst 4 \numexpr + \uniformdeviate and one final \uniformdeviate. The 4 steps each bringing at worst 1/2**28 relative non-uniformity. But if I understand myself correctly this tells only the average non-uniformity not the maximal one. The maximal one should be log(N)/log(4/3) 1/2**28 which brings us around 100 times 1/2**28. The achieved guarantted uniformity is thus less good than in previous method, but apparently with only on average 4 \numexpr and 5 \uniformdeviate. This could be interesting.

@jfbu

This comment has been minimized.

Copy link

@jfbu jfbu commented on c83d05b May 3, 2018

@blefloch Hi again. Sorry for my never-ending comments, but I have given some extra thought on the "non-uniformity".

Imagine we use \uniformdeviate N with R repetitions. Can we detect for a given value 0<= i < N that its probability p_i diverges from 1/N? At each draw of random X, X=i is a Bernoulli for p_i with standard deviation sqrt(p_iq_i) with q_i = 1 - p_i \approx 1. Typically the number of occurences O_i will be the expected one E_i = R p_i plus or minus something of size about sqrt(R p_i q_i). So the observed frequency will differ from p_i by an absolute error about sqrt(p_iq_i/R). On the other hand we know that p_i can differ from 1/N by an absolute error about 1/2**28. To detect that empirically one is tempted to say one must have sqrt(p_iq_i/R) (which is \approx sqrt(1/RN)) not to small too big compared to it, this means RN must be not negligible with respect to 2**56.

For N less than 2**15, this means R not too small compared to 2**41. This is very big number of draws in a LaTeX run... now my reasoning might be crooked because I am focusing on checking non-uniformity for one value among the outcomes.

So I have started running some global statistics with the help of a Python snippet doing the same as pdftex.web code (see links below). And for example I get this:

>>> test_uniform(12345, 37957, 300000000)
37658.08519214392
Power_divergenceResult(statistic=37658.085192173334, pvalue=0.8603065208066698)

where I do a χ2 test of 300,000,000 draws of \uniformdeviate 37957 with seed 12345. It says that there is 86% chance that such a χ2 or higher can arise from an exactly uniform variable with 37957 values 0, ..., 37956. Thus this statistic can not detected non uniformity and we already did 3e8 draws which is not a small number of draws ! (my python takes a while to complete... perhaps not much faster than doing it in TeX actually).

If we use \uniformdeviate N with N = 3*2**26 then the probability of occurrence of i depends on i mod 6 and is twice bigger for i mod 6 = 2, 5 than for 0, 1, 3, 4. Nevertheless the above reasoning if true says that to detect non-uniformity we need of the order of 2**56/(3*2**26) = 2**30/3 trials. This is not so stupid-looking because for 3*2**26 possible outcomes it is reasonable non-uniformity can not be detected from less draws than some reasonable multiple (here 16/9) of the number of possible outcomes.

Fwiw, I Pythonified pdftex.web code here (see also this tex file, which uses \pdfuniformdeviate to check Python produces same random ints). To verify my reasoning one could use this code with 2**28 replaced say by 2**8 and test that non-uniformity then shows up only when we do of the order of 2**16/N (or a bit less) random draws at least, for any N less than 2**8. I will check that later, but will refrain from adding here to my never-ending reports, except if it turns out that I can not confirm my expectations.

The general conclusion is that for N < 2**28 non-uniformity does not appear to be a real problem because for it to show really significantly the number of trials R must not be too small compared to 2**56/N.

@jfbu

This comment has been minimized.

Copy link

@jfbu jfbu commented on c83d05b May 3, 2018

(can't stop) Still focusing on N < 2**28. I gave the example with N = 3 * 2**26 where the probabilities are in fact periodic of period 3 and values in proprotion 1/4:1/4:1/2 thus clearly non uniform if we do random(N) mod 3. For any N = M times 2**k (M odd) if one does random(N) then reduces modulo M the non-uniformity is worse by factor 2**k than doing random(M) directly. This looks to me as the main limitation which should be documented or worked around. One option would be to always produce random(N) by random(M)*2**k + random(2**k) or random(2**k)*M + random(M) (I don't know offhand which is best). If there is a corrective measure to take in priority I think that would be it: "correct" \uniformdeviate N if N is neither odd nor a power of 2.

@blefloch

This comment has been minimized.

Copy link
Member

@blefloch blefloch commented on c83d05b May 4, 2018

@jfbu

This comment has been minimized.

Copy link

@jfbu jfbu commented on c83d05b May 4, 2018

I agree with 99% but look at R = 3*2**26 in your scheme

S = \uniformdeviate R
X = \uniformdeviate 16384
R1 = (R-8192)/16384
R2 = R-16384R1
result = min + R1
X+(R2*X+S)/16384
if result > max: result = min

If I understand correctly you are laying out mentally 16384 copies of range R. One such interval is picked at random (this step is uniform) and in the interval one uses range R (non-uniform). Then one divides by 16384 with rounding.

The relative probabilities before division by 16384 obey the sequence 1:1:2:1:1:2:... etc.

The value 1 will be obtained from 8192 to 8192+16383. As 8192 is 2 modulo 3 the pattern is 2:1:1 5461 times and then 2. For a total weight of 5461*4+2=21846. The value 2 will be obtained from 8192+16384 to 8192+16384+16383. We start with 1:1:2 and end with 1. The weight is 21845. Next one starts with 1:2:1 and ends with 1. Weight again 21845. Newt one will have again weight 21846. So if we now take this modulo 3 we have non-uniformity (starting at 0): 21845:21846:21845. This is better than the 1:1:2 from doing (\uniformdeviate 3*2**26) mod 3, but this it is there. So the construct does not give uniformity (if I am not mistaken).

@jfbu

This comment has been minimized.

Copy link

@jfbu jfbu commented on c83d05b May 4, 2018

A. the requirement about fixed number of calls to \uniformdeviate appears very strong constraint, this means if you manage to make in 4 calls, you can not improve in future if it requires 5 calls.

B. I would drop the requirement for exact uniformity, because it is already an article of faith it is achieved modulo 2**28, or at least we don't know to what extend it is achieved. Placing myself in this perspective!

  • P = 268,435,399 is the largest prime < 2**28 (it is the 14630843th prime https://primes.utm.edu/nthprime/index.php)

  • it is efficient to get "uniform" integer in range(268435399), draw one with maxrand and drop if it is among the rare 268435399 <= x < 268435456

  • but previous step can not be rigorously bounded above in terms of number of \uniformdeviate calls! anyway almost all the time it will be 1.

  • now to get a "uniform" one in range(N) under the constraint N < P use the round( N * rand(268435399) / 268435399) which can be made in one single numexpr "scaling" operation. (and followed with the remapping of top to 0)

  • the idea is that the problems we have with 3*2**26 vs 2**28 have (a priori) no analogue with P, because P is prime.

  • Let's say we do the above only if N <= 100000 for example, because this removes dependency on low bits hence improves the correlation problem (but I am in the complete dark about what exactly happens for high bits; I see no obvious way to show correlation of the type we found for low bits)

  • For N > 100000 we could proceed in a way I described in a previous comment, which improves the uniformity. But later I pointed out in fact I believed the higher non-uniformity for higher N's was not a real problem because doing even 0.01 (2**56)/N draws is a lot!

For me the more serious problem is the one with \uniformdeviate N when N is not a power of 2 but is divided by some large power of 2. The procedure as above using P is designed to get rid of it, but it costs a \numexpr.

Underlying all of this is that I am happy with the non-uniformity inherent in the "re-scaling" method to go from P to N as I am with the one from 2**28 to N.

Then you also want random floats but that's another matter. Going via uniform float to get uniform integer is like replacing 2 by 10 and will still have the problems with numbers such as 3*10**b, b>0.

@blefloch

This comment has been minimized.

Copy link
Member

@blefloch blefloch commented on c83d05b May 4, 2018

@jfbu

This comment has been minimized.

Copy link

@jfbu jfbu commented on c83d05b May 4, 2018

so fat I had carefully neglected the "big" ones... wait, R>=2**31 I am a bit tired now, but can this be possible at all? ah yes, I see the problem of the range. who would ever worry about that ;-).
how about this arrangement

if R2 is at most 7

R2 * X1Y1 + div(R1R0 * X1Y1, 2 ** 28) + div(div(R2R1R0 * Y0X0, 2**30), 2**26)

if R2 is from 8 to 15, S2 = R2 - 8

"min + R2 * X1Y1" + div(R1R0 * X1Y1, 2 ** 28) + div(div(S2R1R0 * Y0X0, 2**30), 2**26) + div(8*Y0X0, 2**28) 

but final result can probably overflow ("over-big" ranges were not my perspective, so my brain is not ready) even if combined with +min. Probably typos.

expl3 should do only randrange(a,b) which does not admit upper limit b !

@blefloch

This comment has been minimized.

Copy link
Member

@blefloch blefloch commented on c83d05b May 5, 2018

@jfbu

This comment has been minimized.

Copy link

@jfbu jfbu commented on c83d05b May 5, 2018

As I now need to think before and not while posting I will ponder later your P=268435399 and N=88783.

Just for the record, I wish to summarize to myself here that the bigger constraint is to get a scheme which is efficient in e-TeX.

As your proposal with R0R1R2 etc... already shows, we could in principle simply do this: if we trust that the PDFTeX RNG for 28bits integers is uniform enough and uncorrelated enough for successive trials, then we just do round(N times 0.XY) where X, Y are base-2**28 digits. This will give relative non-uniformity of order N/2**56. Even if N is divisible by high power of two, this is not problem as we limit N<2**32, relative non-uniformity is at most 1/2**24 and any "modulo" or generalized such can not make this worse when reducing to smaller range: in worst case a given value in a smaller size N range will inherit the non-uniformity prevailing at \approx 2**32, i.e 1/2**24. Thus, there is no "real" problem except:

  1. we don't know what is quantitatively the quality of Knuth RNG. We don't trust low bits: not that we doubt their non-randomness but simply the fact that the space of seeds is smaller than what we would have thought naively.

  2. we must do this efficiently in e-TeX.

First item can be addressed with some waste by only using \uniformdeviate 16384 so we get 2**14-digits and we use them. Except that the "ent" software indicated in my testing that serial correlation was a bit higher for 14 high bits than 14 low bits...

item 2 has the issue that if our recipe does not give in all cases the exact theoretical result then the non-uniformity analysis is more complicated. Thus we need to achieve exactly round(N * 0.XY) and workaround of overflows. If the hierarchy had not decided that N as big as 2**32-1 was to be acceptable, that task would be easier.

(I thinks this is all paraphrasing things tacit in your R0R1R2 etc ... just to convince myself the issue is one of TeXing so it gives me an excuse to wait for you to solve it ;-). Also nothing the above would be really relevant if the goal had been maintained to achieve exact uniformity; this can't be done in a fixed number of \uniformdeviate calls as you recalled as a universe of cardinality a power of 2 can not be divided evenly in 3 parts)

@jfbu

This comment has been minimized.

Copy link

@jfbu jfbu commented on c83d05b May 5, 2018

this is not problem as we limit N<232, relative non-uniformity is at most 1/224 and any "modulo" or generalized such can not make this worse when reducing to smaller range: in worst case a given value in a smaller size N range will inherit the non-uniformity prevailing at \approx 232, i.e 1/224.

by reducing to smaller range I mean any scheme going from N1 to N2 where N2 divides N1 and which would theoretically map uniform distribution at level N1 to uniform distribution at level N2.

@jfbu

This comment has been minimized.

Copy link

@jfbu jfbu commented on c83d05b May 5, 2018

sorry in previous post my restriction on N2 divides N1 is not needed. I consider any scheme of the following type: I have a set with N1 elements, I partition it into N2 sub-parts of same cardinality Q and a remainder. Then I produce a random in range N2 from one in range N1 by repeating trials until I fall into one of the indexed sub-part and return the index. Then the relative non-uniformity for the range N2 variable is at worst the one prevailing at range N1. We don't even need Q to be the Euclidean quotient of N1 by N2. For example we can take Q = 1, meaning we have chosen some N2 outcomes among N1 and just wait until we hit one of them.

however, if we know authorize ourselves to produce N2 from some such recipe involving a pair of 2 randoms in range N1, we worsen the non-uniformity by a factor 2. (i.e. we consider here N2 subsets of range(N1) x range(N1) of equal cardinality).

regarding your observations about P and N, I guess it tells us a concrete scheme to go from N1 = N to N2 = 2 and it makes it explicit how to effectively transfer non-uniformity at level N1 to non-uniformity at level 2.

If with K outcomes we have relative non-uniformity e, we expect about K/e^2 draws to exhibit clearly that one possibly outcome is more likely for example. The number of needed trials to exhibit non-uniformity is thus proportional to the number of outcomes. However imagine K is even, and that the first K/2 outcomes have frequency (1+e)/K and the next K/2 each has (1-e)/K. Then clearly I can create range 2 random variable with probabilities 1+e:1-e and in 1/e^2 trials I will discover ther is something fishy. This reasoning shows that whatever the initial K, we must achieve that 1/e^2 is big with respect to what we allow ourselves.

The P-scheme does not change the fact that somewhere we have e near 1: without looking into details (maybe things are not as bad) we thus must reject it. We must get an e on the 2**32 range which is such that 1/e^2 is big. I would say 10**9 is already big for LaTeX, thus achieving 1/e^2 = 10**16 is in view ample enough. That means achieving e of the order of 10**-8. This conforts your 2**-24\approx 6e-8. Adopting e=2**-24 we get 1/e^2 = 2**48 > 2e14. The clock frequency of my computer is 2.8 Ghz so if I am not wrong this means 2.8 e9 per second, the ratio 2**48/2.8e9 \approx 100526 seconds, hence almost 28 hours. I think we are safe at that level. Maybe you will know better how many "clock beats" one \uniformdeviate needs, which will give a factor by which to multiply these 28 hours and perhaps lower our requirement 2**-24.

(by square root of that factor)

@blefloch

This comment has been minimized.

Copy link
Member

@blefloch blefloch commented on c83d05b May 5, 2018

@jfbu

This comment has been minimized.

Copy link

@jfbu jfbu commented on c83d05b May 5, 2018

@blefloch about

One parameter that I have been ignoring somewhat is that \uniformdeviate itself is certainly not exactly uniform. Do you know how non-uniform it is? You seemed to be saying that the last bit is much more non-uniform than that, for instance, with detectably more 1 than 0.

As I don't know I will give instead some background information as memo that I got from looking in volume 2 of Knuth TAOCP:

The underlying recurrence is x(n) = x(n-55) - x(n-24) and it is this one for which Knuth says (exercise 3.3.2-31 that the following happens: if

  1. we take a random n,
  2. then count how many 1s and 0s there are in the x(n), x(n+1), ... , x(n+78),

then there is more then 51% chance that we found more 1s than 0s.

But this does not apply directly to our true sequence y(n) which is related to x(n) by this formula:

y(55*k + j) = x(55*k + 54 -j)

Here y(1) is returned value after having set the seed. It is thus x(53), because the array index is initially 54 and morally indicates the last returned value. So the first returned one corresponds to x(53) then x(52) ...., x(0) and then a new batch is created, x(109) is returned and downwards to x(55) and then it will be x(164) etc...

At the level of the y(n) the recurrence is this

if n mod 55 is < 31, y(n) = y(n-55) - y(n+24)

if n mod 55 is >= 31, y(n) = y(n-55) - y(n-86)

we can express all in terms of past

if n mod 55  < 7, y(n) = y(n-7) - y(n-31) - y(n-38) + y(n-55)

elif n mod 55 < 31, y(n) = -y(n-31) + y(n-55) + y(n-62)

else y(n) = y(n-55) - y(n-86)

Python:

def TEST(seed, R):
    print(seed)

    max_rand = 2**28

    init_randoms(seed)

    L = [0]  # when we start j_random is 54 which means morally
    # that an array item is already consumed, first random will be
    # the one at randoms[53]
    # so we put a dummy value in Oth position of L

    # first random will be put in position 1 etc...
    for k in range(1, R+1):
        L.append(unif_rand(max_rand))

    # This is recursion using only past
    # We check Z is zero modulo 2**28 always
    for n in range(110, R+1-24):
        u = n % 55
        if u < 7:
            Z = L[n] - (L[n-7] - L[n-31]- L[n-38] + L[n-55])
        elif u < 31:
            Z = L[n] - (-L[n-31] + L[n-55] + L[n-62])
        else:
            Z = L[n] - (L[n-55] - L[n-86])

        # never happens...
        if Z % max_rand:
            print("error", n)
            return None

If I am correct the above recurrences are obeyed exactly by successive returned values of \uniformdeviate 2**28, with n=1 corresponding to immediately after setting the seed.

The switch from the x's to the y's seems to perturb the 79 observation. I have not reproduced it in my brief testing (and I forgot to test the x's so far).

At the level of the x's, as far as I understand the (x(n), x(n+1), ..., x(n+54)) mod 2 are never all zeros and all 2**55-1 bit patterns are obtained exactly once along a 2**55-1 period. And modulo 2**28 it seems from TAOCP Vol II, exercise 3.2.2-30 that the period is 2**27*(2**55-1). This happens for any given seed.

Here is quote (page 570): (i change notations to mine above)

Notes: This exercise is based on the discovery by Vattulainen, Ala-Nissila, and
Kankaala [Physical Review Letters 73 A994), 2513-2516] that a lagged Fibonacci
generator fails a more complicated two-dimensional random walk test. Notice that the
sequence x(2n), x(2n+2), ... will fail the test too, because it satisfies the same recurrence.
The bias toward 1s also carries over into the subsequence consisting of the even-
valued elements generated by x(n) = (x(n-55) ± x(n-24)) mod 2^e; we tend to have more
occurrences of (... 10) than (... 00) in binary notation.
There's nothing magic about the number 79 in this test; experiments show that a
significant bias towards a majority of 1s is present also in random walks of length 101 or
1001 or 10001. But a formal proof seems to be difficult.

(sometimes I find Knuth wording a bit hard to understand; when he says random walks of length 101, he seems to say get a run of 101 starting at x(n) , count 1s and 0s and if more 1s move to right else move to the left, then go to x(n+1) and repeat. At least that's my understanding.)

@jfbu

This comment has been minimized.

Copy link

@jfbu jfbu commented on c83d05b May 5, 2018

@blefloch Ok, I found a bias in the parity bits. I initially was thinking of using batches of 55*79 runs, but it turns out that it is flagrant with batches of 165 random integers.

Procedure: do \pdfsetrandomseed, then 54 times \pdfuniformdeviate then count if more odd than even integers in stretches of 2N+1 at a time. For certain lengths 2N+1 one discovers that there is definitive bias, for example with 2N+1 = 165, there is 53,5%--54% chance that there are more 1s than 0s in the parity bits in the first few thousands of such batches of 2N+1 integers (don't know what happens later on).

>>> test_walk(0, 10000, 165)
[...]
{0: 4655, 1: 5345}
observed probability of more 1s after 10000 batches of 165 is 0.5345
Power_divergenceResult(statistic=47.61, pvalue=5.200253931276336e-12)

>>> test_walk(1, 10000, 165)
[...]
{0: 4603, 1: 5397}
observed probability of more 1s after 10000 batches of 165 is 0.5397
Power_divergenceResult(statistic=63.0436, pvalue=2.021812540152667e-15)

But no bias with batches of 2N+1= 55 draws at a time:

>>> test_walk(0, 30000, 55)
[...]
{0: 14932, 1: 15068}
observed probability of more 1s after 30000 batches of 55 is 0.5022666666666666
Power_divergenceResult(statistic=0.6165333333333334, pvalue=0.43233844574370783)

>>> test_walk(1, 30000, 55)
[...]
{0: 15079, 1: 14921}
observed probability of more 1s after 30000 batches of 55 is 0.4973666666666667
Power_divergenceResult(statistic=0.8321333333333333, pvalue=0.36165637797402184)

The Python code is here.

It does work exactly the same with the real thing in TeX!

ADDITIONS

More experiment on parity bits with arbitrary initial shift of 1 to 55 calls to \uniformdeviate indicates there is always preponderance of 1s in successive 165 draws, except if initial shift is 23, then there is abnormal (more than 6sigma for 100,000 batches of 165 randints, for both seeds) preponderance of 0s. See https://gist.github.com/jfbu

For batches of 55 at a time, no abnormality is observed.
except
I just see now in
https://gist.github.com/jfbu/cc50b862123a742fbd2fdb094caab5c2#file-rngwalkwithshift-out-txt-L241-L242

a big preponderance of odd integers in 30000 successive batches of 55 randints, for the initial shift 23.

(shift=23, seed=0) 0:13381, 1:16619
(shift=23, seed=1) 0:13384, 1:16616

Also the high bit

I tried seeds 0, 1, 8035872, 10427991, 149847074 and 256844633 and batches of 165 \pdfuniformdeviate 2. It seems again there is a preponderance of 1s versus 0s for all seeds tested apart perhaps 10427991.

@jfbu

This comment has been minimized.

Copy link

@jfbu jfbu commented on c83d05b May 6, 2018

It looks hard (I am probably lazy) to bound with complete certainty non-uniformity of R2R1R0x0,X1Y1Y0X0 if the implementation does not obtain the mathematically exact rounding.

No problem with base 10**8 digits RS times XY with hard-coded optimized multiplication knowing it will round modulo 10**16. Relative non uniformity 2**32/10**16 is about 7 times 2**-24 so reduces our safety by about 50 but that looks still ok: with e = 2**32/10**16, e^{-2} times <200 CPU cycle per uniformdeviate> divided by 10Ghz \approx 108420s \approx 30hours.

Con: no way to do this in fixed number of \uniformdeviate.

@blefloch

This comment has been minimized.

Copy link
Member

@blefloch blefloch commented on c83d05b May 6, 2018

@jfbu

This comment has been minimized.

Copy link

@jfbu jfbu commented on c83d05b May 7, 2018

@blefloch

  • Thanks a lot for proposal, I think expl3 documentation is perene but gists are volatile, thus I am perfectly happy with current docs, no need to refer me or to my partial explorations... I checked the docs now and only made one comment about your non-uniformity example modulo 3, but that's very minor remark.

  • I agree that the runs of 55 et al. numbers are about detecting correlations. It is funny that runs of 55 would not show clear deviations from independance (on low bit) but runs of 165 do... and as we know the underlying iteration is on an 55-items array, detecting correlations with circa 150 successive draws is not that surprising perhaps.

  • Having said that about not being surprised, the high-bit is a mystery to me (gist added this morning). I tested only 5 or 6 seeds, so this tells probably nothing, but comparing to Python randrange(), behaviour seems really different. By the way, the simpler pdftex RNG is much faster than Python randrange(): my Python implementation of pdftex RNG runs faster than the built-in (presumably in C) Python "Mersenne twister" whatever that is.

  • So far, I have not looked at your implementation. I agree that neglecting things like R_0 X_1 would make theoretical analysis of non-uniformity quite complicated, so I am glad you find efficient way to incorporate it.

@car222222

This comment has been minimized.

Copy link
Contributor

@car222222 car222222 commented on c83d05b May 7, 2018

See https://en.m.wikipedia.org/wiki/Mersenne_Twister, but it is well out of the class of PRNGs that Bruno wishes to use. It is lauded thus: It was the first PRNG to provide fast generation of high-quality pseudorandom integers; and The Mersenne Twister is the default PRNG in [very many] software systems, see article for the list.

@blefloch

This comment has been minimized.

Copy link
Member

@blefloch blefloch commented on c83d05b May 7, 2018

@jfbu

This comment has been minimized.

Copy link

@jfbu jfbu commented on c83d05b May 7, 2018

Bruno's major bugs are my everyday tiny oversights ;-). I agree with theoretical analysis and formulas for the R2R1R0 times 0,X1Y1Y0X0 approach. Not a small feat to get this all working up to 2**32-1.

Thanks Chris for link to wiki-page. Interestingly when testing my batches for predominance of 1s for randrange(2), short batches say with 11 items gave a bit strange result with the Mersenne twister they looked less random than they should have been, but, my experiments are limited to a few tries... and I am no statistician.

@car222222

This comment has been minimized.

Copy link
Contributor

@car222222 car222222 commented on c83d05b May 7, 2018

The documentation does say that it needs careful, complex seeding or a long run in period to get uniform.