Metamath Shortest Possible PM Proofs
If implemented correctly, this code proves that all of the proofs and subproofs in
pmproofs.txt up to length 17 are already as short as possible.
The original goal was either to discover some shorter proofs than those in
pmproofs.txt, or to prove that those are the shortest:
"Open problem: Are these the shortest proofs?"
I started by downloading the
Since most of my tools were to be written in Python 3, which has a JSON parser in its standard library, I wrote a script that converts
pmproofs.txt to JSON:
This script uses a modified version of the
drule.c program as mentioned in
The modifications that I made are captured in a patch, and the ultimate output of that patch is a program called
druler, compiled from
druler.c which is based on the patch:
pmparser.py are in place, the
pmparser.py script will produce the following JSON output:
Next I ported the proof-checking subset of
drule.c to under fifty lines of Python 3, slightly changing the D-Rule algorithm in the process:
To ensure that
dproof.py works correctly I wrote a script that compares its output with all of the proofs in
pmproofs.json, which are deemed to be correct since they in turn had been generated by
druler which is based on
dproof.py was likely to be correct I could have used it to check proofs en masse, but doing so would have been unacceptably slow. This is because
dproof.py, as well as being written in slow Python 3 rather than fast C, always evaluates all subproofs independently. It is a vast improvement instead to memoise the results of checking each subproof.
To memoise the subproofs they could, for example, be stored efficiently in a trie. I found some Googleware called pygtrie and downloaded it as a dependency:
Then I converted
dproof.py to use
tries.py for memoisation, calling the resulting script
dtrie.py take proofs in reverse polish notation rather than the original polish notation of
dtrie.py script has a number of modes for reading, caching, and writing previously checked subproofs, which makes checking proofs en masse much more efficient. Even though I wrote a faster C version of
druler, which is currently unpublished, based on the modified
dproof.py algorithm, for quicker development I decided to use a trie in Python 3 rather than writing the equivalent in C.
To check all possible proofs exhaustively, I first needed to generate all possible proofs, which is accomplished by the script called
You call it with an odd number as an argument, and it generates proofs, in reverse polish notation for compatibility with
dtrie.py, from length 3 to the argument number inclusive. The number of such terms it generates is proportional to the Catalan sequence, so the output starts to become voluminous very quickly, generating about 50MB for terms up to and including length 15, and I think around 500MB for length 17.
It then became possible to feed the proofs that
trees.py generates to
dtrie.py for mass checking, writing to a database to record the results. To check all proofs up to length 15 for example, we can do:
python3 trees.py 15 | python3 dtrie.py cw proofs15.txt.gz -
This gives 233,677 valid proofs out of the 3,137,841 possible proof combinations between length 3 and 15 inclusive. Both the proofs and the terms proven are recorded in the output database,
proofs15.txt.gz, which is in the repository and available for download.
Next I wanted to check whether any of these 233,677 valid proofs are shorter than those already known in
pmproofs.txt. I wanted to do this not only for the top level proofs in that file, but also for any subproofs just in case we could improve on a subproof of the overall proof, which would allow us to shorten the overall proof too.
In order to check subproofs it is necessary to extract all subproofs first, and to do this I wrote a script called
You give it a reverse polish proof as its input argument, and it will print out all of the subproofs of that proof. This is a generic script, so I wrote another script which wraps
subtrees.py to extract all of the subproofs from the
pmproofs.json file specifically:
Its output is recorded in a file called
Now we can check whether any of the proofs in
proofs15.txt.gz or any other database that we generate are shorter for the equivalent terms than any of those in
pmsubtrees.txt, which would give us the shorter proof that we're trying to discover. The script that does this search is called
Since it does not discover any shorter proofs in
proofs15.txt.gz, this proves that all of the subproofs in
pmproofs.txt up to length 15 inclusive are as short as possible.
proofs15.txt.gz takes just under 5 minutes on my computer. Using the following commands, which take
proofs15.txt.gz as a basis for memoisation, I then checked whether any proof up to length 17 inclusive was shorter than any of the subproofs in
./trees.py 17 | ./dtrie.py rw proofs15.txt.gz - > proofs17.txt ./pmsearch.py pmsubtrees.txt proofs17.txt | grep DISCOVERY
Out of 31,284,531 generated proofs to check, 1,591,746 are valid, but since
pmsearch.py still failed to discover any shorter proofs, all of the subproofs in
pmproofs.txt up to length 17 inclusive are proven to be as short as possible if this code is correct. Generating the proofs for checking took about 45 minutes on my computer, and the output database is about 5MB even when compressed so I have not included it in the repository for download.
Though it would be possible to exhaustively check terms up to and including length 19, further optimisations could be made. Instead of generating every syntactically valid proof and then checking all of them for semantic validity with
trees.py could instead contain a filtering mechanism based on those subproofs which have been previously rejected.
In other words, when
dtrie.py checks its output it only records the successful proofs in its output database. If there were a database of all the unsuccessful proofs as well, these unsuccessful proofs could be used in
trees.py and it could then filter out any subproof of what it generates based on the list of unsuccessful proofs. This means that
dtrie.py could do less checking, and the overall process would probably be much more efficient as a result.
There are two sources of prior work that I know of. One is
drule.c, which seems to contain some code for combinatorial tests. The other is the work of Paul Tarau, who generates all trees for the SKI calculus up to a certain depth and then analyses the resulting terms:
Since the SKI calculus is composed of
ax-1 (K) and
ax-2 (S) it is I believe equivalent in finitary terms only to the positive implicational fragment of propositional logic, missing as it does
ax-3 which adds negation. Moreover, I think that Tarau is interested in these combinators more from the infinitary Turing-complete standpoint than their finitary logical counterpart. For these reasons, if I understand them properly, Tarau's work did not directly help in proving or disproving the existence of shortest proofs in
pmproofs.txt, but it was certainly indirect inspiration for the method that I used.
I was disappointed that I did not find any shorter proofs, but I assume that people have done this kind of exhaustive checking before and have either not published their results or I am not aware of them. Despite that, it was instructive for many reasons, resulted in the negative proof up to length 17 inclusive, and most of all was extremely fun.