Permalink
Switch branches/tags
Nothing to show
Find file Copy path
Fetching contributors…
Cannot retrieve contributors at this time
107 lines (56 sloc) 8.43 KB

Metamath Shortest Possible PM Proofs

If implemented correctly, this code proves that all of the proofs and subproofs in pmproofs.txt up to length 17 are already as short as possible.

The original goal was either to discover some shorter proofs than those in pmproofs.txt, or to prove that those are the shortest:

"Open problem: Are these the shortest proofs?"

http://us.metamath.org/award2003.html

I started by downloading the pmproofs.txt file:

http://us.metamath.org/mmsolitaire/pmproofs.txt

Since most of my tools were to be written in Python 3, which has a JSON parser in its standard library, I wrote a script that converts pmproofs.txt to JSON:

https://github.com/sbp/mmm/blob/master/pmparser.py

This script uses a modified version of the drule.c program as mentioned in pmproofs.txt:

http://us.metamath.org/downloads/drule.c

The modifications that I made are captured in a patch, and the ultimate output of that patch is a program called druler, compiled from druler.c which is based on the patch:

https://github.com/sbp/mmm/blob/master/druler.c.patch

Once druler and pmparser.py are in place, the pmparser.py script will produce the following JSON output:

https://github.com/sbp/mmm/blob/master/pmproofs.json

Next I ported the proof-checking subset of drule.c to under fifty lines of Python 3, slightly changing the D-Rule algorithm in the process:

https://github.com/sbp/mmm/blob/master/dproof.py

To ensure that dproof.py works correctly I wrote a script that compares its output with all of the proofs in pmproofs.json, which are deemed to be correct since they in turn had been generated by druler which is based on drule.c:

https://github.com/sbp/mmm/blob/master/dprooftests.py

Now that dproof.py was likely to be correct I could have used it to check proofs en masse, but doing so would have been unacceptably slow. This is because dproof.py, as well as being written in slow Python 3 rather than fast C, always evaluates all subproofs independently. It is a vast improvement instead to memoise the results of checking each subproof.

To memoise the subproofs they could, for example, be stored efficiently in a trie. I found some Googleware called pygtrie and downloaded it as a dependency:

https://github.com/sbp/mmm/blob/master/tries.py

Then I converted dproof.py to use tries.py for memoisation, calling the resulting script dtrie.py:

https://github.com/sbp/mmm/blob/master/dtrie.py

Both dproof.py and dtrie.py take proofs in reverse polish notation rather than the original polish notation of drule and druler.

The dtrie.py script has a number of modes for reading, caching, and writing previously checked subproofs, which makes checking proofs en masse much more efficient. Even though I wrote a faster C version of druler, which is currently unpublished, based on the modified dproof.py algorithm, for quicker development I decided to use a trie in Python 3 rather than writing the equivalent in C.

To check all possible proofs exhaustively, I first needed to generate all possible proofs, which is accomplished by the script called trees.py:

https://github.com/sbp/mmm/blob/master/trees.py

You call it with an odd number as an argument, and it generates proofs, in reverse polish notation for compatibility with dproof.py and dtrie.py, from length 3 to the argument number inclusive. The number of such terms it generates is proportional to the Catalan sequence, so the output starts to become voluminous very quickly, generating about 50MB for terms up to and including length 15, and I think around 500MB for length 17.

It then became possible to feed the proofs that trees.py generates to dtrie.py for mass checking, writing to a database to record the results. To check all proofs up to length 15 for example, we can do:

python3 trees.py 15 | python3 dtrie.py cw proofs15.txt.gz -

This gives 233,677 valid proofs out of the 3,137,841 possible proof combinations between length 3 and 15 inclusive. Both the proofs and the terms proven are recorded in the output database, proofs15.txt.gz, which is in the repository and available for download.

Next I wanted to check whether any of these 233,677 valid proofs are shorter than those already known in pmproofs.txt. I wanted to do this not only for the top level proofs in that file, but also for any subproofs just in case we could improve on a subproof of the overall proof, which would allow us to shorten the overall proof too.

In order to check subproofs it is necessary to extract all subproofs first, and to do this I wrote a script called subtrees.py:

https://github.com/sbp/mmm/blob/master/subtrees.py

You give it a reverse polish proof as its input argument, and it will print out all of the subproofs of that proof. This is a generic script, so I wrote another script which wraps subtrees.py to extract all of the subproofs from the pmproofs.json file specifically:

https://github.com/sbp/mmm/blob/master/pmsubtrees.py

Its output is recorded in a file called pmsubtrees.txt:

https://github.com/sbp/mmm/blob/master/pmsubtrees.txt

Now we can check whether any of the proofs in proofs15.txt.gz or any other database that we generate are shorter for the equivalent terms than any of those in pmsubtrees.txt, which would give us the shorter proof that we're trying to discover. The script that does this search is called pmsearch.py:

https://github.com/sbp/mmm/blob/master/pmsearch.py

Since it does not discover any shorter proofs in proofs15.txt.gz, this proves that all of the subproofs in pmproofs.txt up to length 15 inclusive are as short as possible.

Generating proofs15.txt.gz takes just under 5 minutes on my computer. Using the following commands, which take proofs15.txt.gz as a basis for memoisation, I then checked whether any proof up to length 17 inclusive was shorter than any of the subproofs in pmproofs.txt:

./trees.py 17 | ./dtrie.py rw proofs15.txt.gz - > proofs17.txt
./pmsearch.py pmsubtrees.txt proofs17.txt | grep DISCOVERY

Out of 31,284,531 generated proofs to check, 1,591,746 are valid, but since pmsearch.py still failed to discover any shorter proofs, all of the subproofs in pmproofs.txt up to length 17 inclusive are proven to be as short as possible if this code is correct. Generating the proofs for checking took about 45 minutes on my computer, and the output database is about 5MB even when compressed so I have not included it in the repository for download.

Though it would be possible to exhaustively check terms up to and including length 19, further optimisations could be made. Instead of generating every syntactically valid proof and then checking all of them for semantic validity with dtrie.py, trees.py could instead contain a filtering mechanism based on those subproofs which have been previously rejected.

In other words, when dtrie.py checks its output it only records the successful proofs in its output database. If there were a database of all the unsuccessful proofs as well, these unsuccessful proofs could be used in trees.py and it could then filter out any subproof of what it generates based on the list of unsuccessful proofs. This means that dtrie.py could do less checking, and the overall process would probably be much more efficient as a result.

There are two sources of prior work that I know of. One is drule.c, which seems to contain some code for combinatorial tests. The other is the work of Paul Tarau, who generates all trees for the SKI calculus up to a certain depth and then analyses the resulting terms:

https://arxiv.org/abs/1507.06944

Since the SKI calculus is composed of ax-1 (K) and ax-2 (S) it is I believe equivalent in finitary terms only to the positive implicational fragment of propositional logic, missing as it does ax-3 which adds negation. Moreover, I think that Tarau is interested in these combinators more from the infinitary Turing-complete standpoint than their finitary logical counterpart. For these reasons, if I understand them properly, Tarau's work did not directly help in proving or disproving the existence of shortest proofs in pmproofs.txt, but it was certainly indirect inspiration for the method that I used.

I was disappointed that I did not find any shorter proofs, but I assume that people have done this kind of exhaustive checking before and have either not published their results or I am not aware of them. Despite that, it was instructive for many reasons, resulted in the negative proof up to length 17 inclusive, and most of all was extremely fun.