Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Factorisation #4859

Closed
wants to merge 17 commits into from
Closed

Factorisation #4859

wants to merge 17 commits into from

Conversation

Sventimir
Copy link
Contributor

This pull request adds Data.Nat.Factor module to contrib package. It defines notions of a factor (a natural number, which divides another number without a remainder), and common factors of two natural numbers. It proves several basic laws of factors and implmements a decision procedure as to whether a certain number is a factor of another number of not. Finally, the notion of a greatest common divisor is formally defined and Euclid's algorithm for finding GCDs is implemented and proven correct.

This is a major step towards implementing a decent Rational type, which will be my next goal.

||| divisors. It is proven correct and total; returns a proof that computed
||| number actually IS the GCD. Unfortunately it's very slow, so improvements
||| in terms of efficiency would be welcome.
gcd : (a, b : Nat) -> {auto aok : LTE 1 a} -> {auto bok : LTE 1 b} -> (f : Nat ** GCD a b f)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a gcd defined in Prelude.Nat

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True. But that definition does not provide any proof that the returned number is indeed what it claims to be. That is the whole point of this module: to define what a factor is and start proving things about factors and factorisation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, my point is just that there is a name collision.

Copy link
Contributor Author

@Sventimir Sventimir May 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about naming it gcf for greatest common factor. That would be along with the naming convention in this module, but on the other hand it would not be along with what people expect that function to be named. Type of this function is different, so in some cases (when you supply implicit arguments explicitly) calls will be unambiguous. Otherwise you'll have to qualify it. Also returned type is different, so if the compiler can infer that, it also won't require qualified name. I don't think it's too bad…

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other gcd function only stipulates that the inputs can't both be zero. Could that work here? I've looked at a few expositions of the algorithm, and nobody ever spends much time discussing the nature of the inputs. It would certainly be cleaner if there weren't LTE 1 n proofs floating around all over, but I don't know how necessary they are.

Copy link
Contributor Author

@Sventimir Sventimir May 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Zeros are problematic, because according to this definition at least, everything is a factor of zero. This destroys some nice properties, like that a factor is always less than or equal to the number it divides. Also there is consideration that according to this setup, zero is a factor of zero, which implicitly assumes that you can actually divide by zero. Also when you want to implement rational numbers, that non-zero property comes in handy, because all ratios are either positive or negative and you just add a special constructor for zero. This way you avoid the problem ZZ has, that 0 and -0 are distinct values, which messes up the (=) proofs. Also the requirement that divisor must be non-zero is graciously provided by this arrangement (due to Uninhabited (Factor n Z) implmentation, which is no longer valid when you consider divisors of zero).

On the other hand, as I review the code, I see that the non-zero property is not used that often (mostly is just passed forward as is or with some alteration). So probably we could require non-zero-ness for those proofs only, and we could require it explicitly for rational numbers as well. I'm currently looking into the possibilities of removing that constraint and it seems possible.

subtractEqLeft {a = Z} prf = prf
subtractEqLeft {a = S k} {b} {c} prf = subtractEqLeft $ succInjective (k + b) (k + c) prf

subtractEqRight : {a, b, c : Nat} -> a + c = b + c -> a = b
Copy link
Contributor

@nickdrozd nickdrozd May 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

subtractEqRight can be proved as an instance of subtractEqLeft:

subtractEqRight : {a, b, c : Nat} -> a + c = b + c -> a = b
subtractEqRight {a} {b} {c} prf =
  subtractEqLeft {a = c} {b = a} {c = b} $
    trans (plusCommutative c a) $
      trans prf $
        plusCommutative b c

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point!

rewrite sym $ plusSuccRightSucc b k in
LTESucc $ addLteRight prf

addLteLeft : LTE b c -> LTE (a + b) (a + c)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, addLteRight would be simpler to prove as an instance of addLteLeft. It's probably true for any of these left / right proofs that one side is easier to prove, and the other side can be derived from that one.

Copy link
Contributor Author

@Sventimir Sventimir May 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's true. Then I'll try rewriting them.

@nickdrozd
Copy link
Contributor

When I compile and load Factor into the repl (:module Data.Nat.Factor), it can't find gcd (or gcdProof, as I've renamed it locally). In the Emacs buffer, all the data types (Factor, NotFactor, etc) get underlined in red. I don't know what it is, but it seems like there is some kind of path / visibility problem.

@Sventimir
Copy link
Contributor Author

Sventimir commented May 16, 2020

When I compile and load Factor into the repl (:module Data.Nat.Factor), it can't find gcd (or gcdProof, as I've renamed it locally)

It's because of the %access private in the GCD namespace. I thought it would be restricted to that block, but apparently it is not. I will add exportqualifier to the gcd function and it should be fine. As to the underlining, I don't know. In my Atom nothing is underlined. Does Emacs suggest anything about what it thinks is wrong? Maybe it's some style issue, indentation size perhaps?


||| If p is a factor of both n and m, then it is also a factor of their sum.
plusFactor : Factor n p -> Factor m p -> Factor (n + m) p
plusFactor {n} {p} nFactor@(CofactorExists qn positN prfN) (CofactorExists qm positM prfM) =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does the @ mean?

Copy link
Contributor Author

@Sventimir Sventimir May 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ binds the whole pattern on the right to the name on the left. In this case it binds CofactorExists qn positN prfN to the name nFactor. This way you can simultaneously refer to the whole expression and to its subexpressions.

It's roughly equivalent to this:

plusFactor {n} {p} nFactor (CofactorExists qm positM prfM) =
    let CofactorExists qn positN prfN = nFactor in
    ...

@nickdrozd
Copy link
Contributor

It's because of the %access private in the GCD namespace. I thought it would be restricted to that block, but apparently it is not. I will add exportqualifier to the gcd function and it should be fine. As to the underlining, I don't know. In my Atom nothing is underlined. Does Emacs suggest anything about what it thinks is wrong? Maybe it's some style issue, indentation size perhaps?

That fixed it.

I don't know what the lines are about. It could be a bug in the mode. Actually, it is certainly a bug in the mode. For: the underlining is either accurate or not. If it isn't accurate, then that's obviously a bug. On the other hand, if it is accurate, then the lines are an indication that something is wrong, but that something isn't being reported, which is a bug.

Is that an intuitionistically valid proof?

||| A decision procedure for whether of not p is a factor of n.
decFactor : (n, d : Nat) -> {auto nok : LTE 1 n} -> {auto dok : LTE 1 d} -> DecFactor n d
decFactor n (S d) {nok} {dok} with (Data.Fin.Extra.divMod n (S d))
| (Fraction n (S d) q r prf) = case r of
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This case statement can be pushed back into the arg list:

decFactor n (S d) {nok} {dok} with (Data.Fin.Extra.divMod n (S d))
  | (Fraction n (S d) q FZ prf) =
    let prf =
            replace {P = \x => x = n} (plusZeroRightNeutral (q + (d * q))) $
            replace {P = \x => x + 0 = n} (multCommutative q (S d)) prf
    in
    ItIsFactor $ CofactorExists q nok prf
  | (Fraction n (S d) q (FS pr) prf) =
    ItIsNotFactor $ ProperRemExists q pr nok $
      rewrite multCommutative d q in
      rewrite sym $ multRightSuccPlus q d in
      prf

(Indent to taste)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes… I usually make a double indentation after the function pattern, because if I'm going to add a where clause later, I like it to be less indented than the function body. I feel it's more apparent where the body ends that way. And if I'm not going to add a where, at least all functions are indented according to the same rule. However, I'm not sure if it's the right approach. I also like 4-space-wide indents, while you apparently prefer 2-spaces only. I should probably have a closer look at older modules and adapt my style to what there already is…

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly I stick to the indentation levels provided by the Emacs mode, and I don't think it ever provides 4-spacing. (It doesn't matter, of course.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I think it does matter in the sense that it should be consistent within a project.

@Sventimir
Copy link
Contributor Author

Is that an intuitionistically valid proof?

Seems valid to me, although rather informal.

CofactorExists (S k) ok (rewrite plusZeroRightNeutral k in Refl)

||| For all natural numbers n, n is (the greatest) a factor of n.
selfFactor : (n : Nat) -> {auto ok : LTE 1 n} -> Factor n n
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would call this factorReflexive. The factor relationship is a partial order, right? That might be worth exploring, although the ok thing complicates it a little.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, these LTEs are annoying. However, considering zeros destroys most of the properties of factors, since anything is a factor of zero. On the other hand, maybe I should exclude zero only where it really hurts and allow considering factors of zero in general case.

Hmm, indeed! The factor relation is a partial order. That's a thing worth proving. I'll add it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(CofactorExists b bok (rewrite plusZeroRightNeutral b in Refl))

||| Any natural number is a common factor of itself and itself.
selfIsCommonFactor : (a : Nat) -> {auto ok : LTE 1 a} -> CommonFactor a a a
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, I would call this commonFactorReflexive and move it next to commonFactorSym. Is there a sense in which CommonFactor is transitive? If it is, you could implement it as an equivalence relation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, in order to be transitive, it would have to be a two-argument predicate. This can be done like so:

CommonFactor (a, b) p

However now sides of the relation have different types, which is no good. However, if you stretch the definition a little bit, you can come up with something like this:

cfTransitive : CommonFactor (a, b) p -> CommonFactor (b, c) p -> CommonFactor (a, c) p

So it is transitive when you consider it a binary relation (either there is or there isn't a common factor of two numbers in question), without considering what that factor actually is. When looked upon like this, it is indeed transitive, and this is the sense in which it is symmetric in commonFactorSym proof.

gcd Z _ {aok} = absurd aok
gcd _ Z {bok} = absurd bok
gcd (S a) (S b) with (cmp (S a) (S b))
gcd (S (b + S d)) (S b) | CmpGT d =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One way to handle Euclid's algorithm is to arrange it so that the first argument is the smaller one. If the first argument is larger, just flip the arguments around and proceed as usual, as in gcd 27 18 = gcd 18 27 = 9. That would simplify the logic (just keep the simpler of the two branches), although I think it would require an assert_total.

This is discussed at the very beginning of Knuth's The Art of Computer Programming.

Copy link
Contributor Author

@Sventimir Sventimir May 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it's exactly what I have done. If a > b, then the recursion is run with (a, b) pair, otherwise – with (b, a). The GCD.Search type even contains a proof that a >= b. Totality of the function is implied by the use of sizeInd function from WellFounded module, which provides looping capability already proved to be total.

Writing the recursion by hand, I could find no way to convince the compiler about totality. The trick provided by the Sized interface is quite ingenious.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, okay, I'm still working through the details.

let sbIsFactor = the (plus q (mult b q) = S a) $
rewrite multCommutative b q in
rewrite sym $ multRightSuccPlus q b in
replace {P = \x => x = S a} (plusZeroRightNeutral (q * S b)) prf
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

replace can be replaced with

                    rewrite sym prf in
                    sym $ plusZeroRightNeutral (q * S b)

IME explicit use of replace is a "proof smell", and there is almost always a nicer way to do it.

Copy link
Contributor Author

@Sventimir Sventimir May 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I try to use rewrite rather than replace whenever I can, but it's not always possible. There is the huge difference between them, however. replace makes a substitution in a proof you already have to create another proof, whereas rewrite makes a substitution in the goal (which is a proof you do not have yet) instead. In practice it means that with rewrite you work kind of backwards, rewriting your goal until it matches something you already have, while replace has a more direct approach. Also with rewrite you can only use equality (=) proofs (the same with replace of course), which means that if you you need use a custom proof-altering function like cong or something in the middle of replacements, you actually can no longer use rewrite from that point on. Unless of course I am missing something here.

This problem does not apply here, of course. This probably arose from some code reworking or maybe I was just tired.

||| Factor n p is a witness that p is indeed a factor of n,
||| i.e. there exists a q such that p * q = n.
data Factor : Nat -> Nat -> Type where
CofactorExists : {n, p : Nat} -> (q : Nat) -> LTE 1 n -> p * q = n -> Factor n p
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I notice that there are a lot of places with rewrite sym prf or
something like that. I think the syms are requried because this
proof has the form p * q = n. If it were n = p * q, I think the
syms could be done away with, which would be a little less verbose.

Generally proof equations are written as complexity-reducing devices,
with the more complex part on the left and the less complex on the
right. For example, the monoid neutral law looks like n + 0 = n,
because n is simpler than n + 0. But with factors, the reduction
in complexity is to say that a larger number can be written as the
product of smaller numbers, so I think n = p * q is appropriate.

(It might be too late to implement this minor suggestion.)

Copy link
Contributor Author

@Sventimir Sventimir May 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a fair point. And I don't think it's too late for the change. It's only 300 lines of code. I have done much larger refactorings with much less type-checking to support me, so I believe I will do it. ;)

@Sventimir
Copy link
Contributor Author

Sventimir commented May 18, 2020

I implemented Preorder interface for Nat and Factor. I also tried to implement Poset, but proving the antisymmetric property is messy, because there are 4 variables and only two equations, so squeezing contradictions from them takes a lot of pattern matching on those Nats. I also inverted the equations in Factor and NotFactor proof constructors and eliminated these pesky LTE 1s.

Refl
)

reflexive Z = CofactorExists 1 Refl
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one doesn't require case splitting: reflexive n = CofactorExists 1 $ sym $ multOneRightNeutral n

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point


||| Factor n p is a witness that p is indeed a factor of n,
||| i.e. there exists a q such that p * q = n.
data Factor : Nat -> Nat -> Type where
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is maybe getting into bikeshedding territory, but what about
changing this from Factor n p to Factor p n? Factor n p reads as
"n has p as a factor", but I feel like the more natural way to say it
would be "p is a factor of n", or alternatively, "p evenly divides n".
With infix syntax, it could even be p | n or something like that.
Similar with CommonFactor. Instead of saying "a and b both have p as
a factor", the reading would be "p is a factor of both a and b" or "p
divides both a and b".

Besides being more natural (IMO), this would clarify the way in which
CommonFactor is symmetric. Given some "p", the relation is "has
factor p in common with", as in "a has factor p in common with b".
Something like this:

using (p : Nat)
  Preorder Nat (CommonFactor p) where

  Equivalence Nat (CommonFactor p) where

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, we can't have (|) as a type constructor, since it is already reserved as a part of Idris syntax. Nontheless, this is probably a better choice, so I'll change it.


||| A decision procedure for whether of not p is a factor of n.
decFactor : (n, d : Nat) -> DecFactor n d
decFactor n Z = ?whaatIfZ
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hole

Copy link
Contributor Author

@Sventimir Sventimir May 19, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Damnit. sorry.

@Sventimir
Copy link
Contributor Author

Sventimir commented May 19, 2020

I have switched places of arguments to Factor and NotFactor.

As to the leftover hole, I'm not really proud of the way I filled it in, but I see no other way. Standard procedure of proving not-a-factor statement requires to specify a non-zero remainder of the division, which is also strictly less than the divisor. Without the last restriction, you could for instance prove that 2 is not a factor of 4, since 2 * 1 + 2 = 4. However, when divisor is 0, there is no such remainder you could possibly construct for any number. Yet, you can't prove that 0 is a divisor of a non-zero number either. So the only way to fill that vacuum I can think of is to provide a special constructor that asserts that 0 is not a factor of any successor. This way decFactor function is total once more and factorNotFactoAbsurd property still holds.

But I don't like it really, so if anyone could come up with a simpler/nicer solution, that would be welcome.

@Sventimir Sventimir closed this Nov 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants