Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(computability/timed): add complexity analysis for sorting algorithms #14494

Open
wants to merge 13 commits into
base: master
Choose a base branch
from

Conversation

tomaz1502
Copy link
Collaborator

@tomaz1502 tomaz1502 commented May 31, 2022


Add formalization and proofs for time complexity of insertion_sort and merge_sort, as discussed here and here. Reference (pt-br): https://github.com/tomaz1502/RunTimeFormalization/blob/main/Report.pdf

Open in Gitpod

@tomaz1502 tomaz1502 marked this pull request as ready for review May 31, 2022 17:10
@tomaz1502 tomaz1502 added awaiting-review The author would like community review of the PR undergrad Relates to undergrad.yaml labels May 31, 2022
@vihdzp vihdzp added the awaiting-CI The author would like to see what CI has to say before doing more work. label Jun 2, 2022
@github-actions github-actions bot removed the awaiting-CI The author would like to see what CI has to say before doing more work. label Jun 2, 2022
Copy link
Member

@robertylewis robertylewis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution! I've made some light suggestions here. In general: it would be easier to review this if you could break it into smaller PRs. You can keep this one open as a "tracking" PR, and open smaller ones that add one file at a time.

I'll defer to @digama0 about the broader approach.

Comment on lines +91 to +92
rw ordered_length,
rw l_ih, }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
rw ordered_length,
rw l_ih, }
rw [ordered_length, l_ih], }

Here and elsewhere you can combine multiple rws into one

unfold insertion_sort,
rw ← ordered_insert_equivalence r l_hd fst,
cases ordered_insert r l_hd fst,
unfold insertion_sort, }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
unfold insertion_sort, }
refl }

A goal-ending unfold can usually be replaced by refl

- log_2_times : ∀ (a : ℕ), 2 * nat.log 2 (a + 2) ≤ a + 2
-/

lemma log_pred : ∀ (a : ℕ) , nat.log 2 a - 1 = nat.log 2 (a / 2)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a lot cleaner by translating to + and using nat.log_of_one_lt_of_le. (There may be ways to make this slicker, this was just the first I came up with.)

lemma log_pred (n : ℕ) : nat.log 2 n - 1 = nat.log 2 (n / 2) :=
if h : n < 2 then by simp [nat.log_of_lt h] else
(nat.sub_eq_iff_eq_add (by rw ← nat.pow_le_iff_le_log; linarith)).mpr 
  (nat.log_of_one_lt_of_le (by norm_num) (le_of_not_lt h))

cases h,
end

lemma log_2_val : nat.log 2 2 = 1 :=
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And this is a consequence of a more general lemma that should go in data.nat.log.

lemma log_base {b : ℕ} (hb : 1 < b) : nat.log b b = 1 :=
by simpa using nat.log_pow hb 1

lemma log_2_val : nat.log 2 2 = 1 :=
log_base (by norm_num)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, I think most of these lemmas can be simplified and/or generalized.

Comment on lines +53 to +54
simp only [list.length] at IH,
simp only [list.length],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
simp only [list.length] at IH,
simp only [list.length],
simp only [list.length] at IH ⊢,

and below. The symbol can be entered as \vdash.

@robertylewis robertylewis added awaiting-author A reviewer has asked the author a question or requested changes and removed awaiting-review The author would like community review of the PR labels Jun 14, 2022
Copy link
Collaborator

@jn1z jn1z left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds cool! Added some comment and proof-shortening suggestions.

# Timed Insertion Sort
This file defines a new version of Insertion Sort that, besides sorting the input list, counts the
number of comparisons made through the execution of the algorithm. Also, it presents proofs of
it's time complexity and it's equivalence to the one defined in data/list/sort.lean
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You want "its", the possessive. ("it's" is simply "it is".)

# Timed Merge
This file defines a new version of Merge that, besides combining the input lists, counts the
number of operations made through the execution of the algorithm. Also, it presents proofs of
it's time complexity and it's equivalence to the one defined in data/list/sort.lean
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"its"

# Timed Merge Sort
This file defines a new version of Merge Sort that, besides sorting the input list, counts the
number of operations made through the execution of the algorithm. Also, it presents proofs of
it's time complexity and it's equivalence to the one defined in data/list/sort.lean
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"its"

# Timed Split
This file defines a new version of Split that, besides splitting the input lists into two halves,
counts the number of operations made through the execution of the algorithm. Also, it presents
proofs of it's time complexity, it's equivalence to the one defined in data/list/sort.lean and of
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"its"

Comment on lines +187 to +199
have l₂s_id : (merge_sort r l₂).fst = l₂s := (congr_arg prod.fst h₂).trans rfl,
rw merge_sort_equivalence at l₂s_id,
have same_lengths₂ := list.length_merge_sort r l₂,
have l₂s_len_l₂_len : l₂s.length = l₂.length :=
begin
rw l₂s_id at same_lengths₂,
exact same_lengths₂,
end,
rw l₁s_len_l₁_len,
rw l₂s_len_l₂_len,

exact split_lengths l l₁ l₂ hs,
end,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can be shortened by generalizing with the previous section, i.e., lines 176-185

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with Robert's comment above that there are other parts that can be generalized. Since merge sort is divide-and-conquer, I'd suspect most proofs for the two sub-lists can be reused.

Comment on lines +154 to +174
have ns_bound : ns ≤ 4 * (l.length + 1) * nat.log 2 l.length :=
begin
have ns_id : (merge_sort r l₁).snd = ns := (congr_arg prod.snd h₁).trans rfl,
rw ← ns_id,
refine le_trans ih₁ _,
calc 8 * l₁.length * nat.log 2 l₁.length
= 4 * (2 * l₁.length) * nat.log 2 l₁.length :
by linarith
... ≤ 4 * (l.length + 1) * nat.log 2 l₁.length :
begin
rw mul_assoc,
rw mul_assoc 4 (l.length + 1) (nat.log 2 l₁.length),
refine (mul_le_mul_left zero_lt_four).mpr _,
exact nat.mul_le_mul_right (nat.log 2 l₁.length) l₁_length,
end
... ≤ 4 * (l.length + 1) * nat.log 2 l.length :
begin
refine nat.mul_le_mul_left (4 * (l.length + 1)) _,
exact @nat.log_monotone 2 l₁.length l.length l₁_length_weak,
end
end,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can be shortened by generalizing with the previous section, i.e., lines 131-152

Comment on lines +76 to +82
have (split (a₁ :: a₂ :: t)).snd.fst.length < (a₁ :: a₂ :: t).length :=
begin
cases e : split (a₁ :: a₂ :: t) with l₁ l₂n,
cases l₂n with l₂ n,
cases length_split_lt e with h₁ h₂,
exact h₂,
end,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can be shortened by generalizing with the previous section, i.e., lines 70-75

@semorrison semorrison added the too-late This PR was ready too late for inclusion in mathlib3 label Jul 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting-author A reviewer has asked the author a question or requested changes too-late This PR was ready too late for inclusion in mathlib3 undergrad Relates to undergrad.yaml
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants