Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A few improvements #536

Merged
merged 35 commits into from
Jul 24, 2023
Merged

A few improvements #536

merged 35 commits into from
Jul 24, 2023

Conversation

zonotope
Copy link
Contributor

This patch contains a few improvements I uncovered while investigating switching to persistent-sorted-set in #523.

Besides cleanup of unused namespaces, functions, and index node attributes, the major changes are to add start and end flake parameters to the fluree.db.index/tree-chan api. This allows us to rely on the tree-chan function itself to trim an index leaf node's flakes immediately upon retrieval instead of forcing us to do it after the fact with a transducer. It also allows us to chose only the branch node's children we need to consider between the start and end flakes in logarithmic time instead of checking each child in linear time.

Besides that, there are changes to make use of transients more when adding to/removing from novelty sets and correcting an error where child maps on branch nodes were silently converted to unsorted maps when calculating ttids.

If this eats up too much memory, we can bump it back down. It's pretty sloooooow
right now because it's running single threaded.
@zonotope zonotope requested a review from a team July 21, 2023 12:14
@zonotope
Copy link
Contributor Author

I have run a benchmark query against a ledger with the schema.org vocabulary loaded in it for both this branch and main. Of course, all the caveats with the unreliability of benchmarks apply, but there is a modest improvement.

Benchmark Query

@(fluree/query db '{:select [?s]
                    :where  [[?s "schema:rangeIncludes" {"@id" "schema:DateTime"}]]})

main

Evaluation count : 1440 in 60 samples of 24 calls.
Execution time mean : 43.707714 ms
Execution time std-deviation : 515.532382 µs
Execution time lower quantile : 43.026732 ms ( 2.5%)
Execution time upper quantile : 44.662472 ms (97.5%)
Overhead used : 5.241825 ns

Found 2 outliers in 60 samples (3.3333 %)
low-severe 1 (1.6667 %)
low-mild 1 (1.6667 %)
Variance from outliers : 1.6389 % Variance is slightly inflated by outliers

this patch

Evaluation count : 1440 in 60 samples of 24 calls.
Execution time mean : 42.863995 ms
Execution time std-deviation : 601.249422 µs
Execution time lower quantile : 42.288177 ms ( 2.5%)
Execution time upper quantile : 44.593585 ms (97.5%)
Overhead used : 5.300420 ns

Found 6 outliers in 60 samples (10.0000 %)
low-severe 4 (6.6667 %)
low-mild 2 (3.3333 %)
Variance from outliers : 1.6389 % Variance is slightly inflated by outliers

This was referenced Jul 22, 2023
Copy link
Member

@cap10morgan cap10morgan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor formatting & documentation stuff and a perf question but looks like a ton of good cleanups, fixes, and improvements otherwise!

(->Flake util/max-long 0 util/max-long const/$xsd:decimal 0 true nil))
(->Flake max-s max-p max-s max-dt max-t max-op max-meta))

(def minimum
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a docstring of "The smallest flake possible"?

[ss to-add to-remove]
(as-> (transient ss) trans
(reduce disj! trans to-remove)
(reduce conj! trans to-add)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does reduce negate some of the benefit of the transient here? Since we don't care about the return value each time, would it be noticeably more performant to just loop these for the side effects?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're totally right! I just didn't think about it. I'll change it.

(try* (let [resolved (<? (resolve r node))]
(trim-node resolved start-flake end-flake))
(catch* e
(log/error e
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you indent the body of the catch* like catch? I.e. two spaces more than (catch* ...

@@ -402,7 +402,7 @@
novel? (fn [node]
(or (seq remove-preds)
(seq (index/novelty-subrange node t novelty))))]
(->> (index/tree-chan conn root novel? (constantly true) 1 refresh-xf error-ch)
(->> (index/tree-chan conn root novel? 4 refresh-xf error-ch)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4 feels kind of magic-number-y here in a way that 1 doesn't as much. Should it be def'd somewhere to give it a name?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is. I just felt like we should be doing this with higher parallelization, and 4 seemed like a fine number. In reality though, this type of thing should be parameterized. I'll change it back to one and make a note to parameterize this later.

Copy link
Contributor

@dpetran dpetran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎏 Looks good!

@zonotope zonotope merged commit 1b7b60d into main Jul 24, 2023
6 checks passed
@zonotope zonotope deleted the fix/delete-all-matches branch July 24, 2023 21:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants