ETS and Snapshots #34

martinsumner · 2017-02-28T13:35:02Z

The Ledger Cache is kept small (up to about 2000 entries) as when a clone is required, this cache goes through ets:to_list to allow a snapshot to exist in the clone.

For short-lived 2i queries, this is probably a poor trade off. It would probably be better to run the query against the ETS table and push the results downstream to the clone, rather than push the whole cache.

DeadZen · 2017-02-28T18:09:10Z

What about a generated module? Could generate the entries and some query functions to scan them and delete the module when the cache expires perhaps.

martinsumner · 2017-02-28T18:16:31Z

Do you have any pointers to an example of doing that sort of thing? I'd like to try and understand the idea better.

DeadZen · 2017-02-28T18:53:27Z

Sure I have a library that does part of it.. http://github.com/DeadZen/goldrush
Should not be too hard to make it store and scan data internally, I would have to make a PoC

DeadZen · 2017-02-28T20:07:12Z

I'll work on it now, can't do anything else with AWS down atm .. ;p

martinsumner · 2017-02-28T20:46:25Z

OK. I'm still a bit unsure on what your suggesting, so I'm intrigued.

Just to be clear, that the snapshot is taken of the cache for the 2i query so that the query will provide a view of the database at the point in time that the query was initiated. I don't need to stream any new events that arrive after initiation of the query into the snapshot.

The Ledger Cache is an ets ordered set, and it gets flushed occasionally (after it has pushed itself to the Penciller). So by extracting from the set at the point of the query, the Bookie is free to flush the set, without any concern that there may be a clone pointing at the ets table expecting it to still be there.

The extraction at the moment is done by doing a tab2list, and this takes about 1ms. However, i will have N/3 nodes doing this concurrently - so if there's a lot of 2i queries all those 1ms CPU hits will add up. All I was intending to do was to iterate over the ets ordered set to extract only the results relevant to the query at the snapshot - on the basis that a few calls to ets:next will be cheaper than an ets:tab2list.

DeadZen · 2017-02-28T21:00:44Z

Ok I was actually going to ask if that was the case with needing to stream new events.
That's good for sure.

martinsumner · 2017-03-02T22:20:19Z

#39

Will close this for now. Hopefully over the next few days will have some proper basho_bench results to compare with leveldb

martinsumner · 2017-03-06T09:55:31Z

In single-threaded tests, this does OK. But less so when running multiple parallel queries.

There is some possible concern about the cost of starting a query - i.e. what is the cost of a query on a near empty database with limited results to return? Is there too big an overhead of snapshotting?

DeadZen · 2017-03-11T01:29:36Z

How do I reproduce this test?

martinsumner · 2017-03-11T18:42:34Z

There's a basho_bench script I'm building for 2i. It is a work in progress at the moment. I will upload it tomorrow evening.

martinsumner · 2017-03-14T23:15:02Z

Here's the current state of the script (sorry forgot about posting this)

https://github.com/martinsumner/basho_bench/blob/mas-nhsload/src/basho_bench_driver_riakc_pb.erl

I'm going to do some more work on this Thursday.

There is a bunch of optimisation work I've undertaken which is all merged into this branch:

https://github.com/martinsumner/leveled/tree/mas-pushmem-i46/src

This has really about improving the performance,not under query load, but under the update-related load when adding more indexes.

The big differences come from this commit:

2b0ec1d#diff-25d902c24283ab8cfbac54dfa101ad31

There is a bit of a convoluted history around what made the difference. There are two parts - one allowing for slots that contain index entries to have twice as many keys (which reduces the pace of merge activity in the tree), and secondly implementing that in a way which reduces needless object copying by the beam.

Part of the issue is that I'd made a rookie mistake in assuming that all large objects are passed as references, not just large binaries. So there has been a general history of performance-related issues from message passing large objects between processes in leveled.

Prior to this, running a 6-hour test with multiple 2i index updates for every object, led to the beam occupying 40% of RAM by the end of the test. Now the same test and the beam ends up with just 10% of RAM - with all that saved memory being used in the page cache, and presumably saved CPU time in garbage collection

martinsumner · 2017-03-17T16:05:36Z

I've rolled through a number of ideas to try and improve this, and they all hit the buffers at some stage.

One thing to note is that although the manifest is small - it must be copied very aggressively for 2i queries. When volume testing at 200 2i queries per second, there are nearly 5000 manifest copies being generated every second. Each individual version of the manifest will be copied thousands of times.

Although the manifest is small, this seems wasteful. Instead, for the first snapshot for any given process a separate worker should be started - and subsequent snapshots for that manifest should use the same worker.

martinsumner · 2017-03-24T19:31:09Z

the change to share manifests across snapshots didn't make a huge difference the first time I tested .. but I was busy rolling a lot of changes at the time.

I will re-test this. Also considering a broader overhaul of actor organisation within the Penciller. Perhaps L0 and the manifest should live under a dedicated actor for handling those things - and all queries (including mainstream non-clone queries) are handled by runners which only has the levelzero index. So there will be a permanent runner (using runner as another actor name - as in bookie's runner) for non-snapshot requests, and runner clones for query requests. The actor holding the L0 sate will have to support snapshotting of request and handling multi-version (of both L0 and the manifest).

martinsumner · 2017-06-26T12:30:46Z

Tried to have a background accumulator of the penciller memory which could be use din queries ... but this didn't help.

https://github.com/martinsumner/leveled/tree/mas-pmemacc-i34/src

Quick volume test with 2i showed a significant reduction in throughput when compared to master

martinsumner · 2017-10-06T15:01:56Z

I'm of the opinion that I should defer work on this until we can test Leveled end-to-end with Riak in OTP 20.0 (and perhaps 20.0 ++ the I/O scheduler changes). Time might be a healer.

Actual testing hasn't actually highlighted this as a major issue. Also despite lots of thinking I'm devoid of any ideas to resolve it that don't have nasty side effects.

I'm moving this to the back-burner for now

martinsumner · 2018-07-11T12:14:04Z

When testing in OTP20/21 there appears to be significant speed-ups, especially when running ct tests.

In the target environment (NHS/Riak) things are ultimately slowed by disk - so enhancements here don't appear to be relevant. Also 2i query response times show no issues.

Won't fix - for now.

martinsumner closed this as completed Mar 2, 2017

martinsumner reopened this Mar 6, 2017

martinsumner closed this as completed Jul 11, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ETS and Snapshots #34

ETS and Snapshots #34

martinsumner commented Feb 28, 2017 •

edited

Loading

DeadZen commented Feb 28, 2017

martinsumner commented Feb 28, 2017

DeadZen commented Feb 28, 2017

DeadZen commented Feb 28, 2017

martinsumner commented Feb 28, 2017

DeadZen commented Feb 28, 2017

martinsumner commented Mar 2, 2017

martinsumner commented Mar 6, 2017

DeadZen commented Mar 11, 2017

martinsumner commented Mar 11, 2017

martinsumner commented Mar 14, 2017

martinsumner commented Mar 17, 2017

martinsumner commented Mar 24, 2017 •

edited

Loading

martinsumner commented Jun 26, 2017

martinsumner commented Oct 6, 2017

martinsumner commented Jul 11, 2018

ETS and Snapshots #34

ETS and Snapshots #34

Comments

martinsumner commented Feb 28, 2017 • edited Loading

DeadZen commented Feb 28, 2017

martinsumner commented Feb 28, 2017

DeadZen commented Feb 28, 2017

DeadZen commented Feb 28, 2017

martinsumner commented Feb 28, 2017

DeadZen commented Feb 28, 2017

martinsumner commented Mar 2, 2017

martinsumner commented Mar 6, 2017

DeadZen commented Mar 11, 2017

martinsumner commented Mar 11, 2017

martinsumner commented Mar 14, 2017

martinsumner commented Mar 17, 2017

martinsumner commented Mar 24, 2017 • edited Loading

martinsumner commented Jun 26, 2017

martinsumner commented Oct 6, 2017

martinsumner commented Jul 11, 2018

martinsumner commented Feb 28, 2017 •

edited

Loading

martinsumner commented Mar 24, 2017 •

edited

Loading