ISPN-6225 Allow the ratio segments/shard to be configurable for the AffinityIndexManager #4576

gustavocoding · 2016-09-28T17:21:14Z

https://issues.jboss.org/browse/ISPN-6225

Marking as preview, since it has a few dependencies:

https://hibernate.atlassian.net/browse/HSEARCH-2351 (TBD on 5.6.0.Beta3)
integration of ISPN-6981 AffinityIndexManager fails to index documents in async mode #4553
a proper fix for https://issues.jboss.org/browse/ISPN-7051 (integration of Refactor command execution patterns #4564)
https://hibernate.atlassian.net/browse/HSEARCH-2445
https://issues.jboss.org/browse/ISPN-7298

Summary of changes

AffinityIndexManager (AIM) supports the shard configuration from Hibernate Search, example:

props.put("default.sharding_strategy.nbr_of_shards", "10");

If specified, will split the index into a fixed number of shards, otherwise will assume
number of shards = number of segments (like it's on master).

Implications

On master, the number of shards is always equals to the number of segments; the proposed configuration above does not require to reduce the number of segments in order to have less shards. When configured with the number of shards, the AIM will not have strictly all the affinity characteristics described in https://github.com/infinispan/infinispan/wiki/Index-affinity-proposal but will behave similarly to the InfinispanIndexManager, except that it's not constrained to have a single index with a single node handling all the indexing cluster wide.

Benchmark

Numbers from AffinityPerfTest (all nodes same JVM) with:

3 nodes writing, DIST caches (REPL for lock and metadata only)
1 node querying with 10 threads doing Lucene Term Queries. Each thread waits 10ms between queries.
Total entries inserted: 10k (for sync indexing mode) and 500k (for async indexing mode)

Sync Indexing Mode

Shards	(Put Threads)/node	Query 90th (ms)	QPS	Puts/s
1	3	2	940	317
1	10	2	966	1333
1	50	3	890	2857
2	3	10	623	350
2	10	2	931	800
2	50	10	744	2222
4	3	10	639	235
4	10	13	544	606
4	50	10	662	1666
16	3	60	229	192
16	10	151	87	294
16	50	398	33	952
256 (master)	3	1333	9	256
256 (master)	10	4663	2	285
256 (master)	50	14226	0.9	312

Async Indexing Mode

Shards	(Put Threads)/node	Query 90th (ms)	QPS	Puts/s
1	3	6	160	3952
1	10	25	240	8403
1	50	28	685	10638
2	3	12	451	6993
2	10	21	758	8928
2	50	140	115	11627
4	3	20	373	8695
4	10	42	220	12658
4	50	73	164	12502
16	3	48	253	9090
16	10	139	103	11637
16	50	269	60	13513
256 (master)	3	5453	2.5	3424
256 (master)	10	10468	1.2	5235
256 (master)	50	24964	0.54	6849

As a comparison, numbers for the InfinispanIndexManager:

Sync Indexing Mode

(Put Threads)/node	Query 90th (ms)	QPS	Puts/s
3	2	868	392
10	3	873	1111
50	9	732	3334

Async Indexing Mode

(Put Threads)/node	Query 90th (ms)	QPS	Puts/s
3	5	713	10416
10	9	358	8196
50	29	325	10525

gustavocoding · 2016-10-18T17:28:15Z

@Sanne After rebasing unfortunately the changes on master are causing the perf tests to deadlock, I am investigating atm.

gustavocoding · 2016-10-20T20:19:30Z

Rebased. Updated the benchmark numbers after increasing the OOB pool size and reduced the max number of threads per node (from 80 to 50) to avoid deadlocking.

gustavocoding · 2016-10-24T10:56:23Z

@Sanne, more numbers, this time using REPL caches for the cache index trio (lock, metadata, data):

REPL Caches for Index instead of DIST

Sync Indexing Mode

Shards	(Put Threads)/node	Query 90th (ms)	QPS	Puts/s
1	3	0.7	979	500
1	10	0.97	974	1250
1	50	3	1061	4000
2	3	1	623	955
2	10	1.6	931	966
2	50	1.8	744	871
4	3	3.1	639	861
4	10	4	838	740
4	50	6.6	806	2000
16	3	23	492	196
16	10	70	266	327
16	50	120	157	1000
256 (master)	3	180	9	75
256 (master)	10	232	2	40
256 (master)	50	574	0.9	23

Async Indexing Mode

Shards	(Put Threads)/node	Query 90th (ms)	QPS	Puts/s
1	3	1.2	861	11494
1	10	2.6	815	11363
1	50	9.4	690	10638
2	3	1.64	843	10752
2	10	4.5	765	10869
2	50	6.6	732	11235
4	3	2.3	815	12048
4	10	6.8	695	12820
4	50	12.3	655	12820
16	3	8.8	687	9345
16	10	21.8	476	11111
16	50	39.7	329	11494
256 (master)	3	648	26	1422
256 (master)	10	945	20	3236
256 (master)	50	1275	14	5050

As expected, query latency is lower than the previous benchmarks, but still, latency increases (linearly?) with the number of shards.

Sanne · 2016-10-24T12:51:10Z

Very interesting, thanks.

From my part, I had made good progress on https://hibernate.atlassian.net/browse/HSEARCH-402 : I think the implementation is mostly drafted out, but I couldn't test nor benchmark it yet.

If you want to have a look:

Sanne/hibernate-search@ef16744

I wouldn't use it yet, for all I know it might NPE on initialization ;)

gustavocoding · 2016-10-24T14:08:33Z

@Sanne, last but not least, here are the numbers for the most query friendly scenario possible, REPL index caches with no writes happening:

Query only, REPL Caches for Index instead of DIST

Sync Indexing Mode

Shards	(Query Threads)/node	Query 90th (ms)	QPS
1	3	0.23	291
1	10	0.39	978
1	50	0.38	4889
2	3	0.68	284
2	10	0.47	953
2	50	0.52	4283
4	3	0.71	283
4	10	0.62	953
4	50	3.44	4283
16	3	1.98	254
16	10	2.54	842
16	50	77.59	1181
256 (master)	3	46.14	62
256 (master)	10	184.55	65
256 (master)	50	1191	70

As expected, query latency is even lower than the other benchmarks, but still the number of shards causes considerable extra latency.

gustavocoding · 2016-10-27T10:06:27Z

@Sanne More tables with numbers. Same conditions as the very first benchmark (R+W REPL_DIST), but using default.reader.strategy=async and default.reader.async_refresh_period_ms=100:

Query + Writes, DIST caches, async reader

Sync Indexing Mode, InfinispanIndexManager

(Put Threads)/node	Query 90th (ms)	QPS	Puts/s
3	0.47	911	363
10	0.49	855	1001
50	0.315	580	2012

Sync Indexing Mode, AffinityIndexManager

Shards	(Put Threads)/node	Query 90th (ms)	QPS	Puts/s
1	3	0.57	822	384
1	10	0.22	726	714
1	50	0.36	715	3333
2	3	0.74	820	322
2	10	0.79	736	714
2	50	2.28	557	2000
4	3	0.75	860	270
4	10	0.37	661	294
4	50	1	610	769
16	3	0.69	907	227
16	10	2.26	776	294
16	50	3.88	668	769
256 (master)	3	42.21	309	185
256 (master)	10	76.55	183	208
256 (master)	50	114.82	138	256

The numbers show query latency reduced a lot (about 10x)!

OTOH, there still a couple of orders of magnitude gap between using < 4 shards and 256.

Sanne · 2016-10-27T15:39:42Z

Why is the put/s metric inversely proportional to the number of shards?

I realize that this is not the Affinity IndexManager but I didn't expect
write performance to be significantly affected with the "traditional"
approach

gustavocoding · 2016-10-27T15:49:30Z

| Why is the put/s metric inversely proportional to the number of shards?

Those numbers are for sync indexing, so I suppose the more shards, the less batching and thus more expensive commits.

| I realize that this is not the Affinity IndexManager

Sorry, your reply got garbled. Exactly what is not the Affinity IndexManager?

gustavocoding · 2016-10-27T15:53:33Z

@Sanne Ops, the tables were both titled InfinispanIndexManager, fixed, sorry about that

gustavocoding · 2016-12-16T17:29:47Z

Rebased

galderz · 2017-01-10T13:50:56Z

Needs rebasing again. @gustavonalle do you need further feedback from @Sanne or anyone else?

gustavocoding · 2017-01-10T19:21:08Z

@galderz Rebased. @Sanne volunteered to review this, but feel free to review if you wish.

gustavocoding · 2017-01-10T19:23:46Z

Actually the query testsuite on master is not stable...Better to wait before integrating this

galderz · 2017-01-23T06:27:05Z

Still needs rebasing, and testsuite not stable enough....

Beta2 goes out later today... is this still planned for beta2?

…ffinityIndexManager

gustavocoding · 2017-01-23T09:34:56Z

@galderz Let me rebase it and re-run the perf tests

gustavocoding · 2017-01-23T10:08:53Z

@galderz Not going to Beta2, there is a deadlock when running stress tests https://issues.jboss.org/browse/ISPN-7381, that is also on master

galderz · 2017-01-23T14:10:22Z

Needs rebasing. Thx for the update @gustavonalle. Closing until ISPN-7381 has been fixed. I'll add a note in the JIRA to reopen this PR when that's fixed.

galderz · 2017-01-23T14:17:46Z

@gustavonalle Why reopen?

galderz · 2017-01-23T14:17:58Z

Needs rebasing too.

gustavocoding · 2017-01-23T14:18:29Z

I had it rebased 1h ago, need do it again

gustavocoding · 2017-01-23T14:19:34Z

@galderz Why close? It can be reviewed and I'd like to have CI feedback on this. The deadlock is on master as well, I'm still investigating....

galderz · 2017-01-23T14:28:21Z

Because we won't be able to integrate it until ISPN-7381 has been fixed. Keeping it open is taking CI resources from other more top priority things. Once ISPN-7381 is open we can refocus on this.

gustavocoding · 2017-01-23T14:37:54Z

@galderz ok, was not aware it was because of the CI, closing for now...

gustavocoding added the Preview label Sep 28, 2016

Sanne self-assigned this Oct 16, 2016

gustavocoding force-pushed the ISPN-6225 branch from 80ef41d to 2ae0fd9 Compare October 20, 2016 19:47

gustavocoding force-pushed the ISPN-6225 branch from 2ae0fd9 to 8c8f5e4 Compare October 27, 2016 10:10

gustavocoding force-pushed the ISPN-6225 branch from 8c8f5e4 to 148c8d0 Compare November 10, 2016 14:49

gustavocoding force-pushed the ISPN-6225 branch 3 times, most recently from 485aa3e to 640ec59 Compare December 12, 2016 14:44

gustavocoding force-pushed the ISPN-6225 branch from 640ec59 to 5d93cf9 Compare December 13, 2016 09:29

gustavocoding added Ready for Review and removed Preview labels Dec 14, 2016

galderz added the Needs Rebase label Jan 10, 2017

gustavocoding force-pushed the ISPN-6225 branch from 5d93cf9 to 9b785a2 Compare January 10, 2017 19:19

gustavocoding removed the Needs Rebase label Jan 10, 2017

gustavocoding added this to the 9.0.0.Beta2 milestone Jan 19, 2017

gustavocoding added Needs Rebase and removed Ready for Review labels Jan 21, 2017

ISPN-6225 Allow the ratio segments/shard to be configurable for the A…

404ed4a

…ffinityIndexManager

gustavocoding force-pushed the ISPN-6225 branch from 9b785a2 to 404ed4a Compare January 23, 2017 09:48

gustavocoding removed this from the 9.0.0.Beta2 milestone Jan 23, 2017

gustavocoding removed the Needs Rebase label Jan 23, 2017

galderz closed this Jan 23, 2017

gustavocoding reopened this Jan 23, 2017

gustavocoding closed this Jan 23, 2017

gustavocoding mentioned this pull request Jan 31, 2017

ISPN-6225 Allow the ratio segments/shard to be configurable for the A… #4813

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ISPN-6225 Allow the ratio segments/shard to be configurable for the AffinityIndexManager #4576

ISPN-6225 Allow the ratio segments/shard to be configurable for the AffinityIndexManager #4576

gustavocoding commented Sep 28, 2016 •

edited

Loading

gustavocoding commented Oct 18, 2016

gustavocoding commented Oct 20, 2016

gustavocoding commented Oct 24, 2016

Sanne commented Oct 24, 2016

gustavocoding commented Oct 24, 2016

gustavocoding commented Oct 27, 2016 •

edited

Loading

Sanne commented Oct 27, 2016 •

edited by gustavocoding

Loading

gustavocoding commented Oct 27, 2016 •

edited

Loading

gustavocoding commented Oct 27, 2016

gustavocoding commented Dec 16, 2016

galderz commented Jan 10, 2017

gustavocoding commented Jan 10, 2017 •

edited

Loading

gustavocoding commented Jan 10, 2017

galderz commented Jan 23, 2017

gustavocoding commented Jan 23, 2017

gustavocoding commented Jan 23, 2017

galderz commented Jan 23, 2017

galderz commented Jan 23, 2017

galderz commented Jan 23, 2017

gustavocoding commented Jan 23, 2017

gustavocoding commented Jan 23, 2017

galderz commented Jan 23, 2017

gustavocoding commented Jan 23, 2017

ISPN-6225 Allow the ratio segments/shard to be configurable for the AffinityIndexManager #4576

ISPN-6225 Allow the ratio segments/shard to be configurable for the AffinityIndexManager #4576

Conversation

gustavocoding commented Sep 28, 2016 • edited Loading

Summary of changes

Implications

Benchmark

gustavocoding commented Oct 18, 2016

gustavocoding commented Oct 20, 2016

gustavocoding commented Oct 24, 2016

REPL Caches for Index instead of DIST

Sanne commented Oct 24, 2016

gustavocoding commented Oct 24, 2016

Query only, REPL Caches for Index instead of DIST

gustavocoding commented Oct 27, 2016 • edited Loading

Query + Writes, DIST caches, async reader

Sanne commented Oct 27, 2016 • edited by gustavocoding Loading

gustavocoding commented Oct 27, 2016 • edited Loading

gustavocoding commented Oct 27, 2016

gustavocoding commented Dec 16, 2016

galderz commented Jan 10, 2017

gustavocoding commented Jan 10, 2017 • edited Loading

gustavocoding commented Jan 10, 2017

galderz commented Jan 23, 2017

gustavocoding commented Jan 23, 2017

gustavocoding commented Jan 23, 2017

galderz commented Jan 23, 2017

galderz commented Jan 23, 2017

galderz commented Jan 23, 2017

gustavocoding commented Jan 23, 2017

gustavocoding commented Jan 23, 2017

galderz commented Jan 23, 2017

gustavocoding commented Jan 23, 2017

gustavocoding commented Sep 28, 2016 •

edited

Loading

gustavocoding commented Oct 27, 2016 •

edited

Loading

Sanne commented Oct 27, 2016 •

edited by gustavocoding

Loading

gustavocoding commented Oct 27, 2016 •

edited

Loading

gustavocoding commented Jan 10, 2017 •

edited

Loading