Cnc merge #12

Merged
merged 19 commits into from Jul 3, 2013

Conversation

Projects
None yet
4 participants

Merge in CNC classes

@divchenko divchenko commented on the diff Jun 27, 2013

...n/java/com/browseengine/bobo/api/BoboIndexReader.java
@@ -192,7 +191,9 @@ public IndexReader getInnerReader()
@Override
public synchronized IndexReader reopen() throws CorruptIndexException,
IOException {
- IndexReader newInner = null;
+ throw new UnsupportedOperationException();
+ /*
@divchenko

divchenko Jun 27, 2013

Please just remove unused code.

@jhartman

jhartman Jun 27, 2013

Sure happy to take it out. I'm not completely sure why Vlod made that change. It would be a little weird to reopen a bobo index reader anyway.

@divchenko

divchenko Jun 28, 2013

Please take it out and merge

@divchenko divchenko commented on the diff Jun 27, 2013

...in/java/com/browseengine/bobo/sort/SortCollector.java
public CollectorContext(BoboIndexReader reader, int base, DocComparator comparator) {
this.reader = reader;
this.base = base;
this.comparator = comparator;
+ _runtimeFacetMap = reader.getRuntimeFacetHandlerMap();
+ _runtimeFacetDataMap = reader.getRuntimeFacetDataMap();
+ }
+
+ public void restoreRuntimeFacets() {
@divchenko

divchenko Jun 27, 2013

When do you call this method?

@jhartman

jhartman Jun 27, 2013

I'm not sure. I think Vlod or Lei did this before they left.

@jhartman

jhartman Jun 27, 2013

Looks like it is used in the sensei result merger to get the runtime facets per index reader.

@divchenko divchenko commented on the diff Jun 27, 2013

...wseengine/bobo/facets/filter/AdaptiveFacetFilter.java
@@ -106,23 +116,37 @@ public RandomAccessDocIdSet getRandomAccessDocIdSet(BoboIndexReader reader)
// takeComplement is only used to choose between TermListRandomAccessDocIdSet and innerDocSet
int validFreqCount = _takeComplement ? (totalCount - freqCount) : freqCount;
-
- if (_facetDataCacheBuilder.getIndexFieldName() != null && ((validFreqCount<<1) < totalCount)) {
+
+ int invertedIndexCost = estimateInvertedIndexCost(validFreqCount, _valSet.size(), totalCount);
+ int forwardIndexCost = estimateForwardIndexCost(validFreqCount, _valSet.size(), totalCount);
+
+ if (_facetDataCacheBuilder.getIndexFieldName() != null && invertedIndexCost < forwardIndexCost) {
@divchenko

divchenko Jun 27, 2013

While estimation of num hits * log(num qieries) is technically correct, the comparison is only fair when inverted index hits are currently in memory. That's why you probably need to introduce inverted_index_penalty. I'd think that just a simple flag to control whether to hit inverted vs forward index for filtering would solve this problem.
I'm not totally against the change as I think it's still better than (validFreqCount<<1) < totalCount

@jhartman

jhartman Jun 27, 2013

That will not work easily because sometimes we want to go to the inverted index for some columns and other times we want the forward index.

Plus by making it configurable someone can have a different tradeoff depending on what they see based on their configuration of memory, disk, SSD, etc. All uscp data is in memory but for many cases it is still faster to use the forward index because a skip list is not that fast compared to a linear scan when the selectivity is high.

We saw CPU reduce about 15% as a result of this change.

@rahula rahula added a commit that referenced this pull request Jul 3, 2013

@rahula rahula Merge pull request #12 from linkedin/cnc-merge
Cnc merge
dfa23a2

@rahula rahula merged commit dfa23a2 into master Jul 3, 2013

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment