Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Serialize index boost and phrase suggest collation keys in a consistent order #20081

Closed
wants to merge 8 commits into from

Conversation

chengpohi
Copy link
Contributor

@chengpohi chengpohi commented Aug 19, 2016

Closes #19986

This cache Key not equal issue is caused by when stream out Map and stream in Map has the different keys order. so the generated bytes are not equal

The possible affected maps include: ObjectFloatHashMap , HashMap. for LinkedHashMap has accessOrder to keep the order.

  1. when ObjectFloatHashMap insert values, it will calculate slot by hashKey and a random generated keyMixer, so even though same data has the different slot(means different order).
    stream out and stream in maybe will generate different orders key-values.
  protected  int hashKey(KType key) {
    assert !((key) == null); // Handled as a special case (empty slot marker).
    return BitMixer.mix(key, this.keyMixer);
  }
  1. HashMap, although HashMap doesn't have random seed to calculate slot, for HashMap public HashMap(Map<? extends K, ? extends V> m), this constructor will pre-size the HashMap, Example(PhraseSuggestionBuilder):
    public PhraseSuggestionBuilder collateParams(Map<String, Object> collateParams) {
        Objects.requireNonNull(collateParams, "collate parameters cannot be null.");
        this.collateParams = new HashMap<>(collateParams); //putAll collateParams to the HashMap
        return this;
    }

this will cause if we use this to construct HashMap, use other HashMap constructor public HashMap(int initialCapacity) to stream in, this will not pre-size:

    private Map readHashMap() throws IOException {
        int size10 = readVInt();
        Map map10 = new HashMap(size10);
        for (int i = 0; i < size10; i++) {
            map10.put(readString(), readGenericValue());
        }
        return map10;
    }

it will cause the different keys order because slot calculate factor is not same.

Solution:

  1. When stream out the ObjectFloatHashMap, try to order their String key to ensure every request stream out have same bytes order, like:
+        List<Tuple<String, Float>> ibs = StreamSupport
+            .stream(indexBoost.spliterator(), false)
+            .map(i -> tuple(i.key, i.value)).sorted((o1, o2) -> o1.v1().compareTo(o2.v1()))
+            .collect(Collectors.toList());
  1. for PhraseSuggestionBuilder collateParams custom map writer with consistent order

@@ -250,11 +254,7 @@ public void writeTo(StreamOutput out) throws IOException {
boolean hasIndexBoost = indexBoost != null;
out.writeBoolean(hasIndexBoost);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While we are able to break the wire protocol, can you remove the boolean and just use size=0 on read to avoid building the map? May as well size a boolean over the wire and in the cache key

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nik9000 , good call 👍

@nik9000
Copy link
Member

nik9000 commented Aug 19, 2016

Fix looks right to me. I left a suggestion for a thing that'd be nice to touch while you are there but isn't required.

@nik9000
Copy link
Member

nik9000 commented Aug 19, 2016

Oh! Can you update the PR description to make to make it clear what the bug you were fixing is? When we cut the release notes we do it with links to PRs rather than issues so it is convenient to have a useful description in the PR.


private void assertMapsOrder(Iterator<Map.Entry<String, Object>> streamInMap, Iterator<Map.Entry<String, Object>> streamOutMap) {
while(streamInMap.hasNext()) {
assertEquals(streamInMap.next().getKey(), streamOutMap.next().getKey());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the method's name I expected it to check the values too.

* write map to stream with consistent order
* to make sure every map generated bytes order are same
*/
public void writeMapWithConsistentOrder(@Nullable Map<String, Object> map) throws IOException {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is for writing map to stream have the consistent order.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like how this makes reading the read and write side easy. Can you add a javadoc to the method saying it is compatible with readMap and readGenericValue?

Can you add a dedicated test case for this method in BytesStreamsTests?

I think we shouldn't try to handle sorting recursively in this, at least for now. Can you say something about only sorting the keys of the map, not maps contained within the map?

Maybe the signature for the method should be public <K extends Comparable<K>> void writeMapWithConsistentOrder(@Nullable Map<K, ? extends Object> map) throws IOException? That way it can function on Map<String, String> and Map<Integer, String> or any other weirdness we need to throw at it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, If don't care about the value's type, can just use writeGenericValue.

About the generic type for key, I don't think we can do it, because if we use generic type for key, we need to use writeGenericValue for writing key. but for writeGenericValue, it will write extra byte for Type, like String use byte 0 for String type. so if we do this, we need to create a new method maybe like readGenericHashMap.
but this will cause writeMapWithConsistentOrder method is incompatible with readMap and readGenericValue. so How about your idea?

and for the other writeMap, they also use String type as key. maybe we can transform Integer to String when we need to writeMap with consistent order.


public void testWriteMapWithConsistentOrder() throws IOException {
Map<String, Object> map = new HashMap<>();
map.put("gWrgS", randomAsciiOfLength(5));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why hardcode these test keys?

For the HashMap default capacity is 16, and threshold is 12(DEFAULT_LOAD_FACTOR(0.75) * DEFAULT_INITIAL_CAPACITY(16)).

for these keys, their slots should be:

Key Slot
gWrgS 10
HLRYi 4
HyKnF 12

and for Map<String, Object> reOrderMap = new HashMap<>(map); it will recalculate capacity size and threshold, capacity: 8, threshold: 6,

and the generated slots should be:

Key Slot
gWrgS 2
HLRYi 4
HyKnF 4 (collision!!! )

when met collision, it will append this key into the existed node.

for these test keys, when met collision,this two maps order will not equal.

but these two map generated bytes still should be equal.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not super comfortable relying on the internals of HashMap for this. I wonder if we could get away instead with two TreeMaps, one with natural ordering and one with Comparator.reverse().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, TreeMap is a good choice to generate the different order Map.

BytesStreamOutput output1 = new BytesStreamOutput();
Map<String, Object> map1 = new LinkedHashMap<>();
try {
output1.writeMapWithConsistentOrder(map1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use expectThrows for this sort of testing. something like

Throwable e = expectThrows(AssertionError.class, () -> output.writeMapWithConsistentOrder(map));
assertEquals("Use writeMap with LinkedHashMaps", e.getMessage());

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, nearly not know this usage. :)

@nik9000
Copy link
Member

nik9000 commented Aug 24, 2016

Looks good to me. I'll yank it locally now and test. Assuming everything passes I'll merge. Thanks for fixing this!

import static java.util.Collections.emptyList;
import static org.elasticsearch.search.builder.SearchSourceBuilderTests.createSearchSourceBuilder;

public class SearchSourceBuilderTest extends ESTestCase {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The build failed because this test should be named SearchSourceBuilderTests with a s on the end. There is actually already a file with that name so maybe these should be just be combined in there?

@nik9000
Copy link
Member

nik9000 commented Aug 24, 2016

Build failed locally - I left a comment explaining the failure. You can catch similar failures with gradle precommit which'll run things like checkstyle. gradle check will run all the tests will take a long time. I have to do that before I can merge it and now that we're close to ready to merge it is probably worth it for you to get that passing locally.

@chengpohi
Copy link
Contributor Author

oh, very sorry about this, I only run gradle test in my local.

@nik9000
Copy link
Member

nik9000 commented Aug 26, 2016

Testing it locally and everything passing so far. This doesn't merge cleanly with master but the manual merge is fairly obvious. Once it passes locally I'll manually squash and merge.

@chengpohi
Copy link
Contributor Author

chengpohi commented Aug 26, 2016

Thanks @nik9000

@nik9000
Copy link
Member

nik9000 commented Aug 26, 2016

Thanks nik9000

Thanks for fixing this!

@nik9000
Copy link
Member

nik9000 commented Aug 26, 2016

Squashed and merged as 22242ec.

@nik9000 nik9000 closed this Aug 26, 2016
@clintongormley clintongormley changed the title When serialization indexBoost the keys should have unique order When serializing indexBoost the keys should have a deterministic order Aug 29, 2016
writeByte((byte) -1);
return;
}
assert false == (map instanceof LinkedHashMap);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this should be an IllegalStateException rather than assertion. @nik9000 what do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is fairly safe as an assertion. Mostly we are in control of the types we pass in here. Any LinkedHashMap isn't wrong to pass here, it is just super silly because we'll reorder it. If we want something stronger than an assertion then maybe we should do something like

if (map instanceof LinkedHashMap) {
    // Already has consistent order
    writeMap(map);
    return;
}

?

javanna added a commit that referenced this pull request Aug 29, 2016
Now that #20081 is merged we can check that cacheKey is consistent across equal search requests, something that wasn't true before due to ordering of map keys when using index boost.

Relates to #19986
@nik9000 nik9000 changed the title When serializing indexBoost the keys should have a deterministic order Serialize index boost and phrase suggest collation keys in a consistent order Aug 29, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants