Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NullPointerExceptions during stress testing with ORecordCacheSoftRefs #6686

Closed
agarciadom opened this issue Sep 12, 2016 · 2 comments
Closed
Assignees
Labels
Milestone

Comments

@agarciadom
Copy link
Contributor

agarciadom commented Sep 12, 2016

OrientDB Version, operating system, or hardware.

2.2.8

Operating System

Linux (Ubuntu 16.04)

Expected behavior and actual behavior

During a stress test with 8+ concurrent connections (2-core machine with hyperthreading) a to an embedded local database with 5M+ nodes while using ORecordCacheSoftRefs, Orient will sometimes fail to retrieve records due to NullPointerExceptions originated in the evictStaleEntries method of the underlying OSoftRefsHashMap cache.

In some heavily starved situations, reverseLookup.remove(sv) will return null:

  private void evictStaleEntries() {
    int evicted = 0;

    Reference<? extends V> sv;
    while ((sv = refQueue.poll()) != null) {
      hashCodes.remove(reverseLookup.remove(sv));
      evicted++;
    }

    if (evicted > 0)
      OLogManager.instance().debug(this, "Evicted %d items", evicted);
  }

Steps to reproduce the problem

  • Create N*2 ODatabaseDocumentTx instances pointing to the same plocal:// embedded database.
  • Run a long running query in all of them at once.
  • Some of the initial queries will work, but once Java starts to reclaim memory, some of the queries will start to fail.

I am not sure why the reverseLookup.remove(sv) call might be returning null: this happens even if I use a simple ThreadLocal<ODatabaseDocumentTx> instead of one of Orient's pools (ensuring each thread has its own instance).

Proposed fix

Adding a simple null check to the method seems to work fine:

    private void evictStaleEntries() {
        int evicted = 0;

        Reference<? extends V> sv;
        while ((sv = refQueue.poll()) != null) {
            final K key = reverseLookup.remove(sv);
            if (key != null) {
                hashCodes.remove(key);
                evicted++;
            }
        }

        if (evicted > 0)
            OLogManager.instance().debug(this, "Evicted %d items", evicted);
    }
@tglman
Copy link
Member

tglman commented Sep 13, 2016

hi @bluezio,

Nice investigation, and as well the fix you suggest make sense, feel free to do a pull request about it, we will be happy to merge it.

In any case we are going to double check that part.

bye

@tglman
Copy link
Member

tglman commented Sep 14, 2016

manually merged, closing this

@tglman tglman closed this as completed Sep 14, 2016
@robfrank robfrank modified the milestones: 2.2.x (next hotfix), 2.2.10 Sep 15, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

No branches or pull requests

4 participants