Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Overhead does not decrease on MongoDB Cluster for read operation #744

Open
vreniers opened this issue May 25, 2015 · 5 comments
Open

Overhead does not decrease on MongoDB Cluster for read operation #744

vreniers opened this issue May 25, 2015 · 5 comments

Comments

@vreniers
Copy link

Hi,

I've performed various benchmarks on a local node system (with MongoDB and YCSB running on the same machine). And a MongoDB cluster using 9 nodes (1 router server, 3 config servers and 5 database shards) using no ReplicaSets but sharding enabled on the collection to provide even distribution of the objects across my nodes.

When evaluating the performance of this abstraction layer, there is one key assumption that I make:
I assume that the overhead induced by the abstraction layer is constant per operation. On a local machine, there is no network delay or packet travel time. Since the overhead in theory should remain constant, the usage of Kundera should have a lower relative overhead as the total runtime increases due to network delay.

In my observations however, I concluded that the overhead does not decrease as expected for the read operation. In fact it increases slightly compared to the read overhead on the local machine. The overhead however does decrease for write, read-update and update operations by a fair percentage.

There is no doubt in my results as I have taken a very large sample size. And I'm talking about reading in various steps: from 100K to 1.000K records from a large data set.

Is there any explanation possible as to why the overhead for read would not decrease in a similar fashion for the cluster? For my setup I use a single threaded execution, with cache clear after 1000 operations. Can it have something to do with the usage of transaction?

@devender-yadav
Copy link
Contributor

@vreniers
This overhead for read should not increase in the cluster due to the abstraction layer. Are you comparing the cumulative time for each operation performed via Kundera with/without cluster setup and observing a lag?

@vreniers
Copy link
Author

@devender-yadav

I'm comparing the runtime for 1 million reads on the cluster and the local node with and without kundera. I have taken a lot of samples with YCSB and executed this read-workload many times.

These are the results I have:

Read local node Kundera: ~ 91 seconds
Read local node Native MongoDB API: ~ 84 seconds
Overhead: 10%

Read cluster Kundera: 492 seconds
Read cluster native MongoDB API: 413 seconds
Overhead: 19%

It is not what I would expect. I assume that the overhead from a read operation in Kundera is constant. When using the cluster, the overhead should remain constant for each operation, the overall runtime increases and as such the relative overhead should decrease in comparison to the MongoDB native API.

I'm using transactions and a single thread in YCSB.
Any ideas on what causes this behavior?

@devender-yadav
Copy link
Contributor

@vreniers

Can you please share some points for more clarification:

  • persistence.xml settings for Kundera mongo unit.
  • Logging level enabled for Kundera and Native YCSB client?
  • Share code snippet of YCSB MongoDB client, Kundera used for connection handling.

-Devender

@vreniers
Copy link
Author

Persistence.xml file:

<persistence-unit name="kundera-mongodb">
        <provider>com.impetus.kundera.KunderaPersistence</provider>
        <properties>
            <!-- <property name="kundera.nodes" value="192.168.145.168" /> -->
            <property name="kundera.nodes" value="localhost" />
            <property name="kundera.port" value="27017" />
            <property name="kundera.keyspace" value="kundera" />
            <property name="kundera.dialect" value="mongodb" />
            <property name="kundera.client.lookup.class"
                value="com.impetus.client.mongodb.MongoDBClientFactory" />
<!--            <property name="kundera.cache.provider.class" value="com.impetus.kundera.cache.ehcache.EhCacheProvider" />           -->
<!-- <property name="kundera.cache.config.resource" value="/ehcache-test.xml" /> -->
<!--            <property name="kundera.pool.size.max.active" value="5" /> -->
<!--            <property name="kundera.pool.size.max.total" value="5" /> -->
            <property name="kundera.client.property" value="kunderaMongoTest.xml" />
        </properties>
</persistence-unit>

The kunderaMongoTest.xml contains read.prefence primary. However this should not matter,
since I'm not using ReplicaSets. The connection pool was enabled during the benchmark, but since i'm only testing for a single thread in YCSB this should not influence performance. Each read has to wait for the next, so multiple connections can't be used. The same settings were used on the local node.

This is my read operation in Kundera.

EntityManager em = emf.createEntityManager();

User u = em.find(User.class, key);

if(amountOps++ % 1000 == 0)
      em.clear();

em.close();

The logging should be disabled. The YCSB MongoDB client is based on the one from your benchmark. I've modified it slightly though, to make it more up-to-date with the lastest driver.

@devender-yadav
Copy link
Contributor

@vreniers

What is the logging level used with Kundera & native mongo client and what is the distribution: uniform or zipfian?
It would be helpful in verifying the issue if you can share native client code. You can share code at kundera@impetus.co.in.

-Devender

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants