Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Perfomance suffers near 2.8 million docs #1696

Closed
wojons opened this issue Nov 23, 2013 · 16 comments
Closed

Perfomance suffers near 2.8 million docs #1696

wojons opened this issue Nov 23, 2013 · 16 comments

Comments

@wojons
Copy link
Contributor

wojons commented Nov 23, 2013

I have a table with about 2.8 million docs according to the approxmation system. everytime i get near this number if is a cluster of machines or a single machine i start getting werid issues. For example when trying to write 200 docs i will get socket timeout errors. This insert is the only operation going on.

@wojons
Copy link
Contributor Author

wojons commented Nov 23, 2013

also not sure if this will be useful at all if i rename the table and then make a new table with the same name as the old and start inserting data into the new table there is no issue anymore. I also have 2 indexes on the table if that has anything to do with it.

@coffeemug
Copy link
Contributor

This sounds like it might be a caching issue. I'd like to wait until the new cache gets in, and then test this. Moving to subsequent so we can keep track of it.

@coffeemug
Copy link
Contributor

FYI, relevant cache issues are #1642 and #97.

@wojons
Copy link
Contributor Author

wojons commented Nov 23, 2013

Not that I think your wrong or anything normally I have seen this sort of thing its an issue in the amount if time it takes to find and move objects with in the three so the new object can go there. Also this was very sudden. 2.6million was fine I think 2.7m was fine then just suddenly things started failing

@coffeemug
Copy link
Contributor

The suddenness of it would imply that there may be a significant part of the tree that can no longer be accessed from RAM directly and needs to be read in from disk. It's just a hypothesis though -- definitely could be wrong as I haven't ran any tests. I'd prefer to wait until the new cache is in to retest this (there of course might be other things we need to fix then), but we'll see what we can do.

@wojons
Copy link
Contributor Author

wojons commented Nov 23, 2013

Would it be useful if I remade the table and reimported everything and this time I put 4gb of ram. I can also look into dropping one or both secondary indexes.

@coffeemug
Copy link
Contributor

Yes -- these would be very helpful tests, thank you!

@wojons
Copy link
Contributor Author

wojons commented Nov 24, 2013

@coffeemug i spent some time working with @danielmewes on this. I dropped a secondary index but system got into a weird state and dropped the table, we then ran it with a table with 4gb cache (but because of cache bug became 16gb when then lowered it to 2gb so it would be 8gb of cache. something interesting is that the problem still occurred around the 2.7 million record mark. then did made a change to try it with --no-direct-io and since then no problems have occurred.

@wojons
Copy link
Contributor Author

wojons commented Dec 4, 2013

@coffeemug how would you like to proceed with this?

@coffeemug
Copy link
Contributor

@wojons -- thanks for the ping. We are working through the performance issues now and there is a lot of stuff happening internally to get these all fixed. There are some issues we're working on that might resolve this one, so unfortunately we have to wait until those get into next. So the only thing to do here is wait.

We definitely did not forget about this. It will be tested and fixed before the LTS. I'm sorry it's going to take a bit of time, but unfortunately that's out of my hands.

@wojons
Copy link
Contributor Author

wojons commented Dec 4, 2013

Oh its fine just wanted to make sure you had the information about the memory stuff related to it i know the new cache system should help a lot with these types of issues

@wojons
Copy link
Contributor Author

wojons commented Dec 6, 2013

Just wanted to add that i split the data between 2 tables with 1023mb cache on 1.11.1 and there was no perfomance issue at all.

@coffeemug
Copy link
Contributor

@wojons -- one more question. Was the underlying disk an SSD or a rotational drive? (can't believe I didn't ask this before)

@wojons
Copy link
Contributor Author

wojons commented Dec 17, 2013

Spinning disk.

@coffeemug
Copy link
Contributor

Ok, this is pretty clearly a caching/ram issue. Moving to backlog. I think a good resolution here would be to give an indication in the web UI that the cache is thrashing, but this is outside the scope of the LTS.

@danielmewes
Copy link
Member

A lot has been improved since this issue was observed. I suspect this might have been fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants