-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
benchmark: populating your instance and stress test with locust #31
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couple of clarifications. I like that we see more and more documentation for this.
@@ -0,0 +1,65 @@ | |||
# Benchmarking your Invenio instance | |||
|
|||
## Populate your instance |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Normal/discussion: What is the bigger context for why we are showing how to add fake random records to the real deployed system in the benchmarking page? Maybe this is about "Checking your deployed Invenio instance" or "Verifying your Invenio instance" i.e. 1) check that you can add records to your deployed instances 2) check what is the load your deployment can take ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess it is an optional step to engage people. They have the full experience of a deployed instance with some data there so they can see them in the UI. Personally I perceive the deployment as the containerized
version but without the warning "You should not put this in production" :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how to add fake random records to the real deployed system in the benchmarking page?
It is true I did not mention it, sorry: I wouldn't do this test in a production/real instance. Mainly because it is highly likely to take it down, several times. For example, we managed to render the DB to not respond for some time (~20mins) until the DBAs acted on it... It is quite a hardcore test.
On this, my idea was:
- If you have a production instance, you might not want to make an effort to "duplicate" your data, nor use the prod DB/ES for stress test. However, you do want to create a sensible amount of records that represent the amount of records in prod. E.g. ES does not search as fast with 1 record than with 3M. It might need improvements on the shards rather than the application architecture, but still this test would point out slow parts.
- You do not have a production instance. So you need to create "dummy" records to simulate what you would expect to have.
On naming on the section, I am open to change it. But I would still mention this here :) WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about "Creating demo records" and explain a bit what you said about the purpose behind it? I think the 2 examples you just mentioned are quite useful for people :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding some of the context/explanation/motivation you just provided would be helpful and changing the filename + main title from "Benchmarking your InvenioRDM instance" to "Stress testing your InvenioRDM instance" (or "Load testing" or "verifying" or something like that :) ) would clarify the context.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice. Last thing for me is just the filename/title change. You can merge once you are happy with the name.
@fenekku thanks once again! I will apply the changes and merge. However, about naming: I actually prefer benchmarking, because even if we have used for stress test it doesnt have to be a stress test... You can just run your instance on a specific load and "benchmark" it. We did stress test it by getting as high as we could (on requests haha) but it doesnt have to be that way. Plus for title its shorter. WDYT? |
I am not too attached to the name. That being said, what I find awkward is the record creation part under benchmarking... The locust part does benchmarking, but the record creation part is more for making sure the application is operating correctly, yes? Maybe I missed it, but we are not measuring anything when it comes to creating records; it's more a binary of did it work or not.
? |
I personally dont like "verifying" cuz it sounds like a "config check" or so not like testing performance. In terms of record creation, it can be placed somewhere else, however I did not want to create yet another top level section and I did not fin any fit. Where would you put it? (I'll move it and finish this PR :D) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is really small, but it actually connects the "Populate your instance" section with the "Benchmark" making its presence in this file all good for me. You can merge with this.
@fenekku sorry about the misunderstanding on:
I hadn't commit the text with the explanations, cuz I wanted to do it just once (thats why I wanted to clarify the naming first). To avoid ping-pong, it seems I managed the contrary 👼 I'm sorry. I have added the following explanation (see in file also):
Which justifies the data population, and is basically what I explained above in the discussion point also with Zach. |
Aaah I get it now. All good. |
Adds:
Populating the instance with rpm-records demo. Points to the bulk indexing used in helm-invenio, which is much more performant (3M records in a few hours). I think
rdm-records
should stay as is, 10 records is more than enough and is better not to add complexity. WDYT?How to use locust and points to the locust.py file.
closes #34