Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

benchmark: populating your instance and stress test with locust #31

Merged
merged 2 commits into from
Mar 24, 2020

Conversation

ppanero
Copy link
Member

@ppanero ppanero commented Mar 18, 2020

Adds:

  • Populating the instance with rpm-records demo. Points to the bulk indexing used in helm-invenio, which is much more performant (3M records in a few hours). I think rdm-records should stay as is, 10 records is more than enough and is better not to add complexity. WDYT?

  • How to use locust and points to the locust.py file.

closes #34

Copy link
Collaborator

@fenekku fenekku left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple of clarifications. I like that we see more and more documentation for this.

docs/deployment/benchmark.md Outdated Show resolved Hide resolved
docs/deployment/benchmark.md Outdated Show resolved Hide resolved
docs/deployment/benchmark.md Outdated Show resolved Hide resolved
@@ -0,0 +1,65 @@
# Benchmarking your Invenio instance

## Populate your instance
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Normal/discussion: What is the bigger context for why we are showing how to add fake random records to the real deployed system in the benchmarking page? Maybe this is about "Checking your deployed Invenio instance" or "Verifying your Invenio instance" i.e. 1) check that you can add records to your deployed instances 2) check what is the load your deployment can take ?

Copy link
Member

@zzacharo zzacharo Mar 19, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it is an optional step to engage people. They have the full experience of a deployed instance with some data there so they can see them in the UI. Personally I perceive the deployment as the containerized version but without the warning "You should not put this in production" :)

Copy link
Member Author

@ppanero ppanero Mar 19, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how to add fake random records to the real deployed system in the benchmarking page?

It is true I did not mention it, sorry: I wouldn't do this test in a production/real instance. Mainly because it is highly likely to take it down, several times. For example, we managed to render the DB to not respond for some time (~20mins) until the DBAs acted on it... It is quite a hardcore test.

On this, my idea was:

  • If you have a production instance, you might not want to make an effort to "duplicate" your data, nor use the prod DB/ES for stress test. However, you do want to create a sensible amount of records that represent the amount of records in prod. E.g. ES does not search as fast with 1 record than with 3M. It might need improvements on the shards rather than the application architecture, but still this test would point out slow parts.
  • You do not have a production instance. So you need to create "dummy" records to simulate what you would expect to have.

On naming on the section, I am open to change it. But I would still mention this here :) WDYT?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about "Creating demo records" and explain a bit what you said about the purpose behind it? I think the 2 examples you just mentioned are quite useful for people :)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding some of the context/explanation/motivation you just provided would be helpful and changing the filename + main title from "Benchmarking your InvenioRDM instance" to "Stress testing your InvenioRDM instance" (or "Load testing" or "verifying" or something like that :) ) would clarify the context.

docs/deployment/benchmark.md Show resolved Hide resolved
Copy link
Collaborator

@fenekku fenekku left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice. Last thing for me is just the filename/title change. You can merge once you are happy with the name.

docs/deployment/benchmark.md Outdated Show resolved Hide resolved
@ppanero
Copy link
Member Author

ppanero commented Mar 20, 2020

@fenekku thanks once again! I will apply the changes and merge. However, about naming: I actually prefer benchmarking, because even if we have used for stress test it doesnt have to be a stress test... You can just run your instance on a specific load and "benchmark" it.

We did stress test it by getting as high as we could (on requests haha) but it doesnt have to be that way. Plus for title its shorter. WDYT?

@fenekku
Copy link
Collaborator

fenekku commented Mar 20, 2020

I am not too attached to the name. That being said, what I find awkward is the record creation part under benchmarking... The locust part does benchmarking, but the record creation part is more for making sure the application is operating correctly, yes? Maybe I missed it, but we are not measuring anything when it comes to creating records; it's more a binary of did it work or not.

  • Maybe record creation could be placed in another file...
    or
  • Maybe "Verifying your InvenioRDM instance" verifying.md

?

@ppanero
Copy link
Member Author

ppanero commented Mar 23, 2020

I personally dont like "verifying" cuz it sounds like a "config check" or so not like testing performance. In terms of record creation, it can be placed somewhere else, however I did not want to create yet another top level section and I did not fin any fit. Where would you put it? (I'll move it and finish this PR :D)

Copy link
Collaborator

@fenekku fenekku left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really small, but it actually connects the "Populate your instance" section with the "Benchmark" making its presence in this file all good for me. You can merge with this.

docs/deployment/benchmark.md Outdated Show resolved Hide resolved
@ppanero
Copy link
Member Author

ppanero commented Mar 23, 2020

@fenekku sorry about the misunderstanding on:

I will apply the changes and merge. However, about naming [...]

I hadn't commit the text with the explanations, cuz I wanted to do it just once (thats why I wanted to clarify the naming first). To avoid ping-pong, it seems I managed the contrary 👼 I'm sorry.

I have added the following explanation (see in file also):

The operations that we are going to run in this section can be highly
 demanding. Therefore, you might not want to run them against a production
 instance since it might cause downtime.

  Nonetheless, we want to have an environment that resembles as close as
 possible to our InvenioRDM instance. In consequence, we might want to add some
 data to our instance in order to have some records to stress-test against.

Which justifies the data population, and is basically what I explained above in the discussion point also with Zach.

@fenekku
Copy link
Collaborator

fenekku commented Mar 24, 2020

Aaah I get it now. All good.

@ppanero ppanero merged commit 70ef0e1 into inveniosoftware:master Mar 24, 2020
@ppanero ppanero deleted the benchmark branch March 30, 2020 09:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

benchmarking: document how to benchmark an instance with locust
3 participants