Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Insert big docs #250

Merged
merged 3 commits into from Jun 21, 2019
Merged

Insert big docs #250

merged 3 commits into from Jun 21, 2019

Conversation

ldennis
Copy link
Contributor

@ldennis ldennis commented Jun 21, 2019

This adds a new workload that inserts big documents.

@ldennis ldennis requested a review from a team as a code owner June 21, 2019 15:48
@ldennis ldennis requested a review from guoyr June 21, 2019 15:49
Copy link
Contributor

@guoyr guoyr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💯

Copy link
Collaborator

@rtimmons rtimmons left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good thanks for adding. Couple comments for you to consider but nothing wrong with this as it is if it measures what you're looking for.

src/workloads/scale/InsertBigDocs.yml Outdated Show resolved Hide resolved
BatchSize: 1
Document:
x: {^RandomInt: {min: 0, max: 2147483647}}
string0: {^RandomString: {length: 15000000}}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. I would suggest using string0: {^FastRandomString: {length: 15000000}}. The tradeoff is slightly less entropy but much faster generation of the strings. I made this change on my laptop and genny was able to produce writes of around 80mb/sec versus only 25 mb/sec with regular RandomString.

  2. Consider dividing up the load between a number of threads. I.e. change Threads: 1 to something like Threads: 10 and then divide DocumentCount: by the new threadcount to maintain the same number of documents. This will of course change what the workload is doing - doing multiple writes in parallel so may not be what you're aiming for. But since this is single-threaded as it is you're going to be waiting between each insert_one invocation for genny to produce the next document.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Yeah good call. FastRandomString is perfect for this use case
  2. I'm pretty sure we want just 1 thread for this particular workload. The 10 thread case may be interesting but can be done as future work (in the form of another phase)

@@ -36,7 +36,7 @@
'console_scripts': [
'genny-metrics-report = genny.cedar_report:main__cedar_report',
'genny-metrics-legacy-report = genny.legacy_report:main__legacy_report',
'lint-yaml = genny.workload_linter:main'
'lint-yaml = genny.yaml_linter:main'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@guoyr
Copy link
Contributor

guoyr commented Jun 21, 2019

Stealing this PR 😛 since I promised @ldennis to get some DSI results today.

@guoyr guoyr merged commit 261e280 into mongodb:master Jun 21, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants