Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Raise an error when attempting to bulk-index without any corpora #568

Closed
danielmitterdorfer opened this issue Sep 6, 2018 · 2 comments
Closed
Labels
enhancement Improves the status quo good first issue Small, contained changes that are good for newcomers help wanted We'd be happy about a community contribution :Usability Makes Rally easier to use
Milestone

Comments

@danielmitterdorfer
Copy link
Member

source: https://discuss.elastic.co/t/no-throughput-metrics-available-for-bulk-likely-cause-the-benchmark-ended-already-during-warmup/147368/8

The root cause of the problem in the discussion above was that the user has used outdated track syntax and wondered why no documents have been bulk-indexed. We could make this more explicit by raising an error in the bulk parameter source if the list of corpora is empty.

@danielmitterdorfer danielmitterdorfer added enhancement Improves the status quo help wanted We'd be happy about a community contribution :Usability Makes Rally easier to use labels Sep 6, 2018
@danielmitterdorfer danielmitterdorfer added this to the 1.x milestone Sep 6, 2018
@danielmitterdorfer danielmitterdorfer added the good first issue Small, contained changes that are good for newcomers label Feb 19, 2020
@bartier
Copy link
Contributor

bartier commented Apr 13, 2020

Hi! Can I try work on this?

I made a test trying to run Rally like below but I removed the corpora definition for track 'percolator' in my local default repository to force this error:

./rally --track=percolator --challenge=append-no-conflicts --kill-running-processes --distribution-version 7.6.0

~/.rally/benchmarks/tracks/default/percolator/track.json

{% import "rally.helpers" as rally with context %}

{
  "version": 2,
  "description": "Percolator benchmark based on AOL queries",
  "indices": [
    {
      "name": "queries",
      "body": "index.json"
    }
  ],
  "operations": [
    {{ rally.collect(parts="operations/*.json") }}
  ],
  "challenges": [
    {{ rally.collect(parts="challenges/*.json") }}
  ]
}


I got the following error:
image

Is this a valid way to reproduce this error? If yes, I would propose your validation suggestion in TrackSpecificationReader#_create_corpora if the list of corpora is empty:

    def _create_corpora(self, corpora_specs, indices):
        if len(corpora_specs) == 0:
            raise exceptions.TrackConfigError(f"There is no document corpora definition for track {self.name}.")
        document_corpora = []
        known_corpora_names = set()
        ...
        ...

With the implementation above I got the TrackConfigError when trying to use a track without any corpora:
image

By the way, there is a None in the error message and I'm not sure if this is something that could be avoided to the user.

@danielmitterdorfer
Copy link
Member Author

It is perfectly fine to define a track without a corpus (for example if you only want to run a query benchmark). I'd instead modify the constructor of BulkIndexParamSource. Here it determines which corpora should be used.

self.corpora = self.used_corpora(track, params)

After that line I'd add a check whether that list is empty and if it is raise exceptions.InvalidSyntax.

By the way, there is a None in the error message and I'm not sure if this is something that could be avoided to the user.

Good point; this is likely due to the top-level error handler:

rally/esrally/rally.py

Lines 746 to 760 in cc2296b

logging.getLogger(__name__).exception("Cannot run subcommand [%s].", sub_command)
msg = str(e.message)
nesting = 0
while hasattr(e, "cause") and e.cause:
nesting += 1
e = e.cause
if hasattr(e, "message"):
msg += "\n%s%s" % ("\t" * nesting, e.message)
else:
msg += "\n%s%s" % ("\t" * nesting, str(e))
console.error("Cannot %s. %s" % (sub_command, msg))
console.println("")
print_help_on_errors()
return False

I assume that this exception has no cause attached and we mistakenly extract None at some point. IMHO this should be solved separately from this issue here though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Improves the status quo good first issue Small, contained changes that are good for newcomers help wanted We'd be happy about a community contribution :Usability Makes Rally easier to use
Projects
None yet
Development

No branches or pull requests

2 participants