-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Validations support #44
Conversation
@@ -104,33 +106,125 @@ public static void runQueryBenchmarks(List<QueryBenchmark> queryBenchmarks, Stri | |||
|
|||
for (int threads = benchmark.minThreads; threads <= benchmark.maxThreads; threads++) { | |||
QueryGenerator queryGenerator = new QueryGenerator(benchmark); | |||
List<SolrBenchQuery> queryResults = new Vector<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Experimented a bit in this change to see whether we can get generated queries from controllerExecutor#run
instead. 😊
The proposed changes aim at:
- Keep the query supplier focused on generating/supplying the queries, it might not be entirely obvious of why query supplier having an list of
queryResults
as input param (and as mutable, which triggers other concerns such as concurrent access, logic flow etc). - On the other end if
controllerExecutor#run
method returns a list of "result", that would be more straight forward to understand - these are the results obtained from executing all the supplied runnable/callables.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think once ^ this one is merged, I'll rebase this PR. :-)
…-callable Proposals to use callable instead of runnable and collect executed qu…
if (StressMain.generateValidations) { | ||
int totalDocsIndexed = getTotalDocsIndexed(baseUrl, collection); | ||
|
||
FileWriter outFile = new FileWriter("suites/validations-"+benchmark.queryFile+"-docs-"+totalDocsIndexed+"-queries-"+benchmark.totalCount+".tsv"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we use json for this file since:
- The facet part is already formatted as json
- This would give us better flexibility to do different types of comparison that TSV. For example we can add the total doc indexed as a part of the result and also verify different params per query result in the future (more than just numFound and facets)
- In general it's easier to follow json with keys (especially that the steps require dev to go through the generated files) than TSV, which just have the values 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to json.
Thanks @chatman good to see this feature! |
int success = 0, failure = 0; | ||
|
||
if (StressMain.generateValidations) { | ||
int totalDocsIndexed = getTotalDocsIndexed(baseUrl, collection); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is structured under runQueryBenchmarks
, perhaps it shouldn't couple with docs indexed as well? This number could mean different things too? The total docs in the collection or docs indexed during the benchmarking?
As for "external" mode, it is valid to run ONLY query benchmarking (ie in the test plan, only test task is query) and start with a pre-populated collection. Therefore such number could change and probably shouldn't be captured here.
Perhaps we can safely remove this as the step of validation does not seem to validate on total # of docs either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My intention was to use that only to name the validation file, but not for any validation. I agree that validating with this wouldn't make sense in external mode.
"\t" + numFound + "\t" + new ObjectMapper().writeValueAsString(facets).replace('\n', ' ') + "\n"); | ||
} | ||
outFile.close(); | ||
} else if (StressMain.validate && benchmark.validationFile != null && new File("suites/" + benchmark.validationFile).exists()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧠 storming a little bit here. Wondering if we can make this more abstract and provide at least 2 implementations:
- The validator that's based on a previously generated input file - as implemented here
- The validator that's based on more static criteria - might be as simple as just verify the doc count as > 0
I do think that the current validation proposal (with a file) would be the preferred option in most cases, however, when I think about the FS test workflow (solr-perf driven by K8s), we might not really want to further complicate the procedure of input doc/query update.
If we apply this file validation to our K8s test flow, then on each query/input docs update, we will need to run it once as a different task (generate result), capture the validation file, then update the corresponding image with the new validation file etc.
It's not super complicated, but it's not trivial either. I think perhaps a middle ground is that : as far as the query executed returns something, we can assume they are legit results.
The main goal of solr-bench is to confirm the performance change due to code changes, the validation imho is useful to flag any obvious anomaly that renders the benchmarking results useless. While it's nice if the validation works more accurately (as suggested by the validation file approach), the added burden to maintain such file might not be that appealing for our use case.
Any thoughts? @justinrsweeney @hiteshk25 @chatman 😊
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The validator that's based on a previously generated input file - as implemented here
The validator that's based on more static criteria - might be as simple as just verify the doc count as > 0
That makes sense. We can have validations as a complex JSON object in the configuration, and specify either we blindly compare against a file (your point 1), or some special conditions (e.g. results > 0) etc.
taskResults.put("mean", controlledExecutor.stats.getMean()); | ||
taskResults.put("total-queries", controlledExecutor.stats.getN()); | ||
taskResults.put("total-time", time); | ||
if (StressMain.validate) taskResults.put("validations-succeeded", success); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor nit: combine these 2 if cases into one? :)
} | ||
} | ||
} | ||
} | ||
|
||
private static Map<String, Pair<Integer, String>> loadValidations(QueryBenchmark benchmark) throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As per pervious thought about make the validation logic abstract, we might be able to move all the loadValidations
(and other logic such as preprocessing logic, string replacement, validation file export/import logic) into the concrete implementation of that. For example having a Validator#recordQueryResults(generatedQueries)
and a Validator#validateResults(generatedQueries)
This could offload some of the logic from BenchmarksMain
😊
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, to refactor it into an abstraction. Also, I'll raise a separate refactoring PR to rationalize the class names a bit, move every operation (restart, indexing, querying, pause etc.) into its own class (instead of today's huge if-else ladder).
Random random; | ||
AtomicLong counter = new AtomicLong(); | ||
final SimpleDateFormat DATE_FORMAT = new SimpleDateFormat("yyyy-MM-dd"); | ||
List<String> queryStrings = new ArrayList<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor nit, can probably remove this field as it's only used locally in the ctor ?
int totalDocsIndexed = getTotalDocsIndexed(baseUrl, collection); | ||
|
||
FileWriter outFile = new FileWriter("suites/validations-"+benchmark.queryFile+"-docs-"+totalDocsIndexed+"-queries-"+benchmark.totalCount+".tsv"); | ||
for (SolrBenchQuery sbq: generatedQueries) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might not work properly, I assume that we want every single executed queries during the test and verify them all?
Instead of returning a new instance of QueryRequest
for each call as in https://github.com/fullstorydev/solr-bench/pull/44/files#diff-e8aa24ed8e0a14385ea7fdfa11268a85f6dcce52021610761f29a76c334e558bL57, it's now returning a saved instance of the stored field in QueryGenerator
https://github.com/fullstorydev/solr-bench/pull/44/files#diff-e8aa24ed8e0a14385ea7fdfa11268a85f6dcce52021610761f29a76c334e558bR73 . Therefore calls that uses the same query will overwrite previous results.
However, I do have some concerns if we correct this by generating new SolrBenchQuery
instance per call, as the design right now store both the underlying QueryRequest
and the string response and those are kept until the very end to generate the validation file/validate result.
Both QueryRequest
and the response string used to have a much shorter lifespan (just for the local call and can get GCed), but now they can no longer be GC until the final validation logic. So if we have a huge number of query runs, we could run into memory problem.
…of Git repository
…idations abstract class
FYI, this cannot be merged at the moment. Master branch has some errors, looking into them (#62). |
Struggling to fix the merge conflicts, so opened a new one: #66 |
Added support for validating results of query benchmark tasks.
Workflow for a user:
a. Add a "validation": "" parameter in the query-benchmark definition,
b. Run stress.sh with a -v (validate) flag. It will use the validations file in the query benchmark task and report number of successful and failed queries.