Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC20 support + other prep work for future integration with new execution system #455

Merged
merged 27 commits into from Jun 1, 2019

Conversation

@dongahn
Copy link
Contributor

dongahn commented Mar 22, 2019

This PR adds a bunch of support needed in preparation for future integration with the new execution systems:

  • Add support for various emit formats including RFC20 RV1 (Issue #436, #443 and #445)
  • Add performance improvements such as adding reserve-vtx-vec options and remove uuid (Issue #451)
  • Add better performance measurement rigs (Issue #448)
  • Various bug fixes (Issue #453)
@dongahn

This comment has been minimized.

Copy link
Contributor Author

dongahn commented Mar 22, 2019

OK. I closed the old PR and newly posted this PR as I think this is getting closer. Before I will turn into a pumpkin, I will try to add a few test cases that uses JGF validator (and run some valgrind tests) at which point this should be ready for review. Hence I removed WIP from the title.

There are whole bunch of other things I need to work on to be ready for the integration work with the new execution system. But this would be a good merge point as this PR becomes larger and larger, and I won't be able to get to this work for awhile due to travel+vacation.

@dongahn dongahn requested review from SteVwonder and grondo Mar 22, 2019
@codecov-io

This comment has been minimized.

Copy link

codecov-io commented Mar 22, 2019

Codecov Report

Merging #455 into master will increase coverage by 1.14%.
The diff coverage is 84.49%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #455      +/-   ##
==========================================
+ Coverage   75.05%   76.19%   +1.14%     
==========================================
  Files          43       45       +2     
  Lines        4990     5477     +487     
==========================================
+ Hits         3745     4173     +428     
- Misses       1245     1304      +59
Impacted Files Coverage Δ
resource/utilities/command.hpp 100% <ø> (ø) ⬆️
resource/traversers/dfu_impl.hpp 100% <ø> (ø) ⬆️
resource/traversers/dfu.cpp 76.92% <100%> (+3.23%) ⬆️
resource/planner/planner_multi.c 62.75% <100%> (+0.45%) ⬆️
resource/policies/base/matcher.cpp 66.91% <100%> (ø) ⬆️
resource/writers/match_writers.hpp 100% <100%> (ø)
resource/generators/gen.cpp 87.08% <100%> (+0.41%) ⬆️
resource/schema/resource_data.cpp 52.63% <50%> (-0.15%) ⬇️
resource/modules/resource_match.cpp 71.46% <50%> (-3.4%) ⬇️
resource/utilities/resource-query.cpp 48.55% <73.98%> (+9.45%) ⬆️
... and 8 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0aa79e1...0fb7229. Read the comment docs.

@SteVwonder

This comment has been minimized.

Copy link
Member

SteVwonder commented Apr 9, 2019

Hey @dongahn. Thanks for putting this PR together!

I like the new interface for writers. It seems very extensible to new formats! For the JSON-based writers, I think we should seriously consider pulling in a C++ JSON library (crossref #383). It would remove the need for a lot of the manual string manipulation, easing future extension and maintenance, and the pretty printing could be handled by the JSON library, removing a lot of the duplicate code.

Per the naming, it looks like writers is used for the classes/types, but emitters is used for the filenames/directory. I think those should probably be the same (either one works for me). When looking for the jgf_match_writers_t implementation, I was looking for a writers directory, but it was actually under emitters.

When I tested out the resource-query command on the new sierra.graphml and the old t/data/resource/jobspecs/basics/test004.yaml, I noticed some oddities on the JGF output (link to the output that I saw). Specifically, it looks like there are edges to targets whose IDs don't exist in the nodes section of the graph. My gut instinct is that this is related to pruning of resources that aren't a core, node, socket, etc, but I haven't tracked it down.

Adding the json-schema for JGF is a great addition! Is the resource-query JGF output being passed through validate_schema.py with the JGF json-schema? If not, I think that would be a useful test to add.

The reserve_vtx_vec is a nice addition. Noticeable speed-ups on the sierra GRUG!

@dongahn

This comment has been minimized.

Copy link
Contributor Author

dongahn commented Apr 10, 2019

Great reviews @SteVwonder across the board!

C++ JSON library
I haven't looked at this library yet. Maybe this is the time.

emitters vs. writers for the directory/file names.
I will change this to writers since this will reduce confusion like what you had.

When I tested out the resource-query command on the new sierra.graphml and the old t/data/resource/jobspecs/basics/test004.yaml, I noticed some oddities on the JGF output (link to the output that I saw). Specifically, it looks like there are edges to targets whose IDs don't exist in the nodes section of the graph. My gut instinct is that this is related to pruning of resources that aren't a core, node, socket, etc, but I haven't tracked it down.

Hmmm. I can't say why there are disconnected edges like this at this point. I do need to seriously look at this issue before this PR gets merged (we may need a test case for this as well). Thank you for kicking the tire for this!

Adding the json-schema for JGF is a great addition! Is the resource-query JGF output being passed through validate_schema.py with the JGF json-schema? If not, I think that would be a useful test to add.

Yes, this is the intent. I have not been able to work on this because of other duties... much slower than my expectations.

@SteVwonder

This comment has been minimized.

Copy link
Member

SteVwonder commented Apr 10, 2019

C++ JSON library
I haven't looked at this library yet. Maybe this is the time.

To keep this PR from growing too large in scope and to get it merged quicker, we can make the C++ JSON library a separate PR if you want. Also, if you want, I am willing to pick up that task, so you can focus on other aspects of the PR and integration with the new exec system.

@dongahn

This comment has been minimized.

Copy link
Contributor Author

dongahn commented Apr 10, 2019

To keep this PR from growing too large in scope and to get it merged quicker, we can make the C++ JSON library a separate PR if you want. Also, if you want, I am willing to pick up that task, so you can focus on other aspects of the PR and integration with the new exec system.

Sounds good for both counts. Thanks you @SteVwonder!

@dongahn

This comment has been minimized.

Copy link
Contributor Author

dongahn commented Apr 25, 2019

I like the new interface for writers. It seems very extensible to new formats! For the JSON-based writers, I think we should seriously consider pulling in a C++ JSON library (crossref #383). It would remove the need for a lot of the manual string manipulation, easing future extension and maintenance, and the pretty printing could be handled by the JSON library, removing a lot of the duplicate code.

@SteVwonder: rereading again: I do believe this IS the right way to go for the future. I found myself worrying about indentations/formats and maintainability. The introduction of this library should solve this headache.

dongahn added 8 commits Feb 19, 2019
Clean up main by refactoring initializers and finalizers
into their own functions.

Reorder and change certain types of fields within
resource_context_t to make it more consistent
with resource_match.cpp.
Introduce vertex/edge label writers that that can emit
the matched resource set with different formats.

Add support for writers such as simple writers,
JSON Graph Format writers, resource set (or R)
version 1 writers as well as more human readable
versions of these writers.
Introduce match writers into the context, which gets
initialized based on a load/CLI option (--match-format).

Set the default format for resource-query "simple" for
backward compatibility to avoid rebaselining some
of the test results.

Set the default format for resource match service
"rv1_nosched" for now.
Add both graph load time and graph vertex/edge count
printing support. The latter will be used for a
regression test that ensures the correctness of our
graph construction phase.

Turn this on only when the --elapse-time option
is given so that we will not break other test
cases.
Improve graph loading performance by reserving std::vector
size used to store graph vertices.
Without this, vector resize operations that are called
as part of add_vertex functions significantly
increase the performance overheads.
@dongahn dongahn force-pushed the dongahn:match-format branch from eac79bc to fab54dc Apr 25, 2019
@dongahn

This comment has been minimized.

Copy link
Contributor Author

dongahn commented Apr 25, 2019

When I tested out the resource-query command on the new sierra.graphml and the old t/data/resource/jobspecs/basics/test004.yaml, I noticed some oddities on the JGF output (link to the output that I saw). Specifically, it looks like there are edges to targets whose IDs don't exist in the nodes section of the graph. My gut instinct is that this is related to pruning of resources that aren't a core, node, socket, etc, but I haven't tracked it down.

Nice catch! I reproduced this, and it appears that somehow higher-level resources (e.g., rack) that do not contain the allocated resources are also emitted! This is clearly a bug. I will work on it. Thanks.

@dongahn

This comment has been minimized.

Copy link
Contributor Author

dongahn commented Apr 25, 2019

Hmmm. I can't say why there are disconnected edges like this at this point. I do need to seriously look at this issue before this PR gets merged (we may need a test case for this as well). Thank you for kicking the tire for this!

@SteVwonder: I found the root cause. It was basically due to a mistake where unqualified edges at a high level resource (e.g., cluster) are emitted along with those qualified edges. Fixed it.

@dongahn

This comment has been minimized.

Copy link
Contributor Author

dongahn commented Apr 25, 2019

I am getting CI errors for the newly added jsonschema validation test for JGF.

FAIL: t6001-match-formats.t 1 - R emitted with -F jgf validates against schema
FAIL: t6001-match-formats.t 2 - R emitted with -F pretty_jgf validates against schema

Perhaps jsonschma python module is not installed for CI...

@dongahn dongahn mentioned this pull request Apr 26, 2019
@dongahn dongahn force-pushed the dongahn:match-format branch 2 times, most recently from d19641f to 1665a5e Apr 26, 2019
@dongahn

This comment has been minimized.

Copy link
Contributor Author

dongahn commented Apr 26, 2019

Ok. This is ready for another review. Some of the commits need to be squashed and reordered but I'm leaving it this way for quicker review. @SteVwonder: do you want to kick the tire for this once again?

@SteVwonder

This comment has been minimized.

Copy link
Member

SteVwonder commented Apr 29, 2019

do you want to kick the tire for this once again?

Yep. Just added to my TODO list for this week.

@SteVwonder

This comment has been minimized.

Copy link
Member

SteVwonder commented May 30, 2019

LGTM! Let me know once you have squashed/reordered any commits that you want to, and then I'll push the button.

@dongahn

This comment has been minimized.

Copy link
Contributor Author

dongahn commented May 30, 2019

LGTM! Let me know once you have squashed/reordered any commits that you want to, and then I'll push the button.

Thanks @SteVwonder! I will do this either today or tomorrow.

query="../../resource/utilities/resource-query"

test_expect_success "vertex/edge counts for a tiny machine are correct" '
echo "quit" >> input1.txt &&

This comment has been minimized.

Copy link
@SteVwonder

SteVwonder May 31, 2019

Member

Sorry. I just caught this will checking out #460, but I believe you want this to be:

Suggested change
echo "quit" >> input1.txt &&
echo "quit" > input1.txt &&

in the case where input1.txt already exists and has other commands in it. Same goes for the other tests in this file.

This comment has been minimized.

Copy link
@dongahn

dongahn May 31, 2019

Author Contributor

Good catch! Will address it shortly.

dongahn added 19 commits Mar 3, 2019
Also added large machine GRUG configuration,
t/data/resource/grugs/sierra.graphml as part of that
to check the performance improvement. But since
it takes a bit long, the test is guarded with
LONGTEST.
Keep track of the min, max and avg time of the match
operations (i.e., match allocate and match allocate_orelse_reserve).

Add stat sub-command to print out these simple performance
stats on demand.
Disable "resource-query> " prompt. This option will help
gather cleaner outputs for automated scripted tests.
The current supported usage is the following:

Usage: resource-bench.sh [OPTIONS] -- [CONFIGURE_ARGS...]
Run the performance benchmark for resource-query by matching
a configurable number of 1-core jobs on a simple configurable cluster.

Options:
 -h, --help                    Display this message
 -n, --nnodes                  Num of nodes in cluster (default=256)
 -c, --ncores                  Num of cores per node (default=32)
 -j, --njobs                   Num of jobs to schedule (default=4096)
 -g, --grug                    System GRUG input file (default=benchmark.graphml.in)
 -s, --jobspec-fn              Jobspec file name (default=benchmark.yaml)
 -p, --path                    Where inputs reside (default=./)
 -k, --keep                    Keep intermediate files
Problem: uuid_generate adds a huge startup overhead
for populating the graph data store with no clear
current use case.

Replace uuid with integer uniq_id. The top level
instance generates uniquely identifiable integers
for resource vertices by using vertex indicies.
Nest instance will use these unique IDs instead
of regenerating new ones.
Track overall performance statistics tracking for
resource-matching service.
Fix a bug where unqualified edges at a high level resource
(e.g., cluster) are emitted along with those qualified edges.
@dongahn

This comment has been minimized.

Copy link
Contributor Author

dongahn commented May 31, 2019

LGTM! Let me know once you have squashed/reordered any commits that you want to, and then I'll push the button.

I reviewed this again and I actually don't see a compelling reason to squash and reorder much. I will force a push to address your last review point. And this can go in from my perspective.

@dongahn dongahn force-pushed the dongahn:match-format branch from cc660a4 to 0fb7229 May 31, 2019
@dongahn

This comment has been minimized.

Copy link
Contributor Author

dongahn commented Jun 1, 2019

@SteVwonder: Forgot to drop a line. Forced a push and this is ready to be merged.

@SteVwonder SteVwonder merged commit a3dcb3d into flux-framework:master Jun 1, 2019
3 checks passed
3 checks passed
codecov/patch 84.49% of diff hit (target 75.05%)
Details
codecov/project 76.19% (+1.14%) compared to 0aa79e1
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@SteVwonder

This comment has been minimized.

Copy link
Member

SteVwonder commented Jun 1, 2019

Thanks! Merged.

@dongahn

This comment has been minimized.

Copy link
Contributor Author

dongahn commented Jun 2, 2019

Thanks @SteVwonder for reviewing this thoroughly. I know this was pretty substantial!

@dongahn dongahn deleted the dongahn:match-format branch Jul 13, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.