Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for resource matching service #386

Merged
merged 18 commits into from Sep 20, 2018

Conversation

Projects
None yet
7 participants
@dongahn
Copy link
Contributor

dongahn commented Sep 1, 2018

This PR addresses all of the comments I had received on the posting of the previous experimental PR.

  • Refactor, augment, modify the resource scheduling infrastructure used by both the resource module and resource-query utility.

  • Add the resource matching service module that uses this infrastructure to populate the resource graph data store and expose four request handlers (match, cancel, info, and next_jobid) next_jobid is only needed for testing.

  • Add flux-resource.py as the front-end command that can interact with resource module. This is essential to drive our tests when we don't have the upcoming schedule loop service.

Don't merge this yet as we have a few more steps before landing this. The proposed remaining steps are:

  1. Add tests. They will focus on testing the handler and RPC logic as I prefer using resource-query for the general correctness and performance tests for the resource scheduling infrastructure (which I already have a bunch).

  2. Squash bugs we found during adding tests

  3. Land this

  4. Merge in @SteVwonder's PR #385 so that resource module can be populated with hwloc data.

  5. Add rc1 and rc3 support with the hwloc mode being default. (Another PR).

If you want to play with this:

copy t/data/resource/jobspecs/basics/test001.yaml and t/data/resource/grugs/tiny.graphml into a directory first.

% flux start -s 1 -o,-S,log-filename=out
% flux module load resource grug-conf=tiny.graphml subsystems=containment policy=high
% flux resource match allocate test001.yaml
JOBID                STATUS               AT                   OVERHEAD (Secs)
0                    ALLOCATED            Now                  0.00181102752686
================================================================================
MATCHED RESOURCES:
      ---------------core35[1:x]
      ------------socket1[1:x]
      ---------node1[1:s]
      ------rack0[1:s]
      ---tiny0[1:s]

% flux resource match allocate test001.yaml
JOBID                STATUS               AT                   OVERHEAD (Secs)
1                    ALLOCATED            Now                  0.000900983810425
================================================================================
MATCHED RESOURCES:
      ---------------core17[1:x]
      ------------socket0[1:x]
      ---------node1[1:s]
      ------rack0[1:s]
      ---tiny0[1:s]

% flux resource match allocate test001.yaml
JOBID                STATUS               AT                   OVERHEAD (Secs)
2                    ALLOCATED            Now                  0.000859975814819
================================================================================
MATCHED RESOURCES:
      ---------------core35[1:x]
      ------------socket1[1:x]
      ---------node0[1:s]
      ------rack0[1:s]
      ---tiny0[1:s]

% flux resource match allocate test001.yaml
JOBID                STATUS               AT                   OVERHEAD (Secs)
3                    ALLOCATED            Now                  0.000686883926392
================================================================================
MATCHED RESOURCES:
      ---------------core17[1:x]
      ------------socket0[1:x]
      ---------node0[1:s]
      ------rack0[1:s]
      ---tiny0[1:s]

% flux resource match allocate test001.yaml
MATCHED RESOURCES: Not found

% flux resource info 3
JOBID                STATUS               AT                   OVERHEAD (Secs)
3                    ALLOCATED            Now                  0.000686883926392

% flux resource match allocate_orelse_reserve test001.yaml
JOBID                STATUS               AT                   OVERHEAD (Secs)
4                    RESERVED             3600                 0.00125002861023
================================================================================
MATCHED RESOURCES:
      ---------------core35[1:x]
      ------------socket1[1:x]
      ---------node1[1:s]
      ------rack0[1:s]
      ---tiny0[1:s]

@dongahn dongahn force-pushed the dongahn:matching-service branch from 15a11bc to 61daec2 Sep 1, 2018

@coveralls

This comment has been minimized.

Copy link

coveralls commented Sep 1, 2018

Coverage Status

Coverage increased (+0.2%) to 77.745% when pulling 170f972 on dongahn:matching-service into 3f36f33 on flux-framework:master.

@codecov-io

This comment has been minimized.

Copy link

codecov-io commented Sep 1, 2018

Codecov Report

Merging #386 into master will increase coverage by 0.12%.
The diff coverage is 72.67%.

Impacted file tree graph

@@            Coverage Diff            @@
##           master    #386      +/-   ##
=========================================
+ Coverage   75.88%     76%   +0.12%     
=========================================
  Files          59      64       +5     
  Lines        9829   10175     +346     
=========================================
+ Hits         7459    7734     +275     
- Misses       2370    2441      +71
Impacted Files Coverage Δ
resource/schema/sched_data.cpp 61.76% <ø> (ø) ⬆️
resource/evaluators/scoring_api.cpp 69.02% <ø> (+0.7%) ⬆️
resource/policies/dfu_match_low_id_first.cpp 83.78% <ø> (ø) ⬆️
resource/traversers/dfu_impl.cpp 82.38% <ø> (+0.74%) ⬆️
resource/schema/infra_data.cpp 40.74% <ø> (+5.55%) ⬆️
resource/policies/base/matcher.hpp 100% <ø> (ø) ⬆️
resource/policies/dfu_match_locality.cpp 0% <ø> (ø) ⬆️
resource/evaluators/fold.hpp 14.28% <ø> (ø) ⬆️
resource/generators/gen.cpp 84.61% <ø> (+1.09%) ⬆️
resource/policies/dfu_match_high_id_first.cpp 86.48% <ø> (+2.7%) ⬆️
... and 29 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3f36f33...170f972. Read the comment docs.

@dongahn dongahn force-pushed the dongahn:matching-service branch 3 times, most recently from ea54bfa to f5b4f1f Sep 2, 2018

@dongahn

This comment has been minimized.

Copy link
Contributor Author

dongahn commented Sep 2, 2018

@SteVwonder: Ok, I added a few test cases and adjusted minor things (e.g., changing flux-resource.py to flux-resource. Of all the planned test cases, I decide to add resource load option tests after we merge your hwloc PR. In a near future, I will also need a valgrind test as well. When Travis CI turns green, this is ready for your review.

@dongahn dongahn requested review from SteVwonder and tpatki Sep 2, 2018

@dongahn

This comment has been minimized.

Copy link
Contributor Author

dongahn commented Sep 2, 2018

Requested @tpatki as a reviewer as well.

@dongahn

This comment has been minimized.

Copy link
Contributor Author

dongahn commented Sep 3, 2018

Found ./flux-sched-0.6.0-17-g2d7394d/_build/sub/t/t4001-match-allocate.output
expecting success: 
    flux module load resource grug-conf=${grug} \
subsystems=containment policy=high
flux-module: resource: not found in module search path
not ok 1 - loading resource module with a tiny machine config works

Somehow make distcheck doesn't find the new resource module (and also flux-resource command). make distcheck works on quartz so this could have something to do with the lib tool version.

I wonder if this is related to the fact that I have the containing directory specified in the dependency...

fluxmod_LTLIBRARIES = modules/resource.la
@tpatki

This comment has been minimized.

Copy link
Contributor

tpatki commented Sep 4, 2018

@dongahn: I'll test this tmrw in detail, just sending a note as a quick update.

@dongahn

This comment has been minimized.

Copy link
Contributor Author

dongahn commented Sep 6, 2018

Hmmm. I moved resource module build rules to its own directory (resource/modules/Makefile.am) But I am still getting:

Found ./flux-sched-0.6.0-19-g2961643/_build/sub/t/t4001-match-allocate.output
expecting success: 
    flux module load resource grug-conf=${grug} \
subsystems=containment policy=high
flux-module: resource: not found in module search path
not ok 1 - loading resource module with a tiny machine config works
expecting success: 
    flux resource match allocate ${jobspec} &&
    flux resource match allocate ${jobspec} &&
    flux resource match allocate ${jobspec} &&
    flux resource match allocate ${jobspec}
flux: `resource' is not a flux command.  See 'flux --help'
not ok 2 - match-allocate works with a 1-node, 1-socket jobspec

Probably missing something more fundamental...

@dongahn

This comment has been minimized.

Copy link
Contributor Author

dongahn commented Sep 6, 2018

Seems that I just needed to add resource/modules/.libs to FLUX_MODULE_PATH_PREPEND as make distcheck runs in-build check. Hopefully this will do.

@dongahn dongahn force-pushed the dongahn:matching-service branch from 5dfee78 to ee59014 Sep 6, 2018

FLUX_MODULE_PATH_PREPEND="${SHARNESS_BUILD_DIRECTORY}/sched/.libs"
FLUX_EXEC_PATH_PREPEND="${SHARNESS_BUILD_DIRECTORY}/sched"
FLUX_MODULE_PATH_PREPEND="${SHARNESS_BUILD_DIRECTORY}/sched/.libs:${SHARNESS_BUILD_DIRECTORY}/resource/modules/.libs"
FLUX_EXEC_PATH_PREPEND="${SHARNESS_BUILD_DIRECTORY}/sched:${SHARNESS_BUILD_DIRECTORY}/resource/modules"

This comment has been minimized.

@grondo

grondo Sep 6, 2018

Contributor

Since your flux-resource utility is a script and not a built program, I think you will want to point to ${SHARNESS_TEST_SRCDIR}/../resource/modules here and not to the "build" directory.

@dongahn

This comment has been minimized.

Copy link
Contributor Author

dongahn commented Sep 6, 2018

Ah. That makes a lot of sense and likely resolve the current issue. Thanks @grondo!

@grondo

This comment has been minimized.

Copy link
Contributor

grondo commented Sep 6, 2018

@dongahn, is the flux-resource utility meant to be a first class flux utility, or more of a test or "plumbing" utility? If the latter, I suggest renaming it (if it is installed) so that it does not conflict with a future, user-facing flux-resource utility or similar. If it is to be a first-class utility, we may have to discuss how to make a standardized UI or something so that other schedulers can also provide a flux-resource interface.

Does that make any sense?

@dongahn

This comment has been minimized.

Copy link
Contributor Author

dongahn commented Sep 6, 2018

It meant to be a utility for testing for now.

In terms of first class scheduler UIs, what I would like to do is to get an end to end demonstration between the new exec system, R, this matching service and the upcoming scheduler loop service first before standardizing user facing interfaces.

It is unclear if flux-resource should be the name for the ultimate UI though. I envision something like flux sched which interacts with both the scheduler loop and resource services to do status and control on the scheduler components. Then, other scheduler with a single module design can also easily satisfy the same UI.

For now, I'd be happy to change the name toflux-resource-query or flux-resource-test

@dongahn dongahn force-pushed the dongahn:matching-service branch from ee59014 to c8a4f57 Sep 6, 2018

FLUX_MODULE_PATH_PREPEND="${SHARNESS_BUILD_DIRECTORY}/sched/.libs"
FLUX_EXEC_PATH_PREPEND="${SHARNESS_BUILD_DIRECTORY}/sched"
FLUX_MODULE_PATH_PREPEND="${SHARNESS_BUILD_DIRECTORY}/sched/.libs:${SHARNESS_BUILD_DIRECTORY}/resource/modules/.libs"
FLUX_EXEC_PATH_PREPEND="${SHARNESS_BUILD_DIRECTORY}/sched:${SHARNESS_BUILD_DIRECTORY}/../resource/modules"

This comment has been minimized.

@grondo

grondo Sep 6, 2018

Contributor

For FLUX_PATH_PREPEND, try s/SHARNESS_BUILD_DIRECTORY/SHARNESS_TEST_SRCDIR/.

Also, if the flux-resource utility is just for testing, you could save some trouble by placing it under ./t (we have some test command like this in flux-core)

@dongahn dongahn force-pushed the dongahn:matching-service branch 3 times, most recently from 7024402 to 0f3d8b0 Sep 6, 2018

dongahn added some commits Sep 7, 2018

travis: Remove eval to avoid incorrect expansions
Pick this up from flux-core.

Fix a side-effect where eval turns the semicolon
into a command separator.
resource: Add factory pattern for match policy object instantiation
Use the factory pattern to allow the upper layer to easily
instantiate an object of different match policy class.
resource: Refactor jobinfo data structure
Factor out the jobinfo data structure and
a utility function to a new source file
as this will be used by both resource-query
and upcoming resource module.

Adjust resource-query code as well.

Update Makefile.am
resource: Add resource matching service as a comms module
Name: resource.

Use the resource scheduler infrastructure code to populate
the resource graph database.

Use the request handlers in place of the control
loop in resource-query to receive various match
requests: e.g., match allocate and allocate_orelse_reserve.

Receive a jobspec directly from the RPC message and
return the matched R in response.

Note that this will likely change later, getting
some handle from the user (e.g., schedule loop service)
of this service to find the jobspec from KVS and
put the matched R into KVS instead of sending
it back in the response message.
build: Update Makefile.am for resource
Make a convenience library out of the resource
scheduling infrastructure.  This reduces
the compilation time.

Add rules for resource module, building it against
the convenience library.

Update rules for resource-query to build
against the same library

@dongahn dongahn force-pushed the dongahn:matching-service branch 3 times, most recently from 6f2a716 to 01380e0 Sep 11, 2018

@dongahn

This comment has been minimized.

Copy link
Contributor Author

dongahn commented Sep 11, 2018

Thank you all for reviewing this PR.

I decided to export the HOME environment variable in the new test scripts so that we don't have to hit this problem next time sharness.sh is pulled from the upstream.

Travis CI is green now.

In addition, I believe I've also addressed all of the great review comments from @grondo, @SteVwonder and @trws. Please let me know if there is anything else you want me to address.

From my perspective, this can go in.

@SteVwonder
Copy link
Member

SteVwonder left a comment

One more minor change. Otherwise this looks ready to go in to me.

Show resolved Hide resolved t/scripts/flux-resource Outdated

dongahn added some commits Sep 1, 2018

resource: Add flux-resource, a front-end cmd for resource module
Add this command primarily for testing.

Add several sub-commands matching the request handlers
within the resource module.

Use python's argparse module with Action class to
register a handler per each sub-commmand.

Support match allocate and allocate_orelse_reserve,
info and cancel sub-commands.
test: Add basic test for match-allocate handler within resource
Test the correctness of the match(-allocate) handler within
the resource module using flux-resource front-end command.

Add data/resource/jobspecs/basics/bad.yaml, a malformed jobspec
yaml file to test if flux-resource handles a Yaml exception
gracefully.

Use the original HOME environment variable as sharness
interferes with Python's module search path.
Python uses HOME to determine its local site-package path,
and changing this leads to an incorrect search path.

Update the Makefile rule.
test: Add the paths for locating resource module and command
Need for both in-build testing and install testing.
test: Add basic test for allocate_orelse_reserve handler within resource
Test the functionality of the match(-allocate_orelse_reserve)
handler within the resource module using flux-resource
front-end command.

Use the original HOME environment variable as sharness
interferes with Python's module search path.
Python uses HOME to determine its local site-package path,
and changing this leads to an incorrect search path.

The Makefile rule updated.
test: Add basic test for cancel and info handlers within resource
Test the functionality of the cancel and info handlers within
resource module using flux-resource front-end command.

Use the original HOME environment variable as sharness
interferes with Python's module search path.
Python uses HOME to determine its local site-package path,
and changing this leads to an incorrect search path.

Update Makefile rule.

@dongahn dongahn force-pushed the dongahn:matching-service branch from 01380e0 to 170f972 Sep 12, 2018

@dongahn

This comment has been minimized.

Copy link
Contributor Author

dongahn commented Sep 12, 2018

OK. Forced a push!

@SteVwonder
Copy link
Member

SteVwonder left a comment

Thanks @dongahn. LGTM! Anyone else have any feedback? If there is no more feedback, I will push the button tomorrow.

@dongahn

This comment has been minimized.

Copy link
Contributor Author

dongahn commented Sep 20, 2018

Ping?

@SteVwonder

This comment has been minimized.

Copy link
Member

SteVwonder commented Sep 20, 2018

My bad. Merging now.

@SteVwonder SteVwonder merged commit c27dc14 into flux-framework:master Sep 20, 2018

3 of 4 checks passed

codecov/patch 72.67% of diff hit (target 75.88%)
Details
codecov/project 76% (+0.12%) compared to 3f36f33
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
coverage/coveralls Coverage increased (+0.2%) to 77.745%
Details
@dongahn

This comment has been minimized.

Copy link
Contributor Author

dongahn commented Sep 20, 2018

Thanks @SteVwonder!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.