-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal for improved submission process #61
Comments
+1. An alternative would be to upload a tgz to a write-only blob storage that mlperf folk would merge into the repo and give access to the repo only once the tgz is merged. |
Generally, we should watch ROI when increasing burden.
Though Guenther's plan sounds like it might actually be easier than the
current approach, but would be best to do it under MLC CLA -- and make sure
there is acknowledgement of submission as being a contribution under CLA.
…On Tue, Sep 22, 2020 at 8:51 AM Guenther Schmuelling < ***@***.***> wrote:
+1. An alternative would be to upload a tgz to a write-only blob storage
that mlperf folk would merge into the repo and give access to the repo only
once the tgz is merged.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#61 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AIIVUHKSES56UQBHS6F4HO3SHDBZFANCNFSM4RU7NSFQ>
.
|
Sure, personally I'm not hung up on how we do it, as long as we address the one-sided visibility issue. |
AIs going forward, discuss in training, amend rules, change infrastructure. |
Ashwin suggests the following:
Ashwin will investigate whether PRs from a private submitters repo can be sent to the private MLC submitters repo and report back. We would like feedback from training, HPC, mobile on this. |
To clarify step 4, if the PR is being made from a private fork to another private fork, I don't think the forks have to be made public, as long as the chair has access to both forks, but I guess that needs to be verified. |
I kinda like the old approach, when you can submit your results as you generate them in the window between the random seeds being distributed and the submission deadline. It gives me a warm and fuzzy feeling having PRs piling up in the main submission repo. I don't mind in the slightest that other submitters can see them early. Whatever else we decide to allow, can we keep the old approach please? |
We already guard against that somewhat, by asking non-submitters to declare themselves before the repo gets properly populated (e.g. 1 week). Of course, a submitter can still declare they intend to submit until the last minute and then not to, or submit and then withdraw. Were the review committee to suspect dishonest behaviour, they could take a vote to disqualify them from submitting to the next submission round. |
The proposal is just to allow delaying the real PR by providing a hash in the meantime. If you send the PR as you would have in the past it should be the same imo. |
I think it's fine if one wants to directly submit their PR to the submission repo. Note however that this submission repo is accessible by all potential submitters, including those who in the end may not actually submit. One of the goals of this proposal is to prevent such 'no-show' submitters from gaining access to the results before the general public. So to clarify what I have in mind: After the final_submission_repo has been populated, the draft_submission_repo can probably be deleted. |
I found that Github does NOT allow these operations:
@georgelyuan Looks like the steps need to be:
|
For the submission repo we never have a public repo. What gets published at the end is a copy into a new repo. This is because we don't want to have the issues filed during review go public and we need to avoid that revoked submissions (that are still in that repo) become public. Not totally sure what happens between step 1 and step 2 above - how does the submitters repo gets PRed back into our private repo ? I think as long submitters repo is private the PR will not work cross org. One could make a git patch and upload that to some write only blob storage at the deadline and we'd just apply that patch to the repo in mlcommons. |
@guschmue Aha, I did not realize PRs do not work between private forks. Your git patch seems like a feasible solution. What is an example of a write only blob storage space? |
aws, azure, gcp have blob storage and you can create write only access (I know for sure on azure but aws and gcp will do the same) |
@georgelyuan please work with Guenther to have a workable solution. Please note it is optional for the submitters. The old way of submitting is fine for v1.0 |
Revised proposal: By the submission deadline, submitter will submit 4 things to the results chair to show Proof of Work:
As soon as the deadline passes, the submission repository goes private. Access will be removed for everyone EXCEPT the following parties:
For #3 above, verification will be performed by the results chair using a script (NVIDIA can contribute to this):
With the process outlined, the results chair will push submissions made using the new secure submission process to minimize delays. The competitive intelligence gap raised by the issue now no longer exists since no participant can now view secure submissions before the deadline (submission tarballs are encrypted). Once the deadline has passed, only verified submitters will be able to view each other’s submissions. In an alternative post-deadline process, the chair can give access to the submitter and have the submitter push the results in a timely manner. At this point, the submitter is committed to submitting results. If they do not for whatever reason, then the results chair will do so in their stead, since by that point the submitter will have access to other submitted submissions. Once the submitter pushes the results, the results chair will still need to verify that the submission matches the decrypted extracted tarball. |
WG: submission rules will be updated with this optional process. PR review next meeting. |
@guschmue @georgelyuan are working on the script for the submission encryption. Pending script check-in. submitter can pick any storage but the access has to be public. |
@georgelyuan PR reminder :) |
Should I just put it in the MLCommons shared drive? Not sure where to put the script |
I have a bunch of submission related scripts here: |
Right my only objection is that the script is not inference-specific.. |
yeah, no worry - we can come up with something better once training folks ask for it. |
@guschmue do you have a mlcommons email? Or should I instruct folks to use your microsoft email? |
good question, maybe we should create a new address for this. Let me find somebody to create one. |
any linked PR for this? @georgelyuan |
pending on me - I'm waiting for Peter to create mail alias for it |
@georgelyuan ... you can use submissions@mlcommons.org |
D'oh looks like I'm missing the mlcommons CLA or something? I just submitted my form. |
Ah I see Guenther had sent me an invitation back in January but it expired :( |
mlcommons/inference#846 |
@georgelyuan is there any rule text update related to this? Thanks for the script. |
* fix typo * Add missing return statement (#839) Broken by #779. * Add missing letters (#843) Co-authored-by: christ1ne <christine.cheng@intel.com> * submission packing script for mlcommons/policies#61 (#846) * Update Seeds to 1.0 (#847) * fix docker images (#844) Co-authored-by: christ1ne <christine.cheng@intel.com> Co-authored-by: Guenther Schmuelling <guschmue@microsoft.com> Co-authored-by: tjablin <tjablin@google.com> Co-authored-by: georgelyuan <53881988+georgelyuan@users.noreply.github.com>
* pull changes from master to r1.0 (#851) * fix typo * Add missing return statement (#839) Broken by #779. * Add missing letters (#843) Co-authored-by: christ1ne <christine.cheng@intel.com> * submission packing script for mlcommons/policies#61 (#846) * Update Seeds to 1.0 (#847) * fix docker images (#844) Co-authored-by: christ1ne <christine.cheng@intel.com> Co-authored-by: Guenther Schmuelling <guschmue@microsoft.com> Co-authored-by: tjablin <tjablin@google.com> Co-authored-by: georgelyuan <53881988+georgelyuan@users.noreply.github.com> * Handle NaN/Inf correctly in LoadGen Logger (#837) * Handle NaN/Inf correctly in LoadGen Logger * Initialize target_latency_percentile in Offline scenario to avoid NaNs Co-authored-by: Guenther Schmuelling <guschmue@microsoft.com> * Add system description JSON fields for power submissions (#853) * Check and parse power logs in submission checker (#854) * Check and parse power logs in submission checker * fixup required_perf_files in submission checker Co-authored-by: Guenther Schmuelling <guschmue@microsoft.com> * Print out correct error message when seeds are wrong (#858) * take out the master content (#859) Co-authored-by: Guenther Schmuelling <guschmue@microsoft.com> * Fix links to benchmark and update submission date (#868) * Fix message when performance_sample_count is too small (#865) Co-authored-by: Guenther Schmuelling <guschmue@microsoft.com> * Fix SingleStream latency metric in submission checker (#872) * Add Power WG's check.py checking to submission checker (#873) * Add Power WG's check.py checking to submission checker * Update power-dev submodule to latest r1.0 Co-authored-by: Guenther Schmuelling <guschmue@microsoft.com> * Update Power WG check.py to latest r1.0 (#884) * ignore .github directory (#881) * Change power metric to J/inference for SingleStream/MultiStream (#886) * Change power metric to J/inference for SingleStream/MultiStream * fix a spelling typo * Fix power metric for SingleStream and update power-dev hash * Fix typo * fix url in results report (#890) * Handle timezone differences when parsing power samples (#896) * use host_processors_per_node in final report (#897) * use host_processors_per_node in final report * report per node processors * fix typo * New PyTorch ResNet50-v1.5 links (#876) * New PyTorch ResNet50-v1.5 links * Delete unused import * Guard max_latency_ with latencies_mutex_ (#878) * Guard max_latency_ with latencies_mutex_ There was a race in the computation of atomic max and std::memory_order_release is not a valid order for atomic loads. The code isn't performance critical, so it is easier and safer to use mutexes instead. * Initialize max_latency_ Co-authored-by: Guenther Schmuelling <guschmue@microsoft.com> * more intuitive error message (#883) we can just update the master for now Co-authored-by: christ1ne <christine.cheng@intel.com> Co-authored-by: Guenther Schmuelling <guschmue@microsoft.com> Co-authored-by: georgelyuan <53881988+georgelyuan@users.noreply.github.com> Co-authored-by: nvpohanh <53919306+nvpohanh@users.noreply.github.com> Co-authored-by: kahmed10 <15948690+kahmed10@users.noreply.github.com> Co-authored-by: Core.Halt <46298335+corehalt@users.noreply.github.com>
* submission packing script for mlcommons/policies#61 * loadgen and compliance test changes to support new TEST04 * fxing script errors * lgtm checks * lgtm checks v2 * lgtm checks v3 * Performance checker now looks at the early stop stats and server target latency * edits from Anton
* submission packing script for mlcommons/policies#61 * loadgen and compliance test changes to support new TEST04 * fxing script errors * lgtm checks * lgtm checks v2 * lgtm checks v3 * Performance checker now looks at the early stop stats and server target latency * edits from Anton * fixing check_performance_dir args Co-authored-by: tjablin <tjablin@google.com>
Former-commit-id: 2d41f96
* pull changes from master to r1.0 (mlcommons#851) * fix typo * Add missing return statement (mlcommons#839) Broken by mlcommons#779. * Add missing letters (mlcommons#843) Co-authored-by: christ1ne <christine.cheng@intel.com> * submission packing script for mlcommons/policies#61 (mlcommons#846) * Update Seeds to 1.0 (mlcommons#847) * fix docker images (mlcommons#844) Co-authored-by: christ1ne <christine.cheng@intel.com> Co-authored-by: Guenther Schmuelling <guschmue@microsoft.com> Co-authored-by: tjablin <tjablin@google.com> Co-authored-by: georgelyuan <53881988+georgelyuan@users.noreply.github.com> * Handle NaN/Inf correctly in LoadGen Logger (mlcommons#837) * Handle NaN/Inf correctly in LoadGen Logger * Initialize target_latency_percentile in Offline scenario to avoid NaNs Co-authored-by: Guenther Schmuelling <guschmue@microsoft.com> * Add system description JSON fields for power submissions (mlcommons#853) * Check and parse power logs in submission checker (mlcommons#854) * Check and parse power logs in submission checker * fixup required_perf_files in submission checker Co-authored-by: Guenther Schmuelling <guschmue@microsoft.com> * Print out correct error message when seeds are wrong (mlcommons#858) * take out the master content (mlcommons#859) Co-authored-by: Guenther Schmuelling <guschmue@microsoft.com> * Fix links to benchmark and update submission date (mlcommons#868) * Fix message when performance_sample_count is too small (mlcommons#865) Co-authored-by: Guenther Schmuelling <guschmue@microsoft.com> * Fix SingleStream latency metric in submission checker (mlcommons#872) * Add Power WG's check.py checking to submission checker (mlcommons#873) * Add Power WG's check.py checking to submission checker * Update power-dev submodule to latest r1.0 Co-authored-by: Guenther Schmuelling <guschmue@microsoft.com> * Update Power WG check.py to latest r1.0 (mlcommons#884) * ignore .github directory (mlcommons#881) * Change power metric to J/inference for SingleStream/MultiStream (mlcommons#886) * Change power metric to J/inference for SingleStream/MultiStream * fix a spelling typo * Fix power metric for SingleStream and update power-dev hash * Fix typo * fix url in results report (mlcommons#890) * Handle timezone differences when parsing power samples (mlcommons#896) * use host_processors_per_node in final report (mlcommons#897) * use host_processors_per_node in final report * report per node processors * fix typo * New PyTorch ResNet50-v1.5 links (mlcommons#876) * New PyTorch ResNet50-v1.5 links * Delete unused import * Guard max_latency_ with latencies_mutex_ (mlcommons#878) * Guard max_latency_ with latencies_mutex_ There was a race in the computation of atomic max and std::memory_order_release is not a valid order for atomic loads. The code isn't performance critical, so it is easier and safer to use mutexes instead. * Initialize max_latency_ Co-authored-by: Guenther Schmuelling <guschmue@microsoft.com> * more intuitive error message (mlcommons#883) we can just update the master for now Co-authored-by: christ1ne <christine.cheng@intel.com> Co-authored-by: Guenther Schmuelling <guschmue@microsoft.com> Co-authored-by: georgelyuan <53881988+georgelyuan@users.noreply.github.com> Co-authored-by: nvpohanh <53919306+nvpohanh@users.noreply.github.com> Co-authored-by: kahmed10 <15948690+kahmed10@users.noreply.github.com> Co-authored-by: Core.Halt <46298335+corehalt@users.noreply.github.com> Former-commit-id: 10aa90b
* submission packing script for mlcommons/policies#61 * loadgen and compliance test changes to support new TEST04 * fxing script errors * lgtm checks * lgtm checks v2 * lgtm checks v3 * Performance checker now looks at the early stop stats and server target latency * edits from Anton Former-commit-id: e8e72cc
* submission packing script for mlcommons/policies#61 * loadgen and compliance test changes to support new TEST04 * fxing script errors * lgtm checks * lgtm checks v2 * lgtm checks v3 * Performance checker now looks at the early stop stats and server target latency * edits from Anton * fixing check_performance_dir args Co-authored-by: tjablin <tjablin@google.com> Former-commit-id: 98776fd
The current submission process is inadequate in two ways:
Proposal for next round:
@DilipSequeira @petermattson @TheKanter
The text was updated successfully, but these errors were encountered: