Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Review of submission 126dietsch #16

Closed
neilernst opened this issue Jul 3, 2019 · 13 comments

Comments

@neilernst
Copy link
Collaborator

commented Jul 3, 2019

No description provided.

@re19ar re19ar self-assigned this Jul 4, 2019

@re19ar

This comment has been minimized.

Copy link
Collaborator

commented Jul 9, 2019

I am trying to execute ./run-experiments.sh on the virtual machine @danieldietsch provided, and got the error message:

# ./run-experiments.sh
####### Running benchmark req2ta2UPPAAL_part1_to_3 ######
Using 0 threads
2019-07-09 17:37:16 - ERROR - At least ONE thread must be given!
WARNING: No file matches 'results/benchexec_def_req2ta2UPPAAL_part1_to_3.*.results.req2ta2UPPAAL.xml'.
ERROR: No benchmark results found.
grep: results/benchexec_def_req2ta2UPPAAL_part1_to_3.*.results.*.csv: No such file or directory
grep: results/benchexec_def_req2ta2UPPAAL_part1_to_3.*.results.*.csv: No such file or directory
### req2ta2UPPAAL ###
ID 	 rt-inc. 	 Time 
* 	  	  

####### Running benchmark ultimate_reqanalyzer_part1_to_3 ######
Using 0 threads
2019-07-09 17:37:16 - ERROR - At least ONE thread must be given!
WARNING: No file matches 'results/benchexec_def_ultimate_reqanalyzer_part1_to_3.*.results.ultimate_reqanalyzer.xml'.
ERROR: No benchmark results found.
grep: results/benchexec_def_ultimate_reqanalyzer_part1_to_3.*.results.*.csv: No such file or directory
grep: results/benchexec_def_ultimate_reqanalyzer_part1_to_3.*.results.*.csv: No such file or directory
grep: results/benchexec_def_ultimate_reqanalyzer_part1_to_3.*.logfiles/*: No such file or directory
grep: results/benchexec_def_ultimate_reqanalyzer_part1_to_3.*.logfiles/*: No such file or directory
grep: results/benchexec_def_ultimate_reqanalyzer_part1_to_3.*.logfiles/*: No such file or directory
### ultimate_reqanalyzer ###
ID 	 Vac. 	 rt-inc. 	 TO 	 Time 
* 	  () 	  () 	  () 	  

Any ideas on the reason and fix?

@danieldietsch

This comment has been minimized.

Copy link
Collaborator

commented Jul 9, 2019

Oh, perhaps you are using only 1 core? You need at least 2.

@re19ar

This comment has been minimized.

Copy link
Collaborator

commented Jul 9, 2019

The virtual machine is set up using 4 core. Any additional setups do I need to do?

Oh, perhaps you are using only 1 core? You need at least 2.

@danieldietsch

This comment has been minimized.

Copy link
Collaborator

commented Jul 9, 2019

Ok, I am very sorry for this, it seems there are still some issues with the VM and the script.

  1. The script has an of-by-one error for the VMs memory. Please insert the line mem_in_gibi=$((mem_in_gibi + 1)) after the initial declaration of mem_in_gibi in the beginning of the script. It should then look like this:
#!/bin/bash
# This script runs the experiments presented in "Scalable Analysis of Real-Time Requirements", RE 2019, Section VII "EVALUATION AND APPLICATION" and produces the results of Table 1 and Table 2.

# The "benchmarks" array controls which benchmarks this script will run
# If you have access to a machine with 32 cores (usually 16 physical cores and with HT 32) and 128GB memory, you can expect the following runtimes.
# * part1_to_3 specifies parts 1 to 3 of Table 1 and should run fairly fast (approx. 30min)
# * part4 has a timeout of 9000s and will take that time for req2ta2UPPAAL
# * part5 is the complete Table 2 and will take approx. 12h
benchmarks=(
  req2ta2UPPAAL_part1_to_3
  ultimate_reqanalyzer_part1_to_3
#  req2ta2UPPAAL_part4
#  ultimate_reqanalyzer_part4
#  ultimate_reqanalyzer_part5
)
log_file="eval-results.log"

xml_tmp_dir="req2ta-tmp-output"
number_of_cores=$(getconf _NPROCESSORS_ONLN)
mem_in_gibi=$(free -g|awk '/^Mem:/{print $2}')
mem_in_gibi=$((mem_in_gibi + 1))

#############################
### Functions             ###
#############################
...
  1. It seems like numerous cgroups features are not correctly preserved in the VM after import. Before executing the script you have to run the following commands:
sudo chmod o+wt '/sys/fs/cgroup/cpuset/'
sudo chmod o+wt '/sys/fs/cgroup/cpu,cpuacct/user.slice'
sudo chmod o+wt '/sys/fs/cgroup/freezer/'
sudo chmod o+wt '/sys/fs/cgroup/memory/user.slice/user-1000.slice/user@1000.service'
sudo swapoff -a

But then it works ;)

@re19ar

This comment has been minimized.

Copy link
Collaborator

commented Jul 9, 2019

But then it works ;)

Yes, it fixes the problem. Now it runs. Thanks!

@re19ar

This comment has been minimized.

Copy link
Collaborator

commented Jul 10, 2019

This submission contains all of the required documentation for the package, i.e. a README, STATUS, LICENSE, INSTALL file. The Status file indicates that the authors are pursuing an AVAILABLE badge. The data are publicly available on Zenodo with DOI. The license is LGPLv3 compatible, which fulfills the OSS license requirement to be marked as available.

Most of the content that should appear in INSTALL.md is in README .md currently. However, this doesn’t affect preparing the environment for running the experiment. I experienced a small issue during setup, but it’s resolved by the author’s respond on GitHub. The author should update the corresponding artifacts and consider reorganizing the content to the expected place upon acceptance.

One folder in the artifact is newer than the one in the paper. The author makes this clear in the README, and explained the differences between the old and new results in detail.

I was able to follow the steps to run the experiment Part1 to 3 for Table 1 in their paper. Given I only used two cores, the observed result was different from the ones reported by the author. However, the general trend is the same. Since there’s not enough memory in my machine, I was not able to run the rest of the experiment. But artifacts for running all the experiment are available.

In summary, I recommend giving the artifact an AVAILABLE badge.

@neilernst

This comment has been minimized.

Copy link
Collaborator Author

commented Jul 10, 2019

@timm do you concur on Available?

@danieldietsch

This comment has been minimized.

Copy link
Collaborator

commented Jul 16, 2019

I just realized that one can apply for two badges -- contrary to the wording on http://re19.ajou.ac.kr/pages/submission/artifact_papers/ , where it states that one should apply for one badge: "one of reusable, available, replicated, reproduced".

Is it too late to also ask for the reusable badge?

@re19ar

This comment has been minimized.

Copy link
Collaborator

commented Jul 18, 2019

I think this dataset has the potential to be granted a "Reusable" badge, but not based on the current submission. For the "Reusable" badge, the artifacts need to be "very carefully documented and well-structured to the extent that reuse and repurposing is facilitated" (reference here). Plus the review period is closed for asking a second reviewer. @neilernst @timm What's your view on this request?

@timm

This comment has been minimized.

Copy link
Contributor

commented Jul 18, 2019

our review process is done. the authors should have asked for 2 badges.

that said, we (@timm and @neilernst) could have done a better job of saying in the CFP that multiple badges are possible.

lesson learned. will do so in future

@danieldietsch

This comment has been minimized.

Copy link
Collaborator

commented Jul 18, 2019

@timm I understand. Perhaps next time.
@re19ar Can you elaborate on why you think our contribution is not "reusable" so that we might improve in future iterations? In particular, our contribution provides multiple complete sets of requirements directly from industry (albeit anonymised). We did also go to great lengths to allow for repeatability and replicability by not only providing exact versions of all the used software, but also using state-of-the-art measurement and benchmarking tools to ensure a maximum of precision during reproduction.

@re19ar

This comment has been minimized.

Copy link
Collaborator

commented Jul 19, 2019

In particular, our contribution provides multiple complete sets of requirements directly from industry (albeit anonymised). We did also go to great lengths to allow for repeatability and replicability by not only providing exact versions of all the used software, but also using state-of-the-art measurement and benchmarking tools to ensure a maximum of precision during reproduction.

I agree and I think the potential users of your artifact would appreciate your effort. The suggestion I have towards more reusable is to improve the documentation for your tool ULTIMATE REQANALYZER. Finding the tool and their usage is not intuitive giving the current explanation in README:

UAutomizer-linux/ contains Ultimate ReqAnalyzer 0.1.24-4f1d294 (i.e., our implementation of the method described in our paper), which is based on the Ultimate program analysis framework. Note that this is a newer version than the one in the paper.

It would be really helpful if you can clearly tell the readers of your paper how to reuse Req2Pea and/or Pea2Boogie in other settings to support better reusability.

@danieldietsch

This comment has been minimized.

Copy link
Collaborator

commented Jul 19, 2019

@re19ar Thank you for clarifying. You are right, we do not explain the tool usage in sufficient detail. We hope that we have adequate documentation in future iterations.

@timm timm closed this Jul 19, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.