Skip to content

Commit

Permalink
Docs: Test case, subtask, and overall verdicts (#138)
Browse files Browse the repository at this point in the history
  • Loading branch information
fushar committed May 27, 2017
1 parent 5f2c281 commit fb587d5
Show file tree
Hide file tree
Showing 7 changed files with 98 additions and 47 deletions.
2 changes: 1 addition & 1 deletion docs/getting-started/getting-started.rst
Original file line number Diff line number Diff line change
Expand Up @@ -454,7 +454,7 @@ You should get the following output:
- Floating point exception: 8
sum_4: Accepted

[ RESULT ]
[ VERDICT ]
Time Limit Exceeded [25]

We get a detailed verdict of each test case. Nice, isn't it? The final result here is **Time Limit Exceeded**, which is the "worst" verdict among all test case verdicts, and we get 25 points because we get one test case correct out of all four test cases.
Expand Down
104 changes: 75 additions & 29 deletions docs/topic-guides/grading.rst
Original file line number Diff line number Diff line change
Expand Up @@ -25,13 +25,22 @@ For example, suppose you have written a problem package for a problem. Your frie

./runner grade --solution=./solution_alt

The verdict of each test case will be shown. The verdict will be one of the following:
The verdicts of each test case, each subtask (if any), as well as the overall verdict will be shown, as described below.

.. _grading_verdicts:

Verdicts
--------

A verdict consists of the verdict status and optionally verdict points.

The recognized statuses, from the best to the worst, are as follows:

Accepted
The output produced by the solution is correct.

OK [points]
The output produced by the solution is partially correct with the given points.
OK
The output produced by the solution is partially correct.

Wrong Answer
The output produced by the solution is incorrect. By default, the diff will be shown, truncated to the first 10 lines.
Expand All @@ -43,11 +52,69 @@ Time Limit Exceeded
The solution did not stop within the time limit, if specified.

Internal Error
Custom scorer (if any) crashed or did not give valid verdict.
Custom :ref:`scorer <styles_scorer>` / :ref:`communicator <styles_communicator>` (if any) crashed or did not give a valid verdict.

Test case verdicts
******************

The verdict of each test case will be shown. For OK statuses, the points (given by the scorer) will be also shown.

Subtask verdicts
****************

If the problem has subtasks, the verdict of each subtask will be shown as well. The verdict of a subtask is the combination of:

- status: the worst status of test case verdicts in the subtask
- points:

- the subtask points (assigned via ``Points()``), if all test case verdicts in the subtask are Accepted,
- the minimum points of OK verdicts in the subtask, if at least one test case verdict is OK and the rest are Accepted, or
- 0, otherwise.

Overall verdict
***************

Finally, the overall verdict is as follows.

For problem without subtasks:

- status: the worst test case verdict status
- points: the sum of test case verdict points, where:

The verdict of each subtask will be also shown. The verdict of a subtask is the worst verdict of all verdicts of test cases that are assigned to it. Here, RTE is worse than WA, and WA is worse than AC.
- an Accepted status will be given 100 / (number of test cases) points
- an OK status will be given its own points
- any other status will be given 0 points

Here is a sample output of a local grading for problems with subtasks.
For problem with subtasks:

- status: the worst subtask verdict status
- points: the sum of subtask verdict points

Sample local grading output
---------------------------

Here is a sample output of a local grading for problems without subtasks.

.. sourcecode:: bash

Local grading with solution command: './solution_alt'...

[ SAMPLE TEST CASES ]
k-product_sample_1: Accepted

[ OFFICIAL TEST CASES ]
k-product_1: Accepted
k-product_2: Accepted
k-product_3: OK [21]
k-product_4: Wrong Answer
* scorer Diff:
(expected) [line 01] 11
(received) [line 01] 12

[ VERDICT ]
Wrong Answer [71]

and here is for problems with subtasks.

.. sourcecode:: bash

Expand Down Expand Up @@ -84,35 +151,14 @@ Here is a sample output of a local grading for problems with subtasks.
- Exit code: 1
- Standard error:

[ SUBTASK RESULTS ]
[ SUBTASK VERDICTS ]
Subtask 1: Accepted [40]
Subtask 2: Wrong Answer [0]
Subtask 3: Runtime Error [0]

[ RESULT ]
[ VERDICT ]
Runtime Error [40]

and here is for problems without subtasks

.. sourcecode:: bash

Local grading with solution command: './solution_alt'...

[ SAMPLE TEST CASES ]
k-product_sample_1: Accepted

[ OFFICIAL TEST CASES ]
k-product_1: Accepted
k-product_2: Accepted
k-product_3: OK [21]
k-product_4: Wrong Answer
* scorer Diff:
(expected) [line 01] 11
(received) [line 01] 12

[ RESULT ]
Wrong Answer [71]

This local grading feature is useful for creating "unit tests" for your test cases. For each problem, you can write many solutions with different intended results. For example, ``solution_123.cpp`` should pass subtasks 1 - 3; ``solution_12.cpp`` should pass subtasks 1 and 2 but not subtask 3, etc.

Notes
Expand Down
11 changes: 8 additions & 3 deletions docs/topic-guides/styles.rst
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,8 @@ Enabled by calling ``InteractiveEvaluator()``. The solution will participate in
Helper programs
---------------

.. _styles_scorer:

Scorer
******

Expand All @@ -52,15 +54,16 @@ A scorer is a program which decides the verdict of a test case. It will receive

It must print the test case verdict to the standard output, which is a line consisting of either:

- ``AC``: indicates that the contestant's output is correct. It will be given 100 / (number of test cases in its subtask) points.
- ``WA``: indicates that the contestant's output is incorrect. It will be given 0 points.
- ``AC``: indicates that the contestant's output is correct.
- ``WA``: indicates that the contestant's output is incorrect.
- ``OK``: indicates that the contestant's output is partially correct. The second line must contain a floating-point number denoting the points. For example:

::

OK
9

See also :ref:`local grading verdicts <grading_verdicts>` on how the verdicts will be interpreted during local grading.

The scorer must be compiled prior to test cases generation/local grading, and the execution command should be passed to the runner program as the ``--scorer`` option. For example:

Expand Down Expand Up @@ -107,6 +110,8 @@ Here is an example scorer which gives AC if the contestant's output differs not
}
}

.. _styles_communicator:

Communicator
************

Expand All @@ -122,7 +127,7 @@ The communicator must be compiled prior to local grading, and the execution comm

./runner grade --solution=./solution_alt --communicator=./my_communicator

The default scorer command is ``./communicator`` if not specified.
The default communicator command is ``./communicator`` if not specified.

Here is an example communicator program in a typical binary search problem.

Expand Down
4 changes: 2 additions & 2 deletions include/tcframe/grader/GraderLogger.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -28,15 +28,15 @@ class GraderLogger : public BaseLogger {

virtual void logResult(const map<int, Verdict>& subtaskVerdicts, const Verdict& verdict) {
if (subtaskVerdicts.size() > 1) {
engine_->logHeading("SUBTASK RESULTS");
engine_->logHeading("SUBTASK VERDICTS");
for (auto entry : subtaskVerdicts) {
engine_->logParagraph(
1,
"Subtask " + StringUtils::toString(entry.first) + ": " + entry.second.toString());
}
}

engine_->logHeading("RESULT");
engine_->logHeading("VERDICT");
engine_->logParagraph(1, verdict.toString());
}
};
Expand Down
14 changes: 7 additions & 7 deletions include/tcframe/verdict/VerdictCreator.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -23,16 +23,16 @@ class VerdictCreator {
virtual Verdict fromStream(istream* in) {
Verdict builder;

string verdictString;
if (!getline(*in, verdictString)) {
throw runtime_error("Expected: <verdict> on the first line");
string statusString;
if (!getline(*in, statusString)) {
throw runtime_error("Expected: <status> on the first line");
}

if (verdictString == "AC") {
if (statusString == "AC") {
return Verdict(VerdictStatus::ac());
} else if (verdictString == "WA") {
} else if (statusString == "WA") {
return Verdict(VerdictStatus::wa());
} else if (verdictString == "OK") {
} else if (statusString == "OK") {
string secondLine;
if (!getline(*in, secondLine)) {
throw runtime_error("Expected: <points> on the second line");
Expand All @@ -45,7 +45,7 @@ class VerdictCreator {
throw runtime_error("Unknown points format: " + pointsString);
}

throw runtime_error("Unknown verdict: " + verdictString);
throw runtime_error("Unknown status: " + statusString);
}

virtual optional<Verdict> fromExecutionResult(const ExecutionResult& executionResult) {
Expand Down
6 changes: 3 additions & 3 deletions test/unit/tcframe/grader/GraderLoggerTests.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ TEST_F(GraderLoggerTests, Result) {
Verdict verdict(VerdictStatus::ac());
{
InSequence sequence;
EXPECT_CALL(engine, logHeading("RESULT"));
EXPECT_CALL(engine, logHeading("VERDICT"));
EXPECT_CALL(engine, logParagraph(1, verdict.toString()));
}
logger.logResult({{Subtask::MAIN_ID, verdict}}, verdict);
Expand All @@ -44,10 +44,10 @@ TEST_F(GraderLoggerTests, Result_WithSubtasks) {
Verdict verdict(VerdictStatus::wa(), 70);
{
InSequence sequence;
EXPECT_CALL(engine, logHeading("SUBTASK RESULTS"));
EXPECT_CALL(engine, logHeading("SUBTASK VERDICTS"));
EXPECT_CALL(engine, logParagraph(1, "Subtask 1: " + subtask1Verdict.toString()));
EXPECT_CALL(engine, logParagraph(1, "Subtask 2: " + subtask2Verdict.toString()));
EXPECT_CALL(engine, logHeading("RESULT"));
EXPECT_CALL(engine, logHeading("VERDICT"));
EXPECT_CALL(engine, logParagraph(1, verdict.toString()));
}
logger.logResult({
Expand Down
4 changes: 2 additions & 2 deletions test/unit/tcframe/verdict/VerdictCreatorTests.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,7 @@ TEST_F(VerdictCreatorTests, FromStream_Empty) {
verdictCreator.fromStream(new istringstream(""));
FAIL();
} catch (runtime_error& e) {
EXPECT_THAT(e.what(), StrEq("Expected: <verdict> on the first line"));
EXPECT_THAT(e.what(), StrEq("Expected: <status> on the first line"));
}
}

Expand All @@ -65,7 +65,7 @@ TEST_F(VerdictCreatorTests, FromStream_UnknownVerdict) {
verdictCreator.fromStream(new istringstream("hokus pokus"));
FAIL();
} catch (runtime_error& e) {
EXPECT_THAT(e.what(), StrEq("Unknown verdict: hokus pokus"));
EXPECT_THAT(e.what(), StrEq("Unknown status: hokus pokus"));
}
}

Expand Down

0 comments on commit fb587d5

Please sign in to comment.