Docs: Test case, subtask, and overall verdicts (#138)

ia-toki · May 27, 2017 · fb587d5 · fb587d5
1 parent 5f2c281
commit fb587d5
Show file tree

Hide file tree

Showing 7 changed files with 98 additions and 47 deletions.
diff --git a/docs/getting-started/getting-started.rst b/docs/getting-started/getting-started.rst
@@ -454,7 +454,7 @@ You should get the following output:
           - Floating point exception: 8
       sum_4: Accepted
 
-    [ RESULT ]
+    [ VERDICT ]
       Time Limit Exceeded [25]
 
 We get a detailed verdict of each test case. Nice, isn't it? The final result here is **Time Limit Exceeded**, which is the "worst" verdict among all test case verdicts, and we get 25 points because we get one test case correct out of all four test cases.

diff --git a/docs/topic-guides/grading.rst b/docs/topic-guides/grading.rst
@@ -25,13 +25,22 @@ For example, suppose you have written a problem package for a problem. Your frie
 
     ./runner grade --solution=./solution_alt
 
-The verdict of each test case will be shown. The verdict will be one of the following:
+The verdicts of each test case, each subtask (if any), as well as the overall verdict will be shown, as described below.
+
+.. _grading_verdicts:
+
+Verdicts
+--------
+
+A verdict consists of the verdict status and optionally verdict points.
+
+The recognized statuses, from the best to the worst, are as follows:
 
 Accepted
     The output produced by the solution is correct.
 
-OK [points]
-    The output produced by the solution is partially correct with the given points.
+OK
+    The output produced by the solution is partially correct.
 
 Wrong Answer
     The output produced by the solution is incorrect. By default, the diff will be shown, truncated to the first 10 lines.
@@ -43,11 +52,69 @@ Time Limit Exceeded
     The solution did not stop within the time limit, if specified.
 
 Internal Error
-    Custom scorer (if any) crashed or did not give valid verdict.
+    Custom :ref:`scorer <styles_scorer>` / :ref:`communicator <styles_communicator>` (if any) crashed or did not give a valid verdict.
+
+Test case verdicts
+******************
+
+The verdict of each test case will be shown. For OK statuses, the points (given by the scorer) will be also shown.
+
+Subtask verdicts
+****************
+
+If the problem has subtasks, the verdict of each subtask will be shown as well. The verdict of a subtask is the combination of:
+
+- status: the worst status of test case verdicts in the subtask
+- points:
+
+  - the subtask points (assigned via ``Points()``), if all test case verdicts in the subtask are Accepted,
+  - the minimum points of OK verdicts in the subtask, if at least one test case verdict is OK and the rest are Accepted, or
+  - 0, otherwise.
+
+Overall verdict
+***************
+
+Finally, the overall verdict is as follows.
+
+For problem without subtasks:
+
+- status: the worst test case verdict status
+- points: the sum of test case verdict points, where:
 
-The verdict of each subtask will be also shown. The verdict of a subtask is the worst verdict of all verdicts of test cases that are assigned to it. Here, RTE is worse than WA, and WA is worse than AC.
+  - an Accepted status will be given 100 / (number of test cases) points
+  - an OK status will be given its own points
+  - any other status will be given 0 points
 
-Here is a sample output of a local grading for problems with subtasks.
+For problem with subtasks:
+
+- status: the worst subtask verdict status
+- points: the sum of subtask verdict points
+
+Sample local grading output
+---------------------------
+
+Here is a sample output of a local grading for problems without subtasks.
+
+.. sourcecode:: bash
+
+    Local grading with solution command: './solution_alt'...
+
+    [ SAMPLE TEST CASES ]
+      k-product_sample_1: Accepted
+
+    [ OFFICIAL TEST CASES ]
+      k-product_1: Accepted
+      k-product_2: Accepted
+      k-product_3: OK [21]
+      k-product_4: Wrong Answer
+        * scorer Diff:
+    (expected) [line 01]    11
+    (received) [line 01]    12
+
+    [ VERDICT ]
+      Wrong Answer [71]
+
+and here is for problems with subtasks.
 
 .. sourcecode:: bash
 
@@ -84,35 +151,14 @@ Here is a sample output of a local grading for problems with subtasks.
           - Exit code: 1
           - Standard error:
 
-    [ SUBTASK RESULTS ]
+    [ SUBTASK VERDICTS ]
       Subtask 1: Accepted [40]
       Subtask 2: Wrong Answer [0]
       Subtask 3: Runtime Error [0]
 
-    [ RESULT ]
+    [ VERDICT ]
       Runtime Error [40]
 
-and here is for problems without subtasks
-
-.. sourcecode:: bash
-
-    Local grading with solution command: './solution_alt'...
-
-    [ SAMPLE TEST CASES ]
-      k-product_sample_1: Accepted
-
-    [ OFFICIAL TEST CASES ]
-      k-product_1: Accepted
-      k-product_2: Accepted
-      k-product_3: OK [21]
-      k-product_4: Wrong Answer
-        * scorer Diff:
-    (expected) [line 01]    11
-    (received) [line 01]    12
-
-    [ RESULT ]
-      Wrong Answer [71]
-
 This local grading feature is useful for creating "unit tests" for your test cases. For each problem, you can write many solutions with different intended results. For example, ``solution_123.cpp`` should pass subtasks 1 - 3; ``solution_12.cpp`` should pass subtasks 1 and 2 but not subtask 3, etc.
 
 Notes

diff --git a/docs/topic-guides/styles.rst b/docs/topic-guides/styles.rst
@@ -41,6 +41,8 @@ Enabled by calling ``InteractiveEvaluator()``. The solution will participate in
 Helper programs
 ---------------
 
+.. _styles_scorer:
+
 Scorer
 ******
 
@@ -52,15 +54,16 @@ A scorer is a program which decides the verdict of a test case. It will receive
 
 It must print the test case verdict to the standard output, which is a line consisting of either:
 
-- ``AC``: indicates that the contestant's output is correct. It will be given 100 / (number of test cases in its subtask) points.
-- ``WA``: indicates that the contestant's output is incorrect. It will be given 0 points.
+- ``AC``: indicates that the contestant's output is correct.
+- ``WA``: indicates that the contestant's output is incorrect.
 - ``OK``: indicates that the contestant's output is partially correct. The second line must contain a floating-point number denoting the points. For example:
 
   ::
 
       OK
       9
 
+See also :ref:`local grading verdicts <grading_verdicts>` on how the verdicts will be interpreted during local grading.
 
 The scorer must be compiled prior to test cases generation/local grading, and the execution command should be passed to the runner program as the ``--scorer`` option. For example:
 
@@ -107,6 +110,8 @@ Here is an example scorer which gives AC if the contestant's output differs not
         }
     }
 
+.. _styles_communicator:
+
 Communicator
 ************
 
@@ -122,7 +127,7 @@ The communicator must be compiled prior to local grading, and the execution comm
 
     ./runner grade --solution=./solution_alt --communicator=./my_communicator
 
-The default scorer command is ``./communicator`` if not specified.
+The default communicator command is ``./communicator`` if not specified.
 
 Here is an example communicator program in a typical binary search problem.
 

diff --git a/include/tcframe/grader/GraderLogger.hpp b/include/tcframe/grader/GraderLogger.hpp
@@ -28,15 +28,15 @@ class GraderLogger : public BaseLogger {
 
     virtual void logResult(const map<int, Verdict>& subtaskVerdicts, const Verdict& verdict) {
         if (subtaskVerdicts.size() > 1) {
-            engine_->logHeading("SUBTASK RESULTS");
+            engine_->logHeading("SUBTASK VERDICTS");
             for (auto entry : subtaskVerdicts) {
                 engine_->logParagraph(
                         1,
                         "Subtask " + StringUtils::toString(entry.first) + ": " + entry.second.toString());
             }
         }
 
-        engine_->logHeading("RESULT");
+        engine_->logHeading("VERDICT");
         engine_->logParagraph(1, verdict.toString());
     }
 };

diff --git a/include/tcframe/verdict/VerdictCreator.hpp b/include/tcframe/verdict/VerdictCreator.hpp
@@ -23,16 +23,16 @@ class VerdictCreator {
     virtual Verdict fromStream(istream* in) {
         Verdict builder;
 
-        string verdictString;
-        if (!getline(*in, verdictString)) {
-            throw runtime_error("Expected: <verdict> on the first line");
+        string statusString;
+        if (!getline(*in, statusString)) {
+            throw runtime_error("Expected: <status> on the first line");
         }
 
-        if (verdictString == "AC") {
+        if (statusString == "AC") {
             return Verdict(VerdictStatus::ac());
-        } else if (verdictString == "WA") {
+        } else if (statusString == "WA") {
             return Verdict(VerdictStatus::wa());
-        } else if (verdictString == "OK") {
+        } else if (statusString == "OK") {
             string secondLine;
             if (!getline(*in, secondLine)) {
                 throw runtime_error("Expected: <points> on the second line");
@@ -45,7 +45,7 @@ class VerdictCreator {
             throw runtime_error("Unknown points format: " + pointsString);
         }
 
-        throw runtime_error("Unknown verdict: " + verdictString);
+        throw runtime_error("Unknown status: " + statusString);
     }
 
     virtual optional<Verdict> fromExecutionResult(const ExecutionResult& executionResult) {

diff --git a/test/unit/tcframe/grader/GraderLoggerTests.cpp b/test/unit/tcframe/grader/GraderLoggerTests.cpp
@@ -32,7 +32,7 @@ TEST_F(GraderLoggerTests, Result) {
     Verdict verdict(VerdictStatus::ac());
     {
         InSequence sequence;
-        EXPECT_CALL(engine, logHeading("RESULT"));
+        EXPECT_CALL(engine, logHeading("VERDICT"));
         EXPECT_CALL(engine, logParagraph(1, verdict.toString()));
     }
     logger.logResult({{Subtask::MAIN_ID, verdict}}, verdict);
@@ -44,10 +44,10 @@ TEST_F(GraderLoggerTests, Result_WithSubtasks) {
     Verdict verdict(VerdictStatus::wa(), 70);
     {
         InSequence sequence;
-        EXPECT_CALL(engine, logHeading("SUBTASK RESULTS"));
+        EXPECT_CALL(engine, logHeading("SUBTASK VERDICTS"));
         EXPECT_CALL(engine, logParagraph(1, "Subtask 1: " + subtask1Verdict.toString()));
         EXPECT_CALL(engine, logParagraph(1, "Subtask 2: " + subtask2Verdict.toString()));
-        EXPECT_CALL(engine, logHeading("RESULT"));
+        EXPECT_CALL(engine, logHeading("VERDICT"));
         EXPECT_CALL(engine, logParagraph(1, verdict.toString()));
     }
     logger.logResult({

diff --git a/test/unit/tcframe/verdict/VerdictCreatorTests.cpp b/test/unit/tcframe/verdict/VerdictCreatorTests.cpp
@@ -56,7 +56,7 @@ TEST_F(VerdictCreatorTests, FromStream_Empty) {
         verdictCreator.fromStream(new istringstream(""));
         FAIL();
     } catch (runtime_error& e) {
-        EXPECT_THAT(e.what(), StrEq("Expected: <verdict> on the first line"));
+        EXPECT_THAT(e.what(), StrEq("Expected: <status> on the first line"));
     }
 }
 
@@ -65,7 +65,7 @@ TEST_F(VerdictCreatorTests, FromStream_UnknownVerdict) {
         verdictCreator.fromStream(new istringstream("hokus pokus"));
         FAIL();
     } catch (runtime_error& e) {
-        EXPECT_THAT(e.what(), StrEq("Unknown verdict: hokus pokus"));
+        EXPECT_THAT(e.what(), StrEq("Unknown status: hokus pokus"));
     }
 }