Skip to content

jdyeager/Python-Autograder-Template

Repository files navigation

This document exists as a reference for future/other instructors that may wish to use this template.


Files

There is a general common set of files in the template.

  • dist.sh - Shell script for generating the autograder. Try not to include solution .py files in the list of things included in the autograder.
  • files.txt - A file containing a (new-line separated) list of solution file names. This is what determines what student-submitted files are considered. There is a default test that looks for all these and makes sure that have author information.
  • README.md - This file, contains information about the assignment template for reference.
  • requirements.txt - List of additional python libraries required by the autograder. Used to install said libraries.
  • run_autograder - Shell script used by the autograder to start running tests.
  • run_tests.py - Python script used by the run_autograder script to run tests. By current default, tests are visible to students, but not the output thereof.
  • setup.sh - Shell script for setting up autograder environment (mostly installing things).
  • metatest.py - Python file with machinery for the autograder to have and inherit.
  • test.py - Python file with the actual tests in them.
  • Dockerfile - A Dockerfile for an environment similar to the ultimate Gradescope environment. Presently makes an environment that should support, C and C++ testing via gtest as well as python testing via te gradescope utilities library.

Setting up Environment

Regardless of the paradigm of the programming environment for the students for the class, this template (and projects derived therefrom) have a Docker file to allow the autograder to be more readily developed in a matching environment.

The Dockerfile is set up to (hopefully) allow for autograders for python and C/C++ testing (I'm not testing the gtest stuff for C/C++ right now, especially this is more aimed at my python meta-auto-grader).

Installing Docker

In order to make use of the Dockerfile, naturally docker needs to be installed. Consult the internet for appropriate installation information.

For the Docker novices like me, I think it suffices to understand two concepts:

  • Images: I think of Docker images as "pre-configured operating systems".
  • Containers: I think of Docker containers as an actual running virtual machine/instance of an image.

The following information is largely cobbled together from my understanding after mucking around with incantations from here.

Building the Image

A docker image can be built (and tagged for easier reference) via:

docker build -t [CHOSEN_TAG] [DIRECTORY_OF_DOCKERFILE]

For instance, when running a terminal in this direcory with the Dockerfile, one could run:

docker build -t autograder_env .

Launching a Container

Containers for a given image can be launched with a shared folder, making it easy to edit files on one's native system while running/compiling in the specific environment. The incantation for all that is:

docker run -it -v [HOST_FULL_PATH_DIR]:[TARGET_DIR] [IMAGE] /bin/bash

(The shell can presumably be changed as desired.) Continuing from the earlier example, my preference is something like:

docker run -it -v $(pwd):/home autograder_env /bin/bash

(Replace $(pwd) with whatever incantation is necessary for operating system of choice.)

Note about Python version

As of the time of writing this, the latest version of Ubuntu that Gradescope lets us base autograders on is Ubuntu 22.04. This comes with python 3.10, and doesn't even have the ability to readily update to python 3.12 (at least didn't at some point). New students downloading python will have python 3.12 (or later once that exists). This is especially an issue because the handling of f-strings has improved remarkable, where f"{"Hello" if b else "Goodbye"}, World!" now works, but did not parse properly prior.

All this is to say, some shenangans were done to get the old Ubuntu 22.04 to use current python, that's what the ppa:deadsnakes/ppa stuff is about. At present though, python3.12 needs to be directly invoked to use, as python3 translates to python3.10. pip3 however is version 25.0.1 at present from python 3.12.


Running Tests Locally

It is fairly simple to run tests locally, no need for special environments or building the whole autograder.

python3.12 -m unittest test.py

However, this mingles the default type of test output with any custom assert statement outputs.

To get just the default test progress and results output, trash stdout.

python3.12 -m unittest test.py 1> /dev/null

To get just the assert statement outputs, trash stderr.

python3.12 -m unittest test.py 2> /dev/null

Building Autograder

In order to actually build the autograder (rather, to zip up all of the necessary files for the autograder), simply run

./dist.sh

This will create a file named gradescope.zip that can be given to Gradescope to configure/build the autograder.


Autograder Lore and Details

The meta-auto-grader has grown into a fairly obtuse creature, some of which only makes sense in the light of history. The first example of this was already explained: the shenanigans to get the most recent version of python working on Gradescope.

But for the actual testing, the "library" of stuff resides in metatest.py and cull.py. Tests for a given assignment go in test.py, whose test class inherits from metatest.MetaTest.

Loading Student Modules

The first bit of eccentricity is in how modules are loaded. A list of python file names are written in the files.txt file. On (Metatest) class setup, these modules a loaded in a fancy manner. Using the cull.py machinery, each of these files has an edited clone made where the global code is wrapped in try-catches. input() is the replaced with something that would throw errors and standard output is routed to the void, and the module is imported. This means that largely whatever global code the students let in the work doesn't mess up the attempt to access their functions for testing purposes. The import is done under the original module name, and the variable for that name in the classes module is set to the imported module. That is to say, when that Metatest code is run from something extending Metatest in test.py and attempts to import foo.py, the variable foo in test.py is set to the module foo_alt (the gloabal-code wrapped foo).

Writing Tests

The tests are largely abstracted to invokations of the meta_omni_test_eq function inherited from the Metatest class. The function takes four arguments:

  • A module name: the module which the function being tested is from.
  • A function name: the function being tested.
  • A dictionary generator function (or probably a dictionary generator as well, maybe even a list/tuple of the dictionaries works): some generator of dictionaries containing information about what the function should do in what cicumstances.
  • A message to be displayed upon success.

These testing dictionaries contain all the information for a single invocation of the function being tested. At heart, specifying the inputs and the expected outputs. They can have a number of keys:

  • "mode": This detmines how the return from calling the student's function is compared against the solution. There are three possibilities:
    • "eq": Checks equality. This is the default if "mode" is unspecified.
    • "aeq": Checks if the return is almost equal to the expected solution. This is intended to be used when the funciton returns floats, or dicts/lists of floats, or some more elaborate recursive structure where the bottom is floats.
    • "regex": Checks if the return matches some regex. Naturally to be used with string returns that have some flexibility.
  • "args": A list/tuple of arguements to be passed to the function call. Defaults to None, which means nothing passed in.
  • "ret": The expected/solution return from the funtion given the arguments (and inputs from input(), see "inps" key). Defaults to None. For the "regex" mode, this must be a particular regex dictionary that will be specified later.
  • "tol": The tolerance used in "aeq" mode, and serves as a default for regex-based comparisons (to be explained more later). Defaults to None, which I believe is interpreted as 7 places of accuracy. An int value is interpreted as a number of places of accuracy. A float value is interpreted as an absolute tolerance. Finally, a nested data structure matching the solution's type is recursively unpacked.
  • "err": An error that should be raised by the function call. Defaults to None, meaning "no error". If there should be an error, the return will not be checked.
  • "inps": A list/tuple of inputs to be routed into the input() function. Defaults to None, meaning input() should be unused.
  • "outp": A list of those regex dictionaries specifying how each non-empty line of standard output should look. Defaults to None, meaning no output to check.
  • "files": A dictionary from file names who contents should be checked, to lists of those regex dictionaries specifying how each non-empty line of a specific file should look. Defaults to None, meaning no files to check.

The "regex" mode, and the "outp" and "files" checking all make use of a particularly formatted dictionary for information about how each non-empty line (only one line in "regex" return checking) should look.

  • "soln": An example of (or maybe exactly) how the line should look.
  • "regex": A regex for how the line should look. Defaults to None, meaning "exactly match the solution".
  • "vals": A list of float values that occur in the regex match that should be checked for ("almost equal") accuracy. Nones in the list mean "not a number to check". Overall thing deaults to None, meaning treat as just a regex check.
  • "tols": tolerances for each value check in "vals". Individual Nones, ints, and floats behave akin to the usual almost equality checking. Defaults to None, meaning use the "tol" key's value as the default.

Misc References and Recourses

I just want to have a list of things that I have found useful in the course of fiddling with this.

About

Template for Gradescope autograder used in a CS class

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published