This document exists as a reference for future/other instructors that may wish to use this template.
There is a general common set of files in the template.
dist.sh- Shell script for generating the autograder. Try not to include solution.pyfiles in the list of things included in the autograder.files.txt- A file containing a (new-line separated) list of solution file names. This is what determines what student-submitted files are considered. There is a default test that looks for all these and makes sure that have author information.README.md- This file, contains information about the assignment template for reference.requirements.txt- List of additional python libraries required by the autograder. Used to install said libraries.run_autograder- Shell script used by the autograder to start running tests.run_tests.py- Python script used by therun_autograderscript to run tests. By current default, tests are visible to students, but not the output thereof.setup.sh- Shell script for setting up autograder environment (mostly installing things).metatest.py- Python file with machinery for the autograder to have and inherit.test.py- Python file with the actual tests in them.Dockerfile- A Dockerfile for an environment similar to the ultimate Gradescope environment. Presently makes an environment that should support, C and C++ testing via gtest as well as python testing via te gradescope utilities library.
Regardless of the paradigm of the programming environment for the students for the class, this template (and projects derived therefrom) have a Docker file to allow the autograder to be more readily developed in a matching environment.
The Dockerfile is set up to (hopefully) allow for autograders for python and C/C++ testing (I'm not testing the gtest stuff for C/C++ right now, especially this is more aimed at my python meta-auto-grader).
In order to make use of the Dockerfile, naturally docker needs to be installed. Consult the internet for appropriate installation information.
For the Docker novices like me, I think it suffices to understand two concepts:
- Images: I think of Docker images as "pre-configured operating systems".
- Containers: I think of Docker containers as an actual running virtual machine/instance of an image.
The following information is largely cobbled together from my understanding after mucking around with incantations from here.
A docker image can be built (and tagged for easier reference) via:
docker build -t [CHOSEN_TAG] [DIRECTORY_OF_DOCKERFILE]
For instance, when running a terminal in this direcory with the Dockerfile, one could run:
docker build -t autograder_env .
Containers for a given image can be launched with a shared folder, making it easy to edit files on one's native system while running/compiling in the specific environment. The incantation for all that is:
docker run -it -v [HOST_FULL_PATH_DIR]:[TARGET_DIR] [IMAGE] /bin/bash
(The shell can presumably be changed as desired.) Continuing from the earlier example, my preference is something like:
docker run -it -v $(pwd):/home autograder_env /bin/bash
(Replace $(pwd) with whatever incantation is necessary for operating system of choice.)
As of the time of writing this, the latest version of Ubuntu that Gradescope lets us base autograders on is Ubuntu 22.04.
This comes with python 3.10, and doesn't even have the ability to readily update to python 3.12 (at least didn't at some point).
New students downloading python will have python 3.12 (or later once that exists).
This is especially an issue because the handling of f-strings has improved remarkable,
where f"{"Hello" if b else "Goodbye"}, World!" now works, but did not parse properly prior.
All this is to say, some shenangans were done to get the old Ubuntu 22.04 to use current python,
that's what the ppa:deadsnakes/ppa stuff is about.
At present though, python3.12 needs to be directly invoked to use,
as python3 translates to python3.10.
pip3 however is version 25.0.1 at present from python 3.12.
It is fairly simple to run tests locally, no need for special environments or building the whole autograder.
python3.12 -m unittest test.py
However, this mingles the default type of test output with any custom assert statement outputs.
To get just the default test progress and results output, trash stdout.
python3.12 -m unittest test.py 1> /dev/null
To get just the assert statement outputs, trash stderr.
python3.12 -m unittest test.py 2> /dev/null
In order to actually build the autograder (rather, to zip up all of the necessary files for the autograder), simply run
./dist.sh
This will create a file named gradescope.zip that can be given to Gradescope to configure/build the autograder.
The meta-auto-grader has grown into a fairly obtuse creature, some of which only makes sense in the light of history. The first example of this was already explained: the shenanigans to get the most recent version of python working on Gradescope.
But for the actual testing, the "library" of stuff resides in metatest.py and cull.py.
Tests for a given assignment go in test.py, whose test class inherits from metatest.MetaTest.
The first bit of eccentricity is in how modules are loaded.
A list of python file names are written in the files.txt file.
On (Metatest) class setup, these modules a loaded in a fancy manner.
Using the cull.py machinery, each of these files has an edited clone made
where the global code is wrapped in try-catches.
input() is the replaced with something that would throw errors
and standard output is routed to the void, and the module is imported.
This means that largely whatever global code the students let in the work
doesn't mess up the attempt to access their functions for testing purposes.
The import is done under the original module name, and the variable for that name
in the classes module is set to the imported module. That is to say,
when that Metatest code is run from something extending Metatest in
test.py and attempts to import foo.py, the variable foo in test.py
is set to the module foo_alt (the gloabal-code wrapped foo).
The tests are largely abstracted to invokations of the meta_omni_test_eq function inherited from the Metatest class.
The function takes four arguments:
- A module name: the module which the function being tested is from.
- A function name: the function being tested.
- A dictionary generator function (or probably a dictionary generator as well, maybe even a list/tuple of the dictionaries works): some generator of dictionaries containing information about what the function should do in what cicumstances.
- A message to be displayed upon success.
These testing dictionaries contain all the information for a single invocation of the function being tested. At heart, specifying the inputs and the expected outputs. They can have a number of keys:
"mode": This detmines how the return from calling the student's function is compared against the solution. There are three possibilities:"eq": Checks equality. This is the default if"mode"is unspecified."aeq": Checks if the return is almost equal to the expected solution. This is intended to be used when the funciton returnsfloats, ordicts/lists offloats, or some more elaborate recursive structure where the bottom isfloats."regex": Checks if the return matches some regex. Naturally to be used with string returns that have some flexibility.
"args": A list/tuple of arguements to be passed to the function call. Defaults toNone, which means nothing passed in."ret": The expected/solution return from the funtion given the arguments (and inputs frominput(), see"inps"key). Defaults toNone. For the"regex"mode, this must be a particular regex dictionary that will be specified later."tol": The tolerance used in"aeq"mode, and serves as a default for regex-based comparisons (to be explained more later). Defaults toNone, which I believe is interpreted as 7 places of accuracy. Anintvalue is interpreted as a number of places of accuracy. Afloatvalue is interpreted as an absolute tolerance. Finally, a nested data structure matching the solution's type is recursively unpacked."err": An error that should be raised by the function call. Defaults toNone, meaning "no error". If there should be an error, the return will not be checked."inps": A list/tuple of inputs to be routed into theinput()function. Defaults toNone, meaninginput()should be unused."outp": A list of those regex dictionaries specifying how each non-empty line of standard output should look. Defaults toNone, meaning no output to check."files": A dictionary from file names who contents should be checked, to lists of those regex dictionaries specifying how each non-empty line of a specific file should look. Defaults toNone, meaning no files to check.
The "regex" mode, and the "outp" and "files" checking all make use of a particularly formatted dictionary for information about how each non-empty line (only one line in "regex" return checking) should look.
"soln": An example of (or maybe exactly) how the line should look."regex": A regex for how the line should look. Defaults toNone, meaning "exactly match the solution"."vals": A list offloatvalues that occur in the regex match that should be checked for ("almost equal") accuracy.Nones in the list mean "not a number to check". Overall thing deaults toNone, meaning treat as just a regex check."tols": tolerances for each value check in"vals". IndividualNones,ints, andfloats behave akin to the usual almost equality checking. Defaults toNone, meaning use the"tol"key's value as the default.
I just want to have a list of things that I have found useful in the course of fiddling with this.
- Python Documentation
- Gradescope Utils Function Prototypes
- Grip utility (for seeing
.mdfiles)