Skip to content


Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?

Latest commit


Git stats


Failed to load latest commit information.
Latest commit message
Commit time

Student code quality

This document describes the steps to perform the automated analysis of student programs, as presented in the paper 'Code Quality Issues in Student Programs' (in Proceedings of ITiCSE 2017). The code is provided to perform checks, extend the analysis and allow replication. Comments and questions can be emailed to the first author. Both Java and Haskell are used for the automated analysis.


The following resources are needed for the analysis.

Blackbox database

The database is not publicly available. Permission to access the Blackbox database needs to be requested with the maintainers.

PMD Version 5.5.2

Add custom ruleset myrules to pmd-java-5.5.2.jar.

CPD Version 5.4.1

A custom CPDRunner ( has been created, which runs CPD on all files in a folder separately, avoiding the overhead of restarting CPD for each file. The runner can be executed using a bat file.


Used for counting lines of code.

SQLite database

Used for storing results locally.

Java code

The following libraries are needed to run the Java code:

  • blackboxAnalyser Java library (for connecting to the Blackbox DB)
  • PMD 5.4.1 libraries (for custom CPD runner)
  • JUnit (for unit tests in BBTests)

Set the values of a number of constants to store local settings in, such as data directories and executable directories:

  • Main.dbUrl
  • Main.inDir4days and Main.outDir4days
  • Main.outFileNameFix

Haskell code

The following libraries are needed to run the Haskell code:

Create the following values in Main to store your local settings:

  • mySettings of type Settings
  • myPMDSettings of type CPDSettings
  • myCPDSettings of type PMDSettings


Issue selection

  1. Copy the payload and index files of the days below from the Blackbox server to a local folder

    • 8 September 2014
    • 8 December 2014
    • 9 March 2015
    • 8 June 2015
  2. [Java] 4Extract the code files from these days into folder '4daysJ' using Main.extract4Days(<inFolder>, <outFolder>).

  3. [Haskell] Run Main.issueSelection mySettings that produces a csv-file with PMD output.

  4. [Haskell] Run Processing.PMD.freqAnalysis "dir\\<filename>.csv" with the name of the csv-file.

  5. [Haskell] Run Main.issueSelectionCPD.

  6. The results are combined into an excel-file for further processing.

Extension selection

[Java] To retrieve the results described in 3.2.3, run showAllExtensionsSelected4Weeks() in BlackboxDB.

Preparing local database

A local SQLite database is used to store the data needed for the analysis. The purple tables contain copied data from Blackbox, the green tables contain data from running CMD and CPD, the pink table from running cloc and the black table contains the names of the issues and the first letter of the corresponding category (data).


The database can be created using queries from this file.

Storing Blackbox data

[Java] Run Main.fillSpaDB(BlackboxDB db) to store (startup) events, snapshots and extensions.

Storing code file analyses

  1. Retrieve the payloads and indices for week 37, 50 of 2014 and week 11 and 24 of 2015 from the Blackbox server and store them in the binDataDir.

  2. [Haskell] Run Processing.CodeFiles.processAll to extract code, run cloc/PMD/CPD, store the results into the database and remove the code (this takes a long time!).

  3. The issue table contains one record for each duplicate, instead of the aggregated number of issues from PMD. The view issue2 is created to provide consistent information by aggregating duplicates (duplicate50 and duplicate100). Fill the issue3 table with the data from view issue2.

Optimising and cleaning database

  • Add indices.
  • [SQL] Add filenames to the issue table with query.
  • [SQL] Cleaning database query.



[Haskell] For Table 2: Data set summary run Reporting.Reports.generalInfo mySettings and Reporting.Reports.generalLocInfo mySettings.


Table 3: Summary of initial PMD run and Table 4: Top 10 issues, see section on Issue selection above.

[Haskell, Java] For Table 5: Issue occurrence run Reporting.Reports.issueOccs mySettings. Set Main.csvFile to the location of the csv-file and run

[Haskell, SQL] For Figure 1: Issues over time run query. The number of unique source files per month, used in these queries, are calculated by dbUniqueSFPerMonth sett.


[Java] For Table 6: Issues Fixes run Main.issueFixing().


[SQL] For Table 7: Extension use execute query.

[SQL] For Figure 2: Issues and extension use run queries.


Automated analysis of code quality in student programs






No releases published


No packages published