Student code quality
This document describes the steps to perform the automated analysis of student programs, as presented in the paper 'Code Quality Issues in Student Programs' (in Proceedings of ITiCSE 2017). The code is provided to perform checks, extend the analysis and allow replication. Comments and questions can be emailed to the first author. Both Java and Haskell is used for the automated analysis.
The following resources are needed for the analysis.
The database is not publicly available. Permission to access the [Blackbox database] (https://www.bluej.org/blackbox.html) needs to be requested with the maintainers.
Add custom ruleset myrules to pmd-java-5.5.2.jar.
A custom CPDRunner (CPDRunner.java) has been created, which runs CPD on all files in a folder separately, avoiding the overhead of restarting CPD for each file. The runner can be executed using a bat file.
Used for counting lines of code.
Used for storing results locally.
The following libraries are needed to run the Java code:
- blackboxAnalyser Java library (for connecting to the Blackbox DB)
- PMD 5.4.1 libraries (for custom CPD runner)
- JUnit (for unit tests in BBTests)
Set the values of a number of constants to store local settings in Main.java, such as data directories and executable directories:
The following libraries are needed to run the Haskell code:
Create the following values in Main to store your local settings:
Copy the payload and index files of the days below from the Blackbox server to a local folder
- 8 September 2014
- 8 December 2014
- 9 March 2015
- 8 June 2015
[Java] 4Extract the code files from these days into folder '4daysJ' using
Main.issueSelection mySettingsthat produces a csv-file with PMD output.
Processing.PMD.freqAnalysis "dir\\<filename>.csv"with the name of the csv-file.
The results are combined into an excel-file for further processing.
[Java] To retrieve the results described in 3.2.3, run
Preparing local database
A local SQLite database is used to store the data needed for the analysis. The purple tables contain copied data from Blackbox, the green tables contain data from running CMD and CPD, the pink table from running cloc and the black table contains the names of the issues and the first letter of the corresponding category (data).
The database can be created using queries from this file.
Storing Blackbox data
Main.fillSpaDB(BlackboxDB db) to store (startup) events, snapshots and extensions.
Storing code file analyses
Retrieve the payloads and indices for week 37, 50 of 2014 and week 11 and 24 of 2015 from the Blackbox server and store them in the binDataDir.
Processing.CodeFiles.processAllto extract code, run cloc/PMD/CPD, store the results into the database and remove the code (this takes a long time!).
The issue table contains one record for each duplicate, instead of the aggregated number of issues from PMD. The view issue2 is created to provide consistent information by aggregating duplicates (duplicate50 and duplicate100). Fill the issue3 table with the data from view issue2.
Optimising and cleaning database
[Haskell] For Table 2: Data set summary run
Reporting.Reports.generalInfo mySettings and
Table 3: Summary of initial PMD run and Table 4: Top 10 issues, see section on Issue selection above.
[Haskell, Java] For Table 5: Issue occurrence run
Main.csvFile to the location of the csv-file and run CSVR.java.
[Haskell, SQL] For Figure 1: Issues over time run query. The number of unique source files per month, used in these queries, are calculated by
[Java] For Table 6: Issues Fixes run
[SQL] For Table 7: Extension use execute query.
[SQL] For Figure 2: Issues and extension use run queries.