Skip to content

Tyranny of the Source File

Paul Phillips edited this page Jun 10, 2015 · 1 revision

In every mainstream programming language, file-level divisions play a major role in code organization. Sometimes, a language will attempt to offer more nuanced avenues of demarcation, but such abstractions always leak because of the inflexible medium which lies underneath. And each file nurtures its own cluster of file-level boilerplate: copyright notices, function prototypes, imports, package declarations. Any language elements which must be coordinated with the file name or path are perpetual impediments to progress.

Usually the content of this boilerplate is deterministic (that's why we call it boilerplate) so correctness requires that it be continuously derived from the external data which determines it. Without virtual files, this is impossible. In the best case, dependent files may be considered for regeneration on each build, which is wasteful but offers some hope of consistency.

Far more commonly, the boilerplate is embedded into each file: copied, pasted, and manually customized for the current local conditions. Afterward it drifts out of date from both the prototypical boilerplate (which itself changes over time) and from the specific data which informed the initial manual embedding.

Later, someone will change a filename and something unexpected will break, or a move between linux and OSX will reveal further breakage based on case-(in)sensitivity.

Some unlucky drone will be tasked with patrolling the repository for brokenness.

Ad hoc consistency checkers will be written, catching some but not all of the inconsistent states which should never have been allowed to arise.

Java mandates this simple relationship between filename and class name.

% javac Bar.java
Bar.java:1: error: class Foo is public, should be declared in a file named Foo.java
public class Foo { }
       ^
1 error

Countless lines of java have been written differently than they would otherwise have been simply to avoid having to create yet another source file containing one line of logic swaddled in twenty boilerplate blankets.

Scala dropped that requirement but introduced new ones.

% scalac bar1.scala bar2.scala
bar2.scala:1: error: Companions 'class Bar' and 'object Bar' must be defined in same file:
  Found in bar1.scala and bar2.scala
object Bar
       ^
one error found

When a machine requires a human to manually undertake a deterministic task in order that the machine might proceed, the machine has failed in its obligation.

Think of suffuse as a glue language through which we can accommodate the whims and frailties of the various compilers while also organizing all information in the manner of our choosing.

For instance, store java source code any way you like. Place ten public classes in the same file, put each individual function in a database and stitch the class together on the fly, parse java snippets out of stack overflow answers - whatever works best in your organization. That each public class resides in its own file of the same name is a lie that javac requires, and a lie which our vfs is glad to tell.

If the code is parseable (and it can't be compiled if it can't be parsed, so we know it can be parsed) then we also know the names of the top-level public classes. From there it's a simple matter to create the virtual filesystem javac expects.

The scala compiler has hundreds of test cases which require multiple files: for instance separate compilation bugs often involve compiling A.scala and B.scala together, and then B.scala alone. There are also many tests which require both java and scala source files. Multi-file tests are tedious to write and maintain in part because it's nonsensical to be forced to break ten lines of logic across several separate files.

With suffuse we can easily embed multiple virtual files in a single underlying one.

Call this file test.vfs:

### S.scala
class Foo { new Bar }
### J.java
public class Bar { }
### run.sh
#!/bin/bash
#
scalac S.scala J.java
javac J.java
scalac S.scala

Now we instruct suffuse to translate .vfs files in the obvious way. What is test.vfs in the underlying filesystem takes on the implied hierarchy in the virtual filesystem.

test
├── S.scala
├── J.java
└── run.sh

Correspondingly, we can equally easily stitch multiple physical files together into a single virtual one, and we can do so in a way which creates the dependency graph as a byproduct. Static generation of composite files is always vulnerable to falling out of date. With suffuse we can create composite virtual files which are never out of date, because they do not exist in an intermediate form which can become stale.

At least, not until we choose to add caching - but that's a story for another document.

Clone this wiki locally