Skip to content
Permalink
master
Switch branches/tags
Go to file
 
 
Cannot retrieve contributors at this time
executable file 286 lines (250 sloc) 10.6 KB
#!/usr/bin/env python3
"""
Computer programs have defects. One of the most common is a buffer
overflow, which occurs when a program writes after the end of a
buffer and scribbles over whatever variables happen to sit after the
buffer in memory. Managed languages such as C# and Java make buffer
overflows impossible because they check each array access against the
boundaries of the array. But C and assembly language don't have such
luxuries. In fact, unlike C, assembly language makes no distinction
between a scalar variable and a 1-element array, making it easy to
forget whether a variable is an array at all.
There are three times when a developer can find defects in a program:
at compile time, at run time during controlled testing, or at run
time after deployment. Obviously, one wants to find defects earlier
when they are less expensive to fix. Because assembly language
provides few tools for finding defects at compile time, any technique
to make run-time problems show up sooner is valuable.
Ordinarly, one can detect an overflow from the behavior of a program
when it uses what is stored after the overflowed buffer. However,
the effects of writing after a buffer into memory that a program is
not using at the moment may not be apparent until it's too late.
One class of techniques to make overflows apparent sooner adds what
could be called a "canary in the coal mine". Stack smash protection
measures used by modern C compilers insert explicit canary variables
near sensitive variables, but this has a few drawbacks on an 8-bit
computer: canaries take space in memory, and checking their values
takes time. Some operating systems for modern PCs and other
machines with memory protection hardware use address space layout
randomization (ASLR), which sets up memory so that a buffer overflow
causes an exception that the CPU traps. But small microcontrollers
often have no memory protection.
So I propose a method that combines ideas from canaries and ASLR:
randomize the order of things in memory, so that each buffer that
can be overflowed is more likely to eventually end up before
something where the effect of the overflow is visible. In essence,
each variable is randomly selected to be another variable's canary.
This involves an extension to 6502 assembly language that introduces
a new control command called '.shuffle'. An assembler should permute
the lines between .shuffle and .endshuffle when assembling the
program, so that variables end up in a different order each time
the program is assembled. For example:
.shuffle
foo: .res 32
bar: .res 4
baz: .res 4
cnut: .res 32
.endshuffle
might become
cnut: .res 32
baz: .res 4
foo: .res 32
bar: .res 4
It's also useful for finding overflows that fall off the end of a
read-only data segment. For these, you'll want to permute chunks
longer than one line, which is why the .shuffle keyword takes an
optional delimiter argument.
.shuffle THE_GAME
title_screen:
.byt ...
.byt ...
THE_GAME
character_menu:
.byt ...
.byt ...
THE_GAME
stage_menu:
.byt ...
.byt ...
.endshuffle
The .shuffle command also helps to document the data flow within a
program. If lines of code can be executed in no particular order,
such as 'lda' and 'clc' instructions, putting them in a .shuffle
block lets others know that this is the case.
Another potential application even after you have found and fixed
buffer overflows is binary watermarking. If you are distributing
copies of a program under nondisclosure agreement, you can covertly
mark each copy uniquely by permuting the subroutines and variables.
This program ordinarily shuffles based on randomness provided by the
operating system. But it also has command-line options to force
a specific order: forward, reverse, or with a seed derived from
a string. This is useful because sometimes, .shuffle itself can
lead to defects when it turns out that a program's correctness did
in fact depend on order. To make sure that your .shuffle commands
are semantics-preserving, first test each program with all modules
shuffled forward and again with all modules shuffled in reverse.
The output of this preprocessor doesn't have the .shuffle commands.
This means line numbers in your assembler's error messages won't
necessarily match line numbers in the original source code file.
So there is another option to preserve line numbers by making a
blank line where the shuffle command hits, which makes the line
numbers correct when you force forward order.
It's fairly easy to integrate this into a makefile. One way
reshuffles the file every time you change the source code:
AS65 = ca65
$(objdir)/%.o: $(objdir)/%.s
$(AS65) $(CFLAGS65) $< -o $@
$(objdir)/%.shuffle.s: $(srcdir)/%.s
tools/shuffle.py $< -o $@
Or changing the seed to be based on the file name would let you
freeze the order while you investigate a given defect:
$(objdir)/%.shuffle.s: $(srcdir)/%.s
tools/shuffle.py $< --seed 345$< -o $@
Further reading:
Article on Wikipedia about explicit instruction-level parallelism
http://en.wikipedia.org/wiki/Explicitly_parallel_instruction_computing
"Software Watermarking: Models and Dynamic Embeddings"
by C. Collberg and C. Thomborson of the University of Auckland
http://www.cs.arizona.edu/~collberg/Research/Publications/CollbergThomborson99a/
"""
from __future__ import with_statement
import sys
import random
versionText = """0.03
Copyright 2016 Damian Yerrick
Copying and distribution of this file, with or without
modification, are permitted in any medium without royalty
provided the copyright notice and this notice are preserved.
This file is offered as-is, without any warranty.
"""
class BlocksReader(object):
def __init__(self, seq, terminator, sep=None):
self.lines = []
self.stopped = False
self.seq = seq
self.terminator = terminator
self.sep = sep
def __iter__(self):
return self
def __repr__(self):
return ("BlocksReader(%s, %s, %s)"
% (repr(self.seq), self.terminator, self.sep))
def __next__(self):
if self.stopped:
raise StopIteration
if self.sep is None:
line = next(self.seq)
if line.strip() == self.terminator:
self.stopped = True
raise StopIteration
return line
# if we have a separator, read lines and combine them
lines = []
while True:
try:
line = next(self.seq)
except StopIteration:
self.stopped = True
return ''.join(lines)
stripped = line.strip()
if stripped == self.terminator:
self.stopped = True
return ''.join(lines)
if stripped == self.sep:
return ''.join(lines)
lines.append(line)
class ShuffledReader(object):
def __init__(self, seq, rng=random, preserveLineNumbers=False):
self.seq = iter(seq)
self.rng = rng
self.bookends = '\n' if preserveLineNumbers else ''
self.factorials = []
def __iter__(self):
return self
def __next__(self):
line = next(self.seq) # raises StopIteration here
splitLine = line.split(None, 1)
if len(splitLine) == 0:
return line
cmd = splitLine[0]
args = splitLine[1].strip() if len(splitLine) > 1 else None
if cmd == '.shuffle':
inner = ShuffledReader(self.seq, self.rng, self.bookends)
lines = list(BlocksReader(inner, '.endshuffle', args))
if len(lines) > 1:
self.factorials.append(len(lines))
self.factorials.extend(inner.factorials)
self.rng.shuffle(lines)
between = self.bookends if args is not None else ''
return self.bookends + between.join(lines) + self.bookends
else:
return line
class ForwardShuffle(object):
def shuffle(self, ls):
pass
class ReverseShuffle(object):
def shuffle(self, ls):
ls.reverse()
def main(argv=None):
from optparse import OptionParser
if argv is None:
argv = sys.argv
parser = OptionParser(usage="%prog [options] [FILE...]",
version=versionText)
parser.add_option("-f", "--forward",
action="store_false", dest="reverse",
help="shuffle all blocks in forward order")
parser.add_option("-r", "--reverse",
action="store_true", dest="reverse",
help="shuffle all blocks in reverse order")
parser.add_option("-l", "--print-lengths",
action="store_true", dest="printLengths",
help="write lengths of shuffled blocks to stderr")
parser.add_option("-s", "--seed",
dest="seed",
help="seed the random number generator with a string",
metavar="SEEDSTRING")
parser.add_option("-o", "--output", dest="outfile",
help="write shuffled output to OUTFILE", metavar="OUTFILE")
parser.add_option("-U", "--universal-newlines",
action="store_true", dest="unl", default=False,
help="no-op, ignored for compatibility")
parser.add_option("--pln", "--preserve-line-numbers",
action="store_true", dest="preserveLineNumbers", default=False,
help="insert blank lines to compensate for removed .shuffle commands")
(options, filenames) = parser.parse_args(argv[1:])
if options.outfile is None:
outfp = sys.stdout
else:
outfp = open(options.outfile, 'w')
if options.reverse is False:
rng = ForwardShuffle()
elif options.reverse is True:
rng = ReverseShuffle()
else:
rng = random.Random(options.seed)
if len(filenames) == 0:
reader = ShuffledReader(sys.stdin, rng,
options.preserveLineNumbers)
for line in reader:
outfp.write(line)
else:
readMode = 'r'
for filename in filenames:
with open(filename, readMode) as infp:
reader = ShuffledReader(infp, rng,
options.preserveLineNumbers)
for line in reader:
outfp.write(line)
if options.outfile is not None:
outfp.close()
if options.printLengths and len(reader.factorials) > 0:
import math
lengths = "".join('%d!' % l for l in reader.factorials)
product = 1
for l in reader.factorials:
product *= math.factorial(l)
print("%s = %d" % (lengths, product), file=sys.stderr)
if __name__=='__main__':
main()
## main('shuffle --pln -U ../src/litemain.s -o test.out.txt'.split())