Consider interphase notation (X@Y)

It looks like we're currently splitting interphase notation in the tokenizer at the "@" symbol, but when we form the vocabulary the "@" is getting combined with some common cases as a multi-word token. We probably need to do something more intelligent with this.