New regalloc design.

+We need to switch to a new register allocator.
+The current one is split in a global and a local register allocator.
+The global one can assign only callee-saves registers and happens
+on the tree-based internal representation: it assigns local variables
+to hardware registers.
+The local one happens on the linear representation on a per basic
+block basis and assigns hard registers to virtual registers (which
+hold temporary values during expression executions) and it deals also
+with the platform-specific issues (fixed registers, call conventions).
+Moving to a different register will help solve some of the performance
+issues introduced by the above split, make the register more easily
+portable and solve some of the issues generated by dealing with trees.
+The general design ideas are below.
+The new allocator should have a global view of all the method, so it can be
+able to assign variables also to some of the volatile registers if possible,
+even across basic blocks (this would improve performance).
+The allocator would be driven by per-arch declarative data, so porting
+should be easier: an architecture needs to specify register classes,
+call convention and instructions requirements (similar to the gcc code).
+The allocator should operate on the linear representation, this way it's
+easier and faster to track usages more correctly. We need to assign virtual
+registers on a per-method basis instead of per basic block. We can assign
+virtual registers to variables, too. Note that since we fix the stack offset
+of local vars only after this step (which happens after the burg rules are run),
+some of the burg rules that try to optimize the code won't apply anymore:
+the peephole code may need to be enhanced to do the optimizations instead.
+We need to handle floating point registers in the global allocator, too.
+The new allocator also needs to keep track precisely of which registers
+contain references or managed pointers to allow us to move to a precise GC.
+It may be worth to use a single increasing set of integers for the virtual
+registers, with the class of the register stored separately (unless the
+current local allocator which keeps interger and fp registers separate).
+Since this is a large task, we need to do it in steps as much as possible.
+The first is to run the register allocator _after_ the burg rules: this
+requires a rewrite of the liveness code, too, to use linear indexes instead
+of basic-block/tree number combinations. This can be done by:
+*) allocating virtual regs to all the locals that can be register allocated
+*) running the burg rules (some may require adjustments): the local virtual
+registers are assigned starting from global-virt-regs+1, instead of the current
+hardware-regs+1, so we can tell apart global and local virt regs.
+*) running the liveness/whatever code is needed to allocate the global registers
+*) allocate the rest of the local variables to stack slots
+*) continue with the current local allocator
+This work could take 2-3 weeks.
+The next step is to define the kind of declarative data an architecture needs
+and assigning virtual regs to all the registers and making the allocator
+assign from the volatile registers, too.
+Note that some of the code that is currently emitted in the arch-specific
+code, will need to be emitted as instructions that the reg allocator
+can inspect: think of a method that returns the first argument which is
+received in a register: the current code copies it to either a local slot or
+to a global reg in the prolog an copies it back to the return register
+int he basic block, but since neither the regallocator nor the peephole code
+knows about the prolog code, the first store cannot be optimized away.
+The gcc code has some example of how to specify register classes in a
+declarative way.

