-
Notifications
You must be signed in to change notification settings - Fork 215
Technical details
- 32-Bit OR1000 Emulator with MMU, TICK counter and PIC (OR1K Specification)
- 32 MB RAM (alterable)
- UART 16550 connected to a terminal
- UART 16550 not connected
- OCFB Framebuffer 640x400 16bpp with LPC32xx touchscreen
- virtio device with support for the 9p filesystem
- ATA connected to a 64 kB hard drive
- Opencore-keyboard controller
- Ethernet MAC controller
- Audio controller
- Real time clock
- Linux Terminal Emulator
- Linux 3.16, Busybox and much much more
Memory IRQ
0x00000000 - 0x01F00000 - 31 MB Random Access Memory (alterable)
0x90000000 - 0x90000006 2 UART 16550 connected to the terminal and keyboard
0x91000000 - 0x91001000 8 Opencore VGA/LCD 2.0 core frame buffer
0x92000000 - 0x92001000 4 Ethernet MAC controller
0x93000000 - 0x93000100 9 LPC32xx touchscreen controller
0x94000000 - 0x94000100 5 Opencore keyboard controller
0x96000000 - 0x96000006 3 second UART 16550
0x97000000 - 0x97001000 6 virtio device for the 9p filesystem
0x98000000 - 0x98000400 7 audio controller
0x99000000 - 0x99001000 10 LPC32xx real time clock
0x9e000000 - 0x9e001000 15 ATA controller
The endianess of the machine is big endian, but the typed array from Javascript work with little endian. Most of the memory accesses are aligned 32 Bit. So at the beginning, after loading the image every 32-Bit word is swapped and 8 and 16 Bit memory addresses are XORed by 3 or 2.
Most part of the code is running in its own thread by using the web worker API. Message passing is used to communicate between the worker and the graphical user interface related objects.
Javascript normally does not have predefined types. To cast to unsigned and signed numbers one can use the (number >>> 0) and (number >> 0) modifier, which does only change the type of "number".
In Javascript every number is supposed to be a double precision floating point number. However the Javascript compiler optimizes the code and try to figure out if an integer is also appropriate. Unfortunately the support of fast unsigned integers is still missing in some compilers. So they are transformed into doubles. The code is optimized to prevent as much unsigned int arithmetic as possible.
Sometimes a few numbers must be sign extended. This is done efficiently by the command (number << x) >> x, where x is an appropriate shift value. To sign extend an signed 8 bit value to a signed 32 bit value the command is ((number << 24) >> 24).
The Carry Flag and Overflow Flag are not used by the gcc compiler. So they are ignored in this emulation. The code to support these flags can be uncommented for better compatibility but lowered speed.
Most of the time the whole instruction fetch is done very efficiently with the command
if ((checkpc^this.pc) >> 11) {
...
}
ins = int32mem[(currenttlb ^ this.pc)];
The important trick is first to check if the current page is still valid and if this is the case just to xor the program counter. The fast tlb lookup for data acccesses is implemented in a similar way.
The TLB Refill is done in Javascript. Unfortunately this makes it dependent on the Linux kernel as it needs the pointer to the internal translation table of the Linux kernel.
The fastest path for one instruction through the code is given by
for(;;) {
if (ppc == fence) {
....
}
ins = int32ram[ppc >> 2];
ppc = ppc + 4;
switch ((ins >> 26)&0x3F) {
....
}
}
The idea here is that the virtual pc is computed only when needed by translating ppc (physical pc) back to the virtual pc address. The variable fence is used to break out of the fast path when ppc reaches a jump, the end of the current page or when the interrupt checks need to be done.
When the system goes idle the operating systems sends a sleep or halt signal. For this case the CPU should wait until the next interrupt occurs. We can use the setTimeout() method of Javascript to accomplish this. The usual tick is set to <=10ms under Linux. Unfortunately with the overhead of the web browsers and their Javascript engine 10ms are often not sufficient for a host processor usage of < 1%. Therefore the Linux kernel was compiled with a tick every 20ms (50 ticks per second). Usually this is not a problem as long as you don't use time critical applications like video players. The response of the system like typing on the keyboard is not influenced.
When a worker thread is executing some code it is no longer responsiveness to messages arriving. The worker thread must go idle to process the message queue. A setTimeOut command with 0ms does not work here. In order to run the cpu at full speed a message ping pong every 5-10ms is performed. The worker sends an "execute" signal to the master and the master hereupon sends it's own "execute" signal back to the worker. By doing this, we keep the responsiveness while using the worker thread efficiently.
The most advanced feature of jor1k is the filesystem which is fully implemented in Javascript. As interface the 9p/virtio implementation of Linux is used. The complete filesystem layout is loaded in the beginning in form of an XML file (https://github.com/s-macke/jor1k-sysroot). When the files are opened, they are downloaded from the repository. Compression reduces the overall loading time. This implementation is much faster than a NFS-filesystem or an on-demand block device implementation because of the significantly reduced overhead. In future dependencies of the different files can be implemented to further reduce the loading time (like library dependencies). This feature also enables us to work with the filesystem directly within Javascript, like uploading and downloading files or complete archives.
The first time Linux booted on the emulator the web browser Chrome was the fastest (0.5-1 MIPS). After more and more optimizations were implemented Firefox was a little bit faster then Google Chrome (5 MIPS). When IE10 became compatible with my code it was the fastest (10 MIPS). After implementing the worker thread Firefox 22 got superior being 3 times faster then the other browsers (33 MIPS). For some reason this advantage got lost with Firefox 23-24 (4-9 MIPS). Instead of this Chrome managed with version 29 to get this position with 30-60 MIPS. In Firefox the asm.js version of the CPU seems to reach 30-100MIPS. At this moment changing one line of code in the Step() function could reduce or increase the speed by a factor of 3. The reason for these speed oscillations is the tremendous complexity of today's JIT compilers and the black box behavior of them which makes it almost impossible to code really fast code.
| Browser | run on | core | benchmark | MIPS |
|---|---|---|---|---|
| Chrome 29 | Core i7-2600 3.4GHz | normal CPU | fbdemo V1 | 45 |
| Chrome 35 | Core i7-2600 3.4GHz | normal CPU | fbdemo V1 | 51 |
| Chrome 30-34 | Core i7-2600 3.4GHz | asm.js V1 | fbdemo V1 | 53 |
| Chrome 35 | Core i7-2600 3.4GHz | asm.js V1 | fbdemo V1 | 55 |
| Firefox 22 | Core i7-2600 3.4GHz | normal cpu | fbdemo V1 | 33 |
| Firefox 24-28 | Core i7-2600 3.4GHz | normal cpu | fbdemo V1 | 7 |
| Firefox 29-30 | Core i7-2600 3.4GHz | normal cpu | fbdemo V1 | 67 |
| Firefox 24-30 | Core i7-2600 3.4GHz | asm.js V1 | fbdemo V1 | 74 |
| IE 10 | Core i7-2600 3.4GHz | asm.js V1 | fbdemo V1 | 22 |
| IE 11 | Core i7-2600 3.4GHz | asm.js V1 | fbdemo V1 | 51 |
| Firefox 31 | Intel XEON (unknown) | asm.js V1 | fbdemo V1 | 200 |
| Firefox 32 | Core i7-2600 3.4GHz | asm.js V2 | fbdemo V2 | 75.5 |
| Firefox 32 | Core i7-2600 3.4GHz | asm.js V2 (without asm statement) | fbdemo V2 | 58.1 |
| Chrome 37 | Core i7-2600 3.4GHz | asm.js V2 | fbdemo V2 | 60.7 |
| IE 11 | Core i7-2600 3.4GHz | asm.js V2 | fbdemo V2 | 60.7 |
| Safari on iPad air | Apple A7-2600 3.4GHz | asm.js V2 | fbdemo V2 | 81.0 |
| Samsung Galaxy S5 | Exynos 5 Octa 5422 | asm.js V2 | fbdemo V2 | 18.1 |
The overall speed is equivalent to a Pentium 90.