-
Notifications
You must be signed in to change notification settings - Fork 216
Technical details
- 32-Bit OR1000 Emulator with MMU, TICK counter and PIC (OR1K Specification)
- 32 MB RAM (alterable)
- UART 16550 connected to a terminal
- UART 16550 not connected
- ocfb Framebuffer 640x400 32bpp with LPC32xx touchscreen
- ATA connected to a 30 MB hard drive
- virtio device with support for the 9p filesystem
- Opencore-keyboard controller
- Ethernet MAC controller
- Renesas FSI (Fifo-attached Serial Interface) audio controller
- Linux Terminal Emulator
- Linux 3.11, Busybox and several images
0x00000000 - 0x01F00000 31 MB Random Access Memory
0x90000000 - 0x90000006 UART 16550 connected to the terminal and keyboard
0x91000000 - 0x91001000 Opencore VGA/LCD 2.0 core frame buffer
0x92000000 - 0x92001000 Ethernet MAC controller
0x93000000 - 0x93000100 LPC32xx touchscreen controller
0x94000000 - 0x94000100 Opencore keyboard controller
0x96000000 - 0x96000006 second UART 16550
0x97000000 - 0x97001000 virtio device for the 9p filesystem
0x98000000 - 0x98000400 audio controller
0x9e000000 - 0x9e001000 ATA controller
The endianess of the machine is big endian, but the typed array from Javascript work with little endian. Most of the memory accesses are aligned 32 Bit. So at the beginning, after loading the image every 32-Bit word is swapped and 8 and 16 Bit memory addresses are XORed by 3 or 2.
Most part of the code is running in its own thread by using the web worker API. Message passing is used to communicate between the worker and the graphical user interface related objects.
Javascript normally does not have predefined types. To cast to unsigned and signed numbers one can use the (number >>> 0) and (number >> 0) modifier, which does only change the type of "number".
In Javascript every number is supposed to be a double precision floating point number. However the Javascript compiler optimizes the code and try to figure out if an integer is also appropriate. Unfortunately the support of fast unsigned integers is still missing in some compilers. So they are transformed into doubles. The code is optimized to prevent as much unsigned int arithmetic as possible.
Sometimes a few numbers must be sign extended. This is done efficiently by the command (number << x) >> x, where x is an appropriate shift value. To sign extend an signed 8 bit value to a signed 32 bit value the command is ((number << 24) >> 24).
The Carry Flag and Overflow Flag are not used by the gcc compiler. So they are ignored in this emulation. The code to support these flags can be uncommented for better compatibility but lowered speed.
Most of the time the whole instruction fetch is done very efficiently with the command
if ((checkpc^this.pc) >> 11) {
...
}
ins = int32mem[(currenttlb ^ this.pc)];
The important trick is first to check if the current page is still valid and if this is the case just to xor the program counter. The fast tlb lookup for data acccesses is implemented in a similar way.
The TLB Refill is done in Javascript. Unfortunately this makes it dependent on the Linux kernel as it needs the pointer to the internal translation table of the Linux kernel.
When the system goes idle the operating systems sends a sleep or halt signal. For this case the CPU should wait until the next interrupt occurs. We can use the setTimeout() method of Javascript to accomplish this. The usual tick is set to <=10ms under Linux. Unfortunately with the overhead of the web browsers and their Javascript engine 10ms are often not sufficient for a host processor usage of < 1%. Therefore the Linux kernel was compiled with a tick every 20ms (50 ticks per second). Usually this is not a problem as long as you don't use time critical applications like video players. The response of the system like typing on the keyboard is not influenced.
When a worker thread is executing some code it is no longer responsiveness to messages arriving. The worker thread must go idle to process the message queue. A setTimeOut command with 0ms does not work here. In order to run the cpu at full speed a message ping pong every 5-10ms is performed. The worker sends an "execute" signal to the master and the master hereupon sends it's own "execute" signal back to the worker. By doing this, we keep the responsiveness while using the worker thread efficiently.
The most advanced feature of jor1k is the filesystem which is fully implemented in Javascript. As interface the 9p/virtio implementation of Linux is used. The complete filesystem layout is loaded in the beginning in form of an XML file (https://github.com/s-macke/jor1k-sysroot). When the files are opened, they are downloaded from the repository. Compression reduces the overall loading time. This implementation is much faster than a NFS-filesystem or an on-demand block device implementation because of the significantly reduced overhead. In future dependencies of the different files can be implemented to further reduce the loading time (like library dependencies). This feature also enables us to work with the filesystem directly within Javascript, like uploading and downloading files or complete archives.
The first time Linux booted on the emulator the web browser Chrome was the fastest (0.5-1 MIPS). After more and more optimizations were implemented Firefox was a little bit faster then Google Chrome (5 MIPS). When IE10 became compatible with my code it was the fastest (10 MIPS). After implementing the worker thread Firefox 22 got superior being 3 times faster then the other browsers (33 MIPS). For some reason this advantage got lost with Firefox 23-24 (4-9 MIPS). Instead of this Chrome managed with version 29 to get this position with 30-60 MIPS. In Firefox the asm.js version of the CPU seems to reach 30-100MIPS. At this moment changing one line of code in the Step() function could reduce or increase the speed by a factor of 3. The reason for these speed oscillations is the tremendous complexity of today's JIT compilers and the black box behavior of them which makes it almost impossible to code really fast code.
Tested on a Core-i7 running fbdemo:
- Chrome 29 with standard CPU: 45 MIPS
- Chrome 35 with standard CPU: 51 MIPS
- Chrome 30-34 with asm.js CPU: 53 MIPS
- Chrome 35 with asm.js CPU: 55 MIPS
- Firefox 22 with standard CPU: 33 MIPS
- Firefox 24-28 with standard CPU: 7 MIPS
- Firefox 29-30 with standard CPU: 67 MIPS
- Firefox 24-30 with asm.js CPU: 74 MIPS
- Internet Explorer 10 with standard CPU: 22 MIPS
- Internet Explorer 11 with standard CPU: 51 MIPS
- Firefox 31 with asm.js CPU on an Intel Xeon host CPU: 200 MIPS!
The overall speed is equivalent to a Pentium 90.