Skip to content

Commit

Permalink
Markdown lint cleanups
Browse files Browse the repository at this point in the history
- Removed some stray tab characters
- Add spaces after titles and before lists
- Use consistent bullet prefixes

[ci skip]
  • Loading branch information
jbush001 committed Apr 15, 2018
1 parent 9f98cdf commit 99560fd
Show file tree
Hide file tree
Showing 14 changed files with 57 additions and 51 deletions.
3 changes: 2 additions & 1 deletion README.md
@@ -1,4 +1,5 @@
# Nyuzi Processor

[![Build Status](https://travis-ci.org/jbush001/NyuziProcessor.svg?branch=master)](https://travis-ci.org/jbush001/NyuziProcessor)
[![Codacy Badge](https://api.codacy.com/project/badge/Grade/fbafdd72749e459d8de6f381abc7436d)](https://www.codacy.com/app/jbush001/NyuziProcessor?utm_source=github.com&utm_medium=referral&utm_content=jbush001/NyuziProcessor&utm_campaign=Badge_Grade)
[![Chat at https://gitter.im/jbush001/NyuziProcessor](https://badges.gitter.im/Join%20Chat.svg)](https://gitter.im/jbush001/NyuziProcessor?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge)
Expand Down Expand Up @@ -99,7 +100,7 @@ Occasionally a change will require a new version of the compiler. To rebuild:
make
sudo make install

## What Next?
## Next Steps

Sample applications are available in [software/apps](software/apps). You can
run these in the emulator by typing 'make run' (some need 3rd party data
Expand Down
11 changes: 6 additions & 5 deletions hardware/README.md
@@ -1,5 +1,6 @@
This directory contains the hardware implementation of the processor. There are
three directories:

- core/
The GPGPU. The top level module is 'nyuzi'. Configurable options (cache size,
associativity, number of cores) are in core/config.sv
Expand All @@ -24,7 +25,7 @@ defined.

This design uses parameterized memories (FIFOs and SRAM blocks) in the modules
core/sram_1r1w.sv, core/sram_2r1w.sv, and core/sync_fifo.sv. By default, these
instantite simulator versions, which are not synthesizable (at least not
instantiate simulator versions, which are not synthesizable (at least not
efficiently).

- For Altera parts, the build files define the preprocessor macro
Expand Down Expand Up @@ -67,10 +68,10 @@ To write a waveform trace, set the environment variable DUMP_WAVEFORM
and rebuild:

make clean
cd ..
cd ..
DUMP_WAVEFORM=1 cmake .
cd hardware
make
cd hardware
make

The simulator writes a file called `trace.vcd` in
"[value change dump](http://en.wikipedia.org/wiki/Value_change_dump)"
Expand All @@ -93,7 +94,7 @@ pragma above the module instantiation:

The timescale is set to 1 ns by default, which simulates a 1 GHz clock speed.

### Support for VCS:
### Support for VCS

Template scripts have been added to support building and running with
[VCS](https://www.synopsys.com/verification/simulation/vcs.html).
Expand Down
7 changes: 4 additions & 3 deletions hardware/fpga/de2-115/README.md
Expand Up @@ -25,7 +25,7 @@ For example:
For a different serial device, you will need to find
the device path. It may also be something like:

/dev/ttyS0
/dev/ttyS0
/dev/ttyUSB0

This defaults to 921600 baud. If your serial device does not
Expand Down Expand Up @@ -87,7 +87,7 @@ The build system is command line based and does not use the Quartus GUI.
make program

You may get an error when running this command. If so, this can usually be
fixed by running the following command:
fixed by running the following command:

sudo killall -9 jtagd

Expand All @@ -100,7 +100,8 @@ The build system is command line based and does not use the Quartus GUI.
cd ../../../tests/fpga/blinky
run_fpga

Other notes:
## Other notes

- Most programs have a script 'run_fpga' that will load them
onto the FPGA board using the serial_loader program (tools/serial_loader).
- Reload programs by pressing the reset button (push button 0) and using
Expand Down
1 change: 0 additions & 1 deletion software/README.md
Expand Up @@ -20,4 +20,3 @@ first.
for setting up the FPGA are in hardware/fpga/de2-115/README.
- **run_debug**: Execute the program in the emulator and attach to it with the
debugger (lldb).

8 changes: 4 additions & 4 deletions software/apps/doom/README.md
Expand Up @@ -13,9 +13,9 @@ the data files over the serial port into the ramdisk.

The primary changes I made for the port were:

* in i_video.c, added code to copy the screen to the framebuffer, expanding
- in i_video.c, added code to copy the screen to the framebuffer, expanding
from an 8 bit paletted format to the 32 bit per pixel framebuffer format.
* in i_video.c, read from a virtual keyboard device for input.
* Fixed a number of places that assumed unaligned accesses were supported.
* Code from i_net and i_sound removed, as there is no hardware support
- in i_video.c, read from a virtual keyboard device for input.
- Fixed a number of places that assumed unaligned accesses were supported.
- Code from i_net and i_sound removed, as there is no hardware support
for them.
10 changes: 6 additions & 4 deletions software/apps/quakeview/README.md
Expand Up @@ -3,10 +3,12 @@ checked in, but you can find the shareware .PAK file by searching the web. Name
the file 'pak0.pak' and put into this directory.

Not implemented:

- Animated textures
- Clipping/collision detection for camera

Controls:

- Up/Down arrows: move camera forward and backward
- Right/left arrows: rotate left and right
- U/D keys: move camera up and down
Expand All @@ -16,7 +18,7 @@ Controls:

You can load other episodes/missions by changing this line in main.cpp:

pak.readBspFile("maps/e1m1.bsp");
pak.readBspFile("maps/e1m1.bsp");

## Running in Emulator

Expand Down Expand Up @@ -51,10 +53,10 @@ loop and stops the worker threads:

context->finish();
printf("rendered frame in %d uS\n", clock() - time);
+ exit(1);
}
+ exit(1);
}

return 0;
return 0;

2. Comment out keyboard polling in main. In the verilator configuration, there is a dummy module that
generates continuous keypresses, but this will cause an infinite loop with this program:
Expand Down
14 changes: 7 additions & 7 deletions software/apps/sceneview/README.md
Expand Up @@ -35,6 +35,7 @@ memory:
RenderContext *context = new RenderContext(0x1000000);

There are a few debug defines in the top of sceneview.cpp:

- **TEST_TEXTURE** If defined, this uses a checkerboard texture in place
of the normal textures. Each mip level is a different color.
- **SHOW_DEPTH** If defined, this shades the pixels with lighter values
Expand All @@ -48,13 +49,13 @@ following changes:
1. At the bottom of the main loop in sceneview.cpp, add a call to exit(). This stop the main
loop, and will cause the worker threads to stop:

context->finish();
printf("rendered frame in %d instructions\n", __builtin_nyuzi_read_control_reg(6)
- startInstructions);
+ exit(1);
}
context->finish();
printf("rendered frame in %d instructions\n", __builtin_nyuzi_read_control_reg(6)
- startInstructions);
+ exit(1);
}

return 0;
return 0;

2. Increase the amount of RAM configured in the FPGA configuration. In hardware/testbench/soc_tb.sv,
change MEM_SIZE to 'h3000000 (48 MB)
Expand All @@ -64,4 +65,3 @@ change MEM_SIZE to 'h3000000 (48 MB)
Once you have made these changes, you can run the test by typing 'make verirun'. This is
compute intensive and will take hours to complete. You should not run this with
VCD logging enabled, as the files will be enormous (described in hardware README).

3 changes: 3 additions & 0 deletions software/benchmarks/dhrystone/README.md
@@ -1,10 +1,12 @@
This is a port of the [Dhrystone](https://en.wikipedia.org/wiki/Dhrystone)
benchmark for Nyuzi. The Dhrystone benchmark has a number of well-known problems:

- It's performance can vary widely due to compiler optimizations
- It calls into standard library functions like strcpy, so it's performance is also
dependent on how optimized those implementations are.

It has additional issues when used against Nyuzi:

- It is single threaded. Nyuzi is optimized for multithreaded workloads. For example,
Nyuzi has a longer pipeline to improve clock speed, but relies on multiple threads to
keep it highly utilized.
Expand All @@ -25,6 +27,7 @@ The second form runs it against the hardware model:

I've made modifications to the original sources to get them to run on Nyuzi.
The changes are in the file nyuzi_changes.diff. They include:

- Hard code the number of runs instead of reading from stdin, since there
is no stdin in this test environment and my standard library doesn't
have scanf.
Expand Down
2 changes: 1 addition & 1 deletion software/kernel/README.md
Expand Up @@ -11,7 +11,7 @@ virtual memory implementation, but no user level synchronization or filesystem
APIs. The downside of that is that I can't use it to stress the hardware with
real workloads.

The reason I built something from scatch rather than using an existing
The reason I built something from scratch rather than using an existing
OS was that I couldn't find something that met my needs. Linux and
FreeBSD are very large: they would take forever to boot in simulation,
making them not well suited for automated testing and CI. There are simpler
Expand Down
15 changes: 9 additions & 6 deletions software/libs/librender/README.md
Expand Up @@ -8,17 +8,19 @@ function parallelExecute() in libos.
This is a tile based renderer, also known as a sort-middle architecture. It
divides the destination into fixed size rectangles. Threads render each tile
completely before moving to the next. This approach has a few advantages:
* It allows splitting the work across many threads. Because each thread

- It allows splitting the work across many threads. Because each thread
exclusively owns a tile, there is no locking required to preserve pixel
ordering, which minimizes synchronization overhead.
* It reduces external memory bandwidth, as the tiles that it is actively
- It reduces external memory bandwidth, as the tiles that it is actively
rendering fit in the L2 cache.

# Pipeline

Rendering occurs in two phases:

## Geometry Phase

This phase has two steps, which execute in sequence for each draw call.
Each step finishes completely before the next starts.

Expand All @@ -31,13 +33,14 @@ does not look at the index buffer, but computes all vertices in the array.
2. Set up triangles. This is scalar, but divided among threads. This phase
builds a list of triangles that potentially cover each tile. It also:

- Clips triangles against the near plane (potentially splitting into multiple
- Clips triangles against the near plane (potentially splitting into multiple
triangles)
- Culls triangles that are facing away from the camera
- Converts from screen space to raster coordinates.
- Insert triangles in tile queues using a bounding box test.
- Culls triangles that are facing away from the camera
- Converts from screen space to raster coordinates.
- Insert triangles in tile queues using a bounding box test.

## Pixel Phase

This phase starts after the geometry phase finishes. Each thread
renders a 64x64 tile of the render target at a time, using the tile's triangle
list that the previous phase created. It also performs:
Expand Down
2 changes: 0 additions & 2 deletions tests/README.md
Expand Up @@ -54,7 +54,6 @@ The Makefile does not run the following tests:
| kernel/vga/ |
| render/ | These only run under the emulator (they support verilator, but they take a long time to run)


# Adding Tests

Each test consists of a function, which takes two arguments: the name parameter (which
Expand Down Expand Up @@ -148,4 +147,3 @@ categories:
output to expected values. These include everything from simple, single
threaded programs to a full fledged kernel. Most can run both in Verilog
simulation and on FPGA.

10 changes: 4 additions & 6 deletions tests/cosimulation/README.md
Expand Up @@ -7,9 +7,9 @@ created by the generate_random utility. Each file is a separate test.
Randomized cosimulation is a common processor verification technique. Here
are a few papers that describe its application in some commercial processors:

* [Functional Verification of a Multiple-issue, Out-of-Order, Superscalar Alpha Processor— The DEC Alpha 21264 Microprocessor](http://www.cs.clemson.edu/~mark/464/21264.verification.pdf)
* [Functional Verification of the HP PA 8000 Processor](http://www.cs.clemson.edu/~mark/464/hp8000.verification.pdf)
* [PicoJava II Verification Guide](http://www1.pldworld.com/@xilinx/html/pds/HDL/picoJava-II/docs/pj2-verif-guide.pdf)
- [Functional Verification of a Multiple-issue, Out-of-Order, Superscalar Alpha Processor— The DEC Alpha 21264 Microprocessor](http://www.cs.clemson.edu/~mark/464/21264.verification.pdf)
- [Functional Verification of the HP PA 8000 Processor](http://www.cs.clemson.edu/~mark/464/hp8000.verification.pdf)
- [PicoJava II Verification Guide](http://www1.pldworld.com/@xilinx/html/pds/HDL/picoJava-II/docs/pj2-verif-guide.pdf)

# Executing Tests

Expand Down Expand Up @@ -130,7 +130,6 @@ the random test program does not flush them and The C model does not emulate
the caches.

# How it works
## Checking

The test program runs the verilog simulator with the +trace flag, which
causes it to print text descriptions of register writebacks and memory stores
Expand Down Expand Up @@ -159,11 +158,10 @@ simulator and flags an error if there is a mismatch.
1. Hardware executes store_sync, which fails. It does not log a cosimuation event
because it *only* logs memory side effects, and those only occur on success
2. Interrupt comes in. Emulator jumps to interrupt handler without executing the
store_sync and register does not get set to 0 to reflect failure.
store_sync and register does not get set to 0 to reflect failure.
- If control register 13 (subcycle) is read after an interrupt, it may not match the
value in hardware, since hardware does not log scatter stores to lanes that don't
have the mask bit set.
- This does not validate virtual memory translation. This has a software managed
TLB, and the TLB replacement behavior is timing specific, which makes it hard to match
behavior exactly.

10 changes: 5 additions & 5 deletions tests/whole-program/README.md
Expand Up @@ -20,15 +20,15 @@ To run only on emulator:

./runtest.py --target emulator *program names*

* The test script skips filenames that begin with underscore, which is
- The test script skips filenames that begin with underscore, which is
useful for known failing cases.
* This skips tests with "noverilator" in the filename (used for tests that
- This skips tests with "noverilator" in the filename (used for tests that
take too long to run in verilator).
* This runtest script supports '--target host' to run on host machine.
- This runtest script supports '--target host' to run on host machine.
Some tests fail in this configuration because they use intrinsics
that exist only for Nyuzi.
* The Csmith random generation tool generated The csmith* tests:
- The Csmith random generation tool generated The csmith* tests:
http://embed.cs.utah.edu/csmith/
* This uses the compiler installed at /usr/local/llvm-nyuzi/. To test a
- This uses the compiler installed at /usr/local/llvm-nyuzi/. To test a
development compiler, adjust COMPILER_DIR variable in test_harness.py
in the parent directory.
12 changes: 6 additions & 6 deletions tools/emulator/README.md
Expand Up @@ -16,7 +16,7 @@ This also simulates hardware peripherals supported by the FPGA and Verilog
simulator environments (which are all software compatible), such as video
output and a mass storage device.

### Command line options:
### Command line options

|Option|Arguments |Meaning |
|------|---------------------------|--------------------------------------------------|
Expand Down Expand Up @@ -53,10 +53,10 @@ Other notes:
causes it to dump instruction statistics.
- See [SOC-Test-Environment](https://github.com/jbush001/NyuziProcessor/wiki/SOC-Test-Environment)
for list of supported device registers. The emulator doesn't support the following devices:
* LED/HEX display output registers
* Serial reads
* VGA frame buffer address/toggle
* SPI GPIO mode
- LED/HEX display output registers
- Serial reads
- VGA frame buffer address/toggle
- SPI GPIO mode

### Debugging with LLDB

Expand All @@ -76,6 +76,7 @@ The steps to run the debugger manually are:
/usr/local/llvm-nyuzi/bin/lldb --arch nyuzi <program>.elf -o "gdb-remote 8000"

Other notes:

- The emulator does not support the debugger in cosimulation mode.
- Debugging works better if you compile the program with optimizations disabled.
For example, at -O3, lldb cannot read variables if they are not live at the
Expand Down Expand Up @@ -131,4 +132,3 @@ llvm-symbolizer program. This is not installed by default, but is in the
build directory for the toolchain:

echo <address> | tools/NyuziToolchain/build/bin/llvm-symbolizer -demangle -obj=<program>.elf

0 comments on commit 99560fd

Please sign in to comment.