EE2003 Course Project 2021: Implementation and Analysis of Peripheral for NanoJPEG Decoder in PicoRV32 environment
- By Bachotti Sai Krishna Shanmukh, Katari Hari Chandan and Potta Muni Asheesh
Roll Numbers : EE19B009, EE19B032 and EE19B048
Professor : Nitin Chandrachoodan
Note: This is a forked repository of PicoRV32 - the original README is available at [[README_picorv32.md]].
Acknowledgements:
Video Demonstration on YouTube
The execution of code has no dependencies on row workspace and col workspace directories. However, they have important information like synthesis reports and timing information from verilog simulation more about them can be seen in the submission report and readme file.
Note : /usr/share/yosys/xilinx/cells_sim.v library used in Makefile (refer to lines 64 and 88)
EE2003_Project_Report.pdf has a brief description about the entire project.
bottleneck_kitten.txt has the clock cycles description of various functions in NanoJPEG without involvement of peripheral.
AP_kitten.txt has the clock cycles description of after peripherals are implemented.
rowidct.v is the unsynthesized verilog module while rowidct_synth.v is generated by Yosys. Similarly col_idct_synth.v is the synthesized code.
This project includes the code for the nanojpeg decoder, with some modifications so that it can be run under the constrained environment of the picorv processor.
The user does not have access to things like File Input/Output, memory management (malloc
etc.) or printf
style statements that you can usually use for debugging.
To get around these the code has the following additions:
- A set of functions have been defined in
njmem.c
that can allocate memory for random use. It uses a very trivial form of memory allocation that only works because our program never needs tofree
memory and try to use it again later. A set of addresses starting at0x40000000
are defined for use with the memory management. - A set of addresses starting from
0x30000000
are defined for reading from the file. You first need to run a pre-processing script (firmware/jpg2hex.py
) to generate the filefirmware/jpg.hex
which will be mapped to this memory range. This is marked as a read-only memory, so you can only read from that range of addresses. Since the file size cannot be read this way, the script also puts the size of the file as anint
in the first 4 bytes of the memory range. - Writing to the address
0x20000000
will result in dumping the appropriate byte into the fileoutput.dump
. This means that you can use this to do the equivalent of afwrite
function in C. However, the filename is always fixed as it cannot be changed from the C program. - There are also two functions defined in
hello.c
that can be used to read out the number of cycles from the CPU at any point. This can be used, for example, to find out the time taken by thenjDecode
function. More importantly, you can use a similar technique inside your code to get the time taken for other functions and find out which ones take the longest to run.
The code comes with sample data in firmware/jpg.hex
- this corresponds to the input file firmware/k8x8.jpg
. The hex file is generated as follows:
$ cd firmware
$ python3 jpg2hex.py k8x8.jpg > jpg.hex
You can replace k8x8.jpg
with some other JPEG file to try with that. Note that the system has an overall memory limitation so any file larger than about 100x100 will most likely run into problems.
$ make
Just typing the above command (while you are in the nanojpeg
folder, not inside one of the subfolders) will take care of compiling and running with iverilog.
WARNING: This is horrendously slow - it takes about 6-7 minutes to run on the default input file, which is just a single JPEG macroblock and the entire image is of size 8 pixels by 7 pixels.
Therefore if you try this with another file (say kitten.jpg
, which is 24x22 macroblocks in size), you can expect it to take more than 3000 minutes -- that is, more than 2 days to run. Please do not try this on the EE2003 server - if the system shows excessive load it will be restarted more than once a day as needed, so simulations will almost certainly not run to completion.
Fortunately, there is a much faster verilog simulator called verilator
. This works by first converting the Verilog code into C++, compiling it, and running the resulting executable. This can actually finish simulating the entire kitten.jpg
input in less than 1 minute. If you want to test any changes to your code, you are strongly advised to use this approach.
To run this, you can just type
$ make test_verilator
This is already set up to take the exact same inputs and generate the same output.
When you run the code, you will see that it generates a file named output.dump
in the main nanojpeg
folder. You can rename this file as output.ppm
, and then it should be possible to view this file. Note that you cannot view it on the server, you will need to download the file to your local machine and then view it.
The default input will generate an image of a kitten that is 8x7 pixels in size -- in other words, if you recognize it as a kitten, you have a very good imagination. Instead, the actual output generated by running the converter on another PC is also available in the file firmware/k8x8.ppm
.
Note that there is currently a bug in the code that results in one extra byte being added to the output. This means that you cannot directly compare the two files to check for correctness. However, if you use the command
$ xxd output.dump
it will dump out the hex formatted output, and here you can see that it matches the original except for the last byte.