EE2003 Course Project 2021: Implementation and Analysis of Peripheral for NanoJPEG Decoder in PicoRV32 environment

By Bachotti Sai Krishna Shanmukh, Katari Hari Chandan and Potta Muni Asheesh
Roll Numbers : EE19B009, EE19B032 and EE19B048

Professor : Nitin Chandrachoodan

Note: This is a forked repository of PicoRV32 - the original README is available at [[README_picorv32.md]].

Acknowledgements:

PicoRV32 - from C. Wolf (YosysHQ).
NanoJPEG

Video Demonstration on YouTube

Submission Notes

The execution of code has no dependencies on row workspace and col workspace directories. However, they have important information like synthesis reports and timing information from verilog simulation more about them can be seen in the submission report and readme file.

Note : /usr/share/yosys/xilinx/cells_sim.v library used in Makefile (refer to lines 64 and 88)

EE2003_Project_Report.pdf has a brief description about the entire project.

bottleneck_kitten.txt has the clock cycles description of various functions in NanoJPEG without involvement of peripheral.

AP_kitten.txt has the clock cycles description of after peripherals are implemented.

rowidct.v is the unsynthesized verilog module while rowidct_synth.v is generated by Yosys. Similarly col_idct_synth.v is the synthesized code.

Problem statement

This project includes the code for the nanojpeg decoder, with some modifications so that it can be run under the constrained environment of the picorv processor.

The user does not have access to things like File Input/Output, memory management (malloc etc.) or printf style statements that you can usually use for debugging.

To get around these the code has the following additions:

A set of functions have been defined in njmem.c that can allocate memory for random use. It uses a very trivial form of memory allocation that only works because our program never needs to free memory and try to use it again later. A set of addresses starting at 0x40000000 are defined for use with the memory management.
A set of addresses starting from 0x30000000 are defined for reading from the file. You first need to run a pre-processing script (firmware/jpg2hex.py) to generate the file firmware/jpg.hex which will be mapped to this memory range. This is marked as a read-only memory, so you can only read from that range of addresses. Since the file size cannot be read this way, the script also puts the size of the file as an int in the first 4 bytes of the memory range.
Writing to the address 0x20000000 will result in dumping the appropriate byte into the file output.dump. This means that you can use this to do the equivalent of a fwrite function in C. However, the filename is always fixed as it cannot be changed from the C program.
There are also two functions defined in hello.c that can be used to read out the number of cycles from the CPU at any point. This can be used, for example, to find out the time taken by the njDecode function. More importantly, you can use a similar technique inside your code to get the time taken for other functions and find out which ones take the longest to run.

How to run

Step 1 - Generate a suitable input

The code comes with sample data in firmware/jpg.hex - this corresponds to the input file firmware/k8x8.jpg. The hex file is generated as follows:

$ cd firmware
$ python3 jpg2hex.py k8x8.jpg > jpg.hex

You can replace k8x8.jpg with some other JPEG file to try with that. Note that the system has an overall memory limitation so any file larger than about 100x100 will most likely run into problems.

Step 2.1 - Build and run with iverilog

$ make

Just typing the above command (while you are in the nanojpeg folder, not inside one of the subfolders) will take care of compiling and running with iverilog.

WARNING: This is horrendously slow - it takes about 6-7 minutes to run on the default input file, which is just a single JPEG macroblock and the entire image is of size 8 pixels by 7 pixels.

Therefore if you try this with another file (say kitten.jpg, which is 24x22 macroblocks in size), you can expect it to take more than 3000 minutes -- that is, more than 2 days to run. Please do not try this on the EE2003 server - if the system shows excessive load it will be restarted more than once a day as needed, so simulations will almost certainly not run to completion.

Step 2.2 - Build and run with verilator

Fortunately, there is a much faster verilog simulator called verilator. This works by first converting the Verilog code into C++, compiling it, and running the resulting executable. This can actually finish simulating the entire kitten.jpg input in less than 1 minute. If you want to test any changes to your code, you are strongly advised to use this approach.

To run this, you can just type

$ make test_verilator

This is already set up to take the exact same inputs and generate the same output.

Step 3 - understand the results

When you run the code, you will see that it generates a file named output.dump in the main nanojpeg folder. You can rename this file as output.ppm, and then it should be possible to view this file. Note that you cannot view it on the server, you will need to download the file to your local machine and then view it.

The default input will generate an image of a kitten that is 8x7 pixels in size -- in other words, if you recognize it as a kitten, you have a very good imagination. Instead, the actual output generated by running the converter on another PC is also available in the file firmware/k8x8.ppm.

Note that there is currently a bug in the code that results in one extra byte being added to the output. This means that you cannot directly compare the two files to check for correctness. However, if you use the command

$ xxd output.dump

it will dump out the hex formatted output, and here you can see that it matches the original except for the last byte.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.ipynb_checkpoints		.ipynb_checkpoints
col_workspace		col_workspace
dhrystone		dhrystone
firmware		firmware
picosoc		picosoc
row_workspace		row_workspace
scripts		scripts
tests		tests
.gitignore		.gitignore
AP_clock_analysis.txt		AP_clock_analysis.txt
AP_kitten.txt		AP_kitten.txt
EE2003_Project_Report.pdf		EE2003_Project_Report.pdf
Makefile		Makefile
README.md		README.md
README_picorv32.md		README_picorv32.md
axi4_mem_periph.v		axi4_mem_periph.v
bottleneck_kitten.txt		bottleneck_kitten.txt
clock_cycle_analysis.txt		clock_cycle_analysis.txt
col_idct.v		col_idct.v
col_idct_synth.v		col_idct_synth.v
final.ppm		final.ppm
final_synth.ppm		final_synth.ppm
jpg.hex		jpg.hex
output.dump		output.dump
picorv32.core		picorv32.core
picorv32.v		picorv32.v
rowidct.v		rowidct.v
rowidct_synth.v		rowidct_synth.v
shell.nix		shell.nix
showtrace.py		showtrace.py
testbench.cc		testbench.cc
testbench.v		testbench.v
testbench_ez.v		testbench_ez.v
testbench_mod.v		testbench_mod.v
testbench_wb.v		testbench_wb.v

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

EE2003 Course Project 2021: Implementation and Analysis of Peripheral for NanoJPEG Decoder in PicoRV32 environment

Video Demonstration on YouTube

Submission Notes

Problem statement

How to run

Step 1 - Generate a suitable input

Step 2.1 - Build and run with iverilog

Step 2.2 - Build and run with verilator

Step 3 - understand the results

About

Releases

Packages

Languages

shanmukh2607/NanoJPEG

Folders and files

Latest commit

History

Repository files navigation

EE2003 Course Project 2021: Implementation and Analysis of Peripheral for NanoJPEG Decoder in PicoRV32 environment

Video Demonstration on YouTube

Submission Notes

Problem statement

How to run

Step 1 - Generate a suitable input

Step 2.1 - Build and run with iverilog

Step 2.2 - Build and run with verilator

Step 3 - understand the results

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages