Skip to content

This is a course project in EE2003 Computer Organization, EE Department, IIT Madras

Notifications You must be signed in to change notification settings

shanmukh2607/NanoJPEG

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

EE2003 Course Project 2021: Implementation and Analysis of Peripheral for NanoJPEG Decoder in PicoRV32 environment

  • By Bachotti Sai Krishna Shanmukh, Katari Hari Chandan and Potta Muni Asheesh
    Roll Numbers : EE19B009, EE19B032 and EE19B048

Professor : Nitin Chandrachoodan

Note: This is a forked repository of PicoRV32 - the original README is available at [[README_picorv32.md]].

Acknowledgements:

Video Demonstration on YouTube

Submission Notes

The execution of code has no dependencies on row workspace and col workspace directories. However, they have important information like synthesis reports and timing information from verilog simulation more about them can be seen in the submission report and readme file.

Note : /usr/share/yosys/xilinx/cells_sim.v library used in Makefile (refer to lines 64 and 88)

EE2003_Project_Report.pdf has a brief description about the entire project.

bottleneck_kitten.txt has the clock cycles description of various functions in NanoJPEG without involvement of peripheral.

AP_kitten.txt has the clock cycles description of after peripherals are implemented.

rowidct.v is the unsynthesized verilog module while rowidct_synth.v is generated by Yosys. Similarly col_idct_synth.v is the synthesized code.

Problem statement

This project includes the code for the nanojpeg decoder, with some modifications so that it can be run under the constrained environment of the picorv processor.

The user does not have access to things like File Input/Output, memory management (malloc etc.) or printf style statements that you can usually use for debugging.

To get around these the code has the following additions:

  • A set of functions have been defined in njmem.c that can allocate memory for random use. It uses a very trivial form of memory allocation that only works because our program never needs to free memory and try to use it again later. A set of addresses starting at 0x40000000 are defined for use with the memory management.
  • A set of addresses starting from 0x30000000 are defined for reading from the file. You first need to run a pre-processing script (firmware/jpg2hex.py) to generate the file firmware/jpg.hex which will be mapped to this memory range. This is marked as a read-only memory, so you can only read from that range of addresses. Since the file size cannot be read this way, the script also puts the size of the file as an int in the first 4 bytes of the memory range.
  • Writing to the address 0x20000000 will result in dumping the appropriate byte into the file output.dump. This means that you can use this to do the equivalent of a fwrite function in C. However, the filename is always fixed as it cannot be changed from the C program.
  • There are also two functions defined in hello.c that can be used to read out the number of cycles from the CPU at any point. This can be used, for example, to find out the time taken by the njDecode function. More importantly, you can use a similar technique inside your code to get the time taken for other functions and find out which ones take the longest to run.

How to run

Step 1 - Generate a suitable input

The code comes with sample data in firmware/jpg.hex - this corresponds to the input file firmware/k8x8.jpg. The hex file is generated as follows:

$ cd firmware
$ python3 jpg2hex.py k8x8.jpg > jpg.hex

You can replace k8x8.jpg with some other JPEG file to try with that. Note that the system has an overall memory limitation so any file larger than about 100x100 will most likely run into problems.

Step 2.1 - Build and run with iverilog

$ make

Just typing the above command (while you are in the nanojpeg folder, not inside one of the subfolders) will take care of compiling and running with iverilog.

WARNING: This is horrendously slow - it takes about 6-7 minutes to run on the default input file, which is just a single JPEG macroblock and the entire image is of size 8 pixels by 7 pixels.

Therefore if you try this with another file (say kitten.jpg, which is 24x22 macroblocks in size), you can expect it to take more than 3000 minutes -- that is, more than 2 days to run. Please do not try this on the EE2003 server - if the system shows excessive load it will be restarted more than once a day as needed, so simulations will almost certainly not run to completion.

Step 2.2 - Build and run with verilator

Fortunately, there is a much faster verilog simulator called verilator. This works by first converting the Verilog code into C++, compiling it, and running the resulting executable. This can actually finish simulating the entire kitten.jpg input in less than 1 minute. If you want to test any changes to your code, you are strongly advised to use this approach.

To run this, you can just type

$ make test_verilator

This is already set up to take the exact same inputs and generate the same output.

Step 3 - understand the results

When you run the code, you will see that it generates a file named output.dump in the main nanojpeg folder. You can rename this file as output.ppm, and then it should be possible to view this file. Note that you cannot view it on the server, you will need to download the file to your local machine and then view it.

The default input will generate an image of a kitten that is 8x7 pixels in size -- in other words, if you recognize it as a kitten, you have a very good imagination. Instead, the actual output generated by running the converter on another PC is also available in the file firmware/k8x8.ppm.

Note that there is currently a bug in the code that results in one extra byte being added to the output. This means that you cannot directly compare the two files to check for correctness. However, if you use the command

$ xxd output.dump

it will dump out the hex formatted output, and here you can see that it matches the original except for the last byte.

About

This is a course project in EE2003 Computer Organization, EE Department, IIT Madras

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published