Skip to content
/ reducto Public

Project II for 3460:635 Advanced Algorithms, which is composed of two parts. Part I uses singular value decomposition (SVD) to compress PGM images. Part II uses principal component analysis (PCA) to reduce the dimensionality of high-dimensional datasets.

Notifications You must be signed in to change notification settings

hmm34/reducto

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

86 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

reducto

The goal of this project is to use linear algebra matrix operations to shrink large datasets. The specific requirements consist of two major parts: image compression and dimensionality reduction. Image compression is accomplished by using singular value decomposition (SVD), and high-dimensional dataset dimensionality reduction is accomplished by using principal component analysis (PCA).

Part I: Image Compression

The most important operation used in this section is singular value decomposition (SVD). A singular value decomposition of an m by n matrix M is a factorization of the form M=UΣV*, where

  • U is an m by m unitary matrix
  • Σ is an m by n rectangular diagonal matrix with nonnegative numbers on the diagonal, and
  • V* is the transpose of a unitary matrix V.

Uses

reducto 1 image.pgm

Saves the necessary information for the PGM image in ASCII PGM format using binary encoding, without preserving comments. 2 bytes are used to save the width of the image, 2 bytes are used to save the height are the image, 1 byte is used to save the grey scale, and 1 byte is used for the grey level of each pixel. See issue #6.

For example, if the image.pgm contains:

P2
#my image
4 5
255
234 23 12 345 22 11 22 11 22 1
1 2 3 4 5 6 7 8 9 10

then the file will be saved using 2 + 2 + 1 + (4 * 5) = 25 bytes. The comment is not preserved. The output is a binary file named image_b.pgm.

reducto 2 image_b.pgm

Reverses the previous command. Converts a binary file, image_b.pgm, to an ASCII PGM file. The original comment is not preserved, and the new comment is "# Back off! I will make you teensy!". The outupt file is saved as image2.pgm. See issue #7.

reducto 3 header.txt SVD.txt k

Saves the necessary information in the approximated image in a binary format. The input includes the header of the original image, three matrices U, Σ, V, and an integer k representing the rank of the approximation. The saved image is named image_b.pgm.SVD, is recoverable, and can be viewed using any PGM viewer. The goal is to save storage.

The header.txt contains three integers (width, height, grey scale levels). The SVD.txt contains three matrices (U, Σ, V). See issue #8.

reducto 4 image_b.pgm.SVD

Reverses the previous command. Takes a PGM image that was compressed using SVD and outputs an uncompressed PGM image, named image_k.pgm, that can be viewed using any pgm viewer. See issue #9.

reducto 5 image.pgm

Possible extra credit opportunity. Takes a PGM image in ASCII format and produces the header.txt and svd.txt files that can be used for step 3. Employs SVD decomposition on the original matrix A (which is the image) to produce three matrices U, S, and V, which are written to image_svd.txt. The width, height, and maximum grayscale values from the original image are pereserved and written to image_header.txt.

Part II: Dimensionality Reduction

The most important algorithm used in this section is principal component analysis (PCA). It is a statistical procedure that uses orthogonal transformation to convert a set of observations of possibly correlated variables into a set of values of linearly uncorrelated variables called principal components. It's commonly used for dimensionality reduction for visualization of high-dimensional datasets. See issue #5

Testing

test storage

Runs multiple test cases of image compression and decompression using the first two reducto commands and varying sizes of images, as well as the last two commands that use SVD with varying levels of rank approximation k and relaxing the norm of the error matrix.

test timing

Runs multiple timing metrics using the aforementioned commands for image compression and decompression based on size of the original image as well as compression size.

About

Project II for 3460:635 Advanced Algorithms, which is composed of two parts. Part I uses singular value decomposition (SVD) to compress PGM images. Part II uses principal component analysis (PCA) to reduce the dimensionality of high-dimensional datasets.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published