Skip to content
Matthew Lake edited this page Sep 11, 2021 · 7 revisions

Original Report

Experimental Image Compression Algorithm

Matthew Lake, Joseph Garrone

Marist College Ashgrove

Polgon version of MCA logo

Introduction

For our project, we have investigated the area of image compression, creating an experimental image compression designed to allow lossless images to be stored in a very small file format, without the encoding-time overheads of traditional compression algorithms designed for general-purpose compression. We were inspired to undertake this investigation by our own experiences of the difficulty of sending image files in a bandwidth-friendly manner without wasting time zipping directories. Furthermore, we quickly realised that this was symptomatic of a larger problem – image and photo sharing has become something of a modern obsession, but existing lossless image formats, designed for an earlier, less connected age, are often too large to be sent across the internet. While this may seem a simple inconvenience, it not only places an increasing load on existing networking infrastructure, but also poses an ongoing and increasing problem as networks grow globally. For these reasons, we believed that image compression was an area of possibly relevant and interesting research.

We felt that a more efficient and user-friendly approach would be an image format specifically designed for the compression of individual images, one which would exist as a compromise between relatively large lossless formats and time-consuming traditional compression software. We decided to investigate image compression in an attempt to create that format, and as a demonstration of our research, we built a simple, albeit largely proof-of-concept image sharing application.

Objective

Our primary objective in this project was to create a lossless image format suitable for being shared on the web. Our goal was for this format to be smaller than PNG both in ordinary usage and when compressed using traditional folder compression methods.

Project Design

Our algorithm consists of three stages. Firstly, we generate a highly efficient approximation of the image and then store not the literal ARGB values of each pixel, but the difference in each channel between the approximation and the real value. In doing so, we have drawn upon the fact that images, unlike many other forms of data, may be fairly accurately approximated by a simplified representation. This fact lies at the heart of all lossy image formats, and is here used to provide a relatively small basis for the lossless compression. Our project generates this simplified representation by using a number of overlapping, coloured polygons to approximate the image.

Genetically evolved shape-based representation

Genetically evolved shape-based representation

Original image (sketch of Firefox logo)

Original image (sketch of Firefox logo)

The size, colour, transparency, number and placement of these polygons is determined through a genetic algorithm that evaluates and evolves a number of possible options. This algorithm works by starting with an essentially random placement of a single polygon upon a background of the image’s average colour, then incrementally adding more polygons and mutating those that already exist by simulating random, but mostly minor, changes to their position and colour.

Overview of genetic algorithm

Overview of genetic algorithm

This algorithm draws upon the ideas of genetic programming and Darwinian evolution by treating each set of polygons as an individual, with each polygon as a chromosome. The suitability of these individuals is quantified by a fitness function that measures the difference between the shape-based representation and the literal bitmap image. The pixels of the original image are compared to the shape-based representation, and the sum of the squares of the differences between the colour of the shape and the colour of the image for each pixel is considered to be the fitness of the individual. The reason that the squares of the differences are is used is because such a method ensures that greater differences are more harshly discouraged.

Once the fitness of each individual is determined, the new iteration of the individual replaces the old if it is fitter – if not, it is discard and this process is repeated. Periodically, the individuals are compared to each other, and the least fit individuals are removed and replaced by a composite of the fittest individuals. In this way, our algorithm simulates the natural process of evolution (survival of the fittest) and uses a simple machine learning technique to approach an optimal representation of the original image. This stage of the larger algorithm concludes when the fitness of the optimal solution implies that the average difference between the pixel-values of the shape-based representation sufficiently small to be stored in 7 bits or less. The resulting approximation is highly efficient and relatively tiny, since a polygon can cover a large area without requiring more than a few points. Since the difference between the approximation and the real ARGB values will in most cases be fairly small, there is already a considerable reduction in size, the extra data storing the approximation notwithstanding.

The second stage of our compression algorithm involves the replacement of common difference values with a reference to the last identical value where ever reasonable. To achieve this, the array of difference values is split into chunks, which are then processed using asynchronous multi-threaded tasks. This allows the data to be processed quickly without too much loss of potential matches, since it is fairly likely that an exact match can be found within 2048 pixels. This method is particularly effective in conjunction with the first stage, since the use of difference values serves to cluster the data closer to the average, meaning that exact matches are more likely.

Finally, a simple dictionary compression technique is used to further reduce the file size. This method works by again dividing the data into asynchronously processed chunks, each large enough to probably contain any string that is repeated enough to be worth storing in the dictionary. Once all words that are repeated have been identified, they are ordered based on the product of their frequency and their length, and are put into assigned a dictionary key based on a Huffman-style encoding. In this way, the most frequently appearing strings are represented by the shortest keys.

Research, Design Process & Problems Encountered

Upon deciding to investigate image compression, we were inspired by minimalistic styles of artwork to investigate the possibility of using a shape-based representation. Though we were committed to developing a lossless standard, we realised that, in a properly designed bit format, it would take less space to store a small number than a large one. Thus, if we could replace the large bitmap array with a difference array of much smaller numbers and a polygon-based representation, it would be possible to store the image in much less space.

However, we initially hit problems in attempting to generate the shape-based representation. Our first attempt centred on a method which stepped through all the image's pixels, placing rectangles when there was a sufficiently large difference between pixels. This rectangle would extend downwards as long as there was such a difference at the same horizontal position. This attempt was fraught with difficulties: it was far too slow, created far too many shapes, and failed to generate a semi-accurate representation on all but the most simplistic test images.

Manual placement versus genetic algorithm

Manual placement versus genetic algorithm

Reviewing our process, it became evident firstly that placing rectangles where-ever there was a change in colour was non-productive, since it tended to generate tens of thousands of tiny rectangles, meaning that there was little reduction in size. Furthermore, stepping through the entire image pixel by pixel checking for distinct colour differences, and determining along the way where to place rectangles, was demonstrated to be incredibly slow, taking up to an hour to produce even a very inaccurate representation.

An attempt was made to mitigate these problems by generating the representation based on a 256-colour version of the image, but this was unsuccessful, meeting with the same flaws as earlier. Additionally, more time was lost in the conversion, primarily due to the .Net Bitmap class lacking implicit support for converting Bitmap between colour depths. Whilst this problem of slow conversion was eventually solved by a manual conversion using Windows' low-level GDI+ API, this method was ultimately fruitless. In reflecting upon our algorithm design process, we came to the realisation that manually creating and positioning rectangles was doomed to be too time-consuming and inaccurate to be feasible. However, our research into methods of image representation and simplification suggested another approach - genetic programming. After more focussed research into genetic programming and evolutionary algorithms, we came to the decision that rather than creating the shape-based representation manually, it would be faster and more productive to create a genetic algorithm to create the shape-based representation.

We decided upon a genetic programming algorithm that works by creating shapes, mutating them, and then testing if the mutated version is fitter than the non-mutated version. Immediately, it was obvious that this approach was much more efficient, yielding accurate results in a much shorter time. Building upon this success, we made the decision to move away from rectangles as the only shape. This restriction, which was necessary when manual control meant that only simple shapes could be used, no longer made sense when any mutation was permissible. This also resulted in significant improvements in accuracy.

Refining the genetic algorithm

Refining the genetic algorithm

Reflecting upon our method as an evolutionary algorithm, it became apparent that we were not taking full advantage of the competitive nature of evolutionary simulations. Though we modelled competition between parents and children, we did not make use of competition and inter-breeding in a larger population. We attempted to rectify this by creating ten different possible solutions, which after mutation, were compared to each other. The least fit individual was deleted, and replaced with a composite of the two fittest individuals. Immediately, however, it was obvious that the added computational load of comparing all the individuals after every iterations was causing great reductions in the algorithm's performance.

Introducing competing populations to the genetic algorithm

Introducing competing populations to the genetic algorithm

We thus decided to limit the comparison to once every thousand generations, based upon testing data. Whilst, on small images the performance benefits were minor, larger images experienced more significant accuracy increases, probably due to the algorithm getting a sense of the fundamental composition of the image through inter-breeding between successful individuals and the implicit transferal of information.

A further development of our genetic algorithm was implemented in allowing the number of polygons to be modified as a result of testing images of extreme sizes. It was realised that whilst setting a static number of polygon might be useful for medium-sized images, this number was unnecessary for small images and insufficient for larger images. We thus modified the algorithm to start with a single polygon and add or remove polygons based a random mutation chance.

Once we had effectively produced a shape-based representation and a small difference array, it came to our attention that there was a significant degree of repetition within the difference array. Taking advantage of this, we decided to replace all repeated difference values with a reference to the original, saving a lot of space.

Relative size before and after references

Relative size before and after references

We initially sought to achieve this by having every pixel search the entire image for a matching pixel. However, this proved to be highly inefficient, bringing the algorithm to a crawl. Instead, we decided to process the image in 2048 pixel chunks, and to only include references to matching pixels if they come before the pixel being considered.

Having done so and achieved reasonable performance, we turned our attention to the possibility of utilising multi-threading. Since each of the chunks requires no interaction with the others, we thought that this process was a perfect case for multi-threading. Firstly, we tried to implement multithreading using the basic, somewhat outdated thread class. Surprisingly, we experience little improvement over the single-threaded version.

Comparison of multi-threading approaches

Comparison of multi-threading approaches

Under closer inspection, it became evident that our usage of the thread class was too simplistic, as simply calling a threaded method would in fact cause the main thread to wait for it before moving on. Faced with the prospect of possibly rewriting much of our referencing algorithm, we decided to instead implement the more modern tasks class, which allowed greater flexibility and introduced significant speed boosts.

Tools

Based upon the (albeit limited) experience of some of us with programming in Visual Basic .Net, we made the decision to develop our project in .Net. Upon further recommendation, we made the decision to use C# rather than VB, since it was determined to be more accessible for those in the team with little programming experience beyond JavaScript. Since we had not developed using C# before, this presented initial problems of unfamiliarity as well as a lack of knowledge of higher-level programming concepts, such as classes, optimisation and type conversion. Furthermore, we decided to use Windows Presentation Format rather than Windows Forms for our project in order to utilise GPU-accelerated graphical calculations. (Naturally important for an image-compression algorithm)

Conclusion

Our project, integrating aspects of genetic programming and traditional dictionary-based compression techniques, appears to have had some success in producing an output file that is in most cases some ten to fifteen percent smaller in size than the equivalent PNG file. Additionally, even when compressed in a folder, our format is still, in the majority of cases, slightly smaller than the PNG equivalent. This success, however, was not without its drawbacks, with our method being significantly slower than PNG for large images. This notwithstanding, our project can be considered to have achieved its objective, since the use case of image sharing does not require instantaneous saving and sending – rather, a delay of several minutes is an acceptable trade-off for reduced network strain.

In terms of this investigation as a learning experience, we both learned considerable amounts of new programming concepts and skills, since we had not undertaken algorithm-based programming projects before. We have also learned much about the intriguing field of genetic programming, a topic that was entirely new to us before our research. In terms of project design, this project demonstrated to us the value of focussing a single use case that fulfils an existing need or solves a real problem. In this way, we learnt to make the trade-offs and compromises necessary to fulfil these requirements.

If we were to continue this line of investigation in the future, we would likely shift our focus onto dictionary-based methods of compression, since this would likely yield greater reductions in file size. Additionally, in terms of procedure it would be useful to focus more on programming methods sooner, rather than starting with unrealistic methods (such as manual assignation of shapes). Such an effort would likely involve greater research before commencement of work, in order to focus our investigation on the most promising areas.

In all, we feel that our investigation, whilst ambitious, delivered satisfactory results that fulfils the task objective and requirements in developing a compression-first image format designed to be light-weight and portable.