Emoji upsampling workflow using LIIF
All TTF glyphs were matched to the emojipedia reference images (excluding one 'hidden' glyph), with the assistance of image hashing functions (however these are not used in the final version, instead name matching rules were used to correspond icons to the original glyphs).
To enlarge the extracted glyphs, run enlarge_osx_glyphs.py
which will populate the enlarged
directory (skipping any glyphs already enlarged in this way). By default this will be sped up
by multiprocessing on all available cores.
See requirements.txt
- LIIF
- Currently this requires a sibling directory
../liif/
with LIIF installed there (change this in thepath_to_liif_script
andpath_to_model
variables inenlarge_osx_glyphs.py
) - Python 3
- PyTorch >= 1.6.0
- If your CUDA is greater than 10.2 then get a more recent PyTorch e.g. 1.7.1
- See here
- Depends:
TensorboardX, yaml, numpy, tqdm, imageio
- Currently this requires a sibling directory
bs4, requests, fonttools, matplotlib, more_itertools, imagehash, pandas, scikit-image
I get everything via pip
except cudatoolkit
and torch
/torchvision
which come via conda (-c pytorch
)
However there is currently an open bug in PyTorch
resolvable by changing the CUDA toolkit dependency to cudatoolkit>=11.0,<11.0.221
conda create -n liif_env
conda activate liif_env
# gives cudatoolkit=11.0.3:
conda install "cudatoolkit>=11.0,<11.0.221" -c conda-forge
# gives pytorch=1.7.1, torchvision=0.8.2, also adds python=3.8.5:
conda install pytorch torchvision -c pytorch
# Install the rest via PyPi:
pip install $(cat requirements.txt | grep -Ev "^(torch(vision)?)$")
The results are nice from 160x160 PNGs extracted from the font TTF.
See this thread on Twitter for some examples
However, to call this complete I'd want to recover equivalent PNGs, including transparency, and for this alpha decomposition is required.
This turned out to be tricky!
Attempting to run the alpha channel through LIIF failed to recover a mask matching the superresolved glyph (RGB channels) so instead I'm planning to estimate it from the RGB:
Figure generated from 🍃 (
U+1F343
) byrestore_alpha_to_enlarged_subview.py
Closer up generated by
restore_alpha_to_enlarged_2x2_subview.py
The alpha channel estimation results were pretty poor (TODO)
After doing a fair bit of closer inspection I still can't figure out precisely how to "pull up" the
pixels that get given an alpha too low (and guessing too naively would degrade the result).
Perhaps 50% of the way there after restating the problem as minimising the loss after re-alpha compositing,
and carrying out the re-estimation in 2 passes (once for pixels with a "uniform loss" e.g. pixel RGB
differences of (10,10,10)
, and a second for pixels with a "partial loss" e.g. RGB difference of
(11,12,13)
for which only the minimum (11) would be re-calibrated -- as all backgrounds are
grayscale).
A further attempt to recover the SR image generated by
reestimate_leaf_sr_transparency.py
resulting in an improvement but not a perfect fix (yet)
Here's the beginning of a second pass (from which it can be seen that the first pass didn't fully resolve the pixels in the loss mask, bottom left, which is less intense but still not uniformly grey). Generated by
further_reestimate_leaf_sr_transparency.py
Yinbo Chen, first author of the LIIF paper, says:
I am not very familiar with the processing of alpha channel. Since the model is only trained for RGB images, the alpha channel will make it an out-of-distribution task. If there are a large amount of LR-HR pairs of images with alpha channel, a straight-forward method is to modify the code to work on 4-channel images (since all the code assume 3-channel, there can be many necessary modifications such as the encoder model part and the data normalization part) and train a 4-channel SR model.
There's an issue on another project with pretty much the same suggestion here and reaching the same conclusion. The only question is where to find a DIV2K-like dataset of semi-transparent images (note: not 'stickers' with simply binary alpha)
TODO...
- Image thumbnails (transparent PNGs) are converted to nontransparent by selecting a colour not used
in the image and flattening with this as a background colour (either in Python, or imagemagick
convert
)- Upon calculation, the RGB value of this alpha colour is stored in a database
- Image thumbnails are upsampled (i.e. enlarged) using LIIF
- The resulting large (2000x2000) 'high resolution' images are restored to transparency by removing the alpha colour [retrieved from the database]
- #2: add background info on the LIIF method
- Find a dataset of images with alpha channel (DIV2K has 1000 images in 8:1:1 split) and train a 4
channel model to avoid having to alpha decomposite
- Alternatively find some way to supervise the decompositing (there are no such datasets around...)
- PyTorch 1.8.0 introduced
set_per_process_memory_fraction
, not clear whether this will be more desirable (it sounds like it raises a CUDA out of memory error if the process tries to take more memory than allocated, which is happening anyway with multiple processes, so I don't know if this would actually help unless there was a way for the process to detect the size of its allocation...)- Check to see if perhaps allowing only half the CPU cores would speed up the total run time, which would be the case if the bottleneck was at the GPU rather than CPU
- If it's helpful, see if PyTorch will work with 1.8.0
The two emoji image sets are sourced from:
- Emojipedia (iOS 14.2) 72x72 PNG
- OSX 10.15 (Catalina) 160x160 PNG
All credit and copyright belongs to Apple for the iOS/OSX emoji character images, reproduced here under fair use and for noncommercial purposes.
Emojipedia's Emoji Image Licensing page states:
To the best of our knowledge, specific information about licensing emojis from Apple... is not publicly available.
Unless otherwise stated, emoji images are © copyright, and enquiries about commercial licensing of emoji images should be directed to their respective font vendors.
They also link to a 2017 blog post, “Who Owns Emoji?”, which states:
Apple has not made licensing options publicly available for Apple Color Emoji.
As such, those wanting to use Apple’s emoji images may be restricted to using these images in a way that could be considered fair use.
Conventional reading of fair use includes such purposes as:
"limited use of copyrighted material without permission for purposes such as criticism, parody, news reporting, research and scholarship, and teaching."