Tool to prevent duplicate emoji imports by comparing new .png images against existing .tga emojis, using perceptual similarity instead of filenames.

We‚Äôll use imagededups library's Perceptual Hashing (PHASH) which should work for emoji-like graphics.

TLDR;

Load existing .tga emojis from Emotes/

Convert them to a comparable format (.png in memory, since imagededup uses PIL).

Load new .png emojis from ~/tmp/Emojis.

Compare similarity between each new .png and existing .tga.

Sort proposed candidates and duplicates

In [52]:
import shutil
from pathlib import Path
from imagededup.methods import PHash
from PIL import Image
import numpy as np

In [None]:
EMOTES_DIR = Path("../Emotes")
TMP_DIR = Path("~/tmp/Emojis").expanduser()
RESULTS_DIR = Path("~/tmp/Emojis_Results").expanduser()
OUTPUT_DIR = RESULTS_DIR / "Emojis_New"
DUPLICATE_DIR = RESULTS_DIR / "Emojis_Duplicates"
EXISTING_DUP_DIR = DUPLICATE_DIR / 'existing'
CANDIDATE_DUP_DIR = DUPLICATE_DIR / 'candidate'

def clear_folder(folder):
    if folder.exists():
        print(f"‚ö†Ô∏è Warning: {folder} exists, clearing all files.")
        for f in folder.iterdir():
            if f.is_file():
                f.unlink()
            elif f.is_dir():
                shutil.rmtree(f)
    else:
        folder.mkdir(parents=True, exist_ok=True)

for d in [RESULTS_DIR, OUTPUT_DIR, DUPLICATE_DIR, EXISTING_DUP_DIR, CANDIDATE_DUP_DIR]:
    clear_folder(d)



In [None]:
# Init perceptual hash model
phasher = PHash()

# Encode all images (existing and new) into a single encoding map
all_encodings = {}

# Add existing .tga emoji encodings
for img_path in EMOTES_DIR.rglob("*.tga"):
    try:
        img = Image.open(img_path).convert("RGB")
        img_array = np.array(img)
        hash_val = phasher.encode_image(image_array=img_array)
        all_encodings[str(img_path)] = hash_val
    except Exception as e:
        print(f"‚ö†Ô∏è Skipping {img_path}: {e}")

# Add new .png emoji encodings
png_encodings = phasher.encode_images(image_dir=str(TMP_DIR))
for filename, hash_val in png_encodings.items():
    all_encodings[str(TMP_DIR / filename)] = hash_val

# Find duplicates in the combined set
duplicates = phasher.find_duplicates(encoding_map=all_encodings, scores=True)

# Print and copy non-duplicate PNGs to OUTPUT_DIR, move duplicates to subfolders
for filename in png_encodings.keys():
    full_path = str(TMP_DIR / filename)
    dups = duplicates.get(full_path, [])
    is_existing_dup = False
    is_candidate_dup = False
    for dup_name, score in dups:
        if dup_name == full_path:
            continue
        score_str = f"{score:.2f}"
        if str(EMOTES_DIR) in dup_name:
            is_existing_dup = True
            # Copy to existing subfolder with clear name and score
            out_name = f"{filename}__DUPLICATE_OF__{Path(dup_name).name}__SCORE_{score_str}.png"
            shutil.copy2(TMP_DIR / filename, EXISTING_DUP_DIR / out_name)
            print(f"üö´ {filename} is a duplicate of existing emoji: {Path(dup_name).name} (score={score_str})")
        elif str(TMP_DIR) in dup_name:
            is_candidate_dup = True
            # Copy to candidate subfolder with clear name and score
            out_name = f"{filename}__DUPLICATE_OF__{Path(dup_name).name}__SCORE_{score_str}.png"
            shutil.copy2(TMP_DIR / filename, CANDIDATE_DUP_DIR / out_name)
            print(f"‚ö†Ô∏è {filename} is a duplicate of another new emoji: {Path(dup_name).name} (score={score_str})")
    if not is_existing_dup and not is_candidate_dup:
        print(f"‚úÖ Adding new emoji: {filename}")
        shutil.copy2(TMP_DIR / filename, OUTPUT_DIR / filename)

2025-09-07 06:16:11,488: INFO Start: Calculating hashes...
100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 141/141 [00:00<00:00, 2718.14it/s]
2025-09-07 06:16:11,634: INFO End: Calculating hashes!
2025-09-07 06:16:11,636: INFO Start: Evaluating hamming distances for getting duplicates
2025-09-07 06:16:11,636: INFO Start: Retrieving duplicates using Cython Brute force algorithm

2025-09-07 06:16:11,634: INFO End: Calculating hashes!
2025-09-07 06:16:11,636: INFO Start: Evaluating hamming distances for getting duplicates
2025-09-07 06:16:11,636: INFO Start: Retrieving duplicates using Cython Brute force algorithm
100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 3673/3673 [00:00<00:00, 6257.25it/s]
2025-09-07 06:16:12,321: INFO End: Retrieving duplicates using Cython Brute force algorithm
2025-09-07 06:16:12,321: INFO End: Evaluating hamming distances for getting duplicates

2025-09-07 06:16:12,321: INFO End: Retrieving duplicates using Cython Brute force algorithm
2025-09-07 06:16:12,321: INFO End: Evaluating 

‚úÖ Adding new emoji: alliance_wow.png
‚úÖ Adding new emoji: AngryNelfNoisesIntensify.gif
‚úÖ Adding new emoji: AngryNightElfNoises.png
‚úÖ Adding new emoji: azerothcore.png
‚úÖ Adding new emoji: basement.png
‚úÖ Adding new emoji: bedge.png
‚úÖ Adding new emoji: blizzard.png
‚úÖ Adding new emoji: bloodelf_f.png
‚úÖ Adding new emoji: bloodelf_m.png
‚úÖ Adding new emoji: bongobutt.gif
‚úÖ Adding new emoji: CatOK.png
‚úÖ Adding new emoji: CavemanBob.png
‚úÖ Adding new emoji: CcAngryNelfNoises.png
‚úÖ Adding new emoji: CcAriseMyChampion2.png
‚úÖ Adding new emoji: ccbench.png
‚úÖ Adding new emoji: ccbonk.gif
‚úÖ Adding new emoji: ccbonk1.png
‚úÖ Adding new emoji: ccbonk2.png
‚úÖ Adding new emoji: ccbonk3.png
‚úÖ Adding new emoji: CcCryingDruid2.png
‚úÖ Adding new emoji: ccgrinch.png
‚úÖ Adding new emoji: cchappyGe.png
‚úÖ Adding new emoji: CcKekThas.png
‚úÖ Adding new emoji: ccKneecap.png
üö´ ccMonkaS.png is a duplicate of existing emoji: monkaS.tga (score=4.00)
üö´ ccMonkaS.png is a dupl