To find fonts that can represent a wide range of Unicode characters, we need to analyze the Unicode coverage of multiple fonts and compare them. 

1. Create a list of fonts we want to evaluate.<newline>
2. Retrieve the supported Unicode character range for each font in the list.<newline>
3. Compile the results and compare the Unicode coverage among the fonts. Look for fonts that have a broader range of supported Unicode characters.<newline>
4. Consider factors such as the scripts or languages we’re interested in, the specific Unicode blocks we want to cover, and the visual design or stylistic preferences we have for the font.<newline>


In [1]:
#pip install fonttools

# Make sure you install this first!

The code below uses the getBestCmap() method to obtain the mapping of Unicode characters supported by the font, and then it prints the Unicode characters in hexadecimal representation. We can also extract additional information about the font, such as the font family name and font style. The first font used is NotoSans, a popular choice as it aims to provide comprehensive coverage of all Unicode characters.

# 1. NotoSans

In [2]:
from fontTools.ttLib import TTFont

# Specify the path to the font file
font_path = "/Users/fatimaadmin/Documents/Alphabets/NotoSans-Regular.ttf"

# Load the font using TTFont
font = TTFont(font_path)

# Retrieve the supported Unicode character range
supported_range = font.getBestCmap().keys()

# # Print the supported Unicode characters
# for char in supported_range:
#     print(hex(char))

In [3]:
from fontTools.ttLib import TTFont

# Specify the path to the font file
font_path = "/Users/fatimaadmin/Documents/Alphabets/NotoSans-Regular.ttf"

# Load the font using TTFont
font = TTFont(font_path)

# Retrieve the font family name
family_name = None
for record in font["name"].names:
    if record.nameID == 1:
        family_name = record.string.decode(record.getEncoding())

# Retrieve the font style
style = None
for record in font["name"].names:
    if record.nameID == 2:
        style = record.string.decode(record.getEncoding())

# Print the font information
print("Family Name:", family_name)
print("Style:", style)

Family Name: Noto Sans
Style: Regular


# 2. Arial Unicode MS

Repeat the above, but with a different font.

In [4]:
from fontTools.ttLib import TTFont

# Specify the path to the font file
font_path = "/Users/fatimaadmin/Documents/Alphabets/Arial Unicode MS Font.ttf"

# Load the font using TTFont
font = TTFont(font_path)

# Retrieve the supported Unicode character range
supported_range = font.getBestCmap().keys()

# # Print the supported Unicode characters
# for char in supported_range:
#     print(hex(char))

In [5]:
from fontTools.ttLib import TTFont

# Specify the path to the font file
font_path = "/Users/fatimaadmin/Documents/Alphabets/Arial Unicode MS Font.ttf"

# Load the font using TTFont
font = TTFont(font_path)

# Retrieve the font family name
family_name = None
for record in font["name"].names:
    if record.nameID == 1:
        family_name = record.string.decode(record.getEncoding())

# Retrieve the font style
style = None
for record in font["name"].names:
    if record.nameID == 2:
        style = record.string.decode(record.getEncoding())

# Print the font information
print("Family Name:", family_name)
print("Style:", style)

Family Name: Arial Unicode MS
Style: Normal


# Comparison

In [6]:
from fontTools.ttLib import TTFont

# Specify the path to the NotoSans font file
noto_path = "/Users/fatimaadmin/Documents/Alphabets/NotoSans-Regular.ttf"

# Load the NotoSans font using TTFont
noto_font = TTFont(noto_path)

# Retrieve the supported Unicode character range for NotoSans
noto_supported_range = set(noto_font.getBestCmap().keys())

# Specify the path to the Arial Unicode MS font file
arial_path = "/Users/fatimaadmin/Documents/Alphabets/Arial Unicode MS Font.ttf"

# Load the Arial Unicode MS font using TTFont
arial_font = TTFont(arial_path)

# Retrieve the supported Unicode character range for Arial Unicode MS
arial_supported_range = set(arial_font.getBestCmap().keys())

# Find unique Unicode characters in NotoSans
noto_unique_chars = noto_supported_range - arial_supported_range

# Find unique Unicode characters in Arial Unicode MS
arial_unique_chars = arial_supported_range - noto_supported_range

# Find overlapping Unicode characters
overlap_chars = noto_supported_range.intersection(arial_supported_range)

# Print the results:

# print("Unique characters in NotoSans:")
# for char in noto_unique_chars:
#     print(hex(char))

# print("\nUnique characters in Arial Unicode MS:")
# for char in arial_unique_chars:
#     print(hex(char))

# print("\nOverlap characters:")
# for char in overlap_chars:
#     print(hex(char))

Alternatively, we can just define a function to compare the Unicode coverage between multiple fonts.

In [7]:
from fontTools.ttLib import TTFont

def compare_unicode_coverage(font_paths):
    unicode_coverage = {}

    for font_path in font_paths:
        font = TTFont(font_path)
        supported_range = set(font.getBestCmap().keys())
        unicode_coverage[font_path] = supported_range

    overlap_chars = set.intersection(*unicode_coverage.values())
    unique_chars = {font_path: supported_range - overlap_chars for font_path, 
                    supported_range in unicode_coverage.items()}

    return overlap_chars, unique_chars

In [8]:
# Example usage comparing NotoSans, Arial Unicode MS, Courier New, Segoe UI, and Times New Roman:
font_paths = ["/Users/fatimaadmin/Documents/Alphabets/NotoSans-Regular.ttf", 
              "/Users/fatimaadmin/Documents/Alphabets/Arial Unicode MS Font.ttf", 
              "/Users/fatimaadmin/Documents/Alphabets/courier_new.ttf",
              "/Users/fatimaadmin/Documents/Alphabets/Segoe UI.ttf",
              "/Users/fatimaadmin/Documents/Alphabets/Times New Roman Font.ttf"]
overlap_chars, unique_chars = compare_unicode_coverage(font_paths)

# print("Overlap characters:")
# for char in overlap_chars:
#     print(hex(char))

# print("\nUnique characters:")
# for font_path, chars in unique_chars.items():
#     print(f"\n{font_path}:")
#     for char in chars:
#         print(hex(char))

The hexadecimal representation is used to represent Unicode characters in a compact and standardized way. It provides a numerical representation of the Unicode code point, allowing easy reference to specific characters.

### Unique Characters:

Under each font path, there are multiple lines representing the unique characters in hexadecimal format. Each line corresponds to a unique character.

### Overlapping Characters:
Each line represents an overlap character, i.e., a character that is present in the Unicode coverage of multiple fonts.

This information can help us identify common symbols or characters that are widely supported across different fonts. Thus we can determine the essential characters that should be covered by any font we choose.

# Filtering Fonts

In [9]:
from fontTools.ttLib import TTFont

def filter_fonts_by_coverage(font_paths, desired_characters):
    filtered_fonts = []

    for font_path in font_paths:
        font = TTFont(font_path)
        supported_range = set(font.getBestCmap().keys())

        overlap_chars = supported_range.intersection(desired_characters)
        unique_chars = supported_range - overlap_chars
        
# Check if the font has at least 1 overlap character and 10 unique characters 
# This can be adjusted according to our requirements
        if len(overlap_chars) >= 3 and len(unique_chars) >= 10:
            filtered_fonts.append(font_path)

    return filtered_fonts

# Example usage:
font_paths = [
    "/Users/fatimaadmin/Documents/Alphabets/NotoSans-Regular.ttf",
    "/Users/fatimaadmin/Documents/Alphabets/Arial Unicode MS Font.ttf",
    "/Users/fatimaadmin/Documents/Alphabets/courier_new.ttf",
    "/Users/fatimaadmin/Documents/Alphabets/Segoe UI.ttf",
    "/Users/fatimaadmin/Documents/Alphabets/Times New Roman Font.ttf"
]
desired_characters = {
    0x0020,  # Space
    0x0021,  # Exclamation mark
    0x0022,  # Quotation mark
    # Add more desired characters here
}

filtered_fonts = filter_fonts_by_coverage(font_paths, desired_characters)

# print("Filtered Fonts:")
# for font_path in filtered_fonts:
#     print(font_path)