Skip to content

madziejm/project-fontr

Repository files navigation

FontR

Info about project

FontR is a font recognition project aimed at creating a solution for font recognition based on the idea proposed by one of the team members. The project is built on the foundation of Adobe's DeepFont research, which can be found in their paper here. https://arxiv.org/pdf/1507.03196v1.pdf

Team members:

@madziejm @mleonowicz @kjpolak

Running

To run the project, follow the steps below:

Install the required dependencies:

poetry run pip install -r src/requirements.txt

Build the project requirements:

poetry run kedro build-reqs

Run the project:

poetry run kedro run

You can also run specific pipeline using -p flag if you want to skip reproducing steps. For example to run the model training and evaluation part run:

poetry run kedro -p data_science

To build our solution we used kedro framework. Kedro is a Python framework for building scalable, maintainable, and reproducible data pipelines It provides a standardized project structure and features for data engineering, machine learning, and production deployment, simplifying the development of end-to-end data solutions.

To manage dependencies our solution uses poetry which is a Python dependency manager, thanks to which a lock of dependencies can be generated, which increases reproducibility and replicability.

To store our data we use s3 which is Amazon storage solution. To use your own storage you need to change filepath parameters in conf/base/catalog.yml and values in conf/base/parameters/bcf_preprocessing.yml.

Weights and Biases logging

Weights and Biases (W&B) is a platform for tracking, visualizing, and optimizing machine learning experiments, which we use in our project. It is required to run the project.

To use W&B logging in this project create a file named credentials_wandb.yml in the conf/local directory. It should have one attribute called api_key that stores an API Key to your W&B account.

Once you have set up the credentials_wandb.yml file, you can run the project and enjoy the benefits of W&B logging. The project will automatically log relevant experiment metrics and provide visualizations through the W&B platform.

AdobeVFR Dataset

AdobeVFR dataset focuses on popular fonts and consists of synthetic and real-world data. The dataset is divided into four parts.

  1. Synthetic Data

    • VFR_syn_train: This subset contains 1,000 images per class for training.
    • VFR_syn_val: This subset includes 100 images per class for validation.

    To generate synthetic training data, the dataset creators render long English words sampled from a large corpus. The resulting text images are tightly cropped, grayscale, and size-normalized. Each of the 2,383 font classes in the dataset has 1,000 training images and 100 validation images.

  2. Real-world Data

    • VFR_real_test: This subset consists of 4,384 real-world test images with reliable font labels. These images were collected from typography forums where users seek help in identifying fonts. The images are converted to grayscale, manually cropped, and normalized to a height of 105 pixels. These images exhibit larger appearance variations due to scaling, background clutter, lighting, noise, perspective distortions, and compression artifacts.
    • VFR_real_u: This subset contains 197,396 unlabeled real-world images. These images were not annotated with font labels but were utilized to pre-train a "shared-feature" extraction subnetwork to reduce the domain gap.

Also a fontlist specifying the 2,383 font classes used in the dataset is provided, allowing to download the corresponding font files (.otf) and render synthetic images.

Furthermore, the text mentions the availability of sample codes, provided in bcf format, for processing the dataset. These sample codes are intended for use with cuda-convenet and are not optimized for product-level usage.

Examples:

Example 1 Example 2 Example 3

Architecture

Our solution consists of three distinct kedro pipelines:

bcf_preprocessing

This pipeline focuses on the preprocessing steps necessary for the solution. It handles tasks such as downloading, preparing, and uploading images to the Amazon S3 storage. These images are essential for replicating the original DeepFont implementation.

data_processing

The data_processing pipeline is responsible for splitting the prepared dataset and feeding it into the models used in the solution.

data_science

The data_science pipeline represents the core component of our solution. It encompasses the training, evaluation, and serialization processes for both the encoder and classifier models. This pipeline is where the machine learning magic happens, as it leverages the provided data to train accurate models capable of font recognition.

By leveraging the capabilities of the kedro framework, we were able to develop a robust and efficient solution for font recognition, with clearly defined and modular pipelines that enable easy maintenance, reproducibility, and scalability

Visualisation of the project using kedro-viz can be seen bellow: Pipeline

Results

TODO:

Sources

Publications on the problem of font recognition:

Code layout and pipeline - highlights

src/
┣ fontr/
┃ ┣ fontr/
┃ ┃ ┣ autoencoder.py
┃ ┃ ┣ classifier.py
┃ ┃ ┣ logger.py
┃ ┃ ┣ transforms.py
┃ ┃ ┗ __init__.py
┃ ┣ pipelines/
┃ ┃ ┣ bcf_preprocessing/
┃ ┃ ┣ data_processing/
┃ ┃ ┣ data_science/
┃ ┃ ┣ nodes.py
┃ ┃ ┗ __init__.py
┃ ┣ datasets.py
┃ ┣ pipeline_registry.py
┃ ┣ settings.py
┃ ┣ __init__.py
┃ ┗ __main__.py
┣ tests/
┃ ┣ pipelines/
┃ ┃ ┗ __init__.py
┃ ┣ conftest.py
┃ ┣ test_autoencoder.py
┃ ┣ test_classifier.py
┃ ┣ test_run.py
┃ ┗ __init__.py
┣ requirements-test-only.txt
┗ requirements.txt