# Initial Exploration

The aim of this project is to create something similar to the "audio features"
([link](https://developer.spotify.com/documentation/web-api/reference/tracks/get-audio-features/))
that Spotify has for it's songs, applied to graphic design documents.

For Spotify, the audio features form the bedrock of
their recommendation data products -- Discover Weekly, Daily Mix, Tastebreakers and others.

Several companies are springing up that offer ready-made graphic design templates. In different
categories such as business cards, wedding invitations and slideshows. [Canva](https://www.canva.com/)
is one of them.

To be able to create similar recommendation data products for Canva, we'd need to
generate information about these different graphic design templates, so that we can
embed user's selections into a feature space. With that in place, finding "similar"
templates (i.e. *Recommended for you*) would be about finding the nearest templates within
this feature space.

To start we need to extract a set of data of a range of different graphic design templates.
We can focus on one category: **invitations**.

<!-- Why menus? With a quick glance it seemed that `menu templates` as a search query on google
has less *noisy* data. As in, the templates are images with no cropping required for them.
By comparison `invitations` is a messy set of images with lots of processing of the images
required. We could jump to a different type if the data isn't working out for us. -->

# Data

Possible places to get data from:

- Google search results.
- Web scrape graphic design web pages: etsy, canva.

We would want to drop off from the data any outliers.
To keep our first iteration simple, we want features that are
clean renderings of the template, not an image
of an already-printed object. This can be done with a well-worded google search.

# Model

- We would want to leverage a pre-trained image model
- Train it further on our data.
- Since we want to simplify the processing, we want to have a vector output based on a range of metrics.
This way we can share the nodes across our outputs. To improve results, if needed,
we can separate them out into separate models.
- Because the cost to label the data is high, we would want to
use semi-supervised learning.

## Outputs

Colours:

- Colour palettes
- Total number of colours present
- Hue
- Saturation
- Contrast
- Monochrome indicator

Layout

- White space between elements

Text

- serifness (sans serif -> serif -> script)
- kerning
- monospaced

Graphics

- floralness
- busyness
- realness (from 0 being a definite vector graphic up to
1 being something that looks like a hand-drawn painted/drawn object)

Some of these outputs are deterministic and wouldn't need to be fed through the model.
We can split the above into those that are aggregations on the image (colour, hue, saturation),
and those that are about the semantics of the image. With more fine-tuned data,
more and more of this wishlist of features can be found for "free". I.e. if we had
info on the individual elements of a design, we can have a smaller model that runs just
for text, and one for graphics. Without that, we are having to deal with the jpeg output
of the image.