# Generating synthetic contracts with signature areas

We want to generate synthetic contract pages with random text, randomly placed signature areas, and have these signature areas randomly contain/not contain a human-looking signature. The purpose is to train a ML-model in recognizing signature areas in scanned text documents, and wheter or not these signature areas contain a human-looking signature.

We first write a small function that will random sample the high-level features of a page:

In [2]:
import numpy as np

In [21]:
def sample_high_level_features(text_number_of_areas_min=0, 
                               text_number_of_areas_max=10,
                               signature_number_of_areas_min=0, 
                               signature_number_of_areas_max=6,
                               ):
    
    text_number_of_areas = np.random.randint(low=text_number_of_areas_min,
                                             high=(text_number_of_areas_max+1))
    
    signature_number_of_areas = np.random.randint(low=signature_number_of_areas_min,
                                                  high=(signature_number_of_areas_max+1))

    return(text_number_of_areas, signature_number_of_areas)

Likewise we need a function to randomly sample the placement of a text/signature box. We include some logic to make sure all boxes are placed inside the page:

In [23]:
def sample_box_placement(width_min, width_max, height_min, height_max, page_width, page_height):
    
    width = np.random.randint(low=width_min, high=width_max)
    height = np.random.randint(low=height_min, high=height_max)
    
    x_max = page_width - width
    y_max = page_height - height
    
    x0 = np.random.randint(low=0, high=x_max)
    y0 = np.random.randint(low=0, high=y_max)
    
    x1 = x0 + width
    y1 = y0 + height
    
    return((x0,x1),(y0,y1))

We also need to keep track of previously added boxes and make sure new boxes don't overlap these. Later we'll decide what to do if a box overlaps.

In [None]:
def check_box_overlap(box_coordinates, existing_boxes_list):