```
    @staticmethod
    def slice_and_pad(desired_length, source_tensor, start, stop, min_padding, left_padding, right_padding):
        L = stop - start # length of the chunk
        channels = source_tensor.size(0)
        t = torch.zeros(channels, desired_length+(2*min_padding), dtype=torch.uint8)
        t[  : , left_padding:left_padding+L] = source_tensor[ : , start:stop]
        return t.unsqueeze(0)
```

```
    @staticmethod
    def pad_and_shift(text, desired_length, random_shifting, min_padding):
        """
        Adds space padding on the left and the right and shifts randomly unless deactivated.
        Returns the padded text.
        """
        # min_padding + text + filler + min_padding
        # len(filler) = desired_length - len(text)
        # 20 + (150-129) + 20
        # 20 + 11+129+10 + 20
        # left_padding + text + right_padding
        padding = min_padding + (desired_length-len(text)) + min_padding
        left_padding = padding // 2
        right_padding = padding - left_padding
        if random_shifting:
            shift = randrange(-left_padding, right_padding)
        else:
            shift = 0
        left_padding = left_padding + shift
        right_padding = right_padding - shift
        padded_text = ' ' * left_padding + text + ' ' * right_padding
        return padded_text, left_padding, right_padding
```

```
    def padding(self, input):
        # 123456789012345678901234567890
        #        this cat               len(input)==8, min_size==10, min_padding=5
        #       +this cat+              pad to bring to min_size
        #  _____+this cat+_____         add min_padding on both sides
        #padding_length = ceil(max(config.min_size - len(input), config.min_padding)/2)
        min_size=1530
        #min_size= config.min_size
        min_padding = config.min_padding
        padding_length = ceil(max(min_size - len(input), 0)/2) + min_padding
        pad = SPACE_ENCODED.repeat(padding_length)
        padded_string = pad + input + pad
        print(len(padded_string))
        return padded_string
```

# Experimentation with large text-based padding but no intrinsic padding in unet2


in `stmag.common.config`:

    _min_padding       = 380 # the number of (usually space) characters added to each example as padding to mitigate 'border effects' in learning


## Panel level examples (`L=400`)

### Role for geneprod

    smtag-convert2th -c 181203all -L400 -X10 -E ".//sd-panel" -y ".//sd-tag[@type='gene']",".//sd-tag[@type='protein']"     -e ".//sd-tag[@type='gene']",".//sd-tag[@type='protein']"  -A ".//sd-tag[@role='intervention']",".//sd-tag[@role='assayed']",".//sd-tag[@role='normalizing']",".//sd-tag[@role='experiment']",".//sd-tag[@role='component']" -f 10X_L400_geneprod_anonym_not_reporter_large_padding_no_ocr --noocr

Training __with only 3 layers__:

    smtag-meta -f 10X_L400_geneprod_anonym_not_reporter_large_padding_no_ocr -E10 -Z128 -R0.01 -o intervention,assayed -k 6,6,6 -n 8,8,8 -p 2,2,2 -w /efs/smtag
    
__saved as `10X_L400_geneprod_anonym_not_reporter_large_padding_no_ocr_intervention_assayed_2019-01-15-17-43.zip`__
   
    
    smtag-meta -f 10X_L400_geneprod_anonym_not_reporter_large_padding_no_ocr -E10 -Z128 -R0.01 -o intervention,assayed -k 6,6,6 -n 16,32,64 -p 2,2,2 -w /efs/smtag

not better than above.
    
    smtag-meta -f 10X_L400_geneprod_anonym_not_reporter_large_padding_no_ocr -E10 -Z128 -R0.01 -o intervention,assayed -k 6,6,6,6 -n 16,16,16,16 -p 2,2,2,2 -w /efs/smtag
   
not possible with L=400 + 2*380 of padding.
    

### Gene product entity and geneproduct reporter with enrichment:

    smtag-convert2th -c 181203all -L400 -X10 -E ".//sd-panel" -y ".//sd-tag[@type='gene']",".//sd-tag[@type='protein']" -e ".//sd-tag[@type='gene']",".//sd-tag[@type='protein']" -f 10X_L400_geneprod_exclusive_padding_no_ocr --noocr
    
    smtag-meta -f 10X_L400_geneprod_exclusive_padding_no_ocr -E60 -Z128 -R0.01 -o geneprod -k 6,6,6 -n 16,16,16 -p 2,2,2 -w /efs/smtag
    
__saved as `10X_L400_geneprod_exclusive_padding_no_ocr_geneprod_2019-01-15-20-56.zip`__
    
    smtag-meta -f 10X_L400_geneprod_exclusive_padding_no_ocr -E10 -Z128 -R0.01 -o reporter -k 6,6,6 -n 16,16,16 -p 2,2,2 -w /efs/smtag
    
__saved as `10X_L400_geneprod_exclusive_padding_no_ocr_reporter_2019-01-15-21-25.zip`__


### Role for small molecules:

    smtag-convert2th -c 181203all -L400 -X10 -E ".//sd-panel" -y ".//sd-tag[@type='molecule']" -e ".//sd-tag[@type='molecule']" -A ".//sd-tag[@type='molecule']" -f 10X_L400_small_molecule_anonym_large_padding_no_ocr --noocr /efs/smtag

    smtag-meta -f 10X_L400_small_molecule_anonym_large_padding_no_ocr -E30 -Z128 -R0.001 -o intervention,assayed -k 6,6,6 -n 8,8,8 -p 2,2,2 -w /efs/smtag


### General L400 dataset without enrichment:

    smtag-convert2th -c 181203all -L400 -X10 -E ".//sd-panel" -f 10X_L400_all_large_padding_no_ocr --noocr

#### Small molecule:

    smtag-meta -f 10X_L400_all_large_padding_no_ocr -E60 -Z128 -R0.01 -o small_molecule -k 6,6,6 -n 16,16,16 -p 2,2,2 -w /efs/smtag
 
__saved as `10X_L400_all_large_padding_no_ocr_small_molecule_2019-01-18-11-03.zip`__

Trying with smaller learning rate:
    
    smtag-meta -f 10X_L400_all_large_padding_no_ocr -E60 -Z128 -R0.001 -o small_molecule -k 6,6,6 -n 16,16,16 -p 2,2,2 -w /efs/smtag
    

saved as `...`
   
---
#### Geneproduct

    smtag-meta -f 10X_L400_all_large_padding_no_ocr -E60 -Z128 -R0.01 -o geneprod -k 6,6,6 -n 16,16,16 -p 2,2,2 -w /efs/smtag
    
    
---

#### Subcellular

    smtag-meta -f 10X_L400_all_large_padding_no_ocr -E60 -Z128 -R0.01 -o subcellular -k 6,6,6 -n 16,16,16 -p 2,2,2 -w /efs/smtag
    
__saved: `10X_L400_all_large_padding_no_ocr_subcellular_2019-01-18-16-41.zip`__

---
#### Cell

    smtag-meta -f 10X_L400_all_large_padding_no_ocr -E60 -Z128 -R0.01 -o cell -k 6,6,6 -n 16,16,16 -p 2,2,2 -w /efs/smtag
    
__saved: `10X_L400_all_large_padding_no_ocr_cell_2019-01-18-19-00.zip`__
    
---
#### Tissue

    smtag-meta -f 10X_L400_all_large_padding_no_ocr -E60 -Z128 -R0.01 -o tissue -k 6,6,6 -n 16,16,16 -p 2,2,2 -w /efs/smtag
    
large instabilities and divergence as soon as overfitting. Capacity too low or learning rate to large?
    
    smtag-meta -f 10X_L400_all_large_padding_no_ocr -E60 -Z128 -R0.001 -o tissue -k 6,6,6 -n 16,16,16 -p 2,2,2 -w /efs/smtag

saved as `10X_L400_all_large_padding_no_ocr_tissue_2019-01-19-13-03`
note: stable, f1=0.55

    smtag-meta -f 10X_L400_all_large_padding_no_ocr -E60 -Z128 -R0.01 -o tissue -k 6,6,6 -n 32,32,32 -p 2,2,2 -w /efs/smtag

__saved as `10X_L400_all_large_padding_no_ocr_tissue_2019-01-19-15-33.zip`__

Note: stable, f1=0.65 but loss on valid same as above.

---
#### Organism

    smtag-meta -f 10X_L400_all_large_padding_no_ocr -E60 -Z128 -R0.01 -o organism -k 6,6,6 -n 16,16,16 -p 2,2,2 -w /efs/smtag
    
__saved `10X_L400_all_large_padding_no_ocr_organism_2019-01-18-23-17.zip`__
    
---
#### Experimental assay

    smtag-meta -f 10X_L400_all_large_padding_no_ocr -E10 -Z128 -R0.01 -o assay -k 6,6,6 -n 16,16,16 -p 2,2,2 -w /efs/smtag
   
Note: 10 epochs only, overfit after that.
    
__saved `10X_L400_all_large_padding_no_ocr_assay_2019-01-19-10-48`__

---



## Figure level with large padding


No enrichment

    smtag-convert2th -c 181203all -L1200 -X10 -E ".//figure-caption" -f 10X_L1200_all_large_padding_no_ocr --noocr

Panel (dataset at __figure level__):

    smtag-meta -f 10X_L1200_all_large_padding_no_ocr -E60 -Z128 -R0.01 -o panel_start -k "12" -n "32" -p "1" -w /efs/smtag
    
__saved as `10X_L1200_all_large_padding_no_ocr_panel_start_2019-01-15-22-34.zip`__


Small molecule:

without enrichment:

    smtag-convert2th -c 181203all -L1200 -X10 -E ".//figure-caption" -f 10X_L1200_all_large_padding_no_ocr --noocr

with enrichment:

    smtag-convert2th -c 181203all -L1200 -X10 -E ".//figure-caption" -f 10X_L1200_all_large_padding_no_ocr -y molecule --noocr

    smtag-meta -f 10X_L1200_all_large_padding_no_ocr -E60 -Z128 -R0.01 -o  -k "12" -n "32" -p "1" -w /efs/smtag






# Disease with large padding

To intermingle negative examples, assembled manually an encoded/10X_L1200_NCBI_disease_augmented dataset by adding all of the already encoded examples from encoded/10X_L1200_all_large_padding_no_ocr:

    smtag-convert2th -L1200 -X10 -b -c NCBI_disease -f 10X_L1200_NCBI_disease_augmented_large_padding --noocr
    
    smtag-meta -E60 -Z128 -R0.001 -n "16,16,16" -p "2,2,2" -k "6,6,6" -o disease -f 10X_L1200_NCBI_disease_augmented_large_padding -w /efs/smtag
    
`10X_L1200_NCBI_disease_augmented_large_padding_disease_2019-01-20-23-13`
