ELMo has a cold start problem #76

sacdallago · 2020-10-23T13:54:45Z

After extensive testing and digging, @mheinzinger and I figured out that ELMo (aka: SeqVec) has an "initialization" problem (see: allenai/allennlp#1169)

In short: the first (couple) batch(es) embedded in SeqVec will produce significant to slightly different embeddings than what is expected. A visual example:

In this case: a reference set of sequences was embedded, including P12345. Then, P12345 was embedded in batches of 1 (setting max_amino_acids: 1) and the euclidean distance between these embeddings and the "reference" embeddings was calculated.

As it's evident, in the first batch, P12345 is 0.02 euclidean distances away from itself. In the second batch, P12345 is 0.007 euclidean distances distant from itself (a factor >10 less than in the first batch). Further down the line, the distance dicreases.

At this stage, the suggestion to fix this is that after programmatic initialization, SeqVec/ELMo needs to be run with a random (but real!) sequence in a single batch, before starting to process the actual sequence set. @mheinzinger suggest to actually run 2-3 sequences in 2-3 batches (1 sequence per batch). This should tryly "initialize" the model.

This is relevant for both CPU and GPU, meaning, in the code, you should add a call to embed:

The text was updated successfully, but these errors were encountered:

ELMo warmup to fix GH-76 See merge request sacdallago/bio_embeddings!97

sacdallago added bug Something isn't working prio:high labels Oct 23, 2020

sacdallago added this to the Version v0.1.5 milestone Oct 23, 2020

sacdallago assigned konstin Oct 23, 2020

konstin closed this as completed in 832224c Nov 3, 2020

konstin pushed a commit that referenced this issue Nov 3, 2020

Merge branch 'elmo-warmup' into 'develop'

f0030ce

ELMo warmup to fix GH-76 See merge request sacdallago/bio_embeddings!97

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ELMo has a cold start problem #76

ELMo has a cold start problem #76

sacdallago commented Oct 23, 2020 •

edited

Loading

ELMo has a cold start problem #76

ELMo has a cold start problem #76

Comments

sacdallago commented Oct 23, 2020 • edited Loading

sacdallago commented Oct 23, 2020 •

edited

Loading