Fix negative sampling RNGs. #39

sebpuetz · 2019-04-19T09:39:08Z

During refactoring, the ZipfRangeGenerator was unintentionally
replaced by an RNG. This commit undoes this change.

sebpuetz · 2019-04-19T13:40:36Z

I'll wait with merging until the fix is verified through trained models.

During refactoring, the ZipfRangeGenerator was unintentionally replaced by an RNG. This commit undoes this unintended change.

sebpuetz · 2019-04-22T21:55:27Z

Well, there was yet another bug to be squashed. I squashed and pushed the fix already.
Structgram panicked because of out of bounds indices from negative samples. I can't remember why I put this code there:

            let negative = match self.skipgram_config.model {
                ModelType::StructuredSkipGram => {
                    let context_size = self.skipgram_config.context_size as usize;
                    let offset = output % (context_size * 2);
                    let rand_type = self.range_gen.next().unwrap();
                    // in structured skipgram the offset into the output matrix is calculated as:
                    // (vocab_idx * context_size * 2) + offset
                    rand_type * context_size * 2 + offset
                }
                ModelType::SkipGram => self.range_gen.next().unwrap(),
            };

But I guess that would've drawn a negative sample with the same offset as the original context.

danieldk · 2019-04-23T09:26:08Z

Weird, that seems to originate from before BandedRangeGenerator.

sebpuetz · 2019-04-23T09:27:51Z

Weird, that seems to originate from before BandedRangeGenerator.

I'm suspecting that I copy-pasted that from one of my working branches.

The skipgram model is done and is now only 0.2% below the pretrained model. 56.94% vs. 57.20%. Structgram will take a while, I didn't see that it crashed until last night.

sebpuetz · 2019-04-24T19:42:47Z

Structgram is also done now, the newly trained model is 0.47% below the pretrained model at 59.53% vs. 60.10%. Do you think this difference can be due to random factors?
I won't merge this PR until you green light that.

danieldk · 2019-04-25T08:22:53Z

Structgram is also done now, the newly trained model is 0.47% below the pretrained model at 59.53% vs. 60.10%. Do you think this difference can be due to random factors?

Hard to tell. I did see some variance between runs. I think 0.57% is within the realm of possibilities. Maybe you could train a basic topo model with both structgram models to verify that the results are the same for a downstream task?

(Of course, it may not give the best result as with zipf distribution parameter hacking, but it's a good extra sanity check.)

sebpuetz · 2019-04-25T13:39:10Z

New embeddings on val 98.19 vs. old embeddings on val 98.08.

Which is actually a new over-all highscore for our topo experiments.

Anyways, I think this still confirms that we are getting comparable embeddings out of the new release. The .1 difference could be due to random effects in toponn or due to random effects in finalfrontier, I don't think it's possible to figure that out.

danieldk · 2019-04-25T13:40:49Z

On Thu, Apr 25, 2019, at 15:39, Sebastian Pütz wrote: New embeddings on val 98.19 vs. old embeddings on val 98.08. Which is actually a new over-all highscore for our topo experiments. Anyways, I think this still confirms that we are getting comparable embeddings out of the new release. The .1 difference could be due to random effects in toponn or due to random effects in finalfrontier, I don't think it's possible to figure that out.

Ship it! Erm, I mean, merge it!

sebpuetz requested a review from danieldk April 19, 2019 09:39

danieldk approved these changes Apr 19, 2019

View reviewed changes

Fix negative sampling RNGs.

a458d3f

During refactoring, the ZipfRangeGenerator was unintentionally replaced by an RNG. This commit undoes this unintended change.

sebpuetz force-pushed the sampling_fix branch from c77da75 to a458d3f Compare April 22, 2019 21:46

sebpuetz requested a review from danieldk April 22, 2019 21:58

sebpuetz merged commit 16035a3 into master Apr 25, 2019

sebpuetz deleted the sampling_fix branch April 25, 2019 15:27

sebpuetz mentioned this pull request Oct 10, 2019

Support explicitly stored ngrams #61

Closed

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix negative sampling RNGs. #39

Fix negative sampling RNGs. #39

sebpuetz commented Apr 19, 2019

sebpuetz commented Apr 19, 2019 •

edited

Loading

sebpuetz commented Apr 22, 2019 •

edited

Loading

danieldk commented Apr 23, 2019

sebpuetz commented Apr 23, 2019

sebpuetz commented Apr 24, 2019

danieldk commented Apr 25, 2019 •

edited

Loading

sebpuetz commented Apr 25, 2019

danieldk commented Apr 25, 2019 via email

Fix negative sampling RNGs. #39

Fix negative sampling RNGs. #39

Conversation

sebpuetz commented Apr 19, 2019

sebpuetz commented Apr 19, 2019 • edited Loading

sebpuetz commented Apr 22, 2019 • edited Loading

danieldk commented Apr 23, 2019

sebpuetz commented Apr 23, 2019

sebpuetz commented Apr 24, 2019

danieldk commented Apr 25, 2019 • edited Loading

sebpuetz commented Apr 25, 2019

danieldk commented Apr 25, 2019 via email

sebpuetz commented Apr 19, 2019 •

edited

Loading

sebpuetz commented Apr 22, 2019 •

edited

Loading

danieldk commented Apr 25, 2019 •

edited

Loading