Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support explicitly stored ngrams #61

Closed
7 tasks done
sebpuetz opened this issue Sep 26, 2019 · 90 comments
Closed
7 tasks done

Support explicitly stored ngrams #61

sebpuetz opened this issue Sep 26, 2019 · 90 comments
Assignees
Labels
feature New feature or request
Milestone

Comments

@sebpuetz
Copy link
Member

sebpuetz commented Sep 26, 2019

Entails changes to finalfusion-rust for serialization/usage.

  • Add config
  • Restructure vocab into module (vocab.rs for all variants is quite unwieldy)
  • Extend vocab module
    • Use indexer approach from finalfusion-rust, parameterize SubwordVocab with Indexer, separate ngram and word indices in the index through enum (might get ugly, considering how many type parameter we already have for the trainer structs)
    • Implement independent vocab type
  • Add support in binaries (depends on support in finalfusion-rust)
  • Update finalfusion-rust dependency once it supports NGramVocabs
  • Replace finalfusion dependency with release.
  • fix Fix EOS ngrams #72
@sebpuetz sebpuetz self-assigned this Sep 26, 2019
@sebpuetz sebpuetz added the feature New feature or request label Sep 26, 2019
@sebpuetz
Copy link
Member Author

Before going deeper into NGram vocabs in this repo, I'd like to pull in some more logic from finalfusion-rust. There's a lot of duplicate code floating around in the subword and vocab department which I think can be simplified a fair bit.

For instance, we should be able to use the finalfusion-rust SubwordVocab to cover both bucket and ngram vocabs if we wrap it in some TrainVocab in finalfrontier. Then we wouldn't need any subword module in finalfrontier and would only need to provide additional methods required for training on the vocabs.

What do you think? @danieldk @NianhengWu

@danieldk
Copy link
Member

Yes, I think this is a great idea. The finalfusion crate wasn't around when we implemented finalfrontier, but now that it exists, it makes sense to reuse as many primitives from finalfusion as we can.

@sebpuetz
Copy link
Member Author

sebpuetz commented Oct 3, 2019

This one's pretty much done, what's missing now is a release for `finalfusion-rust', then we can replace the dependencies with proper versions.

@danieldk
Copy link
Member

danieldk commented Oct 3, 2019

Maybe it would be good to do test runs and compute-accuracy for finalfrontier/finalfusion before the changes and then after (obviously, the n-grams model could only be done after). To see if there are no regressions.

The last time we did big changes, this shook out an important bug.

@sebpuetz
Copy link
Member Author

sebpuetz commented Oct 3, 2019

I'm currently training an ngram model on turing (started with ngram-mincount=100), that'll be done some time tonight. After that there are two more models lined up with ngram-mincount=50 and ngram-mincount=30.

I can train a bucket-vocab model in parallel to catch possible regressions, I guess a skipgram model should suffice since we didn't touch the training routine, only the vocab and config. The ngram models I'm training are all structgram models, since we've been using structgram for almost all experiments.

@sebpuetz
Copy link
Member Author

sebpuetz commented Oct 7, 2019

~48% on compute-accuracy for a skipgram model with

[common_config]
dims = 300
epochs = 15
loss = 'LogisticNegativeSampling'
lr = 0.05000000074505806
negative_samples = 5
zipf_exponent = 0.5

[model_config]
context_size = 10
model = 'SkipGram'
type = 'SkipGramLike'

[vocab_config]
discard_threshold = 0.00009999999747378752
max_n = 6
min_count = 30
min_n = 3
type = 'SubwordVocab'

[vocab_config.indexer]
buckets_exp = 21
type = 'Buckets'

that doesn't seem right. I'm now training on 0.6.1 with the same parameters to see where we might have introduced a regression.

@danieldk did we verify anything after adding support for no-subwords training?

@sebpuetz
Copy link
Member Author

sebpuetz commented Oct 7, 2019

I checked finalfusion-utils for possible regressions in compute-accuracy, but our public skipgram model still gets the 57.2%. No change on that end.

Unfortunately I didn't change the name before restarting training and since finalfrontier eagerly creates the model file before training, the model from my previous comment is gone.

@danieldk
Copy link
Member

danieldk commented Oct 7, 2019

So, accuracy is down ~9%, that must be a bug. I guess the next step is finding out whether it's finalfusion or finalfrontier changes causing this.

@sebpuetz
Copy link
Member Author

sebpuetz commented Oct 7, 2019

Do you remember if we verified models before the 0.6 release?

@danieldk
Copy link
Member

danieldk commented Oct 7, 2019

Of finalfrontier? I don't think so. We verified 0.5.0, where we had the bug where the ZipfRangeGenerator was accidently replaced by an RNG.

@sebpuetz
Copy link
Member Author

sebpuetz commented Oct 7, 2019

My hunch is that it's related to the 0.6 release. We changed quite a lot directly related to the training. If the cargo installed ff-train-skipgram results in similar accuracy we might be onto something.

@danieldk
Copy link
Member

danieldk commented Oct 7, 2019

0.5.0 -> 0.6.1 is fairly small though: embeddings norms storage, better default hyperparameters, dirgram model, and that's pretty much it. All the big changes were in 0.5.0, which we checked.

@sebpuetz
Copy link
Member Author

sebpuetz commented Oct 7, 2019

I misremembered releasing the no-subwords training. I meant to say it's probably related to changing the vocabs and indexing.

@danieldk
Copy link
Member

danieldk commented Oct 7, 2019

I misremembered releasing the no-subwords training. I meant to say it's probably related to changing the vocabs and indexing.

That's likely. One possibility is some incorrect offset in the embedding matrix (where the subword index is not added to the known vocab size).

@danieldk
Copy link
Member

danieldk commented Oct 7, 2019

What version of ff-compute-accuracy did you use? Should be easy to check on the old embeddings whether the finalfusion part is correct.

@sebpuetz
Copy link
Member Author

sebpuetz commented Oct 7, 2019

I verified against finalfusion-utils/master with an updated finalfusion dependency pointing to finalfusion = { git = "https://github.com/finalfusion/finalfusion-rust" }.

So I think that rules out most of finalfusion changes. Unless the serialization was broken in the meantime.

@sebpuetz
Copy link
Member Author

sebpuetz commented Oct 7, 2019

Shouldn't be about subword indices, adding this to SkipgramTrainer::train_iter_from doesn't panic but would do so if there are multiple indices into the word matrix for a single word.

                let mut word_n = 0;
                for idx in &idx {
                    if (idx as usize) < self.vocab.len() {
                        if word_n >= 1 {
                            panic!();
                        }
                        word_n += 1
                    }
                }

@sebpuetz
Copy link
Member Author

sebpuetz commented Oct 7, 2019

  • Every word - apart form EOS marker - has at least one subword index, so we're properly generating the subwords.

  • TrainModel::mean_embedding also gets the correct number of indices.

  • Running in debug mode also doesn't produce any crashes.

  • Upper bounds of negative sampling are correctly set to vocab length, negative sampling doesn't seem to be periodical and produces random output.

@danieldk
Copy link
Member

danieldk commented Oct 7, 2019

Can we see the difference in performance in the loss for a relatively small training corpus? If so, you could git bisect to find the offending commit.

@sebpuetz
Copy link
Member Author

sebpuetz commented Oct 7, 2019

0.4

$ cargo install finalfrontier-utils --force --version ^0.4
$ ff-train tdz-text.conll throwaway --threads 14 --dims 100 --epochs 5
    100% loss: 1.66629 lr: 0.00064 ETA: 00:00:00
$ ff-train tdz-text.conll throwaway --threads 14 --dims 100 --epochs 5
    100% loss: 1.65123 lr: 0.00013 ETA: 00:00:00

0.5

$ cargo install finalfrontier-utils --force --version ^0.5
$ ff-train-skipgram tdz-text.conll throwaway --threads 14 --dims 100 --epochs 5
    100% loss: 1.66744 lr: 0.00053 ETA: 00:00:00
$ ff-train-skipgram tdz-text.conll throwaway --threads 14 --dims 100 --epochs 5
    100% loss: 1.64953 lr: 0.00010 ETA: 00:00:00

0.6

$ cargo install finalfrontier-utils --force
$ ff-train-skipgram tdz-text.conll throwaway --threads 14 --dims 100 --epochs 5
    100% loss: 1.20165 lr: 0.00002 ETA: 00:00:00
$ ff-train-skipgram tdz-text.conll throwaway --threads 14 --dims 100 --epochs 5
    100% loss: 1.20786 lr: 0.00020 ETA: 00:00:00

master

$ cargo build --release
$ target/build/release ff-train-skipgram tdz-text.conll throwaway --threads 14 --dims 100 --epochs 5
    100% loss: 1.20882 lr: 0.00021 ETA: 00:00:00
$ target/build/release ff-train-skipgram tdz-text.conll throwaway --threads 14 --dims 100 --epochs 5   
    100% loss: 1.21025 lr: 0.00020 ETA: 00:00:00

Starting with 0.6 something apparently changed. The loss is now much lower. Although not sure whether that's really indicative. We changed something about the loss calculation at some point.

@sebpuetz
Copy link
Member Author

sebpuetz commented Oct 7, 2019

Default ctx-size changed. So that explains my differences but not the regression.

@danieldk
Copy link
Member

danieldk commented Oct 7, 2019

We changed some defaults:

    The changed defaults are:
    
    - Context size: 5 -> 10
    - Dims: 100 -> 300
    - Epochs: 5 -> 15

So I guess you have to modify the context size.

@danieldk
Copy link
Member

danieldk commented Oct 7, 2019

Heh, simultaneous.

Maybe you could redo them with the same context size?

@sebpuetz
Copy link
Member Author

sebpuetz commented Oct 7, 2019

Already done:

: 1570446569:0;cargo install finalfrontier-utils --force --version ^0.4
: 1570446620:0;ff-train tdz-text.conll throwaway --threads 14 --dims 100 --epochs 5 --context 10
    100% loss: 1.19736 lr: 0.00018 ETA: 00:00:00

and

: 1570446436:0;cargo install finalfrontier-utils --force --version ^0.5
: 1570446497:0;ff-train-skipgram tdz-text.conll throwaway --threads 14 --dims 100 --epochs 5 --context 10
    100% loss: 1.20563 lr: 0.00017 ETA: 00:00:00

Loss is the same as on 0.6 and master now. I'll convert one of our public models with current finalfusion-utils and run it against compute-accuracy to see whether serialization is broken...

@sebpuetz
Copy link
Member Author

sebpuetz commented Oct 7, 2019

$ finalfusion convert /zdv/sfb833-a3/public-data/finalfusion/german-skipgram-mincount-30-ctx-10-dims-300.fifu convert_test.fifu -f finalfusion -t finalfusion
$ finalfusion compute-accuracy ./convert_test.fifu ~/finalfrontier/analogies/de_trans_Google_analogies.txt --threads 10
[...]
    Total: 10611/18552 correct, accuracy: 57.20, avg cos: 0.69

Looks fine...

@sebpuetz
Copy link
Member Author

sebpuetz commented Oct 9, 2019

Cargo installed finalfrontier-utils coming in with even lower scores:

$ finalfusion metadata twe-bucket-vocab-test.fifu
[common_config]
dims = 300
epochs = 15
loss = 'LogisticNegativeSampling'
lr = 0.05000000074505806
negative_samples = 5
zipf_exponent = 0.5

[model_config]
context_size = 10
model = 'SkipGram'
type = 'SkipGramLike'

[vocab_config]
buckets_exp = 21
discard_threshold = 0.00009999999747378752
max_n = 6
min_count = 30
min_n = 3
type = 'SubwordVocab'

cargo installed finalfusion-utils produce the same numbers.

$ finalfusion compute-accuracy ../finalfrontier/twe-bucket-vocab-test.fifu ~/finalfrontier/analogies/de_trans_Google_analogies.txt --threads 40                            
██████████████████████████████ 100%  ETA: 00:00:00
capital-common-countries: 282/506 correct, accuracy: 55.73, avg cos: 0.74, skipped: 0
capital-world: 2678/4524 correct, accuracy: 59.20, avg cos: 0.73, skipped: 0
city-in-state: 716/2467 correct, accuracy: 29.02, avg cos: 0.68, skipped: 0
currency: 58/866 correct, accuracy: 6.70, avg cos: 0.60, skipped: 0
family: 275/506 correct, accuracy: 54.35, avg cos: 0.74, skipped: 0
gram2-opposite: 181/812 correct, accuracy: 22.29, avg cos: 0.65, skipped: 0
gram3-comparative: 901/1332 correct, accuracy: 67.64, avg cos: 0.66, skipped: 0
gram4-superlative: 310/1122 correct, accuracy: 27.63, avg cos: 0.56, skipped: 0
gram5-present-participle: 202/1056 correct, accuracy: 19.13, avg cos: 0.64, skipped: 0
gram6-nationality-adjective: 583/1599 correct, accuracy: 36.46, avg cos: 0.70, skipped: 0
gram7-past-tense: 734/1560 correct, accuracy: 47.05, avg cos: 0.66, skipped: 0
gram8-plural: 494/1332 correct, accuracy: 37.09, avg cos: 0.67, skipped: 0
gram9-plural-verbs: 601/870 correct, accuracy: 69.08, avg cos: 0.72, skipped: 0
Total: 8015/18552 correct, accuracy: 43.20, avg cos: 0.68
Skipped: 0/18552 (0%)

@danieldk
Copy link
Member

So, does this mean that the bug was already in 0.6.1?

@danieldk
Copy link
Member

If it's really down to missed updates, then we should see differences in losses, right?

At some point yes, but we are obscuring losses a bit by averaging over all time. The initial updates will always be much larger and the effect is probably less observable. The long tail we cannot observe as closely due to averaging. This is because we simulated word2vec, but we should really show some moving average.

BTW. not saying that this is the culprit, but it's one of the few things I could think of why a single version would produce good and bad results between runs.

Also why the older embeddings do so much better. I think I used to train with 20 threads at most. Sometimes even fewer (e.g. on shaw).

@danieldk
Copy link
Member

Are spectre mitigations turned on for hopper?

Yes, didn't disable them yet.

@danieldk
Copy link
Member

I'll restart master training on hopper with 10 rather than 30 threads. Would be good if we didn't train other embeddings on the same machine.

@sebpuetz
Copy link
Member Author

Losses for the 30 threads models on turing are super close to each other with 0.00195 and 0.00203.
Tesniere on 40 threads was a fair bit higher at 0.00282.

What did you get on hopper?

@danieldk
Copy link
Member

danieldk commented Oct 11, 2019

0.4.0: 0.00178
0.6.1: 0.00212

But again, let's not over-analyze an average loss, which is also updated with Hogwild ;). (Both were with 30 threads.)

@sebpuetz
Copy link
Member Author

sebpuetz commented Oct 11, 2019

Sure, just thought it'd be good to verify we're not magnitudes apart. Btw, 0.4.0 probably still had the incorrect loss calculation. Nvm 9bb40df

@sebpuetz
Copy link
Member Author

sebpuetz commented Oct 12, 2019

Threads could really be our answer, this is from tesniere with 20 threads on 0.5:

finalfusion compute-accuracy twe-bucket-vocab-test_0.5_20threads.fifu ../finalfusion-utils/de_trans_Google_analogies.txt --threads 40
██████████████████████████████ 100%  ETA: 00:00:00
capital-common-countries: 454/506 correct, accuracy: 89.72, avg cos: 0.76, skipped: 0
capital-world: 3885/4524 correct, accuracy: 85.88, avg cos: 0.74, skipped: 0
city-in-state: 876/2467 correct, accuracy: 35.51, avg cos: 0.69, skipped: 0
currency: 95/866 correct, accuracy: 10.97, avg cos: 0.59, skipped: 0
family: 289/506 correct, accuracy: 57.11, avg cos: 0.74, skipped: 0
gram2-opposite: 188/812 correct, accuracy: 23.15, avg cos: 0.65, skipped: 0
gram3-comparative: 893/1332 correct, accuracy: 67.04, avg cos: 0.67, skipped: 0
gram4-superlative: 282/1122 correct, accuracy: 25.13, avg cos: 0.57, skipped: 0
gram5-present-participle: 199/1056 correct, accuracy: 18.84, avg cos: 0.65, skipped: 0
gram6-nationality-adjective: 793/1599 correct, accuracy: 49.59, avg cos: 0.73, skipped: 0
gram7-past-tense: 812/1560 correct, accuracy: 52.05, avg cos: 0.68, skipped: 0
gram8-plural: 708/1332 correct, accuracy: 53.15, avg cos: 0.65, skipped: 0
gram9-plural-verbs: 704/870 correct, accuracy: 80.92, avg cos: 0.75, skipped: 0
Total: 10178/18552 correct, accuracy: 54.86, avg cos: 0.69
Skipped: 0/18552 (0%)

And turing on 0.5 with 30 threads:

 finalfusion compute-accuracy ../finalfrontier/twe-bucket-vocab-test_0.5.fifu ~/finalfrontier/analogies/de_trans_Google_analogies.txt --threads 40 
capital-common-countries: 273/506 correct, accuracy: 53.95, avg cos: 0.75, skipped: 0
capital-world: 2581/4524 correct, accuracy: 57.05, avg cos: 0.73, skipped: 0
city-in-state: 713/2467 correct, accuracy: 28.90, avg cos: 0.68, skipped: 0
currency: 73/866 correct, accuracy: 8.43, avg cos: 0.61, skipped: 0
family: 282/506 correct, accuracy: 55.73, avg cos: 0.74, skipped: 0
gram2-opposite: 179/812 correct, accuracy: 22.04, avg cos: 0.65, skipped: 0
gram3-comparative: 882/1332 correct, accuracy: 66.22, avg cos: 0.66, skipped: 0
gram4-superlative: 279/1122 correct, accuracy: 24.87, avg cos: 0.56, skipped: 0
gram5-present-participle: 196/1056 correct, accuracy: 18.56, avg cos: 0.64, skipped: 0
gram6-nationality-adjective: 568/1599 correct, accuracy: 35.52, avg cos: 0.70, skipped: 0
gram7-past-tense: 688/1560 correct, accuracy: 44.10, avg cos: 0.66, skipped: 0
gram8-plural: 491/1332 correct, accuracy: 36.86, avg cos: 0.68, skipped: 0
gram9-plural-verbs: 537/870 correct, accuracy: 61.72, avg cos: 0.72, skipped: 0
Total: 7742/18552 correct, accuracy: 41.73, avg cos: 0.68
Skipped: 0/18552 (0%)

I'll give master on tesniere and turing with 20 threads a go now.

We should add n_threads to metadata.

@danieldk
Copy link
Member

danieldk commented Oct 12, 2019

It would not be very satisfying, but it at least would be a good explanation. I agree on adding n_threads to the metadata. Once we have establish what works, we may also want to print a warning when someone uses high thread counts.

If this is indeed the culprit, we should also investigate switching to HogBatch. From a quick glance over the paper (still need to read it in detail), this seems to be simple enough to implement -- it updates the 'global' matrix every n instances rather than per instance. We would need a performant sparse matrix implementation, but there is a lot of prior work there (and only the rows need to be sparse, within a row we could use dense vectors).

@sebpuetz
Copy link
Member Author

I haven't really looked at more than the figures in the paper, if I find time, I'll also go through it. There was also an issue that suggested implementing some different optimization techniques: #42

Another small datapoint not related to the bug but turing: With identical hyperparameters turing shows an ETA of +24h compared to tesniere.

Are your results from hopper in yet?

@sebpuetz
Copy link
Member Author

There's also additional support for the threads theory - the skipgram model from a couple of months ago which wasn't broken was also trained on 20 threads. Found that by accident in my bash history.

@danieldk
Copy link
Member

danieldk commented Oct 12, 2019

Are your results from hopper in yet?

Nope, but I suspended the process overnight, because I had to run something else (prepare Dutch treebank), and I didn't want that to influence the result in any way. I have continued to train this morning. But the ETA is completely unreliable now. IIRC it would take a bit more than a day with 10 threads and it's now at 26%.

@danieldk
Copy link
Member

danieldk commented Oct 12, 2019

Another small datapoint not related to the bug but turing: With identical hyperparameters turing shows an ETA of +24h compared to tesniere.

I continue to be surprised about this, since they have identical CPUs. Of course, there is the VM, but that should only be a few percent difference. I have reset the CPU affinities of the VMs. I also noticed that the powersave CPU governor was used again since the last reboot, changed to performance.

Edit: tesniere also uses powersave.

@sebpuetz
Copy link
Member Author

The difference is still almost 20 hours, current ETAs are 17h on tesniere and 36h on turing.

Turing is running an older Ubuntu release at 16.04 while tesniere is on 18.04, although that shouldn't explain such dramatic differences.

@danieldk
Copy link
Member

You could perf record both processes for a bit (perf record -p <PID>) and read the perf report to see if the hot loops are spending most time in the same machine code.

The Ubuntu versions should indeed not make a large difference. On both machines, the same compiler is used. Should produce roughly the same machine code. And our hot loops don't do syscalls anyway.

@sebpuetz
Copy link
Member Author

Results from tesniere on master are in, 56.51%. Looks like everything was fine all along...

@danieldk
Copy link
Member

master with 10 threads should be done in half an hour or so ;).

@sebpuetz
Copy link
Member Author

Turing won't be done until the early morning. Anyways, I don't expect different results from turing, we've seen on 0.5, 0.6 and master that the number of threads can really do this. Which probably means we should set upper bounds on the default number of threads. Something like n_cpus / 2 unless n_cpus > 20 or 18.

@danieldk
Copy link
Member

danieldk commented Oct 13, 2019

10 threads on hopper on master:

capital-common-countries: 471/506 correct, accuracy: 93.08, avg cos: 0.77, skipped: 0
capital-world: 4029/4524 correct, accuracy: 89.06, avg cos: 0.75, skipped: 0
city-in-state: 825/2467 correct, accuracy: 33.44, avg cos: 0.69, skipped: 0
currency: 94/866 correct, accuracy: 10.85, avg cos: 0.60, skipped: 0
family: 300/506 correct, accuracy: 59.29, avg cos: 0.75, skipped: 0
gram2-opposite: 192/812 correct, accuracy: 23.65, avg cos: 0.65, skipped: 0
gram3-comparative: 888/1332 correct, accuracy: 66.67, avg cos: 0.67, skipped: 0
gram4-superlative: 278/1122 correct, accuracy: 24.78, avg cos: 0.57, skipped: 0
gram5-present-participle: 194/1056 correct, accuracy: 18.37, avg cos: 0.65, skipped: 0
gram6-nationality-adjective: 785/1599 correct, accuracy: 49.09, avg cos: 0.74, skipped: 0
gram7-past-tense: 822/1560 correct, accuracy: 52.69, avg cos: 0.68, skipped: 0
gram8-plural: 793/1332 correct, accuracy: 59.53, avg cos: 0.66, skipped: 0
gram9-plural-verbs: 718/870 correct, accuracy: 82.53, avg cos: 0.76, skipped: 0
Total: 10389/18552 correct, accuracy: 56.00, avg cos: 0.70
Skipped: 0/18552 (0%)

@danieldk
Copy link
Member

Which probably means we should set upper bounds on the default number of threads. Something like n_cpus / 2 unless n_cpus > 20 or 18.

Yes. And we have to put a warning in the README. Most people will think 'I have 40 cores, let's use all of them'.

@danieldk danieldk added this to the release-0.7 milestone Oct 13, 2019
@danieldk
Copy link
Member

danieldk commented Oct 13, 2019

Ok, seems like this issue could almost be closed. Maybe we should also run master with 20 threads for structured skipgram, dependency embeddings, and directional skipgram, to ensure that those are fine too. I am not sure if we want to do bucketed/explicit ngrams for each of these? I guess that we would detect regressions is we do something like:

skipgram - bucket
structured skipgram - bucket
directional skipgram - bucket
dependency - bucket
skipgram - explicit n-grams

We should probably do all of them with the fixed compute-accuracy and then replace https://github.com/finalfusion/finalfrontier/blob/master/ACCURACY.md with the results (that file is outdated anyway), so that we have everything readily available for testing 0.8 and beyond.

@sebpuetz
Copy link
Member Author

Well, there's still #72 to fix, the finalfusion dependency needs to be updated to a release

@danieldk
Copy link
Member

Well, there's still #72 to fix, the finalfusion dependency needs to be updated to a release

That's true. But we currently don't add EOS markers anyway, right? So this would only affect a hypothetical </s> in the data that makes the cut-off.

@danieldk
Copy link
Member

Never mind, we do.

@sebpuetz
Copy link
Member Author

They appear among the most frequent tokens in the vocab, usually the top index.

@danieldk
Copy link
Member

I had a vague recollection that we once had a discussion about not using EOS.

@sebpuetz
Copy link
Member Author

That was in #60, but we didn't remember this. I think we could patch this out since we include punctuation in training which essentially acts as EOS marker(s).

@sebpuetz
Copy link
Member Author

Btw, following these changes, enabling training of actual fastText embeddings is only a From<SubwordVocab<FastTextIndexer>>> and addition of a "fasttext"choice to the subwords option away.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature or request
Development

No branches or pull requests

2 participants