removing tensor resizing in future_mask #877

taylanbil · 2019-07-12T23:50:07Z

tensor resizing doesn't work well with tpus, this change is equivalent
to the base and works better w/ tpus.

tensor resizing doesn't work well with tpus, this change is equivalent to the base and works better w/ tpus.

facebook-github-bot

@myleott has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot · 2019-07-14T22:02:03Z

@myleott merged this pull request in c38b1f9.

…with Transformer Models" (#877) Summary: Pull Request resolved: fairinternal/fairseq-py#877 This PR implements guided alignment training described in "Jointly Learning to Align and Translate with Transformer Models (https://arxiv.org/abs/1909.02074)". In summary, it allows for training selected heads of the Transformer Model with external alignments computed by Statistical Alignment Toolkits. During inference, attention probabilities from the trained heads can be used to extract reliable alignments. In our work, we did not see any regressions in the translation performance because of guided alignment training. Pull Request resolved: #1095 Differential Revision: D17170337 Pulled By: myleott fbshipit-source-id: daa418bef70324d7088dbb30aa2adf9f95774859

TPU specific changes [here](https://gist.github.com/taylanbil/150abd31b1fbf5c91ca90ef5a4d79f08) The rest is rebasing on a more current fairseq upstream commit. --- * v0.7.1 -> v0.7.2 (#891) Summary: No major API changes since the last release. Cutting a new release since we'll be merging significant (possibly breaking) changes to logging, data loading and the masked LM implementation soon. Pull Request resolved: https://github.com/pytorch/fairseq/pull/891 Differential Revision: D16377132 Pulled By: myleott fbshipit-source-id: f1cb88e671ccd510e53334d0f449fe18585268c7 * Switch to torch.nn.functional.gelu when available Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/735 Differential Revision: D16377046 Pulled By: myleott fbshipit-source-id: 9725d4a3ce6b2fc8cee0b1d1cb8921f9d59c551a * Improve interactive generation (support --tokenizer and --bpe) Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/734 Differential Revision: D16377044 Pulled By: myleott fbshipit-source-id: 37d5553d76aa7c653113fec089f59710281c31d7 * Store task in the criterion base class Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/737 Differential Revision: D16377805 Pulled By: myleott fbshipit-source-id: 1e090a02ff4fbba8695173f57d3cc5b88ae98bbf * Create standalone label_smoothed_nll_loss Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/739 Differential Revision: D16377798 Pulled By: myleott fbshipit-source-id: 20047c80de2e6f108269ace4ae3eec906a5920dd * Allow not specifying --warmup-init-lr Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/736 Differential Revision: D16378001 Pulled By: myleott fbshipit-source-id: 2907f63bcbf7068ceaa48b00096040fa2639e569 * Rename _load_model_ensemble -> load_model_ensemble_and_task Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/738 Differential Revision: D16377803 Pulled By: myleott fbshipit-source-id: 6beb2f78e7464b70ff65a965d2b747cdca0ca951 * Rename data.transforms -> data.encoders Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/747 Differential Revision: D16403464 Pulled By: myleott fbshipit-source-id: ee3b4184f129a02be833c7bdc00685978b4de883 * Fix topp sampling issues (#882) Summary: Two issues here: 1. `last_included` should be the last included index `cumsum_mask[:, :, -1:]` instead of `cumsum_mask[:, :, :1]` (which is either 0 or 1); 2. If `--no-repeat-ngram-size` is set, the sum of `probs` may less than 1, we need to re-normalize to make it a valid probability distribution The following code can reproduce this issues: ``` import torch import numpy as np def _sample_topp(probs): # ===== Code from fairseq/search.py _sample_topp ====== # sort the last dimension (vocab dimension) in descending order sorted_probs, sorted_indices = probs.sort(descending=True) # compute a mask to indicate the words to be included in the top-P set. cumsum_probs = sorted_probs.cumsum(dim=2) mask = cumsum_probs.lt(sampling_topp) # note that mask was computed by 'lt'. One more word needs to be included # so that the cumulative probability mass can exceed p. cumsum_mask = mask.cumsum(dim=2) last_included = cumsum_mask[:, :, :1] mask = mask.scatter_(2, last_included, 1) # truncate unnecessary dims. max_dim = last_included.max() truncated_mask = mask[:, :, :max_dim + 1] truncated_probs = sorted_probs[:, :, :max_dim + 1] truncated_indices = sorted_indices[:, :, :max_dim + 1] # trim the words that are not in top-P by setting their probabilities # to 0, so that they would not be sampled later. trim_mask = 1 - truncated_mask trimed_probs = truncated_probs.masked_fill_(trim_mask, 0) return trimed_probs, truncated_indices # ======================================================== if __name__ == '__main__': np.random.seed(1234) torch.manual_seed(1234) sampling_topp = 0.9 probs = torch.softmax(torch.randn(1, 1, 10), dim=-1) # probs = tensor([0.0545, 0.0779, 0.0189, 0.0647, 0.0282, 0.0862, 0.0656, 0.1041, 0.0399, 0.4600]) print('probs =', probs[0][0]) trimed_probs, truncated_indices = _sample_topp(probs) cum_probs = trimed_probs.cumsum(dim=-1)[0][0] # cumsum = tensor([0.4600, 0.5641]) print('cumsum =', cum_probs) # Will throw AssertionError assert float(cum_probs[-1]) >= sampling_topp ``` Pull Request resolved: https://github.com/pytorch/fairseq/pull/882 Differential Revision: D16409269 Pulled By: xingz9 fbshipit-source-id: 94b1122eed50c656057b64e22af6f4a6ea7a68af * Default to mmap and infer dataset implementations automatically Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/751 Differential Revision: D16410989 Pulled By: myleott fbshipit-source-id: ddbbee49756f9ff6c4487977a3f5d2259b7abafe * Update GPT-2 BPE Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/749 Differential Revision: D16410984 Pulled By: myleott fbshipit-source-id: 7698df46b8a179afccb287990f9705358690454a * Misc improvements to torch hub interface Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/750 Differential Revision: D16410986 Pulled By: myleott fbshipit-source-id: 8ee6b4371d6ae5b041b00a54a6039a422345795e * Move Masked LM components to legacy/ -- new ones are coming Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/740 Differential Revision: D16377797 Pulled By: myleott fbshipit-source-id: f7d6c8b00a77e279ea94376b1f0fcd15087eaf5f * Add fallback for SLURM config Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/752 Differential Revision: D16417582 Pulled By: myleott fbshipit-source-id: 6b4289febcf9290452bb91f1f2181a02c09c82a7 * Fix --reset-meters Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/756 Differential Revision: D16418302 Pulled By: myleott fbshipit-source-id: 62495a0bff41d1741e2b09807a3b43ff2c66c8fb * Simplify hubconf Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/758 Differential Revision: D16418932 Pulled By: myleott fbshipit-source-id: 59f005164b61b9fa712922eeb23525f7eec38f38 * Add new Datasets Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/757 Differential Revision: D16418305 Pulled By: myleott fbshipit-source-id: 25f293a2792509f7a75c688e4bf8cff02e6bba2e * Add new Masked LM task + criterion Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/761 Differential Revision: D16421335 Pulled By: myleott fbshipit-source-id: 257d92c2b90361147642e2baa38486b4d18f6297 * Implement sparse transformer fixed attention pattern (#804) Summary: Pull Request resolved: https://github.com/facebookresearch/pytext/pull/804 Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/746 Pull Request resolved: https://github.com/pytorch/fairseq/pull/894 Adding an implementation of the sparse transformer to multi-head attention using the fixed attention pattern specified https://arxiv.org/pdf/1904.10509.pdf. The sparse_mask masks out words using -inf; after softmax, -inf becomes 0. Thus, a mask does not need to be re-calculated and re-applied when multiplying attn_weights and values. Four inputs are added to the config: sparse, is_bidirectional, stride, expressivity. If we are using the sparse transformer, is_bidirectional, stride, and expressivity must be specified (there are defaults). If is_bidirectional is False, the mask values using the fixed attention pattern described in the paper. If is_bidirectional is True, subset one includes all values in the current stride window and a summary from every stride window--all other values are masked. Stride (L in the paper) controls the window size and expressivity (c in the paper) controls the size of the summary. Reviewed By: borguz Differential Revision: D16042988 fbshipit-source-id: c59166dc7cfe89187a256e4076000c2458842fd5 * Fix read_binarized.py script Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/762 Differential Revision: D16427266 Pulled By: myleott fbshipit-source-id: 9bd9b8c6b4994ae98a62a37b34d03265bd365453 * Initializing mask as a tensor of ints (not long) (#875) Summary: Since mask really is a tensor of ints, this change should be mathematically equivalent to the base. On the other hand, this has performance implications for xla, hence the pull request. Pull Request resolved: https://github.com/pytorch/fairseq/pull/875 Differential Revision: D16232877 Pulled By: myleott fbshipit-source-id: e63175ee0016dcf0dfe10e2fd22570b8bbfbde84 * Update README.md Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/899 Differential Revision: D16448602 Pulled By: myleott fbshipit-source-id: afd1a1b713274b6328150cd85d7f8a81833597aa * check save_dir before beginning training Summary: I sadly discovery that my checkpoint directory wasn't globally readable after 8 hours of training. Adding this check at the beginning of train loop to keep that from happening again! Reviewed By: myleott Differential Revision: D16455394 fbshipit-source-id: 35959aa058150b2afb63710c468d01ebc8a12b0c * Update torch.hub usage Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/770 Differential Revision: D16491911 Pulled By: myleott fbshipit-source-id: 8dd2b76f8fa24183640ae9d1129ea47ded77d43d * Standardize on 'teacher forcing' rather than 'input feeding' which is… (#769) Summary: Input feeding generally refers to a slightly different concept Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/769 Differential Revision: D16491898 Pulled By: myleott fbshipit-source-id: 68573584e820f11f199db4e7e37e9ee7a69a3287 * Add RoBERTa README Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/778 Differential Revision: D16525447 Pulled By: myleott fbshipit-source-id: e721e3a10e243a2408a04f89f06b5adbbe2fdff2 * Add return_all_hiddens flag to hub interface Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/909 Differential Revision: D16532919 Pulled By: myleott fbshipit-source-id: 16ce884cf3d84579026e4406a75ba3c01a128dbd * Fix compatibility with PyTorch 1.0.x (Fixes #906) Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/910 Differential Revision: D16536532 Pulled By: myleott fbshipit-source-id: 56bb5570e70b5670ad87c64d9dd20c64c1fa9f5c * Make hub_utils.generator inherit from nn.Module Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/913 Differential Revision: D16536562 Pulled By: myleott fbshipit-source-id: ce28642da6868ec884e3e416388a652977a062df * Misc dataset improvements Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/911 Differential Revision: D16536559 Pulled By: myleott fbshipit-source-id: 7fe495054ce5b7658b1d3a43eca38c5858360236 * Correctly zero padding index in TransformerSentenceEncoder Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/912 Differential Revision: D16536561 Pulled By: myleott fbshipit-source-id: 54c5c20a826a14f4e690770e027bcb282acdf911 * Add Adamax optimizer Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/914 Differential Revision: D16536670 Pulled By: myleott fbshipit-source-id: 8a41c98f0fb87af6c384cdade756e3eae2978a88 * Change default --num-workers to 1 Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/779 Differential Revision: D16536673 Pulled By: myleott fbshipit-source-id: bf56e9a81d3086f3d95a3273391dc5e04ed2dbc4 * Update BPE library code Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/780 Differential Revision: D16537567 Pulled By: myleott fbshipit-source-id: 4e18c529959935e82ea122c3a2ee477308ffcbe3 * Add RoBERTa Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/916 Differential Revision: D16537774 Pulled By: myleott fbshipit-source-id: 86bb7b1913a428ee4a21674cc3fc7b39264067ec * Add instructions to load RoBERTa models on PyTorch 1.0 Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/921 Differential Revision: D16541025 Pulled By: myleott fbshipit-source-id: bb78d30fe285da2adfc7c4e5897ee01fa413b2e4 * Fix RoBERTa model import (fixes #918) Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/920 Differential Revision: D16540932 Pulled By: myleott fbshipit-source-id: b64438ad8651ecc8fe8904c5f69fa6111b4bed64 * Add missing files for RoBERTa hub interface Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/923 Differential Revision: D16541289 Pulled By: myleott fbshipit-source-id: b3563a9d61507d4864ac6ecf0648672eaa40b5f3 * Update README.md to add top-p sampling (#783) Summary: Update README.md to include the recently implemented top-p/nucleus sampling. Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/783 Differential Revision: D16543974 Pulled By: myleott fbshipit-source-id: 27c502af10ee390d29607038118a99ff0067aec4 * Support different --max-positions and --tokens-per-sample Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/924 Differential Revision: D16548165 Pulled By: myleott fbshipit-source-id: 49569ece3e54fad7b4f0dfb201ac99123bfdd4f2 * adding glue data preprocessing scripts (#771) Summary: 1) Added glue data pre-processing script. 2) updated README with usage. TODO: 1) releasing fairseq dictionary and remove hardcoded path. 2) remove hard-coded path for bpe-encoding, myleott what do you recommend for above TODOs? Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/771 Reviewed By: myleott Differential Revision: D16547679 Pulled By: myleott fbshipit-source-id: 6a6562d9b6215523d048fdf3daee63ffac21e231 * Fix tokenization (fixes #926) (#929) Summary: Fixes https://github.com/pytorch/fairseq/issues/926 Pull Request resolved: https://github.com/pytorch/fairseq/pull/929 Differential Revision: D16560281 Pulled By: myleott fbshipit-source-id: 751051bcdbf25207315bb05f5bee0235d21be627 * Relicense fairseq under MIT license (#786) Summary: The previous BSD+PATENTS license was controversial. We have been approved to relicense fairseq under the MIT license. Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/786 Differential Revision: D16560654 Pulled By: myleott fbshipit-source-id: f78b1beb4f2895dd7b9bfc79f5f952a2bfb94034 * 1) replaced fstring 2) fixed error from max-positions arg Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/787 Differential Revision: D16562052 fbshipit-source-id: 640e30b2378ec917d60092558d3088a77f9741cb * Add roberta.decode to hub interface to decode BPE (#931) Summary: Fixes https://github.com/pytorch/fairseq/issues/930. Pull Request resolved: https://github.com/pytorch/fairseq/pull/931 Differential Revision: D16562511 Pulled By: myleott fbshipit-source-id: c4c07e2f067326b79daa547dcb3db84aeddbd555 * Wmt19 models (#767) Summary: Release of the WMT 19 pretrained models Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/767 Reviewed By: edunov Differential Revision: D16472717 Pulled By: nng555 fbshipit-source-id: acf0fa3548c33f2bf2b5f71e551c782ad8c31a42 * Use commandline interface in preprocess_GLUE_tasks.sh (#937) Summary: Just a small fix for issue https://github.com/pytorch/fairseq/issues/936 . Pull Request resolved: https://github.com/pytorch/fairseq/pull/937 Differential Revision: D16580263 Pulled By: myleott fbshipit-source-id: 1777e782491c63697726e95bd555892da3fed4ec * Update language_model README.md (#941) Summary: Adding a backslash in the convolutional language model training usage. Pull Request resolved: https://github.com/pytorch/fairseq/pull/941 Differential Revision: D16581388 Pulled By: myleott fbshipit-source-id: 7e2e05ecf13e86cb844dc5200d49f560c63b12ff * Roberta add classification finetuning example readme (#790) Summary: Added readme for IMDB classification as tutorial for custm finetuning of roberta Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/790 Reviewed By: myleott Differential Revision: D16587877 Pulled By: myleott fbshipit-source-id: ed265b7254e6fa2fc8a899ba04c0d2bb45a7f5c4 * Fix citation errors (#791) Summary: Fixing booktitle in wmt19 citation Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/791 Reviewed By: myleott Differential Revision: D16589372 Pulled By: nng555 fbshipit-source-id: 28402784bb6ef0615e46b8d8383bfa52d79e46de * Fix small syntax error in hub_utils.py (fixes #942) Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/944 Differential Revision: D16593568 Pulled By: myleott fbshipit-source-id: 611bccae2ad0b8dc704c47a8a3343161010c2356 * Update PyTorch Hub interface Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/782 Differential Revision: D16542256 Pulled By: myleott fbshipit-source-id: ea3279e7a1ce4687a5914f32b76787c419be1ffa * Fix sampling with beam>1 Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/792 Differential Revision: D16591987 Pulled By: myleott fbshipit-source-id: d27c490ae75f80ded19226b8384f4776485dd694 * Changed tensor comparison return type from uint8 to bool (#21113) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21113 ghimport-source-id: 9c4ba63457a72bfc41894387e0b01be3fd9a9baf Test Plan: Imported from OSS Differential Revision: D15552204 Pulled By: izdeby fbshipit-source-id: a608213668649d058e22b510d7755cb99e7d0037 * Add more details for bulk BPE encoding Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/793 Differential Revision: D16603930 Pulled By: myleott fbshipit-source-id: b302db3743db4f36c14fb0dc7f3456fe8a0079dd * Use ==/!= to compare str, bytes, and int literals (#948) Summary: Identity is not the same thing as equality in Python. Pull Request resolved: https://github.com/pytorch/fairseq/pull/948 Differential Revision: D16608269 Pulled By: myleott fbshipit-source-id: be203d62e7824c96c59400d1b342196adb89a839 * Fix wmt19 links (#796) Summary: fix links to .tar.gz vs .tar.bz2 Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/796 Reviewed By: myleott Differential Revision: D16611740 Pulled By: nng555 fbshipit-source-id: 76210484225ed917ff14ef626845680d918948f5 * Update beam search code to support torch.bool change Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/797 Differential Revision: D16617067 Pulled By: myleott fbshipit-source-id: 52e3aeb98d6e3b55ff9154b784028bf13eabfe38 * Update READMEs for torch.hub Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/795 Differential Revision: D16620488 Pulled By: myleott fbshipit-source-id: 1998a9ccd8816fc7f590861fb4898f910a36bc1e * Add single-models for WMT'19 for hub tutorial Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/800 Differential Revision: D16621509 Pulled By: myleott fbshipit-source-id: d3e8e97d30bcafbc35c3f67cd8bbc657b6fa5fe7 * Fewer torch.hub requirements (#959) Summary: We will raise exceptions if these are needed and aren't available. Only keep minimum set of reqs Pull Request resolved: https://github.com/pytorch/fairseq/pull/959 Differential Revision: D16623304 Pulled By: myleott fbshipit-source-id: 8e65253742e393b527e8396a9433e64ebec9bb55 * Avoid cast in PositionalEmbeddings to fix BLEU drop in pytorch native export Summary: Tracing mode doesn't generalize correctly in positional embedding calculation, which caused -5 BLEU at transformer export when using pytorch native. Details: The original issue was that in ensemble_export, _to_tensor(x) in scripting mode turns integer x into 1-d tensor torch.tensor([x]), not 0-d tensor (scalar x) which is expected in the embedding. So the return value in embedding forward() is actually of wrong shape. When self.weights is of size [x,y], the return value should be (bsz, y, 1) but it was (bsz, 1, y), which caused problem in downstream computation. Tracing only becomes an issue when I used pos = timestep.view(-1)[0] to fix the shape. Then casting the scalar to primary int, to be used as index is not generalizable by tracing mode. Thus I need to convert everything to tensor and replace the advanced indexing with index_select operator. In summary, less understood features in both scripting&tracing sides caused the bleu drop. :) Reviewed By: myleott Differential Revision: D16623025 fbshipit-source-id: 0c7a2c3eafbd774760a5c880c6034009ee084abb * Fix generating with a fixed prefix Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/801 Differential Revision: D16628318 Pulled By: myleott fbshipit-source-id: 50e93bb9108afd2ba90f1edd4f34306a7c9964a4 * remove default params from args so architecture works properly Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/798 Reviewed By: myleott Differential Revision: D16619502 Pulled By: alexeib fbshipit-source-id: af20c90c4522458850d8f42cab001259ef4293cc * Add doc string for Roberta.encode function Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/969 Differential Revision: D16642388 Pulled By: myleott fbshipit-source-id: c5b1655dbddb697822feefa433f33f6bb08253ab * fixed roberta finetuning with --find-unused-parameters on multiGPU Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/806 Differential Revision: D16649933 fbshipit-source-id: 6eeda6e2caf8019228e3efc0c27ddfcc3c4d8674 * Add back set_epoch functionality lost in RoBERTa merge Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/982 Differential Revision: D16668353 Pulled By: myleott fbshipit-source-id: 699243d6c028c47cd0e3f801d89051b3f919b17e * Add code to realign RoBERTa features to word-level tokenizers Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/805 Differential Revision: D16670825 Pulled By: myleott fbshipit-source-id: 872a1a0274681a34d54bda00bfcfcda2e94144c6 * Fix tests and GLUE finetuning (fixes #989) Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/991 Differential Revision: D16687970 Pulled By: myleott fbshipit-source-id: d877fc16891a8ab97aec47a8d440baa56c2b5f46 * Added mask_fill api and some examples in README (#807) Summary: 1) This currently works only for single `<mask>` token as multi mask, we might have to look more into order of factorization. 2) This is currently only for single BPE token Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/807 Differential Revision: D16674509 fbshipit-source-id: 0a020030ee5df6a5115e5f85d5a9ef52b1ad9e1c * fixed reloading from checkpoint (#811) Summary: Tested by starting training from (a) `roberta.large`, (b) `roberta.large.mnli`, (c) `checkpoints/checkpoint_last.pt` Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/811 Reviewed By: myleott Differential Revision: D16689528 Pulled By: myleott fbshipit-source-id: 849d72ede9d526c34b4753c1bffd689554d1f837 * Asr initial push (#810) Summary: Initial code for speech recognition task. Right now only one ASR model added - https://arxiv.org/abs/1904.11660 unit test testing: python -m unittest discover tests also run model training with this code and obtained 5.0 test_clean | 13.4 test_other on librispeech with pytorch/audio features Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/810 Reviewed By: cpuhrsch Differential Revision: D16706659 Pulled By: okhonko fbshipit-source-id: 89a5f9883e50bc0e548234287aa0ea73f7402514 * Integrate with Apache Arrow/Plasma in-memory store for large datasets (#995) Summary: Datasets with many examples can generate very large indexes in TokenBlockDataset (and possibly elsewhere). When using `--num-workers>0` these indexes are pickled and transferred via a multiprocessing pipe, which is slow and can fail if the index grows beyond 4GB (~0.5B examples). Apache Arrow has an in-memory store called Plasma that will offload these arrays to shared memory, which both reduces duplication of the data and avoids needing to pickle. Pull Request resolved: https://github.com/pytorch/fairseq/pull/995 Differential Revision: D16697219 Pulled By: myleott fbshipit-source-id: 1b679ee5b3d2726af54ff418f6159a3671173fb8 * replace 'mkdir' with 'mkdir -p' (#997) Summary: Allow shell script to create sub directories with -p flag. Amends readme file too. Pull Request resolved: https://github.com/pytorch/fairseq/pull/997 Differential Revision: D16710813 Pulled By: myleott fbshipit-source-id: 89abefa27e8fac99d212fc9b7b0dbc3690043ba0 * added superglue dev set results to readme Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/815 Differential Revision: D16733633 fbshipit-source-id: 0a5029e41b6dbb9fb28e9703ad057d939d489d90 * MacOS requires c++ flag (#1000) Summary: To install on MacOS, `-stdlib=libc++` needs to be specified. Pull Request resolved: https://github.com/pytorch/fairseq/pull/1000 Differential Revision: D16733819 Pulled By: myleott fbshipit-source-id: 7a1ed11e2b4e1071e61c64c379c84f72e02ad2b5 * added sentence ranking task and loss (#809) Summary: This task and loss are used for sentence ranking and multiple choice tasks such as RACE Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/809 Reviewed By: myleott Differential Revision: D16715745 Pulled By: jingfeidu fbshipit-source-id: cb4d1c7b26ebb3e2382449ba51af5745ef56f30f * Fix Python 3.5 compat Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/1005 Differential Revision: D16751489 Pulled By: myleott fbshipit-source-id: 6e372ac23643e32a3791044c13f4466bdc28f049 * Add WSC task and criterion Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/1004 Differential Revision: D16751443 Pulled By: myleott fbshipit-source-id: f70acd6c7be6d69da45b5b32fe4c4eff021539ab * Fix torch.hub for MNLI Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/1006 Differential Revision: D16753078 Pulled By: myleott fbshipit-source-id: 970055632edffcce4e75931ed93b42a249120a4a * Update --restore-file logic (partially fixes #999) Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/1007 Differential Revision: D16762490 Pulled By: myleott fbshipit-source-id: d67137bcf581887850323d188bb4ea643a35ac9e * Remove LAMB optimizer (at least until we can test it more) Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/1008 Differential Revision: D16763315 Pulled By: myleott fbshipit-source-id: d4bad8384eec273f2d5de4ed29fb8d158ab9187c * Lint Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/817 Differential Revision: D16762905 Pulled By: myleott fbshipit-source-id: d920595bec44ed26b72dfc6fbc15c0aa107b4e56 * Minor fixes for RACE finetuning (#818) Summary: - remove unnecessary extra spaces in RACE data in preprocessing - fix finetuning instructions (add `--truncate-sequence` and add `--dropout` params) - close file handle in SentenceRankingTask Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/818 Differential Revision: D16770055 Pulled By: myleott fbshipit-source-id: 2c80084e92cdf8692f2ea7e43f7c344c402b9e61 * ignore files starting with . e.g. .ipynb_checkpoints (#819) Summary: .ipynb_checkpoints folder in models folders crashed the importlib now there is a check for this Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/819 Differential Revision: D16772192 Pulled By: myleott fbshipit-source-id: 01c956aef4ed312bc7645c31c83dbf98af89d931 * fix cosine scheduler docstring Summary: as title Reviewed By: myleott Differential Revision: D16773845 fbshipit-source-id: 2d10e197c31f94d894430559327289a4d03e33f7 * added readme code for inference with GLUE finetuned model Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/820 Differential Revision: D16783469 fbshipit-source-id: d5af8ba6a6685608d67b72d584952b8e43eabf9f * Add Commonsense QA task Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/1014 Differential Revision: D16784120 Pulled By: myleott fbshipit-source-id: 946c0e33b594f8378e4ab6482ce49efcb36e1743 * Add fairseq-validate Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/765 Differential Revision: D16763357 Pulled By: myleott fbshipit-source-id: 758b03158e486ee82786e2d5bf4e46073b50c503 * Updates for PyTorch 1.2 masking/bool behavior Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/821 Differential Revision: D16790120 Pulled By: myleott fbshipit-source-id: 2fb5070172636561d08596a29f08c93df07548bf * Fix tests Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/822 Differential Revision: D16800078 Pulled By: myleott fbshipit-source-id: b86e08e01f2fe13c64b77f1d23a5f6800f252bf7 * v0.7.2 -> v0.8.0 (#1017) Summary: Changelog: - Relicensed under MIT license - Add RoBERTa - Add wav2vec - Add WMT'19 models - Add initial ASR code - Changed torch.hub interface (`generate` renamed to `translate`) - Add `--tokenizer` and `--bpe` - f812e52: Renamed data.transforms -> data.encoders - 654affc: New Dataset API (optional) - `47fd985`: Deprecate old Masked LM components - `5f78106`: Set mmap as default dataset format and infer format automatically - Misc fixes for sampling - Misc fixes to support PyTorch 1.2 Pull Request resolved: https://github.com/pytorch/fairseq/pull/1017 Differential Revision: D16799880 Pulled By: myleott fbshipit-source-id: 45ad8bc531724a53063cbc24ca1c93f715cdc5a7 * Update READMEs Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/823 Differential Revision: D16804995 Pulled By: myleott fbshipit-source-id: abac5dc0ed6b7bfe2309ba273456e54b37340b2c * initial light and dynamic convolution kernels (#547) Summary: CUDA code for light/dynamicconv kernels, including pytorch modules. Modules can be built by running setup.py in each respective folder, and can then be imported and used like any other module. Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/547 Reviewed By: myleott, shubho Differential Revision: D15703660 Pulled By: nng555 fbshipit-source-id: e9c913753be3a1cd571965f7200df6678b644520 * added effcient wsc task/criterion for winogrande (#825) Summary: 1) So far getting `78%` on winogrande validation dataset comapred to `63.5%` in the paper. 2) Will upgrade readme once everything is finalized. Questions: 1) Should I just call `binary_wsc_task` instead of `winogrande` to be less specific to dataset and be generic? Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/825 Differential Revision: D16810159 fbshipit-source-id: cfde73561fa4caaaa63a4773c0aecd12ce1fa518 * Update README Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/826 Differential Revision: D16830402 Pulled By: myleott fbshipit-source-id: 25afaa6d9de7b51cc884e3f417c8e6b349f5a7bc * Backward reranking public (#667) Summary: Implementation of noisy channel model reranking for release with paper Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/667 Reviewed By: michaelauli Differential Revision: D15901665 Pulled By: nng555 fbshipit-source-id: 2de2c518be8e5828ffad72db3e741b0940623373 * Update README Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/827 Differential Revision: D16833252 Pulled By: myleott fbshipit-source-id: 8eded8cc651002dfd60869fc2383d305ed335d3a * BMUF Resetting local state param Summary: BMUF 1) Resetting BMUF parameters after warmup. 2) Resetting local param state after warmup. 3) Allowing user to pass block momentum value instead of gpu derived Block Momentum. Reviewed By: skritika, mrshenli Differential Revision: D16692026 fbshipit-source-id: d02eaf29d0e4b37007418166ec937d4bf5fe6aca * added hf bert bpe Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/829 Differential Revision: D16856693 fbshipit-source-id: 545bbf4815f5c40e72a6ed241312a51dc90e34a1 * added check in token block dataset for multiple consecutive blank lines Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/830 Differential Revision: D16861799 fbshipit-source-id: d85deaf78ec5b9c23eafd4145a96252e3901fa22 * implement tri-stage lr_scheduler (#1028) Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/1028 Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/831 tri-stage lr-scheduler consisted of 3 stages: 1. warmup; 2. hold; 3. (exponentially) decay; used in https://arxiv.org/pdf/1904.08779.pdf Reviewed By: myleott Differential Revision: D16806206 fbshipit-source-id: 40e472ec382449a0fb711f8ee980f14d27d2114a * Fix bug (the returned value has a dimension mismatch) in label-smoothed-cross-entropy for MoE (#1037) Summary: MoE will encounter a dimension mismatch bug when using label-smoothed cross entropy as the criterion, which occurs at [https://github.com/pytorch/fairseq/blob/master/fairseq/tasks/translation_moe.py#L125](url). This is a fix to the bug. Pull Request resolved: https://github.com/pytorch/fairseq/pull/1037 Differential Revision: D16892674 Pulled By: myleott fbshipit-source-id: a73bc03d2280356667d02422d22ad11d968d0c65 * remove shlex.quote in scripts/spm_train.py (#972) Summary: to resolve the issue https://github.com/pytorch/fairseq/issues/971 Pull Request resolved: https://github.com/pytorch/fairseq/pull/972 Differential Revision: D16892827 Pulled By: myleott fbshipit-source-id: baf277961f1e292f4593eefe31e3541aa9d0d8c4 * add constrains when checking multiple consecutive blank lines (#1031) Summary: It will cause runtime error on some standard datasets (e.g. wikitext-103). Details: After preprocessing to wikitext-103 folder with current master branch, I use fairseq-train and get the following Error: ```bash Traceback (most recent call last): File "/home/trinkle/.local/bin/fairseq-train", line 11, in <module> load_entry_point('fairseq', 'console_scripts', 'fairseq-train')() File "/data/git/Transformer/fairseq/fairseq_cli/train.py", line 321, in cli_main main(args) File "/data/git/Transformer/fairseq/fairseq_cli/train.py", line 46, in main task.load_dataset(valid_sub_split, combine=False, epoch=0) File "/data/git/Transformer/fairseq/fairseq/tasks/language_modeling.py", line 167, in load_dataset break_mode=self.args.sample_break_mode, include_targets=True, File "/data/git/Transformer/fairseq/fairseq/data/token_block_dataset.py", line 54, in init "Found multiple blank lines in the dataset, please remove them" AssertionError: Found multiple blank lines in the dataset, please remove them (eg. cat -s raw.txt) and preprocess the data again. ``` It's because these datasets have multiple blank lines. The assertion is added in https://github.com/pytorch/fairseq/commit/851c022610b27da3beaa4e40a6834b5fb3b44f44, however, adding this assertion is not a good way. Pull Request resolved: https://github.com/pytorch/fairseq/pull/1031 Differential Revision: D16892942 Pulled By: myleott fbshipit-source-id: 90c41b7d98a7b78f506bb57320f9f6b901e05d5b * Add instructions to resume training from released RoBERTa models (fixes #1034) Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/1041 Differential Revision: D16904073 Pulled By: myleott fbshipit-source-id: 22e5e25a15f7a0b6f2d827d98c953a6cec07610e * Small fixes Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/835 Differential Revision: D16904038 Pulled By: myleott fbshipit-source-id: 2c9d0b913f8d688297ac80fcabd905bd1397f66a * Back out "[fairseq][PR] Fix bug (the returned value has a dimension mismatch) in label-smoothed-cross-entropy for MoE" (#837) Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/837 Original commit changeset: a73bc03d2280 Differential Revision: D16904372 fbshipit-source-id: b4c4047b2686ba47258cdf0783059726134c920a * Fix method has same name as property Summary: Training is failing sometimes because `self.collater` can be both method and property for AsrDataset https://github.com/pytorch/fairseq/issues/1036 Reviewed By: jcai1 Differential Revision: D16919945 fbshipit-source-id: b34ba54e4dae315b7c723996610a348a8e3031af * Give path when checkpoint can't be found (#1040) Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/1040 Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/836 Reviewed By: myleott, liezl200 Differential Revision: D16889252 fbshipit-source-id: 45a1b6c1217fb099f0350096e38e1c7d83ea0a64 * vggblock support without pooling and pooling_kernel_size missing self (#839) Summary: 1) VggBlock was not supported if pooling kernel size was None. 2) Since we modify pooling kernel size by using _pair. We should use self.pooling_kernel_size. But I agree it doesn't matter as pytorch is robust to this. Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/839 Differential Revision: D16934112 Pulled By: okhonko fbshipit-source-id: b6b95163b0e7f7203d76d535f01a41912382bdc3 * Multiset (#838) Summary: Adds ability to tag individual examples with the names of their datasets, along with some minor miscellaneous fixes and improvements Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/838 Differential Revision: D16919175 Pulled By: alexeib fbshipit-source-id: 4bf493299645bae63f3ee6382e15f18a9f73666c * Parameterized criterions (#808) Summary: Support criterion with parameters, such as AutoSegmentationCriterion (ASG) used in wav2letter which has a transition matrix parameter. This is needed to integrate wav2letter's ASG into PySpeech. With this diff, parameters in criterions will be: (1) updated by optimizers, with a configurable learning rate (2) saved and loaded from checkpoints, preserving backward compatibility for criterions without parameters (3) synchronized across nodes in distributed training. Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/808 Reviewed By: jcai1 Differential Revision: D16934097 Pulled By: okhonko fbshipit-source-id: 121ec9382459385c6f9cbef3a8274bec1a434038 * fix string format to work in python 3.5 (#1050) Summary: change string fromat in fairseq/data/subsample_dataset.py#20 Pull Request resolved: https://github.com/pytorch/fairseq/pull/1050 Differential Revision: D16946060 Pulled By: okhonko fbshipit-source-id: 0eabf22e7ffd4f658b6d18c87dc6e59c81a355c7 * Misc changes Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/840 Differential Revision: D16947645 Pulled By: myleott fbshipit-source-id: e869789bc22bbf5cb08d9adfa44f9fc09b3805af * Add links to cuda models (#828) Summary: Add links to pre-trained cuda models in pay less attention Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/828 Reviewed By: michaelauli Differential Revision: D16833577 Pulled By: nng555 fbshipit-source-id: 1556aa77fd87ea259812de8ef65963257c370f9b * Fix year in noisy channel citation (#842) Summary: 2018->2019 Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/842 Differential Revision: D16973530 Pulled By: nng555 fbshipit-source-id: 00207b79821ac0257a53a0581a84582130e1bff5 * wav2vec everstore support Summary: changes for internal support Differential Revision: D16646887 fbshipit-source-id: ac5bf6c32901819726249422324eae32a0a6e148 * Cythonize token block dataset (#834) Summary: Cythonized token block dataset code, it's `> 100x` faster. Token block for entire `bookwiki+CC+stories+openweb` is just ~`39.9` seconds. TODO: 1) I think, I can make it 2x more faster. 2) cleanup. EDIT History: ~~First pass at parellelizing `token_block_dataset`. The code feels somewhat complicated and cluttered. This is 2-3x faster though on my tests on `bookwiki` dataset with both `complete` and `complete_doc` modes. myleott Can you take a look for correctness as I am still not 100% sure that I am not missing corner cases.~~ Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/834 Test Plan: Imported from GitHub, without a `Test Plan:` line. Test workflow: f133816198 Reviewed By: myleott Differential Revision: D16970257 Pulled By: myleott fbshipit-source-id: ec45a308193c9e9f3e7075336c15df4723228d6f * Suppress leaked semaphore warnings Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/844 Differential Revision: D16985131 Pulled By: myleott fbshipit-source-id: 66ba3b9aa0cdf329a1e38fc09786f34906afdb43 * fix cython dependency in the setup (#847) Summary: Fixes broken build for `pytext` https://github.com/pytorch/fairseq/commit/4fc39538aec5141aa41f5d6d7dc0097e7c0f7b48 Earlier version of setup tools required `cython` to be installed before even starting setup.py. This one fixes it. More details: https://github.com/pypa/setuptools/blob/master/CHANGES.rst#180 and https://stackoverflow.com/questions/37471313/setup-requires-with-cython Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/847 Differential Revision: D16997450 fbshipit-source-id: 5f65026c228a1b94280ca73937078ee3e21ce4f8 * wav2vec everstore support fix Summary: fixes some merge issues that prevented wav2vec from training properly Reviewed By: myleott Differential Revision: D16981120 fbshipit-source-id: cad39aaf2f44daabcbafe7b4e8735d055b3842a7 * installing numpy headers for cython Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/848 Differential Revision: D17060283 fbshipit-source-id: c7e61cae76a0566cc3e2ddc3ab4d48f8dec9d777 * Minor update of README.md of language model example (#1063) Summary: With this white space, the command might fail. ``` fairseq-preprocess: error: unrecognized arguments: zsh: command not found: --destdir ``` Pull Request resolved: https://github.com/pytorch/fairseq/pull/1063 Differential Revision: D17072516 Pulled By: myleott fbshipit-source-id: 68bb9d05b40b215b18aceac2bff3f5ec1ef2f537 * Minor cleanup for setup.py Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/1078 Differential Revision: D17072514 Pulled By: myleott fbshipit-source-id: 69a8c8c9cc7caa7e04c414329a5d79e6e1a6621c * use numpy function for filter by size when possible (#845) Summary: For general Masked language modeling use-case, this is much faster, (`3 minutes vs 1 sec`). Let me know what you think about it myleott, if you don't like all the special case checking, we can think of reorganizing the dataset APIs to always have `sizes` as property calculated in `__init__`. Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/845 Reviewed By: myleott Differential Revision: D16993769 Pulled By: myleott fbshipit-source-id: 161bba62af2965190c07c47e838ee967cb886e88 * Fix multi-gpu training (fixes #1088) Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/1089 Differential Revision: D17108918 Pulled By: myleott fbshipit-source-id: 818c77a5bbf3b146028991aca64d79b93f144b28 * Adopt Contributor Covenant Summary: In order to foster healthy open source communities, we're adopting the [Contributor Covenant](https://www.contributor-covenant.org/). It has been built by open source community members and represents a shared understanding of what is expected from a healthy community. Reviewed By: josephsavona, danobi, rdzhabarov Differential Revision: D17104640 fbshipit-source-id: d210000de686c5f0d97d602b50472d5869bc6a49 * set numpy seed explicitly + other minor fixes (#850) Summary: not setting the numpy seed explicitly at the beginning was an extremely annoying bug to find. it it caused different gpus to have a different view of data if some randomization was used in the dataset (e.g. subsample dataset) Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/850 Differential Revision: D17085006 Pulled By: alexeib fbshipit-source-id: 62bb2116369fb703df878e6bc24c06f1ea4e75a0 * add missing colorize dataset Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/851 Differential Revision: D17145769 Pulled By: alexeib fbshipit-source-id: 9dd26799d044ae5386e8204a129b5e3fc66d6e85 * Improve support for `python setup.py build_ext --inplace` Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/852 Differential Revision: D17147452 Pulled By: myleott fbshipit-source-id: 5fd9c7da3cc019c7beec98d41db1aef1329ee57a * Cleaner handling of numpy-based extensions in setup.py Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/853 Differential Revision: D17147879 Pulled By: myleott fbshipit-source-id: b1f5e838533de62ade52fa82112ea5308734c70f * fixed numpy based size filtering (#854) Summary: This bug got introduced in my [commit](https://github.com/fairinternal/fairseq-py/commit/9624f9651478bcb88022decf7e1b0685b410133b) for fast numpy based size filtering. Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/854 Differential Revision: D17150350 fbshipit-source-id: cb564119543e116d6a17784d1c22e9bce7059a0c * Fix an error in the command about Hierarchical Neural Story Generation (#1099) Summary: When I try to reproduce the experiment in _Hierarchical Neural Story Generation_, I found the command about generation cannot be executed. It said that **fairseq-generate: error: unrecognized arguments: --sampling-temperature 0.8** In the document, I find: ``` --temperature temperature for generation Default: 1.0 ``` And I don't find a parameter named `--sampling-temperature`, so I think the parameter `--sampling-temperature` should be changed to `--temperature` Pull Request resolved: https://github.com/pytorch/fairseq/pull/1099 Differential Revision: D17163065 Pulled By: myleott fbshipit-source-id: 25c430eeee4703f8ec30353825ffec4bb973da0d * added cython to install_requires Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/856 Reviewed By: myleott Differential Revision: D17162411 Pulled By: myleott fbshipit-source-id: e70ecc802398bbba2b5326e9700f2121c422fd18 * Fix multilingual translation bug for to-many case Summary: The logic for adding decoder side language token was wrongly implemented. The way we inject the language token is by replacing the eos symbol with language token symbol. However, the parameter for source / target eos symbol was not set correctly. Reviewed By: tangyuq Differential Revision: D17129108 fbshipit-source-id: 6fae385b787370656fd7ca7ab74e6bb91fe5463b * Return predicted token for RoBERTa filling mask Summary: Added the `predicted_token` to each `topk` filled output item Updated RoBERTa filling mask example in README.md Reviewed By: myleott Differential Revision: D17188810 fbshipit-source-id: 5fdc57ff2c13239dabf13a8dad43ae9a55e8931c * Average local optimizer param after warmup and during bmuf sync Summary: We have seen that averaging the local param instead of doing reset or broadcast after warmup improves the WER. Reviewed By: skritika Differential Revision: D16739278 fbshipit-source-id: 75033d2d25f9a88fd6dd325d0d9d4c856d22d947 * added fast stats sync option (#858) Summary: Added `--fast-stat-sync` option. This avoids pickle and achieves `~7%` more `wps` on 16 nodes. It is less flexible as it just aggregates only basic stats and it ignores the aggregate function defined by criterion. Let me know what you think myleott Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/858 Differential Revision: D17398770 fbshipit-source-id: 36261a1d970e67deeda8211af8f009ef9b4f9c14 * Update README.md Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/1140 Differential Revision: D17431506 Pulled By: myleott fbshipit-source-id: b47dae303d7e76daa5b49795476b5e48d7b090ad * Fix link to RACE fine-tuning instructions. Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/1125 Differential Revision: D17431557 Pulled By: myleott fbshipit-source-id: f712e5355d8dbb0a8f1170674d62e2b6880295b4 * dont project maske tokens for mlm loss (#859) Summary: This saves ~4-5gb gpu memory while training roberta large with `seq_len=512`. I am able to fit `--max-sentences=16` on `volta32gb` for `roberta-large` Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/859 Differential Revision: D17435814 fbshipit-source-id: 2663909768fac0ef0102107613770ee01b1f8c00 * Minor fix to make adafactor work for >2d conv kernels (#1122) Summary: missing .unsqueeze(-1) in line 124, without this change we'll encounter runtime error for >2d convolutional kernels, with this fix, we're applying adafactor's 2d logic to the two final dimensions. Pull Request resolved: https://github.com/pytorch/fairseq/pull/1122 Differential Revision: D17431662 Pulled By: myleott fbshipit-source-id: e7435e77270a9252f75f01b2457ef0048f5bcf36 * Add autogenerated cython files to gitignore (#860) Summary: `python setup.py build_ext --inplace` generates C++ source files directly in the Python source tree. They should most likely be ignored by git. Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/860 Differential Revision: D17460597 Pulled By: jma127 fbshipit-source-id: 72a29d438ebb57627b68ec7e9a2a77c8a36f1c21 * Add cython language_level hints Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/1147 Differential Revision: D17468447 Pulled By: myleott fbshipit-source-id: 0dbac04b92c8df74ad991d5e92cd02036d662369 * Add dataset class for weighted sampling with replacement. (#861) Summary: As discussed with Naman earlier today. Weighted sampling with replacement can be done on a per-epoch basis using `set_epoch()` functionality, which generates the samples as a function of random seed and epoch. Additionally, `FairseqTask` needs to set the starting epoch for the dataset at the very beginning of iterator construction. Not yet implemented is the per-epoch iterator construction, which is necessary to actually regenerate the batches for each epoch. Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/861 Differential Revision: D17460687 Pulled By: jma127 fbshipit-source-id: 1c2a54f04ac96b3561c100a6fd66a9fccbe3c658 * added multilingual masked LM training (#849) Summary: The multilingual-RoBERTa training is working with aconneau XLM data. Two pieces remaining: 1) `XLM` limits batch to be from same language, I am not 100% sure about the reason for that, but should be easy to implement, basically we can add `batch_by_size_and_language` instead of default `batch_by_size` function. If it's not critical, I would want to leave it out as it keeps the code very clean and simple. 2) `sample_ratio` in `ConcatDataset` works with `int` by tiling the datasets based on ratio. Currently I am handling it by sounding off the ratio to `first decimal` and then multiplying by `10`. We can see if some such simple heuristics are good enough, there are other options (we can talk about them offline). Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/849 Differential Revision: D17162460 fbshipit-source-id: d967f3d872f7a1f0aa4ea418bd362b68af9e432f * Update README.race.md Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/1155 Differential Revision: D17509762 Pulled By: myleott fbshipit-source-id: 4de535289c1f35abff0d8142d8580f3ede039f47 * Remove extraneous call to RNG in multi-GPU code path Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/865 Differential Revision: D17510276 Pulled By: myleott fbshipit-source-id: 24119402ad5fe95a1312fadb77bafe49a9197c6b * fixed train valid epoch iter Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/866 Differential Revision: D17517115 fbshipit-source-id: fd6921e642c99e37fce6ad58b24c93e70a5364e5 * Miscellaneous documentation improvements: (#868) Summary: - More clearly document the correspondence between FairseqAdam and torch.optim.AdamW - Add ResamplingDataset to Sphinx docs Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/868 Differential Revision: D17523244 Pulled By: jma127 fbshipit-source-id: 8e7b34b24889b2c8f70b09a52a625d2af135734b * fixed corner case in mlm criterion when all tokens get masked Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/869 Reviewed By: myleott Differential Revision: D17531776 Pulled By: myleott fbshipit-source-id: 349c9449a0a7db5d3bb8449561302d4220cfa60c * Issue 1146: Minor fix to roberta pre-training readme (#1165) Summary: This is to make this instructions a little more generalizable, since in some systems, bash will parse the spaces within quotes Addressing https://github.com/pytorch/fairseq/issues/1146 Pull Request resolved: https://github.com/pytorch/fairseq/pull/1165 Differential Revision: D17547810 Pulled By: myleott fbshipit-source-id: 5a026d42f678126b5ca8bc4477ba8f26ea549dcd * PR for Issue #1154: Two comments in lstm.py seem to be incorrect Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/1185 Differential Revision: D17602249 Pulled By: lematt1991 fbshipit-source-id: bd515b7d2ebce8181a80684f45223a8db7c7e3cd * Update getting_started.rst (#1188) Summary: Hi, I think there is a minor mistake in the doc. `--distributed-no-spawn` argument is needed for distributed training on multiple machines without `slurm`. Otherwise, the program will start 8 jobs on each GPU, when `nproc_per_node=8`. Pull Request resolved: https://github.com/pytorch/fairseq/pull/1188 Differential Revision: D17627778 Pulled By: myleott fbshipit-source-id: 35ab6b650dc1132d7cb2d150e80d2ebf0caf3e69 * Explain the language modelling format in RoBERTa pretraining readme Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/1174 Differential Revision: D17627767 Pulled By: myleott fbshipit-source-id: 7b5f77146b8776a5967699e430136039c066c851 * Fixing BMUF warmup and sync strategy Summary: Bmuf sync started happening even before warmup is done. This diff fixes the behavior and do bmuf sync once warmup is done or if it's zero. TODO: write a unit test case so that these problems can be figure out faster. Reviewed By: jay-mahadeokar Differential Revision: D17356277 fbshipit-source-id: 21500e6ed1225b97794e4ee203e5d7d04a2840f8 * Levenshtein Transformer paper code Summary: Code for our NeurIPS paper [Levenshtein Transformer](https://arxiv.org/abs/1905.11006) * Added Levenshtein Transformer model, task and criterion class * Added iterative NAT Transformer, insertion Transformer and CMLM Transformer model class for baselines * Add an option for prepending BOS to dictionary class and translation task class Reviewed By: myleott Differential Revision: D17297372 fbshipit-source-id: 54eca60831ae95dc721c2c34e882e1810ee575c7 * Fixing example of batched predictions for Roberta (#1195) Summary: For batched predictions in Roberta, the README was giving an example that was pretty unclear. After a thorough discussion with ngoyal2707 in issue https://github.com/pytorch/fairseq/issues/1167 he gave a clear example of how batched predictions were supposed to be done. Since I spent a lot of time on this inconsistency, I thought that it might benefit the community if his solution was in the official README 😄 ! For for details, see issue https://github.com/pytorch/fairseq/issues/1167 Pull Request resolved: https://github.com/pytorch/fairseq/pull/1195 Differential Revision: D17639354 Pulled By: myleott fbshipit-source-id: 3eb60c5804a6481f533b19073da7880dfd0d522d * RoBERTa now supported on TPU and TensorFlow via transformers library Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/1197 Differential Revision: D17651374 Pulled By: myleott fbshipit-source-id: 5feb986de1e682eb83c4479f419ad51325718572 * Implementation of the WeCNLP abstract "Cross+Self-Attention for Transformer Models" (#1097) Summary: This PR implements a new attention module which combines cross-attention (encoder-decoder attention) and the decoder self-attention. This work was accepted as an abstract at WeCNLP 2019 (https://www.wecnlp.ai/wecnlp-2019). Cross+Self-Attention reduces the amount of parameter and increases the inference speed without any degradation in translation quality. More details can be found in the attached [abstract](https://github.com/pytorch/fairseq/files/3561282/paper.pdf) Pull Request resolved: https://github.com/pytorch/fairseq/pull/1097 Differential Revision: D17653168 Pulled By: myleott fbshipit-source-id: deb834c2c78a229d7418ffbfea20ba3ce252991c * fix typo in README of examples/translation Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/1200 Differential Revision: D17659658 Pulled By: myleott fbshipit-source-id: 1863e6d60a439dbb7e71e5da68817c9d53649737 * Fix torch.hub to not depend on libnat Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/878 Differential Revision: D17661768 Pulled By: myleott fbshipit-source-id: 1e4c5f09eb14c40d491ca2459fd2adb8382fb6d2 * Implementation of the paper "Jointly Learning to Align and Translate with Transformer Models" (#877) Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/877 This PR implements guided alignment training described in "Jointly Learning to Align and Translate with Transformer Models (https://arxiv.org/abs/1909.02074)". In summary, it allows for training selected heads of the Transformer Model with external alignments computed by Statistical Alignment Toolkits. During inference, attention probabilities from the trained heads can be used to extract reliable alignments. In our work, we did not see any regressions in the translation performance because of guided alignment training. Pull Request resolved: https://github.com/pytorch/fairseq/pull/1095 Differential Revision: D17170337 Pulled By: myleott fbshipit-source-id: daa418bef70324d7088dbb30aa2adf9f95774859 * extract FP16OptimizerMixin for share the same logic in PyText (#1180) Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/1180 Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/874 extract FP16OptimizerMixin for share the same logic in PyText Reviewed By: hudeven Differential Revision: D17594102 fbshipit-source-id: 8625a4e4f3e09cbaba6ae92599c1121b86ed4e78 * Native Torchscript Wordpiece Tokenizer Op for BERTSquadQA, Torchscriptify BertSQUADQAModel (#879) Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/879 Pull Request resolved: https://github.com/facebookresearch/pytext/pull/1023 Pull Request resolved: https://github.com/pytorch/fairseq/pull/1211 Added a new native op that does wordpiece tokenization while additionally returning token start and end indices in the raw text as required by BertSquadQA. Includes Unit Tests for the native op and also to check its parity with the PyText Wordpiece Tokenizer. Also combined is a torchscript implementation of the Bert SQUAD QA Model. There are scripts for evaluation and testing of the torchscript code as well. Reviewed By: borguz, hikushalhere Differential Revision: D17455985 fbshipit-source-id: c2617c7ecbce0f733b31d04558da965d0b62637b * Add periodic CUDA cache cleanup (#882) Summary: This adds a periodic call to `torch.cuda.empty_cache()` in order to mitigate memory fragmentation in the PyTorch CUDA cached allocator that can cause OOMs on models approaching GPU memory limit. By default, this will occur every 64 updates. Performance considerations: - I've benchmarked this on a reasonably large model with memory footprint 16 GB, and the overhead with the default setting is <0.2%. With `update-freq > 1`, the cost is mitigated even further. - This behavior can be disabled with a value of zero. Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/882 Differential Revision: D17742386 Pulled By: jma127 fbshipit-source-id: 68d8f93f798d6818b5efc3d67d43b52dfb8b2865 * add pre-trained wav2vec model Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/884 Differential Revision: D17774515 Pulled By: alexeib fbshipit-source-id: d1ffe8ab723fa284c69b067bbd43d699eaa2f02f * Setting Global sync to 50 in BMUF Summary: In all our final settings, we are using global_sync = 50 and we get comparable results with DDP and caffe2. Setting the default global-sync-iter = 50 and users can just define --use-bmuf to enable it for training. Reviewed By: skritika Differential Revision: D17765094 fbshipit-source-id: 369591eeff266d757f89e1fc8dda01711146fdbc * fix max lengths in Levenshtein Tramsformer Summary: Fix the max length calculation in Levenshtein Transformer Reviewed By: jhcross Differential Revision: D17672946 fbshipit-source-id: e5efbe7e56cf879d3e822864e4398f99f45b04d4 * ensemble levts Summary: Add ensemble wrappers to the levenshtein NAT. Levenshtein Final softmax ensemble over the pipeline of three steps: deletion, placeholder insertion, and word selection. 1. Deletion 2. Placeholder Insertion 3. Word Selection Each step involves scoring, averaging the scores over the ensemble, and then make hard decisions with argmax. Then next step follows. We cannot do the three steps in parallel by design. Reviewed By: kahne Differential Revision: D17723202 fbshipit-source-id: 05f7a4fcd922a972cc4796ca397e8220f0b4d53e * Add printing of PyTorch memory summary on OOM (#885) Summary: PyTorch now has more comprehensive memory instrumentation, added in https://github.com/pytorch/pytorch/pull/27361 . This PR makes fairseq print a summary table of the memory state when an OOM occurs. Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/885 Differential Revision: D17820445 Pulled By: jma127 fbshipit-source-id: 1887417c7648d703f78e1cff9f2a5b89901f49d0 * Fix data loading memory issue in pyspeech Summary: We currently shard data when creating the batch iterator. This means we first load all indicese/frame lengths/handles into memory, and then do the sharding. This makes it impossible to train on large datasets with a high amount of workers because each worker will need to load the entire dataset into memory. For training on a million hours of data (i.e. semi-supervised or unsupervised approaches) this data loading just makes it…

…with Transformer Models" (facebookresearch#877) Summary: Pull Request resolved: fairinternal/fairseq-py#877 This PR implements guided alignment training described in "Jointly Learning to Align and Translate with Transformer Models (https://arxiv.org/abs/1909.02074)". In summary, it allows for training selected heads of the Transformer Model with external alignments computed by Statistical Alignment Toolkits. During inference, attention probabilities from the trained heads can be used to extract reliable alignments. In our work, we did not see any regressions in the translation performance because of guided alignment training. Pull Request resolved: facebookresearch#1095 Differential Revision: D17170337 Pulled By: myleott fbshipit-source-id: daa418bef70324d7088dbb30aa2adf9f95774859

removing tensor resizing in future_mask

6fe7f3d

tensor resizing doesn't work well with tpus, this change is equivalent to the base and works better w/ tpus.

facebook-github-bot added the CLA Signed label Jul 12, 2019

facebook-github-bot reviewed Jul 13, 2019

View reviewed changes

facebook-github-bot closed this in c38b1f9 Jul 14, 2019

facebook-github-bot added the Merged label Jul 14, 2019

taylanbil deleted the model branch October 2, 2019 20:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

removing tensor resizing in future_mask #877

removing tensor resizing in future_mask #877

taylanbil commented Jul 12, 2019

facebook-github-bot left a comment

facebook-github-bot commented Jul 14, 2019

removing tensor resizing in future_mask #877

removing tensor resizing in future_mask #877

Conversation

taylanbil commented Jul 12, 2019

facebook-github-bot left a comment

Choose a reason for hiding this comment

facebook-github-bot commented Jul 14, 2019