CHORE: Use faster test translation scenario, cut CI time by ~5mins #3046

connortann · 2023-06-26T15:42:04Z

Supports #3045

Overview

Changes the model used in the translation scenario to amuch smaller one that will run much faster:
https://huggingface.co/mesolitica/finetune-translation-t5-super-super-tiny-standard-bahasa-cased

Timings

The change seems to save ~5 min on Linux, and 7+ min on MacOS.

On Linux GH runner python 3.11, test timings before:

80.04s call     tests/explainers/test_partition.py::test_translation
76.44s call     tests/explainers/test_partition.py::test_translation_auto
76.27s call     tests/explainers/test_partition.py::test_translation_algorithm_arg
74.24s call     tests/explainers/test_partition.py::test_serialization
73.27s call     tests/explainers/test_partition.py::test_serialization_custom_model_save
69.42s call     tests/explainers/test_partition.py::test_serialization_no_model_or_masker

Test timings after:

22.75s call     tests/explainers/test_partition.py::test_translation
<19s   call     tests/explainers/test_partition.py::test_translation_auto
<19s   call     tests/explainers/test_partition.py::test_translation_algorithm_arg
20.38s call     tests/explainers/test_partition.py::test_serialization
20.70s call     tests/explainers/test_partition.py::test_serialization_custom_model_save
19.80s call     tests/explainers/test_partition.py::test_serialization_no_model_or_masker

Overall that's 328 seconds faster on python 3.11 🎉

Timings vary between python versions and platforms, so the overall average speedup may differ.

Note about protobuf

This new model require that we use protobuf<=3.20.x, or otherwise a TypeError is thrown. There is a related thread on stackoverflow here.

Here is the full traceback:

____________ ERROR at setup of test_serialization_custom_model_save ____________

    @pytest.mark.skipif(sys.platform == 'win32', reason="Integer division bug in HuggingFace on Windows")
    @pytest.fixture(scope="session")
    def basic_translation_scenario():
        """ Create a basic transformers translation model and tokenizer.
        """
        AutoTokenizer = pytest.importorskip("transformers").AutoTokenizer
        AutoModelForSeq2SeqLM = pytest.importorskip("transformers").AutoModelForSeq2SeqLM
    
        # Use a very small model, for speed
        name = "mesolitica/finetune-translation-t5-super-super-tiny-standard-bahasa-cased"
>       tokenizer = AutoTokenizer.from_pretrained(name)

tests/explainers/conftest.py:16: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
/opt/hostedtoolcache/Python/3.11.4/x64/lib/python3.11/site-packages/transformers/models/auto/tokenization_auto.py:691: in from_pretrained
    return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
/opt/hostedtoolcache/Python/3.11.4/x64/lib/python3.11/site-packages/transformers/tokenization_utils_base.py:1825: in from_pretrained
    return cls._from_pretrained(
/opt/hostedtoolcache/Python/3.11.4/x64/lib/python3.11/site-packages/transformers/tokenization_utils_base.py:1988: in _from_pretrained
    tokenizer = cls(*init_inputs, **init_kwargs)
/opt/hostedtoolcache/Python/3.11.4/x64/lib/python3.11/site-packages/transformers/models/t5/tokenization_t5_fast.py:133: in __init__
    super().__init__(
/opt/hostedtoolcache/Python/3.11.4/x64/lib/python3.11/site-packages/transformers/tokenization_utils_fast.py:114: in __init__
    fast_tokenizer = convert_slow_tokenizer(slow_tokenizer)
/opt/hostedtoolcache/Python/3.11.4/x64/lib/python3.11/site-packages/transformers/convert_slow_tokenizer.py:1307: in convert_slow_tokenizer
    return converter_class(transformer_tokenizer).converted()
/opt/hostedtoolcache/Python/3.11.4/x64/lib/python3.11/site-packages/transformers/convert_slow_tokenizer.py:445: in __init__
    from .utils import sentencepiece_model_pb2 as model_pb2
/opt/hostedtoolcache/Python/3.11.4/x64/lib/python3.11/site-packages/transformers/utils/sentencepiece_model_pb2.py:91: in <module>
    _descriptor.EnumValueDescriptor(
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

cls = <class 'google.protobuf.descriptor.EnumValueDescriptor'>, name = 'UNIGRAM'
index = 0, number = 1, type = None, options = None, serialized_options = None
create_key = <object object at 0x7f886b0750d0>

    def __new__(cls, name, index, number,
                type=None,  # pylint: disable=redefined-builtin
                options=None, serialized_options=None, create_key=None):
>     _message.Message._CheckCalledFromGeneratedFile()
E     TypeError: Descriptors cannot not be created directly.
E     If this call came from a _pb2.py file, your generated code is out of date and must be regenerated with protoc >= 3.19.0.
E     If you cannot immediately regenerate your protos, some other possible workarounds are:
E      1. Downgrade the protobuf package to 3.20.x or lower.
E      2. Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python (but this will use pure-Python parsing and will be much slower).
E     
E     More information: https://developers.google.com/protocol-buffers/docs/news/2022-05-06#python-updates

In future, we might be able to relax this pin if the transformers library is updated, or if we find an alternative Tokenizer model that was trained with a more recent version of protobuf.

codecov · 2023-06-26T15:47:19Z

Codecov Report

Merging #3046 (1bb00b8) into master (9d72ec7) will not change coverage.
The diff coverage is n/a.

@@          Coverage Diff           @@
##           master   #3046   +/-   ##
======================================
  Coverage    0.00%   0.00%           
======================================
  Files          90      90           
  Lines       12850   12850           
======================================
  Misses      12850   12850

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

thatlittleboy

Great work!

Use faster test translation scenario

01e8da0

connortann added the ci Relating to Continuous Integration / GitHub Actions label Jun 26, 2023

connortann changed the title ~~Use faster test translation scenario~~ CHORE: Use faster test translation scenario Jun 26, 2023

connortann added 7 commits June 26, 2023 17:54

Try change protobuf

1bbe4cf

Session scope

fc4001d

Try pin protobuf

8ed5468

Typo

5b0e7e1

Pin protobuf again

5136a55

Update pyproject.toml

aa9fd09

Fixup docs

1bb00b8

connortann changed the title ~~CHORE: Use faster test translation scenario~~ CHORE: Use faster test translation scenario, faster tests by ~5min Jun 27, 2023

connortann added the enhancement Indicates new feature requests label Jun 27, 2023

connortann self-assigned this Jun 27, 2023

connortann changed the title ~~CHORE: Use faster test translation scenario, faster tests by ~5min~~ CHORE: Use faster test translation scenario Jun 27, 2023

connortann marked this pull request as ready for review June 27, 2023 10:04

connortann requested a review from thatlittleboy June 27, 2023 10:05

connortann changed the title ~~CHORE: Use faster test translation scenario~~ CHORE: Use faster test translation scenario, cut CI time by ~5mins Jun 27, 2023

thatlittleboy approved these changes Jun 27, 2023

View reviewed changes

thatlittleboy merged commit 8f9f7d1 into master Jun 27, 2023
15 checks passed

connortann deleted the chore/faster-tests branch June 27, 2023 15:16

connortann mentioned this pull request Sep 26, 2023

MAINT: Revert transformer model in tests to previous larger model #3286

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CHORE: Use faster test translation scenario, cut CI time by ~5mins #3046

CHORE: Use faster test translation scenario, cut CI time by ~5mins #3046

connortann commented Jun 26, 2023 •

edited

codecov bot commented Jun 26, 2023 •

edited

thatlittleboy left a comment

CHORE: Use faster test translation scenario, cut CI time by ~5mins #3046

CHORE: Use faster test translation scenario, cut CI time by ~5mins #3046

Conversation

connortann commented Jun 26, 2023 • edited

Overview

Timings

Note about protobuf

codecov bot commented Jun 26, 2023 • edited

Codecov Report

thatlittleboy left a comment

Choose a reason for hiding this comment

connortann commented Jun 26, 2023 •

edited

codecov bot commented Jun 26, 2023 •

edited