Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Synapse] Tokenizer padding / generate fixes #846

Merged
merged 13 commits into from
Jul 18, 2022
Merged

Conversation

opentaco
Copy link
Contributor

@opentaco opentaco commented Jul 18, 2022

[Synapse] Tokenizer padding / generate fixes

Live testing of core_server and core_validator on nobunaga appears successful.
Ready for merging.

Define PAD Token = EOS Token = 50256, according to https://github.com/huggingface/transformers/blob/49c8c67fb815a277405f84dea4a66353e19fb347/tests/models/gpt2/test_modeling_gpt2.py#L532

Set padding_side = "left", since generative default expects most recent token on right-hand side with padding on left, according to huggingface/transformers#10552
Note that tokenizer(padding=True, ...) is not used because unpadded offset_mapping is required for logit translation operations.
To allow function to be used in various scenarios, including for causallm and generate.
Now a single remapping_token function servers all server forward functions.
@coveralls
Copy link

coveralls commented Jul 18, 2022

Pull Request Test Coverage Report for Build 4e8eedeb-dedc-4050-8e16-1ed2819c6795

  • 2 of 4 (50.0%) changed or added relevant lines in 1 file are covered.
  • 4 unchanged lines in 2 files lost coverage.
  • Overall coverage increased (+0.03%) to 65.56%

Changes Missing Coverage Covered Lines Changed/Added Lines %
bittensor/_tokenizer/init.py 2 4 50.0%
Files with Coverage Reduction New Missed Lines %
bittensor/_metagraph/metagraph_impl.py 1 63.72%
bittensor/utils/init.py 3 78.68%
Totals Coverage Status
Change from base Build f8d61c2e-48fe-47b6-8f20-9c23d75c42e7: 0.03%
Covered Lines: 3830
Relevant Lines: 5842

💛 - Coveralls

@opentaco opentaco merged commit c78d4be into Synapse Jul 18, 2022
@ifrit98 ifrit98 deleted the Synapse-tokenizer-pull branch May 24, 2023 14:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants