Fix token type ids #501

xenova · 2024-01-04T03:01:09Z

A fix needed for #497, where token type ids were not correctly chosen from the post-processor template.

closes #497

Example code:

import { AutoTokenizer, AutoModelForSequenceClassification } from '@xenova/transformers';

const model_id = 'Xenova/ms-marco-TinyBERT-L-2-v2';

const model = await AutoModelForSequenceClassification.from_pretrained(model_id);
const tokenizer = await AutoTokenizer.from_pretrained(model_id);

const features = tokenizer(
    ['How many people live in Berlin?', 'How many people live in Berlin?'],
    {
        text_pair: [
            'Berlin has a population of 3,520,031 registered inhabitants in an area of 891.82 square kilometers.',
            'New York City is famous for the Metropolitan Museum of Art.',
        ],
        padding: true,
        truncation: true,
    }
)

const scores = await model(features)
console.log(scores);
// quantized:   [ 7.210887908935547, -11.559350967407227 ]
// unquantized: [ 7.235750675201416, -11.562294006347656 ]

HuggingFaceDocBuilderDev · 2024-01-04T03:05:01Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

xenova added 4 commits January 4, 2024 03:45

Formatting

9f27c05

Update ESM pair template

526a2ef

Fix token type ids

c051aa2

Update JSDoc

873a41d

xenova added 8 commits January 4, 2024 13:54

Cleanup

e3bbd4c

Remove unused prepare_model_inputs function

aacc018

Move pad and truncate logic to helper functions

3bb8a42

Add static padding/truncation unit tests

fef0b86

Fix padding/truncation

c4f3123

Remove unused add_token_types function

02d9ca1

Reduce duplication

e329fba

let -> const where possible

83b416e

xenova mentioned this pull request Jan 4, 2024

Cross Encoder #497

Closed

Add cross-encoder models

b6fe318

xenova merged commit ebd5335 into main Jan 4, 2024

xenova deleted the fix-token-type-ids branch January 4, 2024 16:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix token type ids #501

Fix token type ids #501

Uh oh!

xenova commented Jan 4, 2024 •

edited

Loading

Uh oh!

HuggingFaceDocBuilderDev commented Jan 4, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Fix token type ids #501

Fix token type ids #501

Uh oh!

Conversation

xenova commented Jan 4, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Jan 4, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

xenova commented Jan 4, 2024 •

edited

Loading