Skip to content
This repository was archived by the owner on Sep 25, 2025. It is now read-only.
This repository was archived by the owner on Sep 25, 2025. It is now read-only.

Handling domain specific vocabulary #237

@hoonkai

Description

@hoonkai

Similar to #9 I'm trying to handle words that are not already in vocab.txt, e.g., "ohm", "farad", etc., but these words are not compositions of the wordpieces in vocab.txt. Should they be manually added to vocab.txt or should I follow the advice @jacobdevlin-google gave which is to use the existing vocab.txt and fine-tune the model on the in-domain text? As these words aren't compositions, how can they be learnt?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions