-
Notifications
You must be signed in to change notification settings - Fork 31.6k
Description
Code
`# Initialize the tokenizer and model
model_name = "facebook/rag-token-nq"
tokenizer = RagTokenizer.from_pretrained(model_name)
model = RagTokenForGeneration.from_pretrained(model_name)
# Initialize the retriever
retriever = RagRetriever.from_pretrained(model_name)
# Tokenization function
def tokenize_function(examples):
return tokenizer(
examples['text'],
truncation=True,
padding='max_length',
max_length=512
)
# Tokenize the dataset
tokenized_dataset = dataset.map(
tokenize_function,
batched=True,
remove_columns=dataset.column_names
)`
error
HFValidationError: Repo id must be in the form 'repo_name' or 'namespace/repo_name': 'https://storage.googleapis.com/huggingface-nlp/datasets/wiki_dpr/'. Use repo_type` argument if needed.
The above exception was the direct cause of the following exception:
OSError Traceback (most recent call last)
OSError: Incorrect path_or_model_id: 'https://storage.googleapis.com/huggingface-nlp/datasets/wiki_dpr/'. Please provide either the path to a local folder or the repo_id of a model on the Hub.
During handling of the above exception, another exception occurred:
OSError Traceback (most recent call last)
/usr/local/lib/python3.11/dist-packages/transformers/models/rag/retrieval_rag.py in _resolve_path(self, index_path, filename)
122 f"- or '{index_path}' is the correct path to a directory containing a file named {filename}.\n\n"
123 )
--> 124 raise EnvironmentError(msg)
125 if is_local:
126 logger.info(f"loading file {resolved_archive_file}")
OSError: Can't load 'psgs_w100.tsv.pkl'. Make sure that:
-
'https://storage.googleapis.com/huggingface-nlp/datasets/wiki_dpr/' is a correct remote path to a directory containing a file named psgs_w100.tsv.pkl
-
or 'https://storage.googleapis.com/huggingface-nlp/datasets/wiki_dpr/' is the correct path to a directory containing a file named psgs_w100.tsv.pkl.`