TypeError: Couldn't build proto file into descriptor pool! Invalid proto descriptor for file "sentencepiece_model.proto": sentencepiece_model.proto: A file with this name is already in the pool. #12882

KawaiiNotHawaii · 2023-05-23T04:06:12Z

What version of protobuf and what language are you using?
Version: v3.8.0 (NOTE: please try updating to the latest version of protoc/runtime possible beforehand to attempt to resolve your problem)
Language: Python

What operating system (Linux, Windows, ...) and version?
16.04.6 LTS (GNU/Linux 4.4.0-142-generic x86_64)

What runtime / compiler are you using (e.g., python version or gcc version)
Python 3.8.0 | packaged by conda-forge | (default, Nov 22 2019, 19:11:38)
[GCC 7.3.0] :: Anaconda, Inc. on linux

What did you do?
Steps to reproduce the behavior:

Load a LanguageModelingTransformer in lightning_transformer
load a dataset from BigBench using load_datasets() imported from datasets
See error (while trace stack differs, same error (TypeError) occurs even if I exchange the order, that is loading dataset first and loading the model after it)

What did you expect to see
Dataset loaded successfully

What did you see instead?
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ in <cell line: 2>:2 │
│ │
│ 1 # dataset = ["What the result of 1+3", "Calculate 4*253"] │
│ ❱ 2 dataset = load_dataset("bigbench", 'modified_arithmetic', cache_dir='data', split='valid │
│ 3 # dataset = dataset['validation']['inputs'][:] │
│ 4 │
│ 5 # Create a DataLoader │
│ │
│ /data2/cxsun/anaconda3/envs/llm_new/lib/python3.8/site-packages/datasets/load.py:1773 in │
│ load_dataset │
│ │
│ 1770 │ ) │
│ 1771 │ │
│ 1772 │ # Create a dataset builder │
│ ❱ 1773 │ builder_instance = load_dataset_builder( │
│ 1774 │ │ path=path, │
│ 1775 │ │ name=name, │
│ 1776 │ │ data_dir=data_dir, │
│ │
│ /data2/cxsun/anaconda3/envs/llm_new/lib/python3.8/site-packages/datasets/load.py:1512 in │
│ load_dataset_builder │
│ │
│ 1509 │ ) │
│ 1510 │ │
│ 1511 │ # Get dataset builder class from the processing script │
│ ❱ 1512 │ builder_cls = import_main_class(dataset_module.module_path) │
│ 1513 │ builder_kwargs = dataset_module.builder_kwargs │
│ 1514 │ data_files = builder_kwargs.pop("data_files", data_files) │
│ 1515 │ config_name = builder_kwargs.pop("config_name", name) │
│ │
│ /data2/cxsun/anaconda3/envs/llm_new/lib/python3.8/site-packages/datasets/load.py:115 in │
│ import_main_class │
│ │
│ 112 │ - a DatasetBuilder if dataset is True │
│ 113 │ - a Metric if dataset is False │
│ 114 │ """ │
│ ❱ 115 │ module = importlib.import_module(module_path) │
│ 116 │ │
│ 117 │ if dataset: │
│ 118 │ │ main_cls_type = DatasetBuilder │
│ │
│ /data2/cxsun/anaconda3/envs/llm_new/lib/python3.8/importlib/init.py:127 in import_module │
│ │
│ 124 │ │ │ if character != '.': │
│ 125 │ │ │ │ break │
│ 126 │ │ │ level += 1 │
│ ❱ 127 │ return _bootstrap._gcd_import(name[level:], package, level) │
│ 128 │
│ 129 │
│ 130 _RELOADING = {} │
│ in _gcd_import:1014 │
│ in _find_and_load:991 │
│ in _find_and_load_unlocked:975 │
│ in _load_unlocked:671 │
│ in exec_module:783 │
│ in _call_with_frames_removed:219 │
│ │
│ /home/cxsun/.cache/huggingface/modules/datasets_modules/datasets/bigbench/d2757373c3fb6b35a846ee │
│ 951265c3f8fbf0124fb650b12cef5678cf902914d2/bigbench.py:22 in │
│ │
│ 19 │
│ 20 from typing import Optional │
│ 21 │
│ ❱ 22 import bigbench.api.util as bb_utils # From: "bigbench @ https://storage.googleapis.com │
│ 23 import bigbench.bbseqio.bigbench_bridge as bbb │
│ 24 from bigbench.api import json_task │
│ 25 from bigbench.bbseqio import bigbench_json_paths as bb_json_paths │
│ │
│ /data2/cxsun/anaconda3/envs/llm_new/lib/python3.8/site-packages/bigbench/api/util.py:25 in │
│ │
│ │
│ 22 import json │
│ 23 import os │
│ 24 import bigbench.api.task as task_api │
│ ❱ 25 import bigbench.api.json_task as json_task │
│ 26 import bigbench.api.model as model_api │
│ 27 import bigbench.api.results as results_api │
│ 28 import bigbench.api.task_metrics as task_metrics │
│ │
│ /data2/cxsun/anaconda3/envs/llm_new/lib/python3.8/site-packages/bigbench/api/json_task.py:26 in │
│ │
│ │
│ 23 │
│ 24 from bigbench.api import json_utils │
│ 25 import bigbench.api.task as task │
│ ❱ 26 import bigbench.api.task_metrics as metrics │
│ 27 import bigbench.api.results as results_api │
│ 28 import numpy as np │
│ 29 from scipy.special import logsumexp │
│ │
│ /data2/cxsun/anaconda3/envs/llm_new/lib/python3.8/site-packages/bigbench/api/task_metrics.py:24 │
│ in │
│ │
│ 21 │
│ 22 from datasets import load_metric │
│ 23 from scipy.special import logsumexp │
│ ❱ 24 from t5.evaluation import metrics │
│ 25 from sklearn.metrics import f1_score │
│ 26 │
│ 27 │
│ │
│ /data2/cxsun/anaconda3/envs/llm_new/lib/python3.8/site-packages/t5/init.py:17 in │
│ │
│ 14 │
│ 15 """Import API modules.""" │
│ 16 │
│ ❱ 17 import t5.data │
│ 18 import t5.evaluation │
│ 19 │
│ 20 # Version number. │
│ │
│ /data2/cxsun/anaconda3/envs/llm_new/lib/python3.8/site-packages/t5/data/init.py:17 in │
│ │
│ │
│ 14 │
│ 15 """Import data modules.""" │
│ 16 # pylint:disable=wildcard-import,g-bad-import-order │
│ ❱ 17 from t5.data.dataset_providers import * │
│ 18 from t5.data.glue_utils import * │
│ 19 import t5.data.postprocessors │
│ 20 import t5.data.preprocessors │
│ │
│ /data2/cxsun/anaconda3/envs/llm_new/lib/python3.8/site-packages/t5/data/dataset_providers.py:28 │
│ in │
│ │
│ 25 from collections.abc import Mapping │
│ 26 import re │
│ 27 │
│ ❱ 28 import seqio │
│ 29 from t5.data import utils │
│ 30 import tensorflow.compat.v2 as tf │
│ 31 │
│ │
│ /data2/cxsun/anaconda3/envs/llm_new/lib/python3.8/site-packages/seqio/init.py:18 in │
│ │
│ 15 """Import to top-level API.""" │
│ 16 # pylint:disable=wildcard-import,g-bad-import-order │
│ 17 │
│ ❱ 18 from seqio.dataset_providers import * │
│ 19 from seqio import evaluation │
│ 20 from seqio import experimental │
│ 21 from seqio.evaluation import Evaluator │
│ │
│ /data2/cxsun/anaconda3/envs/llm_new/lib/python3.8/site-packages/seqio/dataset_providers.py:38 in │
│ │
│ │
│ 35 import numpy as np │
│ 36 from packaging import version as version_lib │
│ 37 import pyglove as pg │
│ ❱ 38 from seqio import metrics as metrics_lib │
│ 39 from seqio import preprocessors as seqio_preprocessors │
│ 40 from seqio import task_registry_provenance_tracking │
│ 41 from seqio import utils │
│ │
│ /data2/cxsun/anaconda3/envs/llm_new/lib/python3.8/site-packages/seqio/metrics.py:25 in │
│ │
│ 22 import clu.metrics │
│ 23 import flax │
│ 24 import numpy as np │
│ ❱ 25 from seqio import utils │
│ 26 import tensorflow.compat.v2 as tf │
│ 27 │
│ 28 │
│ │
│ /data2/cxsun/anaconda3/envs/llm_new/lib/python3.8/site-packages/seqio/utils.py:29 in │
│ │
│ 26 │
│ 27 from absl import logging │
│ 28 import numpy as np │
│ ❱ 29 from seqio.vocabularies import Vocabulary │
│ 30 import tensorflow.compat.v2 as tf │
│ 31 import tensorflow_datasets as tfds │
│ 32 │
│ │
│ /data2/cxsun/anaconda3/envs/llm_new/lib/python3.8/site-packages/seqio/vocabularies.py:25 in │
│ │
│ │
│ 22 import tensorflow.compat.v2 as tf │
│ 23 import tensorflow_text as tf_text │
│ 24 │
│ ❱ 25 from sentencepiece import sentencepiece_model_pb2 │
│ 26 import sentencepiece as sentencepiece_processor │
│ 27 │
│ 28 PAD_ID = 0 │
│ │
│ /data2/cxsun/anaconda3/envs/llm_new/lib/python3.8/site-packages/sentencepiece/sentencepiece_mode │
│ l_pb2.py:16 in │
│ │
│ 13 │
│ 14 │
│ 15 │
│ ❱ 16 DESCRIPTOR = _descriptor.FileDescriptor( │
│ 17 name='sentencepiece_model.proto', │
│ 18 package='sentencepiece', │
│ 19 syntax='proto2', │
│ │
│ /data2/cxsun/anaconda3/envs/llm_new/lib/python3.8/site-packages/google/protobuf/descriptor.py:10 │
│ 24 in new │
│ │
│ 1021 │ │ except KeyError: │
│ 1022 │ │ raise RuntimeError('Please link in cpp generated lib for %s' % (name)) │
│ 1023 │ elif serialized_pb: │
│ ❱ 1024 │ │ return _message.default_pool.AddSerializedFile(serialized_pb) │
│ 1025 │ else: │
│ 1026 │ │ return super(FileDescriptor, cls).new(cls) │
│ 1027 │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
TypeError: Couldn't build proto file into descriptor pool!
Invalid proto descriptor for file "sentencepiece_model.proto":
sentencepiece_model.proto: A file with this name is already in the pool.

Make sure you include information that can help us debug (full error message, exception listing, stack trace, logs).

Anything else we should know about your project / environment
Dependencies mentioned:

Name: lightning-transformers
Version: 0.2.5
Summary: Lightning Transformers.
Home-page: https://github.com/Lightning-AI/lightning-transformers
Author: Lightning AI et al.
Author-email: pytorch@lightning.ai
License: Apache-2.0
Location: /data2/cxsun/anaconda3/envs/llm_new/lib/python3.8/site-packages
Requires: datasets, Pillow, pytorch-lightning, sentencepiece, torchmetrics, transformers
Required-by:

Name: datasets
Version: 2.12.0
Summary: HuggingFace community-driven open-source library of datasets
Home-page: https://github.com/huggingface/datasets
Author: HuggingFace Inc.
Author-email: thomas@huggingface.co
License: Apache 2.0
Location: /data2/cxsun/anaconda3/envs/llm_new/lib/python3.8/site-packages
Requires: aiohttp, dill, fsspec, huggingface-hub, multiprocess, numpy, packaging, pandas, pyarrow, pyyaml, requests, responses, tqdm, xxhash
Required-by: bigbench, lightning-transformers

KawaiiNotHawaii · 2023-05-23T20:36:36Z

I fixed this bug following the instructions here

KawaiiNotHawaii added the untriaged auto added to all issues by default when created. label May 23, 2023

KawaiiNotHawaii closed this as completed May 23, 2023

saum7800 mentioned this issue Oct 28, 2023

Particular retrieved model fails with proto descriptor error neulab/prompt2model#373

Open

googleberg removed the untriaged auto added to all issues by default when created. label Feb 17, 2024

This issue was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TypeError: Couldn't build proto file into descriptor pool! Invalid proto descriptor for file "sentencepiece_model.proto": sentencepiece_model.proto: A file with this name is already in the pool. #12882

TypeError: Couldn't build proto file into descriptor pool! Invalid proto descriptor for file "sentencepiece_model.proto": sentencepiece_model.proto: A file with this name is already in the pool. #12882

KawaiiNotHawaii commented May 23, 2023 •

edited

Loading

KawaiiNotHawaii commented May 23, 2023

TypeError: Couldn't build proto file into descriptor pool! Invalid proto descriptor for file "sentencepiece_model.proto": sentencepiece_model.proto: A file with this name is already in the pool. #12882

TypeError: Couldn't build proto file into descriptor pool! Invalid proto descriptor for file "sentencepiece_model.proto": sentencepiece_model.proto: A file with this name is already in the pool. #12882

Comments

KawaiiNotHawaii commented May 23, 2023 • edited Loading

KawaiiNotHawaii commented May 23, 2023

KawaiiNotHawaii commented May 23, 2023 •

edited

Loading