Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BLEURT: Match key names to correspond with filename #3348

Merged
merged 2 commits into from Dec 7, 2021

Conversation

jaehlee
Copy link
Contributor

@jaehlee jaehlee commented Dec 1, 2021

In order to properly locate downloaded ckpt files key name needs to match filename. Correcting change introduced in #3235

In order to properly locate downloaded ckpt files key name needs to match filename
Copy link
Member

@lhoestq lhoestq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing that out !

Maybe to avoid a breaking change we can just consider self.config_name.lower() ? This way both upper case and lower case will work

@jaehlee
Copy link
Contributor Author

jaehlee commented Dec 2, 2021

Thanks for the suggestion! I think the current checked-in CHECKPOINT_URLS is already not working. I believe anyone who tried using the new ckpts (BLEURT-20-X) can't unless this fix is in. The zip file from bleurt side unzips to directory name matching the filename (capitalized for new ones). For example without current changes I'd get the following error

---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-5-f6832fe20f84> in <module>()
      1 predictions = ["hello there", "general kenobi"]
      2 references = ["hello there", "general kenobi"]
----> 3 bleurt = datasets.load_metric("bleurt", "bleurt-20")
      4 results = bleurt.compute(predictions=predictions, references=references)

4 frames
/usr/local/lib/python3.7/dist-packages/bleurt/checkpoint.py in read_bleurt_config(path)
     84   """Reads and checks config file from a BLEURT checkpoint."""
     85   assert tf.io.gfile.exists(path), \
---> 86       "Could not find BLEURT checkpoint {}".format(path)
     87   config_path = os.path.join(path, CONFIG_FILE)
     88   assert tf.io.gfile.exists(config_path), \

AssertionError: Could not find BLEURT checkpoint /root/.cache/huggingface/metrics/bleurt/bleurt-20/downloads/extracted/e34c60f1a05394ecda54e253a10413ca7b5d59f9a23f3cc73258c6b78ffa2f50/bleurt-20

inspecting specified path I see that directory name is BLEURT-20 instead of bleurt-20.
Other solution similar to your suggestion is meddle with dl_manager.download_and_extract to unzip to paths with lowering all the paths but I imagine this will affect other parts of the library.

@lhoestq
Copy link
Member

lhoestq commented Dec 6, 2021

Indeed, good catch ! Your solution that fixes CHECKPOINT_URLS is simple and works well, thanks :)

Furthermore to avoid breaking changes though we could also keep the support for the lowercase one:

        if self.config_name.lower() in CHECKPOINT_URLS:
            checkpoint_name = self.config_name.lower()
        elif self.config_name.upper() in CHECKPOINT_URLS:
            checkpoint_name = self.config_name.upper()
        else:
            raise KeyError(
                f"{self.config_name} model not found. You should supply the name of a model checkpoint for bleurt in {CHECKPOINT_URLS.keys()}"
            )

and then we can use checkpoint_name instead of self.config_name to download and instantiate the model:

        model_path = dl_manager.download_and_extract(CHECKPOINT_URLS[checkpoint_name])
        self.scorer = score.BleurtScorer(os.path.join(model_path, checkpoint_name))

Please let me know if that sounds reasonable to you !

@jaehlee
Copy link
Contributor Author

jaehlee commented Dec 6, 2021

Thanks for the suggestion! I believe your suggestion should work to make keys case insensitive. Changes are committed to the PR now.

Copy link
Member

@lhoestq lhoestq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks !

@lhoestq lhoestq changed the title Match key names to correspond with filename BLEURT: Match key names to correspond with filename Dec 7, 2021
@lhoestq lhoestq merged commit 0009a8e into huggingface:master Dec 7, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants