Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How can I use the code offline? #315

Closed
TinaChen95 opened this issue Oct 9, 2022 · 18 comments
Closed

How can I use the code offline? #315

TinaChen95 opened this issue Oct 9, 2022 · 18 comments

Comments

@TinaChen95
Copy link

TinaChen95 commented Oct 9, 2022

Hi, friends, I meet a problem about how to use the code offline.

import evaluate
metric = evaluate.load("accuracy")

How can I pre-download related files and solve this problem?
Thanks.

@morrisalp
Copy link

I see this was possible with datasets.load_metric: https://discuss.huggingface.co/t/using-load-metric-offline-in-datasets/9283

Does it work for evaluate?

I am encountering the same issue with BLEU. It's not intuitive that the library downloads the metric from the internet on every use, and it would be great for the documentation to cover how to cache metrics for offline use.

@mathemakitten
Copy link
Contributor

mathemakitten commented Oct 10, 2022

Hi @TinaChen95! Yes, loading a metric offline should work as mentioned in the thread linked by @morrisalp — accuracy is a "canonical" metric so it's integrated into the evaluate library already at metrics/accuracy/accuracy.py so you should be able to pass that in. Let me know if that doesn't work.

You're right that we should document this better though, right now it's sort of buried in the package reference. I'll add that to our backlog to have another more prominent example. Thanks for the suggestion @morrisalp!

@liumaishen
Copy link

I wonder why such commonly used metrics need to be loaded online

@lvwerra
Copy link
Member

lvwerra commented Nov 16, 2022

I wonder why such commonly used metrics need to be loaded online

The main reason is that it simplifies the library by just having a single mechanism to load metrics and causes less confusion than having different mechanisms for some metrics.

@lvwerra lvwerra closed this as completed Nov 16, 2022
@Marcel1805
Copy link

Marcel1805 commented Nov 28, 2022

Unfortunately the proposed solution is not working for me:
I´ve cloned the datasets repo (https://github.com/huggingface/datasets), which includes the metrics directory and uploaded accuracy.py to my server. The server has no public internet access.
Do you have any idea what might be my problem?
Thank you very much!
image

@lvwerra
Copy link
Member

lvwerra commented Nov 29, 2022

Hi @Marcel1805, can you clone the evaluate repository instead of datasets. The metrics between the two libraries are not compatible. Let me know if this works.

@lvwerra lvwerra reopened this Nov 29, 2022
@Marcel1805
Copy link

Hi @Marcel1805, can you clone the evaluate repository instead of datasets. The metrics between the two libraries are not compatible. Let me know if this works.

Yes it worked with those metrics from the evaluate repo, thank you very much!

@lvwerra lvwerra closed this as completed Jan 6, 2023
@JohnGiorgi
Copy link

JohnGiorgi commented Feb 14, 2023

Is there no way to use the metrics offline without cloning the .py file locally?

Previously with from datasets import load_metric, you could (1) load the metric once with an internet connection, and it would get cached (2) run the same code again with HF_DATASETS_OFFLINE=1 without internet, and it would work fine. This seems a lot more convenient; if you have to maintain scripts locally they could easily go out of date with your installed version of evaluate.

@XinliYu
Copy link

XinliYu commented Feb 15, 2023

I have to say this way of loading metrics is super bad counter-intuitive CX for many people who have to work offline for their models (company/lab policy). Have spent 2hrs on this to just let the metric work. You end up using the hacky path parameter. We could just design it as from evaluate.metrics import accuracy.

@JohnGiorgi
Copy link

Just an update to this issue, if you try to load a metric offline that you have previously loaded and cached online, it will actually work! It just takes ~10s of min (at least on my system). So it looks like the code does actually support loading cached metrics offline?

Haven't dug into the code base but it might be straightforward to detect if user is offline (maybe with an environment variable similar to HF_DATASETS_OFFLINE) and go straight to loading a script from cache in this case.

@JArnoldAMD
Copy link

I observed similar behavior to @JohnGiorgi. I was able to work avoid the delay by setting HF_EVALUATE_OFFLINE=1. This doesn't appear to be documented, but I found it in the code at: https://github.com/huggingface/evaluate/blob/main/src/evaluate/loading.py#L133

@YihanCao123
Copy link

Please consider redesign this to from evaluate.metrics import accuracy, downloading scripts is very hard to use for those who have to run their model on certain servers. Strongly agree with the above comments about this issue.

@guotong1988
Copy link

same question.

@Ki-Seki
Copy link

Ki-Seki commented Nov 2, 2023

Please consider redesign this to from evaluate.metrics import accuracy, downloading scripts is very hard to use for those who have to run their model on certain servers. Strongly agree with the above comments about this issue.

Totally agree. Please rethink the loading function.

@pourion
Copy link

pourion commented Feb 3, 2024

  1. git clone https://github.com/huggingface/evaluate.git /workspace/thirdparty
  2. also pip install evaluate
  3. in your script do:
import evaluate

perplexity_module = evaluate.load("/workspace/thirdparty/evaluate/metrics/perplexity/perplexity.py", module_type="metric")
...
perplexity_module.compute(model_id="gpt2", predictions=predictions)

with predictions being a list of strings.

@younengma
Copy link

encountered the same issues when compute the "bleu". It is not cached and takes long time every time I run the code.

The evaluate model requries the internet access even when I copy the "whole evaluate folder" and try to load locally.
My final solution is just adjust the compute function and use it directly. In this way I can get rid of the "evaluate.load"
eg:

import ... (what is needed)
def compute_bleu(predictions, references, tokenizer=Tokenizer13a(), max_order=4, smooth=False):
    # if only one reference is provided make sure we still use list of lists
    if isinstance(references[0], str):
        references = [[ref] for ref in references]

    references = [[tokenizer(r) for r in ref] for ref in references]
    predictions = [tokenizer(p) for p in predictions]
    score = compute_bleu(
        reference_corpus=references, translation_corpus=predictions, max_order=max_order, smooth=smooth
    )
    (bleu, precisions, bp, ratio, translation_length, reference_length) = score
    return {
        "bleu": bleu,
        "precisions": precisions,
        "brevity_penalty": bp,
        "length_ratio": ratio,
        "translation_length": translation_length,
        "reference_length": reference_length,
    }

@justin13601
Copy link

encountered the same issues when compute the "bleu". It is not cached and takes long time every time I run the code.

The evaluate model requries the internet access even when I copy the "whole evaluate folder" and try to load locally. My final solution is just adjust the compute function and use it directly. In this way I can get rid of the "evaluate.load" eg:

import ... (what is needed)
def compute_bleu(predictions, references, tokenizer=Tokenizer13a(), max_order=4, smooth=False):
    # if only one reference is provided make sure we still use list of lists
    if isinstance(references[0], str):
        references = [[ref] for ref in references]

    references = [[tokenizer(r) for r in ref] for ref in references]
    predictions = [tokenizer(p) for p in predictions]
    score = compute_bleu(
        reference_corpus=references, translation_corpus=predictions, max_order=max_order, smooth=smooth
    )
    (bleu, precisions, bp, ratio, translation_length, reference_length) = score
    return {
        "bleu": bleu,
        "precisions": precisions,
        "brevity_penalty": bp,
        "length_ratio": ratio,
        "translation_length": translation_length,
        "reference_length": reference_length,
    }

This is the only method that worked for me, and was surprisingly simple as these metrics are rather straightforward - thank you!

@CaptXiong
Copy link

CaptXiong commented Jun 14, 2024

I cannot load the local path, so I deside to look inside accuracy.py, then I change my code to this instead of metrics.accuracy:

from sklearn.metrics import accuracy_score

def compute_metrics(eval_pred):
      predictions, labels = eval_pred
      predictions = np.argmax(predictions, axis=1)
      return {
      "accuracy": float(
          accuracy_score(labels, predictions)
      )
  }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests