Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

如何离线使用这个包 #84

Closed
3 tasks done
liuyukid opened this issue Nov 17, 2023 · 1 comment
Closed
3 tasks done

如何离线使用这个包 #84

liuyukid opened this issue Nov 17, 2023 · 1 comment
Assignees
Labels
question Further information is requested

Comments

@liuyukid
Copy link

Before Asking 在提问之前

  • I have read the README carefully. 我已经仔细阅读了 README 上的操作指引。

  • I have pulled the latest code of main branch to run again and the problem still existed. 我已经拉取了主分支上最新的代码,重新运行之后,问题仍不能解决。

Search before asking 先搜索,再提问

  • I have searched the Data-Juicer issues and found no similar questions. 我已经在 issue列表 中搜索但是没有发现类似的问题。

Question

我目前需要在一个离线容器中使用这个包,虽然目前包已经装好了,但在运行的时候发现这个包需要下载很多模型权重之类的东西,请问有什么比较有效的办法能让我在离线环境中也能使用这个包吗

Additional 额外信息

No response

@liuyukid liuyukid added the question Further information is requested label Nov 17, 2023
@zhijianma
Copy link
Collaborator

您好,感谢您使用Data-Juicer!
有些op是基于模型实现的,比如language_id_score_filter依赖于fasttext。Data-Juicer本身会在cache 目录进行检测,如果找不到会分别从模型的原始链接和备用链接尝试下载,另外还有一些 op 需要从huggingface进行下载,需要您的机器本身可以联网并支持下载。
这里建议您可以离线下载好,并放置到默认的 cache 目录。如果您下载到其他目录并想共用,可以通过设置环境变量export CACHE_HOME=your_cache_path 来设定。
Data-Juicer 默认的 cache 路径为~/.cache/data_juicer, huggingface 默认的 cache 路径 为~/.cache/huggingface

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants