New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

关于禁用 “词组拼音库” 达到减少内存占用的问题 #113

Open
Provinm opened this Issue Jan 3, 2018 · 4 comments

Comments

Projects
None yet
4 participants
@Provinm

Provinm commented Jan 3, 2018

运行环境

  • 操作系统(Linux/macOS/Windows):windows
  • Python 版本:2.7.10
  • pypinyin 版本:0.25.0

问题描述以及解决过程

在我的项目中,汉字转拼音只会转单个的汉字,而不需要词组,如果不对 pypinyin 做任何改动直接导入 pinyin 会占用 30M 左右的内存

Line #    Mem usage    Increment   Line Contents
================================================
     7   25.246 MiB   25.246 MiB   @profile
     8                             def func():
     9   52.117 MiB   26.871 MiB       from pypinyin import pinyin,Style
    10   52.125 MiB    0.008 MiB       print pinyin(u'啦', style=Style.INITIALS)

通过给 mozillazg 大神发邮件询问,设置环境变量 PYPINYIN_NO_PHRASES=true 得以大量的减少了内存占用

设置代码

import os
os.environ['PYPINYIN_NO_PHRASES'] = 'true'

@profile
def func():
    from pypinyin import pinyin,Style
    print pinyin(u'', style=Style.INITIALS)

func()

最终的结果

Line #    Mem usage    Increment   Line Contents
================================================
     7   25.258 MiB   25.258 MiB   @profile
     8                             def func():
     9   29.176 MiB    3.918 MiB       from pypinyin import pinyin,Style
    10   29.184 MiB    0.008 MiB       print pinyin(u'啦', style=Style.INITIALS)

感谢大神的耐心回答,以上是整个问题解决过程,供有相似问题的同学作为参考。

@mozillazg mozillazg added the question label Jan 3, 2018

@mozillazg

This comment has been minimized.

Owner

mozillazg commented Jan 3, 2018

感谢 @Provinm 分享这个需求。还可以通过删除

PINYIN_DICT = pinyin_dict.pinyin_dict.copy()
这一行的 copy 方法调用来进一步减少内存占用:

# 修改前
PINYIN_DICT = pinyin_dict.pinyin_dict.copy()
# 修改后
PINYIN_DICT = pinyin_dict.pinyin_dict

copy 的目的是希望在用户的自定义拼音库出现问题时可以回退到自带的拼音库,如果没有这个需求的话,可以禁用这个操作。

欢迎提交 PR 增加一种方法可以在不修改 pypinyin 代码的情况下实现禁用 copy 操作的目的(比如:增加 PYPINYIN_NO_DICT_COPY 或者 PYPINYIN_DISABLE_DICT_COPY 环境变量来控制)。

@daya0576

This comment has been minimized.

daya0576 commented Jan 8, 2018

copy 的目的是希望在用户的自定义拼音库出现问题时可以回退到自带的拼音库

意思是说如果自定义的PINYIN_DICT出错的话, 用户可以自己调用pinyin_dict.pinyin_dict?

@mozillazg

This comment has been minimized.

Owner

mozillazg commented Jan 8, 2018

@daya0576 可以直接把 PINYIN_DICT 清空,然后用 load_single_dict 加载默认的 pinyin_dict.pinyin_dict,对于词组库也可以用类似的办法恢复(使用 load_phrases_dict)(自定义拼音库文档 )。

@daya0576 daya0576 referenced this issue Jan 12, 2018

Merged

feat: 通过环境变量, 禁用 copy 操作. #115

3 of 3 tasks complete
@crispgm

This comment has been minimized.

crispgm commented Jan 19, 2018

Great! 恰好遇到这个问题 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment