Skip to content

Memory leak when using docx.Document to parse large word file #1428

@joyme123

Description

@joyme123

similar issue: #1364

reproduce code:

import gc
import os

import psutil
from docx import Document
from memory_profiler import profile


@profile
def main():
    file = "test.docx"
    document = Document(file)

    del document
    gc.collect()

    print(
        "current process memory: %.4f GB"
        % (psutil.Process(os.getpid()).memory_info().rss / 1024 / 1024 / 1024),
    )


if __name__ == "__main__":
    main()

reproduce file:
新建 DOCX 文档.docx

output:

❯ python test.py
current process memory: 2.6228 GB
Filename: test.py

Line #    Mem usage    Increment  Occurrences   Line Contents
=============================================================
     9     28.3 MiB     28.3 MiB           1   @profile
    10                                         def main():
    11     28.3 MiB      0.0 MiB           1       file = "test.docx"
    12   2685.7 MiB   2657.4 MiB           1       document = Document(file)
    13
    14   2685.7 MiB      0.0 MiB           1       del document
    15   2685.7 MiB      0.0 MiB           1       gc.collect()
    16
    17   2685.7 MiB      0.0 MiB           2       print(
    18   2685.7 MiB      0.0 MiB           2           "current process memory: %.4f GB"
    19   2685.7 MiB      0.0 MiB           1           % (psutil.Process(os.getpid()).memory_info().rss / 1024 / 1024 / 1024),
    20                                             )

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions