# テキストのオープンテスト
- Python でテキストのサイズごと、および、読み込み方でパフォーマンスの比較する
- 実行時間の測定には、line_profilerを利用
- メモリの使用量については、memory_profilerを利用
- オープンするファイルは、 ./make_file.sh を実行して作成したものを取り扱う

## 読み込み方
- 検証パターン
  - read
  - readline
  - readlines
  - iterator

## メモリの使用量検証
- ファイルオープン時にどの程度メモリを利用するのか検証を行う
- 1MB, 10M, 100MB, 1000MB を順に読み込む

In [None]:
% ./file_open.sh
+ source ./venv/bin/activate
+ python -m memory_profiler file_open.py resource/t1M.txt 0
Filename: /Users/test/work/PyBench/reader/ext_read.py

Line #    Mem usage    Increment   Line Contents
================================================
     8   46.617 MiB   46.617 MiB   @profile
     9                             def read(path: str) -> str:
    10   46.617 MiB    0.000 MiB       with open(path, "r") as file:
    11   48.547 MiB    1.930 MiB           text = file.read()
    12
    13   48.547 MiB    0.000 MiB       return text


+ python -m memory_profiler file_open.py resource/t1M.txt 1
Filename: /Users/test/work/PyBench/reader/ext_readline.py

Line #    Mem usage    Increment   Line Contents
================================================
     8   46.504 MiB   46.504 MiB   @profile
     9                             def readline(path: str) -> str:
    10   46.504 MiB    0.000 MiB       with open(path, "r") as file:
    11   46.512 MiB    0.008 MiB           line = file.readline()
    12   46.512 MiB    0.000 MiB           text = line
    13   65.000 MiB    0.000 MiB           while line:
    14   65.000 MiB    0.012 MiB               line = file.readline()
    15   65.000 MiB    0.156 MiB               text += line
    16
    17   64.066 MiB    0.000 MiB       return text


+ python -m memory_profiler file_open.py resource/t1M.txt 2
Filename: /Users/test/work/PyBench/reader/ext_readlines.py

Line #    Mem usage    Increment   Line Contents
================================================
     8   46.469 MiB   46.469 MiB   @profile
     9                             def readlines(path: str) -> str:
    10   46.469 MiB    0.000 MiB       text = ""
    11   46.469 MiB    0.000 MiB       with open(path, "r") as file:
    12   48.070 MiB    1.602 MiB           texts = file.readlines()
    13   66.379 MiB    0.000 MiB           for line in texts:
    14   66.379 MiB    0.156 MiB               text += line
    15
    16   65.445 MiB    0.000 MiB       return text


+ python -m memory_profiler file_open.py resource/t1M.txt 3
Filename: /Users/test/work/PyBench/reader/ext_iterator.py

Line #    Mem usage    Increment   Line Contents
================================================
     8   46.293 MiB   46.293 MiB   @profile
     9                             def iterator(path: str) -> str:
    10   46.293 MiB    0.000 MiB       text = ""
    11   46.293 MiB    0.000 MiB       with open(path, "r") as file:
    12   65.070 MiB    0.012 MiB           for line in file:
    13   65.070 MiB    0.156 MiB               text += line
    14
    15   64.137 MiB    0.000 MiB       return text


+ python -m memory_profiler file_open.py resource/t10M.txt 0
Filename: /Users/test/work/PyBench/reader/ext_read.py

Line #    Mem usage    Increment   Line Contents
================================================
     8   46.477 MiB   46.477 MiB   @profile
     9                             def read(path: str) -> str:
    10   46.477 MiB    0.000 MiB       with open(path, "r") as file:
    11   65.742 MiB   19.266 MiB           text = file.read()
    12
    13   65.742 MiB    0.000 MiB       return text


+ python -m memory_profiler file_open.py resource/t10M.txt 1
Filename: /Users/test/work/PyBench/reader/ext_readline.py

Line #    Mem usage    Increment   Line Contents
================================================
     8   46.402 MiB   46.402 MiB   @profile
     9                             def readline(path: str) -> str:
    10   46.402 MiB    0.000 MiB       with open(path, "r") as file:
    11   46.402 MiB    0.000 MiB           line = file.readline()
    12   46.402 MiB    0.000 MiB           text = line
    13  212.703 MiB    0.000 MiB           while line:
    14  212.703 MiB    0.012 MiB               line = file.readline()
    15  212.703 MiB    0.156 MiB               text += line
    16
    17  203.102 MiB    0.000 MiB       return text


+ python -m memory_profiler file_open.py resource/t10M.txt 2
Filename: /Users/test/work/PyBench/reader/ext_readlines.py

Line #    Mem usage    Increment   Line Contents
================================================
     8   46.477 MiB   46.477 MiB   @profile
     9                             def readlines(path: str) -> str:
    10   46.477 MiB    0.000 MiB       text = ""
    11   46.477 MiB    0.000 MiB       with open(path, "r") as file:
    12   61.734 MiB   15.258 MiB           texts = file.readlines()
    13  229.020 MiB    0.004 MiB           for line in texts:
    14  229.020 MiB    0.156 MiB               text += line
    15
    16  219.418 MiB    0.000 MiB       return text


+ python -m memory_profiler file_open.py resource/t10M.txt 3
Filename: /Users/test/work/PyBench/reader/ext_iterator.py

Line #    Mem usage    Increment   Line Contents
================================================
     8   46.551 MiB   46.551 MiB   @profile
     9                             def iterator(path: str) -> str:
    10   46.551 MiB    0.000 MiB       text = ""
    11   46.551 MiB    0.000 MiB       with open(path, "r") as file:
    12  211.625 MiB    0.012 MiB           for line in file:
    13  211.625 MiB    0.156 MiB               text += line
    14
    15  202.023 MiB    0.000 MiB       return text


+ python -m memory_profiler file_open.py resource/t100M.txt 0
Filename: /Users/test/work/PyBench/reader/ext_read.py

Line #    Mem usage    Increment   Line Contents
================================================
     8   46.457 MiB   46.457 MiB   @profile
     9                             def read(path: str) -> str:
    10   46.457 MiB    0.000 MiB       with open(path, "r") as file:
    11  239.105 MiB  192.648 MiB           text = file.read()
    12
    13  239.105 MiB    0.000 MiB       return text


+ python -m memory_profiler file_open.py resource/t1000M.txt 0
Filename: /Users/test/work/PyBench/reader/ext_read.py

Line #    Mem usage    Increment   Line Contents
================================================
     8   46.383 MiB   46.383 MiB   @profile
     9                             def read(path: str) -> str:
    10   46.383 MiB    0.000 MiB       with open(path, "r") as file:
    11  971.770 MiB  925.387 MiB           text = file.read()
    12
    13  971.863 MiB    0.094 MiB       return text

## 結果

|ファイルサイズ(MB)|read(Mbi)|readline(Mbi)|readlines(Mbi)|iterator(Mbi)|
|-|-|-|-|-|
|1|1.930|17.562|18.975|17.844|
|10|19.266|166.301|182.543|165.074|
|100|192.648|-|-|-|
|1000|925.48|-|-|-|

下記の方法では、memory_profilerが1時間待機しても終わらなかったため、中断した。
- readline (100MB, 1000MB)
- readlines (100MB, 1000MB)
- iterator (100MB, 1000MB)

```
100MBのファイルをreadlineで処理した場合、途中中断した時点で、メモリを1000MiB消費していたので、実用性はない。
1行ずつ読み込む場合は、保持せずに、読み込むたびに古い情報は破棄する必要がある。
readline, readlines, iteratorで処理した場合、おそらく、read時の10倍ほどメモリを消費していると思われる。
readで読み込んだ際に、ファイルサイズの2倍メモリが消費されるのは何故なのか。
また、readで1000MB読み込んだ場合、読み込んだ分しかメモリが消費されないのは1MB, 10MB, 100MBを読み込んだときの差は何か。
```

## 実行速度検証
- ファイルオープンからクローズまでの時間を測定する
- 1MB, 10M, 100MB, 1000MB を順に読み込む

### 下記実行結果

In [None]:
+ ./line_profiler/kernprof.py -l -v file_open.py resource/t1M.txt 0
Wrote profile results to file_open.py.lprof
Timer unit: 1e-06 s

Total time: 0.002066 s
File: /Users/test/work/PyBench/reader/ext_read.py
Function: read at line 8

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
     8                                           @profile
     9                                           def read(path: str) -> str:
    10         1        103.0    103.0      5.0      with open(path, "r") as file:
    11         1       1962.0   1962.0     95.0          text = file.read()
    12
    13         1          1.0      1.0      0.0      return text

+ ./line_profiler/kernprof.py -l -v file_open.py resource/t1M.txt 1
Wrote profile results to file_open.py.lprof
Timer unit: 1e-06 s

Total time: 0.015578 s
File: /Users/test/work/PyBench/reader/ext_readline.py
Function: readline at line 8

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
     8                                           @profile
     9                                           def readline(path: str) -> str:
    10         1         79.0     79.0      0.5      with open(path, "r") as file:
    11         1         39.0     39.0      0.3          line = file.readline()
    12         1          1.0      1.0      0.0          text = line
    13     10001       3226.0      0.3     20.7          while line:
    14     10000       6510.0      0.7     41.8              line = file.readline()
    15     10000       5722.0      0.6     36.7              text += line
    16
    17         1          1.0      1.0      0.0      return text

+ ./line_profiler/kernprof.py -l -v file_open.py resource/t1M.txt 2
Wrote profile results to file_open.py.lprof
Timer unit: 1e-06 s

Total time: 0.013614 s
File: /Users/test/work/PyBench/reader/ext_readlines.py
Function: readlines at line 8

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
     8                                           @profile
     9                                           def readlines(path: str) -> str:
    10         1          8.0      8.0      0.1      text = ""
    11         1        108.0    108.0      0.8      with open(path, "r") as file:
    12         1       3677.0   3677.0     27.0          texts = file.readlines()
    13     10001       3535.0      0.4     26.0          for line in texts:
    14     10000       6286.0      0.6     46.2              text += line
    15
    16         1          0.0      0.0      0.0      return text

+ ./line_profiler/kernprof.py -l -v file_open.py resource/t1M.txt 3
Wrote profile results to file_open.py.lprof
Timer unit: 1e-06 s

Total time: 0.017188 s
File: /Users/test/work/PyBench/reader/ext_iterator.py
Function: iterator at line 8

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
     8                                           @profile
     9                                           def iterator(path: str) -> str:
    10         1         12.0     12.0      0.1      text = ""
    11         1        165.0    165.0      1.0      with open(path, "r") as file:
    12     10001       7974.0      0.8     46.4          for line in file:
    13     10000       9037.0      0.9     52.6              text += line
    14
    15         1          0.0      0.0      0.0      return text

+ ./line_profiler/kernprof.py -l -v file_open.py resource/t10M.txt 0
Wrote profile results to file_open.py.lprof
Timer unit: 1e-06 s

Total time: 0.010989 s
File: /Users/test/work/PyBench/reader/ext_read.py
Function: read at line 8

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
     8                                           @profile
     9                                           def read(path: str) -> str:
    10         1         64.0     64.0      0.6      with open(path, "r") as file:
    11         1      10924.0  10924.0     99.4          text = file.read()
    12
    13         1          1.0      1.0      0.0      return text

+ ./line_profiler/kernprof.py -l -v file_open.py resource/t10M.txt 1
Wrote profile results to file_open.py.lprof
Timer unit: 1e-06 s

Total time: 0.176368 s
File: /Users/test/work/PyBench/reader/ext_readline.py
Function: readline at line 8

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
     8                                           @profile
     9                                           def readline(path: str) -> str:
    10         1         64.0     64.0      0.0      with open(path, "r") as file:
    11         1         34.0     34.0      0.0          line = file.readline()
    12         1          0.0      0.0      0.0          text = line
    13    100001      34214.0      0.3     19.4          while line:
    14    100000      78500.0      0.8     44.5              line = file.readline()
    15    100000      63555.0      0.6     36.0              text += line
    16
    17         1          1.0      1.0      0.0      return text

+ ./line_profiler/kernprof.py -l -v file_open.py resource/t10M.txt 2
Wrote profile results to file_open.py.lprof
Timer unit: 1e-06 s

Total time: 0.113633 s
File: /Users/test/work/PyBench/reader/ext_readlines.py
Function: readlines at line 8

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
     8                                           @profile
     9                                           def readlines(path: str) -> str:
    10         1          5.0      5.0      0.0      text = ""
    11         1         58.0     58.0      0.1      with open(path, "r") as file:
    12         1      26897.0  26897.0     23.7          texts = file.readlines()
    13    100001      32597.0      0.3     28.7          for line in texts:
    14    100000      54076.0      0.5     47.6              text += line
    15
    16         1          0.0      0.0      0.0      return text

+ ./line_profiler/kernprof.py -l -v file_open.py resource/t10M.txt 3
Wrote profile results to file_open.py.lprof
Timer unit: 1e-06 s

Total time: 0.105926 s
File: /Users/test/work/PyBench/reader/ext_iterator.py
Function: iterator at line 8

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
     8                                           @profile
     9                                           def iterator(path: str) -> str:
    10         1          4.0      4.0      0.0      text = ""
    11         1         52.0     52.0      0.0      with open(path, "r") as file:
    12    100001      51284.0      0.5     48.4          for line in file:
    13    100000      54585.0      0.5     51.5              text += line
    14
    15         1          1.0      1.0      0.0      return text

+ ./line_profiler/kernprof.py -l -v file_open.py resource/t100M.txt 0
Wrote profile results to file_open.py.lprof
Timer unit: 1e-06 s

Total time: 0.115549 s
File: /Users/test/work/PyBench/reader/ext_read.py
Function: read at line 8

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
     8                                           @profile
     9                                           def read(path: str) -> str:
    10         1         61.0     61.0      0.1      with open(path, "r") as file:
    11         1     115487.0 115487.0     99.9          text = file.read()
    12
    13         1          1.0      1.0      0.0      return text

+ ./line_profiler/kernprof.py -l -v file_open.py resource/t100M.txt 1
Wrote profile results to file_open.py.lprof
Timer unit: 1e-06 s

Total time: 1.45125 s
File: /Users/test/work/PyBench/reader/ext_readline.py
Function: readline at line 8

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
     8                                           @profile
     9                                           def readline(path: str) -> str:
    10         1         60.0     60.0      0.0      with open(path, "r") as file:
    11         1         26.0     26.0      0.0          line = file.readline()
    12         1          1.0      1.0      0.0          text = line
    13   1000001     312814.0      0.3     21.6          while line:
    14   1000000     604239.0      0.6     41.6              line = file.readline()
    15   1000000     534111.0      0.5     36.8              text += line
    16
    17         1          1.0      1.0      0.0      return text

+ ./line_profiler/kernprof.py -l -v file_open.py resource/t100M.txt 2
Wrote profile results to file_open.py.lprof
Timer unit: 1e-06 s

Total time: 1.04156 s
File: /Users/test/work/PyBench/reader/ext_readlines.py
Function: readlines at line 8

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
     8                                           @profile
     9                                           def readlines(path: str) -> str:
    10         1          4.0      4.0      0.0      text = ""
    11         1         50.0     50.0      0.0      with open(path, "r") as file:
    12         1     238296.0 238296.0     22.9          texts = file.readlines()
    13   1000001     302026.0      0.3     29.0          for line in texts:
    14   1000000     501179.0      0.5     48.1              text += line
    15
    16         1          1.0      1.0      0.0      return text

+ ./line_profiler/kernprof.py -l -v file_open.py resource/t100M.txt 3
Wrote profile results to file_open.py.lprof
Timer unit: 1e-06 s

Total time: 1.08408 s
File: /Users/test/work/PyBench/reader/ext_iterator.py
Function: iterator at line 8

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
     8                                           @profile
     9                                           def iterator(path: str) -> str:
    10         1          4.0      4.0      0.0      text = ""
    11         1         51.0     51.0      0.0      with open(path, "r") as file:
    12   1000001     535396.0      0.5     49.4          for line in file:
    13   1000000     548624.0      0.5     50.6              text += line
    14
    15         1          1.0      1.0      0.0      return text

+ ./line_profiler/kernprof.py -l -v file_open.py resource/t1000M.txt 0
Wrote profile results to file_open.py.lprof
Timer unit: 1e-06 s

Total time: 1.49888 s
File: /Users/test/work/PyBench/reader/ext_read.py
Function: read at line 8

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
     8                                           @profile
     9                                           def read(path: str) -> str:
    10         1         58.0     58.0      0.0      with open(path, "r") as file:
    11         1    1498822.0 1498822.0    100.0          text = file.read()
    12
    13         1          3.0      3.0      0.0      return text

+ ./line_profiler/kernprof.py -l -v file_open.py resource/t1000M.txt 1
Wrote profile results to file_open.py.lprof
Timer unit: 1e-06 s

Total time: 14.477 s
File: /Users/test/work/PyBench/reader/ext_readline.py
Function: readline at line 8

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
     8                                           @profile
     9                                           def readline(path: str) -> str:
    10         1         67.0     67.0      0.0      with open(path, "r") as file:
    11         1         24.0     24.0      0.0          line = file.readline()
    12         1          0.0      0.0      0.0          text = line
    13  10000001    2926478.0      0.3     20.2          while line:
    14  10000000    6157430.0      0.6     42.5              line = file.readline()
    15  10000000    5393000.0      0.5     37.3              text += line
    16
    17         1          0.0      0.0      0.0      return text

+ ./line_profiler/kernprof.py -l -v file_open.py resource/t1000M.txt 2
Wrote profile results to file_open.py.lprof
Timer unit: 1e-06 s

Total time: 12.9202 s
File: /Users/test/work/PyBench/reader/ext_readlines.py
Function: readlines at line 8

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
     8                                           @profile
     9                                           def readlines(path: str) -> str:
    10         1          7.0      7.0      0.0      text = ""
    11         1         75.0     75.0      0.0      with open(path, "r") as file:
    12         1    2886449.0 2886449.0     22.3          texts = file.readlines()
    13  10000001    4426648.0      0.4     34.3          for line in texts:
    14  10000000    5607039.0      0.6     43.4              text += line
    15
    16         1          1.0      1.0      0.0      return text

+ ./line_profiler/kernprof.py -l -v file_open.py resource/t1000M.txt 3
Wrote profile results to file_open.py.lprof
Timer unit: 1e-06 s

Total time: 10.1053 s
File: /Users/test/work/PyBench/reader/ext_iterator.py
Function: iterator at line 8

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
     8                                           @profile
     9                                           def iterator(path: str) -> str:
    10         1          5.0      5.0      0.0      text = ""
    11         1         66.0     66.0      0.0      with open(path, "r") as file:
    12  10000001    4971487.0      0.5     49.2          for line in file:
    13  10000000    5133766.0      0.5     50.8              text += line
    14
    15         1          1.0      1.0      0.0      return text



## 結果

|ファイルサイズ(MB)|read(Sec)|readline(Sec)|readlines(Sec)|iterator(Sec)|
|-|-|-|-|-|
|1|0.002066|0.015578|0.013614|0.017188|
|10|0.010989|0.176368|0.113633|0.105926|
|100| 0.11554|1.4512|1.04156|1.08408|
|1000|1.49888|14.447|12.9202|10.1053|


- readと他を比べると、readのほうが、10倍ほど速い
- for ループが遅いと思われる
