You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A related issue to #36, As peek is not in the HdfsFile, the same issue also happens when using HDFS. Unlike the #36, where the peek in POSIX was just hiden by the FileObject, peek needs to be added in case of HDFS.
$ python test.py
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/share/java/slf4j-simple.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/lib/zookeeper/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.SimpleLoggerFactory]
19/08/16 14:53:56 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
chainerio: 186.07044076919556
$ cat test.py
import time
import pickle
import chainerio
chainerio.set_root("hdfs")
cache_path = 'a_large_file.pkl'
start = time.time()
with chainerio.open(cache_path, 'rb') as f:
data = pickle.load(f)
print('chainerio: ', time.time() - start)
Wed Jul 24 19:43:47 2019 profile
466895442 function calls (466887441 primitive calls) in 161.088 seconds
Ordered by: internal time
List reduced from 2941 to 10 due to restriction <10>
ncalls tottime percall cumtime percall filename:lineno(function)
1 74.242 74.242 161.008 161.008 {built-in method _pickle.load}
233293409 46.669 0.000 84.175 0.000
xxx/versions/3.7.2/lib/python3.7/site-packages/chainerio/fileobject.py:50(read)
233293409 37.505 0.000 37.505 0.000 {method 'read' of '_io.BufferedReader' objects}
1452 0.384 0.000 0.388 0.000 <frozen importlib._bootstrap>:157(_get_module_lock)
The text was updated successfully, but these errors were encountered:
This commit solves the performance issues described in #42.
In order to add the missing `peek` support, the HdfsFile object gets
wrapped with `io.BufferedReader` when opening with 'rb',
which is how the file is opened when using pickle.
* Improve pickle performance on HDFS
This commit solves the performance issues described in #42.
In order to add the missing `peek` support, the HdfsFile object gets
wrapped with `io.BufferedReader` when opening with 'rb',
which is how the file is opened when using pickle.
* add comment string
* add io.bufferedwriter
* update comment string
A related issue to #36, As
peek
is not in theHdfsFile
, the same issue also happens when using HDFS. Unlike the #36, where thepeek
in POSIX was just hiden by theFileObject
,peek
needs to be added in case of HDFS.The text was updated successfully, but these errors were encountered: