You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In papermill==2.3.4, when I try to use HadoopFileSystem, I get the following error message.
$ papermill Untitled.ipynb hdfs://myhost/tmp.ipynb
Executing: 0%| | 0/9 [00:00<?, ?cell/s]
Traceback (most recent call last):
File "/opt/conda/bin/papermill", line 8, in <module>
sys.exit(papermill())
File "/opt/conda/lib/python3.7/site-packages/click/core.py", line 1128, in __call__
return self.main(*args, **kwargs)
File "/opt/conda/lib/python3.7/site-packages/click/core.py", line 1053, in main
rv = self.invoke(ctx)
File "/opt/conda/lib/python3.7/site-packages/click/core.py", line 1395, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/opt/conda/lib/python3.7/site-packages/click/core.py", line 754, in invoke
return __callback(*args, **kwargs)
File "/opt/conda/lib/python3.7/site-packages/click/decorators.py", line 26, in new_func
return f(get_current_context(), *args, **kwargs)
File "/opt/conda/lib/python3.7/site-packages/papermill/cli.py", line 267, in papermill
execution_timeout=execution_timeout,
File "/opt/conda/lib/python3.7/site-packages/papermill/execute.py", line 118, in execute_notebook
**engine_kwargs
File "/opt/conda/lib/python3.7/site-packages/papermill/engines.py", line 49, in execute_notebook_with_engine
return self.get_engine(engine_name).execute_notebook(nb, kernel_name, **kwargs)
File "/opt/conda/lib/python3.7/site-packages/papermill/engines.py", line 357, in execute_notebook
nb_man.notebook_start()
File "/opt/conda/lib/python3.7/site-packages/papermill/engines.py", line 69, in wrapper
return func(self, *args, **kwargs)
File "/opt/conda/lib/python3.7/site-packages/papermill/engines.py", line 198, in notebook_start
self.save()
File "/opt/conda/lib/python3.7/site-packages/papermill/engines.py", line 69, in wrapper
return func(self, *args, **kwargs)
File "/opt/conda/lib/python3.7/site-packages/papermill/engines.py", line 139, in save
write_ipynb(self.nb, self.output_path)
File "/opt/conda/lib/python3.7/site-packages/papermill/iorw.py", line 447, in write_ipynb
papermill_io.write(nbformat.writes(nb), path)
File "/opt/conda/lib/python3.7/site-packages/papermill/iorw.py", line 143, in write
return self.get_handler(path).write(buf, path)
File "/opt/conda/lib/python3.7/site-packages/papermill/iorw.py", line 372, in write
with self._get_client().open(path, 'wb') as f:
File "/opt/conda/lib/python3.7/site-packages/papermill/iorw.py", line 361, in _get_client
self._client = HadoopFileSystem()
File "pyarrow/_hdfs.pyx", line 55, in pyarrow._hdfs.HadoopFileSystem.__init__
TypeError: __init__() takes at least 1 positional argument (0 given)
Steps to reproduce the behavior
Install the latest version of papermill by either of the following steps
pip install papermill==2.3.4.
Install according to CONTRIBUTING.md
Execute the papermill with specifing a URL starting with hdfs:// for the path.
e.g.: $ papermill untitled.ipynb hdfs://myhost/tmp.ipynb
Analysis
I found that the HDFSHandler in iorw.py has not been modified, even though the API has changed with the filesystem change (#615).
As mentioned above, the function interface is different and I would like to fix HDFSHandler to match the new API of PyArrow.fs.HadoopFileSystem.
Note that in that case, I think we should delete the part importing PyArrow.HadoopFileSystem and update requirements/hdfs.txt.
(If it continues to support the previous filesystem, it needs to determine which was Imported and call the appropriate API.
However, I personally think that it is not necessary to support the deprecated class, considering the future maintenance effort.
Would you mind sharing your thoughts?)
The text was updated successfully, but these errors were encountered:
馃悰 Bug
In papermill==2.3.4, when I try to use HadoopFileSystem, I get the following error message.
Steps to reproduce the behavior
pip install papermill==2.3.4
.hdfs://
for the path.e.g.:
$ papermill untitled.ipynb hdfs://myhost/tmp.ipynb
Analysis
I found that the
HDFSHandler
iniorw.py
has not been modified, even though the API has changed with the filesystem change (#615).Proposed amendment
As mentioned above, the function interface is different and I would like to fix HDFSHandler to match the new API of
PyArrow.fs.HadoopFileSystem
.Note that in that case, I think we should delete the part importing PyArrow.HadoopFileSystem and update
requirements/hdfs.txt
.(If it continues to support the previous filesystem, it needs to determine which was Imported and call the appropriate API.
However, I personally think that it is not necessary to support the deprecated class, considering the future maintenance effort.
Would you mind sharing your thoughts?)
The text was updated successfully, but these errors were encountered: