Skip to content
This repository has been archived by the owner on May 22, 2019. It is now read-only.

error while running how_to_use_ast2vec.ipynb #120

Closed
monperrus opened this issue Nov 22, 2017 · 14 comments
Closed

error while running how_to_use_ast2vec.ipynb #120

monperrus opened this issue Nov 22, 2017 · 14 comments

Comments

@monperrus
Copy link

I've installed ml pip3 install -r requirements.txt

Now I'm executing the example:
jupyter-nbconvert --execute how_to_use_ast2vec.ipynb

But I get an error. Any idea to fix it?

Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/nbconvert/nbconvertapp.py", line 393, in export_single_notebook
    output, resources = self.exporter.from_filename(notebook_filename, resources=resources)
  File "/usr/lib/python3/dist-packages/nbconvert/exporters/exporter.py", line 174, in from_filename
    return self.from_file(f, resources=resources, **kw)
  File "/usr/lib/python3/dist-packages/nbconvert/exporters/exporter.py", line 192, in from_file
    return self.from_notebook_node(nbformat.read(file_stream, as_version=4), resources=resources, **kw)
  File "/usr/lib/python3/dist-packages/nbconvert/exporters/html.py", line 85, in from_notebook_node
    return super(HTMLExporter, self).from_notebook_node(nb, resources, **kw)
  File "/usr/lib/python3/dist-packages/nbconvert/exporters/templateexporter.py", line 280, in from_notebook_node
    nb_copy, resources = super(TemplateExporter, self).from_notebook_node(nb, resources, **kw)
  File "/usr/lib/python3/dist-packages/nbconvert/exporters/exporter.py", line 134, in from_notebook_node
    nb_copy, resources = self._preprocess(nb_copy, resources)
  File "/usr/lib/python3/dist-packages/nbconvert/exporters/exporter.py", line 311, in _preprocess
    nbc, resc = preprocessor(nbc, resc)
  File "/usr/lib/python3/dist-packages/nbconvert/preprocessors/base.py", line 47, in __call__
    return self.preprocess(nb, resources)
  File "/usr/lib/python3/dist-packages/nbconvert/preprocessors/execute.py", line 262, in preprocess
    nb, resources = super(ExecutePreprocessor, self).preprocess(nb, resources)
  File "/usr/lib/python3/dist-packages/nbconvert/preprocessors/base.py", line 69, in preprocess
    nb.cells[index], resources = self.preprocess_cell(cell, resources, index)
  File "/usr/lib/python3/dist-packages/nbconvert/preprocessors/execute.py", line 286, in preprocess_cell
    raise CellExecutionError.from_cell_and_msg(cell, out)
nbconvert.preprocessors.execute.CellExecutionError: An error occurred while executing the following cell:
------------------
# setup logging
from ast2vec import setup_logging
setup_logging(level="DEBUG")

# setup linguist - mandatory to launch first time to build enry.
# after this you can specify path to enry file.
from ast2vec import install_enry
install_enry()

# check bblfsh server
from ast2vec import ensure_bblfsh_is_running_noexc
ensure_bblfsh_is_running_noexc()
------------------

---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
<ipython-input-1-8dceed0627d5> in <module>()
      1 # setup logging
----> 2 from ast2vec import setup_logging
      3 setup_logging(level="DEBUG")
      4 
      5 # setup linguist - mandatory to launch first time to build enry.

ImportError: cannot import name 'setup_logging'
ImportError: cannot import name 'setup_logging'


@zurk
Copy link
Contributor

zurk commented Nov 23, 2017

@monperrus, thank you for your report! It was a little outdated version of notebook since we update modelforge and move log-related functions there. I fix it and you will be able to run it when @vmarkovtsev merge PR.

Also, be sure that you put some real repositories in py_repos list.

@monperrus
Copy link
Author

monperrus commented Nov 23, 2017 via email

@zurk
Copy link
Contributor

zurk commented Nov 23, 2017

@monperrus just look to the second cell in notebook you are trying to run: https://github.com/src-d/ml/blob/master/doc/how_to_use_ast2vec.ipynb

There is some template value, that says that you need to put the real one.

# repositories to use
py_repos = ['list', 'of', 'repositories', 'from', 'github', 'or', 'local', 'files', 'with', 'repositories']

@monperrus
Copy link
Author

monperrus commented Nov 23, 2017 via email

@vmarkovtsev
Copy link
Collaborator

The URLs must be cloneable with git clone - it is called under the hood. You can mix URLs and local dirs, the URLs will be cloned automatically.

@monperrus
Copy link
Author

Thanks.

After setting the list, there is a new error.

Increasing the timeout does not help, and the timeout occurs after a long period of CPU idleness. Waiting for network?

[NbConvertApp] Converting notebook how_to_use_ast2vec.ipynb to html
[NbConvertApp] Executing notebook with kernel: python3
[NbConvertApp] ERROR | Timeout waiting for execute reply (30s).
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/nbconvert/preprocessors/execute.py", line 324, in _wait_for_reply
    msg = self.kc.shell_channel.get_msg(timeout=timeout)
  File "/usr/lib/python3/dist-packages/jupyter_client/blocking/channels.py", line 57, in get_msg
    raise Empty
queue.Empty

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/bin/jupyter-nbconvert", line 11, in <module>
    load_entry_point('nbconvert==5.3.1', 'console_scripts', 'jupyter-nbconvert')()
  File "/usr/lib/python3/dist-packages/jupyter_core/application.py", line 267, in launch_instance
    return super(JupyterApp, cls).launch_instance(argv=argv, **kwargs)
  File "/usr/lib/python3/dist-packages/traitlets/config/application.py", line 658, in launch_instance
    app.start()
  File "/usr/lib/python3/dist-packages/nbconvert/nbconvertapp.py", line 325, in start
    self.convert_notebooks()
  File "/usr/lib/python3/dist-packages/nbconvert/nbconvertapp.py", line 493, in convert_notebooks
    self.convert_single_notebook(notebook_filename)
  File "/usr/lib/python3/dist-packages/nbconvert/nbconvertapp.py", line 464, in convert_single_notebook
    output, resources = self.export_single_notebook(notebook_filename, resources, input_buffer=input_buffer)
  File "/usr/lib/python3/dist-packages/nbconvert/nbconvertapp.py", line 393, in export_single_notebook
    output, resources = self.exporter.from_filename(notebook_filename, resources=resources)
  File "/usr/lib/python3/dist-packages/nbconvert/exporters/exporter.py", line 174, in from_filename
    return self.from_file(f, resources=resources, **kw)
  File "/usr/lib/python3/dist-packages/nbconvert/exporters/exporter.py", line 192, in from_file
    return self.from_notebook_node(nbformat.read(file_stream, as_version=4), resources=resources, **kw)
  File "/usr/lib/python3/dist-packages/nbconvert/exporters/html.py", line 85, in from_notebook_node
    return super(HTMLExporter, self).from_notebook_node(nb, resources, **kw)
  File "/usr/lib/python3/dist-packages/nbconvert/exporters/templateexporter.py", line 280, in from_notebook_node
    nb_copy, resources = super(TemplateExporter, self).from_notebook_node(nb, resources, **kw)
  File "/usr/lib/python3/dist-packages/nbconvert/exporters/exporter.py", line 134, in from_notebook_node
    nb_copy, resources = self._preprocess(nb_copy, resources)
  File "/usr/lib/python3/dist-packages/nbconvert/exporters/exporter.py", line 311, in _preprocess
    nbc, resc = preprocessor(nbc, resc)
  File "/usr/lib/python3/dist-packages/nbconvert/preprocessors/base.py", line 47, in __call__
    return self.preprocess(nb, resources)
  File "/usr/lib/python3/dist-packages/nbconvert/preprocessors/execute.py", line 262, in preprocess
    nb, resources = super(ExecutePreprocessor, self).preprocess(nb, resources)
  File "/usr/lib/python3/dist-packages/nbconvert/preprocessors/base.py", line 69, in preprocess
    nb.cells[index], resources = self.preprocess_cell(cell, resources, index)
  File "/usr/lib/python3/dist-packages/nbconvert/preprocessors/execute.py", line 280, in preprocess_cell
    reply, outputs = self.run_cell(cell, cell_index)
  File "/usr/lib/python3/dist-packages/nbconvert/preprocessors/execute.py", line 348, in run_cell
    exec_reply = self._wait_for_reply(msg_id, cell)
  File "/usr/lib/python3/dist-packages/nbconvert/preprocessors/execute.py", line 337, in _wait_for_reply
    raise exception("Cell execution timed out")
TimeoutError: Cell execution timed out

@monperrus
Copy link
Author

And with a smaller repo, another error. I had a look at the code but the error is not obvious.

[NbConvertApp] Converting notebook how_to_use_ast2vec.ipynb to html
[NbConvertApp] Executing notebook with kernel: python3
[NbConvertApp] ERROR | Error while converting 'how_to_use_ast2vec.ipynb'
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/nbconvert/nbconvertapp.py", line 393, in export_single_notebook
    output, resources = self.exporter.from_filename(notebook_filename, resources=resources)
  File "/usr/lib/python3/dist-packages/nbconvert/exporters/exporter.py", line 174, in from_filename
    return self.from_file(f, resources=resources, **kw)
  File "/usr/lib/python3/dist-packages/nbconvert/exporters/exporter.py", line 192, in from_file
    return self.from_notebook_node(nbformat.read(file_stream, as_version=4), resources=resources, **kw)
  File "/usr/lib/python3/dist-packages/nbconvert/exporters/html.py", line 85, in from_notebook_node
    return super(HTMLExporter, self).from_notebook_node(nb, resources, **kw)
  File "/usr/lib/python3/dist-packages/nbconvert/exporters/templateexporter.py", line 280, in from_notebook_node
    nb_copy, resources = super(TemplateExporter, self).from_notebook_node(nb, resources, **kw)
  File "/usr/lib/python3/dist-packages/nbconvert/exporters/exporter.py", line 134, in from_notebook_node
    nb_copy, resources = self._preprocess(nb_copy, resources)
  File "/usr/lib/python3/dist-packages/nbconvert/exporters/exporter.py", line 311, in _preprocess
    nbc, resc = preprocessor(nbc, resc)
  File "/usr/lib/python3/dist-packages/nbconvert/preprocessors/base.py", line 47, in __call__
    return self.preprocess(nb, resources)
  File "/usr/lib/python3/dist-packages/nbconvert/preprocessors/execute.py", line 262, in preprocess
    nb, resources = super(ExecutePreprocessor, self).preprocess(nb, resources)
  File "/usr/lib/python3/dist-packages/nbconvert/preprocessors/base.py", line 69, in preprocess
    nb.cells[index], resources = self.preprocess_cell(cell, resources, index)
  File "/usr/lib/python3/dist-packages/nbconvert/preprocessors/execute.py", line 286, in preprocess_cell
    raise CellExecutionError.from_cell_and_msg(cell, out)
nbconvert.preprocessors.execute.CellExecutionError: An error occurred while executing the following cell:
------------------
# prepare input to swiwel:
# many (tokens, co-occ) -> swivel input format
from ast2vec.id_embedding import PreprocessTransformer
prep = PreprocessTransformer()

input_to_swivel = "input_to_swivel"  # folder for swivel shards, etc.
df_loc = "df.asdf"  # location for document freq
prep.transform(X=coocc_folder, output=input_to_swivel, df=df_loc)
------------------

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-4-98dd654e0352> in <module>()
      6 input_to_swivel = "input_to_swivel"  # folder for swivel shards, etc.
      7 df_loc = "df.asdf"  # location for document freq
----> 8 prep.transform(X=coocc_folder, output=input_to_swivel, df=df_loc)

/home/martin/martin-no-backup/ml/ast2vec/id_embedding.py in transform(self, X, output, df, vocabulary_size, shard_size)
     43                          input=X, df=df, shard_size=self.shard_size,
     44                          output=output)
---> 45         preprocess(args)
     46 
     47     def _get_log_name(self):

/home/martin/martin-no-backup/ml/ast2vec/id_embedding.py in preprocess(args)
     86             "vocabulary_size={0} is less than shard_size={1}. "
     87             "You should specify smaller shard_size "
---> 88             "(pass shard_size={0} argument).".format(vs, sz))
     89     vs -= vs % sz
     90     log.info("Effective vocabulary size: %d", vs)

ValueError: vocabulary_size=0 is less than shard_size=8. You should specify smaller shard_size (pass shard_size=0 argument).
ValueError: vocabulary_size=0 is less than shard_size=8. You should specify smaller shard_size (pass shard_size=0 argument).


@vmarkovtsev
Copy link
Collaborator

@monperrus Thank you very much for your patience and testing! @zurk Please investigate this. @EgorBu Please suffer too.

I think we should prepare a dockerized notebook which is guaranteed to be reproducible, with needed bblfsh inside and the rest of the quirks. I will have a personal look tomorrow.

Let me apologize for this. I have extensively tested the command line apps but apparently did not pay enough attention to the notebook. Let me suggest to switch to the command line. @zurk should provide the commands tomorrow.

@zurk
Copy link
Contributor

zurk commented Nov 24, 2017

on my way.
@monperrus Can you say which repository you take the first time?

@monperrus
Copy link
Author

monperrus commented Nov 24, 2017 via email

@zurk
Copy link
Contributor

zurk commented Nov 24, 2017

With the second repository, everything is clear.
ast2vec can work only with repositories, that written in languages Babelfish can parse.
For the babelfish version, that ast2vec uses it is Java and Python. And there are no such files in https://github.com/github/scientist.

FYI, for new versions you can check out it in babelfish documentation. Also, Ruby and Bash will be available soon.

P.S.: Now we are working on a big ast2vec update. Also, it will be renamed to sourced ml. It will use new babelfish and our new and awesome https://github.com/src-d/engine.

And please give me more time to investigate the first error.

@zurk
Copy link
Contributor

zurk commented Nov 24, 2017

About this repo: https://github.com/INRIA/spoon

It is encoding related issue because of this file:
https://github.com/INRIA/spoon/blob/master/src/test/resources/noclasspath/IsoEncoding.java
This file has non-standard characters.

Continue to investigate.

@zurk
Copy link
Contributor

zurk commented Nov 24, 2017

So, it is not a real problem. Because ast2vec just ignore such files.
Anyway, I create an issue for Babelfish team to have better exception message.
bblfsh/python-client#61

I found out that everything is working for me for https://github.com/INRIA/spoon when I run it using `jupyter notebook. But it takes 20 mins (extremely long for 1200 java files). Maybe it can be related to fresh Babelfish issue bblfsh/bblfshd#130. Not sure.

There will be PR soon with some README changes and new docker container for the notebook. @vmarkovtsev mentioned it.

@zurk
Copy link
Contributor

zurk commented Dec 12, 2017

Sorry for long response.
I modify docker file and add instructions to readme how to run it with a jupyter notebook, so you should be able to run it easily.
For now, we close the issue. If you will have problems, feel free to reopen it.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants