Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training a model fails #552

Closed
campoy opened this issue Jan 26, 2019 · 13 comments
Closed

Training a model fails #552

campoy opened this issue Jan 26, 2019 · 13 comments

Comments

@campoy
Copy link

campoy commented Jan 26, 2019

Hi there,

Today I tried to use style-analyzer on a new repository, starting from scratch, as part of a demo and to better understand the whole project.

I failed, and this is a report on how.

Following the quickstart guide

I tried to follow the steps in this list.
Unfortunately I don't understand how this could work, since there's no model to be trained. Is it?

I tried it anyway:

$ python3 -m lookout run lookout.style.format -c config.yml
INFO:d823:run:Created SQLAlchemyModelRepository(db=sqlite:////tmp/lookout.sqlite, fs=/tmp)
INFO:d823:run:Created DataService(0.0.0.0:10301)
INFO:d823:run:Created AnalyzerManager(style.format.analyzer.FormatAnalyzer/1)
INFO:d823:run:Created EventListener(0.0.0.0:9930, 1 workers)
INFO:d823:run:Listening 0.0.0.0:9930
INFO:9d89:EventListener:new ReviewEvent
INFO:9d89:code-format:Reading /tmp/github.com/campoy/node/style.format.analyzer.FormatAnalyzer_1.asdf...
INFO:9d89:AnalyzerManager:cache miss: style.format.analyzer.FormatAnalyzer
INFO:9d89:DataService:Opened <grpc._channel.Channel object at 0x7f312284f5c0>
WARNING:9d89:FeaturesExtractor:could not parse file benchmark/cluster/echo.js with error 'Couldn't find the token in the specified position:
Node role: Operator
Parsed form: “=”
Raw form: “”
Start position: 0, 0, 0
End position: 0, 0, 0', skipping
INFO:9d89:AnalyzerManager:style.format.analyzer.FormatAnalyzer: 0 comments
INFO:9d89:EventListener:OK 0.870

Ok, so I do need a trained model first. I search for "train" on the README and it doesn't help, so I start to search for "train" on the filenames and I find there's a train.py file under lookout/style/format/research.

Giving up on the docs, let's train this thing!

Ok, so the docs don't really tell me much ... I'll read the python code.
It seems like I need to create an input and output directories:

$ mkdir ~/training_dir
$ cd ~/training_dir && git clone https://github.com/nodejs/node
$ mkdir ~/output_path

And finally, once everything is set-up I start training the model!

$ python3 train.py ~/training_dir/node/ ~/output_path/

It seems like it's working but it's quite slow ... I start looking at the logs of the containers started by docker-compose up for lookout and I see something interesting in the logs of bblfsh:

time="2019-01-25T20:40:45Z" level=error msg="request processed content 35313 bytes, status Fatal" elapsed=2.299299ms filename="doc/api/os.md" language=markdown
time="2019-01-25T20:40:45Z" level=error msg="error selecting pool: unexpected error: runtime failure: missing driver for language "markdown""

Wait ... why are we trying to parse markdown? And ... why is it failing? status Fatal is far from being meaningful. Anyway, it seems like we're spending a crazy amount of time parsing markdown, python, and other languages which don't seem relevant to the style-analyzer since it only supports javascript.

So I drop all the unnecessary drivers using bbflshctl now there's only one:

$ docker exec -it lookout_bblfsh_1 bblfshctl driver list
+------------+------------------------------------------+-------------+--------+-----------+--------+-----+----------+
|  LANGUAGE  |                  IMAGE                   |   VERSION   | STATUS |  CREATED  |   OS   | GO  |  NATIVE  |
+------------+------------------------------------------+-------------+--------+-----------+--------+-----+----------+
| javascript | docker://bblfsh/javascript-driver:v1.2.0 | dev-adcd1b4 | beta   | 10 months | alpine | 1.9 | 8.9.3-r0 |
+------------+------------------------------------------+-------------+--------+-----------+--------+-----+----------+
Response time 773.632µs

After doing this the training process accelerates and soon I get to 16525 iterations ... where I got this:

16525it [38:35, 46.06it/s]ERROR:2c2a:grpc._common:Exception deserializing message!
Traceback (most recent call last):
  File "/home/francesc/.local/lib/python3.5/site-packages/grpc/_common.py", line 82, in _transform
    return transformer(message)
google.protobuf.message.DecodeError: Error parsing message
Traceback (most recent call last):
  File "train.py", line 89, in <module>
    main()
  File "train.py", line 85, in main
    train(**vars(args))
  File "train.py", line 69, in train
    FakeDataService(bblfsh_client, prepare_files(filenames, bblfsh_client, language), None)
  File "/home/francesc/style-analyzer/lookout/style/format/utils.py", line 50, in prepare_files
    res = client.parse(file)
  File "/home/francesc/.local/lib/python3.5/site-packages/bblfsh/client.py", line 71, in parse
    return self._stub.Parse(request, timeout=timeout)
  File "/home/francesc/.local/lib/python3.5/site-packages/grpc/_channel.py", line 550, in __call__
    return _end_unary_response_blocking(state, call, False, None)
  File "/home/francesc/.local/lib/python3.5/site-packages/grpc/_channel.py", line 467, in _end_unary_response_blocking
    raise _Rendezvous(state, None, None, deadline)
grpc._channel._Rendezvous: <_Rendezvous of RPC that terminated with:
        status = StatusCode.INTERNAL
        details = "Exception deserializing response!"
        debug_error_string = "None"
>

Oh ... that's bad. Maybe it's bad luck and I should run it again.

16526it [07:05, 49.21it/s]ERROR:1676:grpc._common:Exception deserializing message!
Traceback (most recent call last):
  File "/home/francesc/.local/lib/python3.5/site-packages/grpc/_common.py", line 82, in _transform
    return transformer(message)
google.protobuf.message.DecodeError: Error parsing message
Traceback (most recent call last):
  File "train.py", line 89, in <module>
    main()
  File "train.py", line 85, in main
    train(**vars(args))
  File "train.py", line 69, in train
    FakeDataService(bblfsh_client, prepare_files(filenames, bblfsh_client, language), None)
  File "/home/francesc/style-analyzer/lookout/style/format/utils.py", line 50, in prepare_files
    res = client.parse(file)
  File "/home/francesc/.local/lib/python3.5/site-packages/bblfsh/client.py", line 71, in parse
    return self._stub.Parse(request, timeout=timeout)
  File "/home/francesc/.local/lib/python3.5/site-packages/grpc/_channel.py", line 550, in __call__
    return _end_unary_response_blocking(state, call, False, None)
  File "/home/francesc/.local/lib/python3.5/site-packages/grpc/_channel.py", line 467, in _end_unary_response_blocking
    raise _Rendezvous(state, None, None, deadline)
grpc._channel._Rendezvous: <_Rendezvous of RPC that terminated with:
        status = StatusCode.INTERNAL
        details = "Exception deserializing response!"
        debug_error_string = "None"
>

Ok, so at least now it fails much faster (7 minutes vs 38), but the error is still pretty cryptic.

Time to go to bed.

@campoy
Copy link
Author

campoy commented Jan 26, 2019

Just to be sure I tried with a different repo, this time with vue (https://github.com/vuejs/vue) and the same error occurred (just much faster).

465it [01:02,  7.39it/s]ERROR:1863:grpc._common:Exception deserializing message!
Traceback (most recent call last):
  File "/home/francesc/.local/lib/python3.5/site-packages/grpc/_common.py", line 82, in _transform
    return transformer(message)
google.protobuf.message.DecodeError: Error parsing message
Traceback (most recent call last):
  File "train.py", line 89, in <module>
    main()
  File "train.py", line 85, in main
    train(**vars(args))
  File "train.py", line 69, in train
    FakeDataService(bblfsh_client, prepare_files(filenames, bblfsh_client, language), None)
  File "/home/francesc/style-analyzer/lookout/style/format/utils.py", line 50, in prepare_files
    res = client.parse(file)
  File "/home/francesc/.local/lib/python3.5/site-packages/bblfsh/client.py", line 71, in parse
    return self._stub.Parse(request, timeout=timeout)
  File "/home/francesc/.local/lib/python3.5/site-packages/grpc/_channel.py", line 550, in __call__
    return _end_unary_response_blocking(state, call, False, None)
  File "/home/francesc/.local/lib/python3.5/site-packages/grpc/_channel.py", line 467, in _end_unary_response_blocking
    raise _Rendezvous(state, None, None, deadline)
grpc._channel._Rendezvous: <_Rendezvous of RPC that terminated with:
        status = StatusCode.INTERNAL
        details = "Exception deserializing response!"
        debug_error_string = "None"
>

@campoy
Copy link
Author

campoy commented Jan 26, 2019

After a while, it seems something happened and maybe the analyzer that I started running initially actually trained something?

I see these logs:

/home/francesc/.local/lib/python3.5/site-packages/sklearn/metrics/classification.py:1143: UndefinedMetricWarning: Precision and F-score are ill-defined and being set to 0.
0 in labels with no predicted samples.
  'precision', 'predicted', average, warn_for)
INFO:9d89:FormatAnalyzer:trained {'__init__': True,
 'created_at': datetime.datetime(2019, 1, 26, 0, 36, 12, 350828),
 'dependencies': [],
 'model': 'code-format',
 'uuid': '5bc4738b-e235-4719-b9c7-1599b5346e6d',
 'version': [1]}
style.format.analyzer.FormatAnalyzer/[1] https://github.com/campoy/node.git 4385240d999708ab6a3904095d9666c5aba5221c
# javascript
23 rules, avg.len. 6.4
DEBUG:code-format:/ruless/thresholds/ -> lz4 compression
DEBUG:code-format:/ruless/features/ -> lz4 compression
DEBUG:code-format:/ruless/cls/ -> lz4 compression
DEBUG:code-format:/ruless/support/ -> lz4 compression
DEBUG:code-format:/ruless/cmps/ -> lz4 compression
DEBUG:code-format:/ruless/conf/ -> lz4 compression
DEBUG:code-format:/ruless/lengths/ -> lz4 compression
DEBUG:code-format:/ruless/artificial/ -> lz4 compression
DEBUG:code-format:/origin_configs/feature_extractor/selected_features/ -> lz4 compression
INFO:9d89:EventListener:OK 464.649

The logs are full of messages of this style:

WARNING:9d89:FeaturesExtractor:could not parse file test/parallel/test-fs-read-stream-fd.js with error 'Couldn't find the token in the specified position:
Node role: Operator
Parsed form: “+=”
Raw form: “”
Start position: 0, 0, 0
End position: 0, 0, 0', skipping

And when I send a PR https://github.com/campoy/node/pull/4/files to the analyzer the parser fails:

INFO:9d89:EventListener:new ReviewEvent
WARNING:9d89:FeaturesExtractor:could not parse file benchmark/cluster/echo.js with error 'Couldn't find the token in the specified position:
Node role: Operator
Parsed form: “=”
Raw form: “”
Start position: 0, 0, 0
End position: 0, 0, 0', skipping
INFO:9d89:AnalyzerManager:style.format.analyzer.FormatAnalyzer: 0 comments
INFO:9d89:EventListener:OK 0.551

@vmarkovtsev
Copy link
Collaborator

According to the logs, the babelfish driver has a wrong version. We will insert the check since it is critical.

So we need to update the docs, because everything has changed recently. There are two ways to run the thing, you tried the developer's way and it is more tricky to setup.

@vmarkovtsev
Copy link
Collaborator

Version check is blocked by bblfsh/python-client#141

@campoy
Copy link
Author

campoy commented Jan 29, 2019

The babelfish driver for javascript is docker://bblfsh/javascript-driver:v1.2.0 ... isn't that the correct one?

@vmarkovtsev
Copy link
Collaborator

From my experience, sometimes you are sure that the driver is correct - but it's actually not. I had exactly the same problem before the Eng demo.

@campoy
Copy link
Author

campoy commented Jan 29, 2019

This is a serious problem, then.

Does this mean there's an issue on babelfish not pulling the right version? If so, the @src-d/language-analysis should be aware of this.

@vmarkovtsev
Copy link
Collaborator

It pulls but it is still tricky because a tiny mistake ruins everything.

@creachadair
Copy link

It pulls but it is still tricky because a tiny mistake ruins everything.

Sorry, I may be missing some context here: What kind of mistake can cause this to happen? If we can do something to make such errors less likely, I'm interested to know.

@vmarkovtsev
Copy link
Collaborator

I was mainly talking about docker: a container restart kills the driver if there is no volume.

@m09
Copy link
Contributor

m09 commented Jan 30, 2019

In my experience it happens when you first install the recommended driver and then install the correct version.

@zurk
Copy link
Contributor

zurk commented Jan 30, 2019

My experience is the same as Hugo's. And there is an issue bblfsh/bblfshd#184.

@campoy
Copy link
Author

campoy commented Feb 4, 2019

Ok, I'll close this as a duplicate of bblfsh/bblfshd#184 in that case

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants