-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FE: Implement FE service #73
Conversation
Signed-off-by: Maxim Sukharev <maxim@sourced.tech>
Signed-off-by: Maxim Sukharev <maxim@sourced.tech>
Signed-off-by: Maxim Sukharev <maxim@sourced.tech>
Signed-off-by: Maxim Sukharev <maxim@sourced.tech>
Should we add the python tests to travis in this PR? |
@carlosms what do you mean? they are added. Do you want to remove them? (I don't) |
My bad, I scanned the PR quickly thought your changes were testing the docker build only. |
.travis.yml
Outdated
@@ -54,6 +54,12 @@ matrix: | |||
env: STYLE_CHECK=true | |||
script: ./sbt scalastyle | |||
|
|||
- scala: 2.11.2 | |||
script: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's minor, but for convenience and consistency, could you please add env
section to this build with something descriptive and human-readable, like FE_PYTHON_TEST=true
?
It does not mean much yet, but is super-helpful with TravisCI, when a project has many profiles, to quickly identify the purpose of the failing one.
Like i.e here one would be able to tell that it's only Integration Tests failing for this job.
Looks awesome, a quick question - is there a reproducible way how one can generate/update the fixtures for the FE? |
"""Extract identifiers weighted set""" | ||
|
||
extractor = IdentifiersBagExtractor( | ||
docfreq_threshold=request.docfreqThreshold, split_stem=request.splitStem) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From what I can see, all BagsExtractor
s, of which IdentifiersBagExtractor
is a particular kind, have a weight
parameter - do you think it should be added on our side as well?
AFAIK it's something that most probably going to be used as part of parameter adjustment as soon as the values are inferred from a labeled data, provided by 🐈 .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh. I'm not sure how I missed it in the task about creating proto files. My bad.
Do you mind if I add this parameter in a separate PR? It should be in all extractors and requires changes in proto & scala.
"""Extract uast2seq weighted set""" | ||
|
||
extractor = UastSeqBagExtractor( | ||
docfreq_threshold=request.docfreqThreshold) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same situation with arguments of the base algorithm class here , allthough these guys token2index
and token_parser
class name might be of less value right now and could be added later, as soon as they are really needed.
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found only 1 implementation for token parser and token index in ml lib. So I decided, for now, we don't need additional parameters there.
if __name__ == '__main__': | ||
parser = argparse.ArgumentParser(description='Feature Extractor Service.') | ||
parser.add_argument("--port", type=int, default=9001, | ||
help="server listen port") | ||
args = parser.parse_args() | ||
|
||
# sourced-ml expects PYTHONHASHSEED != random or unset | ||
if os.getenv("PYTHONHASHSEED", "random") == "random": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I can see, these are src-d/ml implementation details, relevant to some extractors.
What do you think - on our side, would it be wise to be more careful and make put them as part of the API and pass them as explicit parameters of RPC request?
This could help to decouple a library API with it's runtime environment, which is a good practice anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope. This parameter changes how python interpreter works. You can’t change it in runtime. That’s why the check is in __main__
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. Reading https://docs.python.org/3.3/using/cmdline.html#envvar-PYTHONHASHSEED helped to understand it better, which would be nice to have somewhere (i.e in that comment above the code)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is a comment already. 2 lines below.
Signed-off-by: Maxim Sukharev <maxim@sourced.tech>
You can generate a fixture using sdk. The command for it was added here bblfsh/sdk#249 all comments were addressed (by comments). @bzz could you please take another look? I see only 1 problem: weight param. But I strongly prefer to add it in separate PR to avoid conflicts. |
Really, really nice, thank you for linking 👍 ! I only wish it could be documented somewhere, i.e same as code generation from .proto, so next person has to update the tests or something like that - it saves time for a him/her.
sounds good to me |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, sans very minor documentation issue on test fixture generation.
I'm not sure about the rightest place to put info about fixtures for now. I believe I would need fixtures for integration test scala-python (last task in umbrella issue), it might change the location of fixtures or I'll create some script. I'll document how to generate fixtures in that task when figure out the best way to do it. |
Part of #51