-
Notifications
You must be signed in to change notification settings - Fork 275
Lib classifiers #35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Lib classifiers #35
Conversation
| companion object { | ||
| val LANGUAGE_NAME = "python" | ||
| val FILE_EXTS = listOf("py", "py3") | ||
| val LIBRARIES_CLASSIFIER_PATH = "data/models/python_libs_pipeline.pmml" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add this file to Git.
|
|
||
| val pmml = PMMLUtil.unmarshal(File(LIBRARIES_CLASSIFIER_PATH).inputStream()) | ||
| val evaluator = ModelEvaluatorFactory.newInstance() | ||
| .newModelEvaluator(pmml) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better to align on dot or just 4 spaces if it doesn't fit in a such way.
| override fun getLineLibrary(line: String): String { | ||
| val arguments = LinkedHashMap<FieldName, FieldValue>() | ||
|
|
||
| for (inputField in evaluator.inputFields) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fields are all known libraries?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
evaluator.inputFields is a wrapper over classifier input.
| return libraries.toList() | ||
| } | ||
|
|
||
| override fun getLineLibrary(line: String): String { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function not used and it doesn't account detected imports. Did it make intentionally?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a basic piece that returns the result of a classifier which we then should use in commit stat aggregating by intersection with detected imports.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that it's better to select library according to detected imports inside this function because in such way we don't lose library that is actually imported but has smaller score than a library which isn't imported.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand it, but my current working approach is when the classifier returns bunch of libraries based on the output probabilities(if some libraries have almost the same probability as the library with the highest prob we return them too). Anyway I'll think of how to improve the quality by introducing imported libraries. Thanks!
75c8648 to
6832446
Compare
astansler
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome!
Wrap lines that excess 80 column limit, please.
| && it != "-"} | ||
| val stringRegex = Regex("""(".+?"|'.+?')""") | ||
| val newLine = stringRegex.replace(line, "") | ||
| //TODO: multiline comment regex |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO(lyaronskaya)
* wip: lib classifier integration, python model pmml * wip: updated classifiers, tests, add models.zip * chore: todo
No description provided.