Skip to content

Conversation

@yaronskaya
Copy link
Contributor

No description provided.

companion object {
val LANGUAGE_NAME = "python"
val FILE_EXTS = listOf("py", "py3")
val LIBRARIES_CLASSIFIER_PATH = "data/models/python_libs_pipeline.pmml"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add this file to Git.


val pmml = PMMLUtil.unmarshal(File(LIBRARIES_CLASSIFIER_PATH).inputStream())
val evaluator = ModelEvaluatorFactory.newInstance()
.newModelEvaluator(pmml)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to align on dot or just 4 spaces if it doesn't fit in a such way.

override fun getLineLibrary(line: String): String {
val arguments = LinkedHashMap<FieldName, FieldValue>()

for (inputField in evaluator.inputFields) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fields are all known libraries?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

evaluator.inputFields is a wrapper over classifier input.

return libraries.toList()
}

override fun getLineLibrary(line: String): String {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function not used and it doesn't account detected imports. Did it make intentionally?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a basic piece that returns the result of a classifier which we then should use in commit stat aggregating by intersection with detected imports.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that it's better to select library according to detected imports inside this function because in such way we don't lose library that is actually imported but has smaller score than a library which isn't imported.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand it, but my current working approach is when the classifier returns bunch of libraries based on the output probabilities(if some libraries have almost the same probability as the library with the highest prob we return them too). Anyway I'll think of how to improve the quality by introducing imported libraries. Thanks!

@astansler astansler changed the base branch from master to develop October 5, 2017 15:38
@yaronskaya yaronskaya changed the title Classifier wrapper for python Lib classifiers Oct 13, 2017
Copy link
Member

@astansler astansler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome!
Wrap lines that excess 80 column limit, please.

&& it != "-"}
val stringRegex = Regex("""(".+?"|'.+?')""")
val newLine = stringRegex.replace(line, "")
//TODO: multiline comment regex
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO(lyaronskaya)

@yaronskaya yaronskaya merged commit 9691d68 into develop Oct 16, 2017
astansler pushed a commit that referenced this pull request Nov 27, 2017
* wip: lib classifier integration, python model pmml

* wip: updated classifiers, tests, add models.zip

* chore: todo
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants