Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Aligning the terminology used by CNTK with usual ML practices #2056

Closed
vermorel opened this issue Jun 28, 2017 · 5 comments
Closed

Aligning the terminology used by CNTK with usual ML practices #2056

vermorel opened this issue Jun 28, 2017 · 5 comments
Assignees

Comments

@vermorel
Copy link
Contributor

CNTK has its own terminology, and it's does not feel fully aligned with the rest of the world.

For example:

  • a sequence (CNTK) is typically called an instance
  • a sample (CNTK) is typically called a feature
  • input stream (plural, CNTK) is typically a feature column (arguably TF terminology) or a feature
  • the criterion (CNTK) is typically called the goal or the loss.
  • the evaluation (CNTK) is typically a metric

Two authoritative sources for machine learning terminology:

There are probably more examples to be found. Changing the terminology later on will be an even greater pain than it is today. Yet, the CNTK solution would significantly gain in transparency by adopting established terminology.

@ghost
Copy link

ghost commented Jun 28, 2017

@vermorel
I think that while there are these differences, the CNTK terms make a lot of sense and there are good reasons to keep them.
It would be better to make a note of terms used in CNTK literature, but removing these terminology would be catastrophic or at least imprudent. A note of these terms would be very helpful.

I do agree with you on the issue of sequence (CNTK) vs instance. But I would also like to give you my opinions in a little detail on the terminology examples you have posed.

  • Criterion (CNTK) makes sense because, it is the basis on which a network is trained. While the term goal is also very sensible, I have never liked the term loss too much. This is because loss as a term does not make much sense and does not give an intuitive understanding of what is being done.

  • Input stream (CNTK) is also a much better terminology than feature column. The term feature would make sense usually in regression based tasks or in cases where some pre-extracted set of numbers are provided as an input to a CNN. However in the context of computer vision where images are provided as an input, the term feature does not make much sense and infact contradicts with the usual view of the term feature. In the case of CNNs, they function as feature extractors and hence the output of a CNN should be called as a feature, not the input. The input stream term however is much more sensible. It implies a flow of data coming into a network and hence this term is independent of the context in which it is used.

  • evaluation(CNTK) is another sensible term. Metric refers to a measure. In CNTK metric is used to refer to the quantity which measures a network's performance. This is what is reported by ProgressWriter. The use of evaluation by CNTK to refer to the process of computing forward pass over a network is a natural way in which the process is clarified.

  • sample (CNTK) is more sensible than feature due to the same reason as described two points above.

  • sequence (CNTK) is a point where I do agree with your comment. I think calling it an instance is a much more prudent thing to do. Sequence (CNTK) restricts the natural usage to speech or NLP applications while instance makes it application independent.

I think it is important to make CNTK a more independent library in terms of its take on machine learning practices and use of sensible terms (while describing them !!) as against going after the popular terminology is much more important.

@vermorel
Copy link
Contributor Author

@ujjwal-researcher Thank you very much for this excellent piece of feedback. The content of your post would deserve a page in the public CNTK documentation. It would really help a lot.

@ghost
Copy link

ghost commented Jun 28, 2017

Thank you very much if you found it helpful :)

@cha-zhang cha-zhang assigned frankseide and n17s and unassigned frankseide Jul 13, 2017
@ldmtwo
Copy link

ldmtwo commented Jul 14, 2017

Thank you both for clarifying and translating the confusion. This would be most appropriate on the getting started page.

@sayanpa sayanpa self-assigned this Aug 8, 2017
sayanpa added a commit that referenced this issue Aug 10, 2017
@sayanpa
Copy link
Contributor

sayanpa commented Aug 11, 2017

Incorporated your changes into our introductory CNTK 101 tutorial.

@sayanpa sayanpa closed this as completed Aug 11, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants