-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to export the classification result to a txt file #44
Comments
I've stumbled upon this issue as well, although I think it might be by design. What you can do is output the output layer's values to a .txt file by changing the action of the test config to "write" and specifying outputPath and outputNodeNames attributes. Like so (for the MNIST example): This dumps values of the ol.z node to a text file, one sample per row (in this example you have 10 values for the 10 classes). Now all you need is a simple Python/Matlab/whatever script to determine the index of the max value per row. But maybe there is an easier way? |
Yeap, after digging into the code, I've found a 'dirty' way to do this. The function "AssignNumOfDiff" is designed for this. Taking CPUMatrix.cpp as an example. You can change this function into template
} Say a(i,j) is the original label, and the b(i,j) is the predicted label. Now you can export them into "classification.txt" for further evaluation. If you choose the GPU version, you would have to look into template where the 'num of diff' is actually computed in _assignNumOfDiff(...). And again, maybe there's a better way to do this. But I am happy with the output at this moment. |
I am happy to activeted in. |
OK, thanks so much! We should address this at some point in time. From: xqyd [mailto:notifications@github.com] Yeap, after digging into the code, I've found a 'dirty' way to do this. The function "AssignNumOfDiff" is designed for this. Taking CPUMatrix.cpp as an example. You can change this function into template ElemType n = 0; int old, cur; FILE *stream = fopen("classification.txt", "w"); if (!searchInCol) {
} else {
} fclose(stream); Resize(1, 1); // result should be one element (*this)(0, 0) = n; return *this; } Say a(i,j) is the original label, and the b(i,j) is the predicted label. Now you can export them into "classification.txt" for further evaluation. If you choose the GPU version, you would have to look into template where the 'num of diff' is actually computed in _assignNumOfDiff(...). And again, maybe there's a better way to do this. But I am happy with the output at this moment. — |
I've implemented a new node called "ClassPrediction(labels,output)" that outputs predictions and corresponding labels of a net. You can use the "write" action together with this node to dump predictions and labels to a text file. |
Thanks! Could you please see how it differs from HardMaxNode? From: enricoschroeder [mailto:notifications@github.com] I've implemented a new node called "ClassPrediction(labels,output)" that outputs predictions and corresponding labels of a net. You can use the "write" action together with this node to dump predictions and labels to a text file. — |
I think it will output the id of the class instead of the hard scores. Thanks, Dong Yu (俞栋) From: Frank Seide [mailto:notifications@github.com] Thanks! Could you please see how it differs from HardMaxNode? From: enricoschroeder [mailto:notifications@github.com] I've implemented a new node called "ClassPrediction(labels,output)" that outputs predictions and corresponding labels of a net. You can use the "write" action together with this node to dump predictions and labels to a text file. — — |
Yes, exactly. It outputs the class index of prediction and label. When you dump the node's output to a text file using the "writer" action you get a text file containing N rows (for N samples) and 2 columns, one for prediction and one for label. |
Nice! This is our first actual code contribution, and I like how it fits nicely with the rest. I do have two concerns though: One is that Readers can interpret their inputs as “real” values or “category” values. The BinaryWriter can also write “category” values (not helpful to you because it’s binary). I think that is what you are achieving as well, but using a very different way. The second is that so far, CNTK does not have a notion of index vectors (except in Readers, where category indices get converted into one-hot immediately). It has been a discussion whether we should introduce this explicit notion, and if we do, what impact will it have on other nodes/parts of the system. E.g. would we also need the opposite operation (an indexing node)? Shuffling operations? Back-propagation through an index vector? Our thinking so far is that it would hold off with introducing this new notion until we full understood and vetted the impact. What I had in mind when I first saw your and similar Issues was somewhat different from your approach, to locate this conversion from one-hot to index in the writers: · add an optional labelType parameter to the writer (SimpleWriter for now) o it would expect the input to be a one-hot vector, and its output would be the index of the vector instead of the vector, like what ClassPredictionNode achieves, but confined to an I/O concept. · add an optional labelMappingFile parameter that allows to map the indices back to a string representation o (not needed in your specific case, but useful for language processing) · classification would be done using HardmaxNode instead (which creates a one-hot representation), and in order to output both classification result and label, you would just pass both Hardmax and input-label nodes to the writer o you would need to use one extra a ‘paste’ command on the command line in order to zip both values together next to each other I wonder what are your thoughts on this? Would you be open to modifying your submission to implement it this way? (maybe without the labelMappingFile for now, we can add that once we have a test case) Thanks so much for your contribution! Frank From: enricoschroeder [mailto:notifications@github.com] Yes, exactly. It outputs the class index of prediction and label. When you dump the node's output to a text file using the "writer" action you get a text file containing N rows (for N samples) and 2 columns, one for prediction and one for label. — |
Sorry, I should have said this more clearly: It's the first contribution of a new module--we are very happy about the various bug fixes we have also received! |
Hi Frank and others, sorry for the late response (was on holidays and without internet for the last week). Your way of implementing this feature makes more sense. The way I did it was more like a little exercise to get acquainted with CNTKs architecture and especially how to implement missing functionality via new nodes. I could maybe spent some time in the next couple of weeks to implement it the way you proposed (or are you already working on it?). |
Hi Enrico, thanks! I have already implemented it, as I needed the same for a new project. Once that lands in master, it will support interpreting outputs as category labels (it will pick the max), and you can also map it with a dictionary. For example, add this in the "write" section: format = [ type = "category"; labelMappingFile = "echo -e 'zero\none\ntwo\n...' |" ] You can also specify header, footer, and separator strings, which supports some simple syntaxes such as Matlab matrices. I will notify this thread once it is in. (You can try it in branch fseide/s2s.) |
Hi, Yes an easier export feature would be helpful. Thanks, JM |
Ah, I just saw that a correction has been added to the source code in Github that was not part of the binary that I downloaded. Thanks, JM |
Hi,
After compiling the CNTK on Win8.1, I was able to run the Simple2d to have a taste on this toolkit. The performance is amazing!!! It's super fast!!!
However, I have a simple question, besides print out the EvalErrorPrediction value onto the screen, is there a command or action that can output the classification result to a .txt file like this:
0 0
1 1
0 1
1 0
1 1
...
while the first column is the label of the test file, and the second column is the label identified with CNTK. The reason I'm asking this is that we are intend to run different fiber identification task on CNTK, hence we need to know the identified fiber blend ratio, say the 70/30 cotton/wool might be identified as 65/35 blends.
Thanks!!
The text was updated successfully, but these errors were encountered: