Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to export the classification result to a txt file #44

Closed
xqyd opened this issue Jan 28, 2016 · 14 comments
Closed

How to export the classification result to a txt file #44

xqyd opened this issue Jan 28, 2016 · 14 comments

Comments

@xqyd
Copy link

xqyd commented Jan 28, 2016

Hi,
After compiling the CNTK on Win8.1, I was able to run the Simple2d to have a taste on this toolkit. The performance is amazing!!! It's super fast!!!

However, I have a simple question, besides print out the EvalErrorPrediction value onto the screen, is there a command or action that can output the classification result to a .txt file like this:
0 0
1 1
0 1
1 0
1 1
...
while the first column is the label of the test file, and the second column is the label identified with CNTK. The reason I'm asking this is that we are intend to run different fiber identification task on CNTK, hence we need to know the identified fiber blend ratio, say the 70/30 cotton/wool might be identified as 65/35 blends.

Thanks!!

@enricoschroeder
Copy link

I've stumbled upon this issue as well, although I think it might be by design. What you can do is output the output layer's values to a .txt file by changing the action of the test config to "write" and specifying outputPath and outputNodeNames attributes.

Like so (for the MNIST example):
test = [
action = write
outputPath = "path/to/some/location"
outputNodeNames = "ol.z" # ol.z is the name of the output layers final value in this particular example
...
]

This dumps values of the ol.z node to a text file, one sample per row (in this example you have 10 values for the 10 classes). Now all you need is a simple Python/Matlab/whatever script to determine the index of the max value per row.

But maybe there is an easier way?

@xqyd
Copy link
Author

xqyd commented Jan 29, 2016

Yeap, after digging into the code, I've found a 'dirty' way to do this. The function "AssignNumOfDiff" is designed for this. Taking CPUMatrix.cpp as an example. You can change this function into

template
CPUMatrix& CPUMatrix::AssignNumOfDiff(const CPUMatrix& a, const CPUMatrix& b, bool searchInCol)
{
if (a.GetNumCols() != b.GetNumCols())
throw std::invalid_argument("AssignNumOfDiff: a and b must have the same number of columns.");
if (!searchInCol && a.GetNumRows() != b.GetNumRows())
throw std::invalid_argument("AssignNumOfDiff: a and b must have the same number of rows.");

ElemType n = 0;
int old, cur;
FILE *stream = fopen("classification.txt", "w");
if (!searchInCol)
{
    foreach_coord (i, j, a)
    {
        old = (int)a(i, j);
        cur = (int)b(i, j);
        fprintf(stream, "%d %d\n", old, cur);
        n += (a(i, j) != b(i, j));
    }
}
else
{
    size_t crow = b.GetNumRows();
    const ElemType* curCol = b.m_pArray;
    for (size_t icol = 0; icol < a.GetNumCols(); icol++, curCol += crow)
    {
        auto res = std::find(curCol, curCol + crow, a(0, icol));
        if (res == curCol + crow)
            n++;
    }
}
fclose(stream);

Resize(1, 1); // result should be one element
(*this)(0, 0) = n;

return *this;

}

Say a(i,j) is the original label, and the b(i,j) is the predicted label. Now you can export them into "classification.txt" for further evaluation.

If you choose the GPU version, you would have to look into

template
GPUMatrix& GPUMatrix::AssignNumOfDiff(const GPUMatrix& a, const GPUMatrix& b, bool searchInCol)

where the 'num of diff' is actually computed in _assignNumOfDiff(...).

And again, maybe there's a better way to do this. But I am happy with the output at this moment.

@amyrebullar
Copy link

I am happy to activeted in.

@frankseide
Copy link
Contributor

OK, thanks so much! We should address this at some point in time.

From: xqyd [mailto:notifications@github.com]
Sent: Friday, January 29, 2016 0:46
To: Microsoft/CNTK CNTK@noreply.github.com
Subject: Re: [CNTK] How to export the classification result to a txt file (#44)

Yeap, after digging into the code, I've found a 'dirty' way to do this. The function "AssignNumOfDiff" is designed for this. Taking CPUMatrix.cpp as an example. You can change this function into

template
CPUMatrix& CPUMatrix::AssignNumOfDiff(const CPUMatrix& a, const CPUMatrix& b, bool searchInCol)
{
if (a.GetNumCols() != b.GetNumCols())
throw std::invalid_argument("AssignNumOfDiff: a and b must have the same number of columns.");
if (!searchInCol && a.GetNumRows() != b.GetNumRows())
throw std::invalid_argument("AssignNumOfDiff: a and b must have the same number of rows.");

ElemType n = 0;

int old, cur;

FILE *stream = fopen("classification.txt", "w");

if (!searchInCol)

{

foreach_coord (i, j, a)

{

    old = (int)a(i, j);

    cur = (int)b(i, j);

    fprintf(stream, "%d %d\n", old, cur);

    n += (a(i, j) != b(i, j));

}

}

else

{

size_t crow = b.GetNumRows();

const ElemType* curCol = b.m_pArray;

for (size_t icol = 0; icol < a.GetNumCols(); icol++, curCol += crow)

{

    auto res = std::find(curCol, curCol + crow, a(0, icol));

    if (res == curCol + crow)

        n++;

}

}

fclose(stream);

Resize(1, 1); // result should be one element

(*this)(0, 0) = n;

return *this;

}

Say a(i,j) is the original label, and the b(i,j) is the predicted label. Now you can export them into "classification.txt" for further evaluation.

If you choose the GPU version, you would have to look into

template
GPUMatrix& GPUMatrix::AssignNumOfDiff(const GPUMatrix& a, const GPUMatrix& b, bool searchInCol)

where the 'num of diff' is actually computed in _assignNumOfDiff(...).

And again, maybe there's a better way to do this. But I am happy with the output at this moment.


Reply to this email directly or view it on GitHubhttps://github.com//issues/44#issuecomment-176645662.

@enricoschroeder
Copy link

I've implemented a new node called "ClassPrediction(labels,output)" that outputs predictions and corresponding labels of a net. You can use the "write" action together with this node to dump predictions and labels to a text file.
I'll issue a pull request, in the mean time anyone interested can find it here: https://github.com/enricoschroeder/CNTK.

@frankseide
Copy link
Contributor

Thanks! Could you please see how it differs from HardMaxNode?

From: enricoschroeder [mailto:notifications@github.com]
Sent: Wednesday, February 3, 2016 10:16
To: Microsoft/CNTK CNTK@noreply.github.com
Cc: Frank Seide fseide@microsoft.com
Subject: Re: [CNTK] How to export the classification result to a txt file (#44)

I've implemented a new node called "ClassPrediction(labels,output)" that outputs predictions and corresponding labels of a net. You can use the "write" action together with this node to dump predictions and labels to a text file.
I'll issue a pull request, in the mean time anyone interested can find it here: https://github.com/enricoschroeder/CNTK.


Reply to this email directly or view it on GitHubhttps://github.com//issues/44#issuecomment-179386574.

@dongyu888
Copy link
Contributor

I think it will output the id of the class instead of the hard scores.

Thanks,

Dong Yu (俞栋)

From: Frank Seide [mailto:notifications@github.com]
Sent: Wednesday, February 3, 2016 11:53 AM
To: Microsoft/CNTK CNTK@noreply.github.com
Subject: Re: [CNTK] How to export the classification result to a txt file (#44)

Thanks! Could you please see how it differs from HardMaxNode?

From: enricoschroeder [mailto:notifications@github.com]
Sent: Wednesday, February 3, 2016 10:16
To: Microsoft/CNTK <CNTK@noreply.github.com mailto:CNTK@noreply.github.com >
Cc: Frank Seide <fseide@microsoft.com mailto:fseide@microsoft.com >
Subject: Re: [CNTK] How to export the classification result to a txt file (#44)

I've implemented a new node called "ClassPrediction(labels,output)" that outputs predictions and corresponding labels of a net. You can use the "write" action together with this node to dump predictions and labels to a text file.
I'll issue a pull request, in the mean time anyone interested can find it here: https://github.com/enricoschroeder/CNTK.


Reply to this email directly or view it on GitHubhttps://github.com//issues/44#issuecomment-179386574.


Reply to this email directly or view it on GitHub #44 (comment) . https://github.com/notifications/beacon/AL5Pc0XmgAxU8IQPmBzKosqQ9AVBWqd8ks5pglIFgaJpZM4HOFwI.gif

@enricoschroeder
Copy link

Yes, exactly. It outputs the class index of prediction and label. When you dump the node's output to a text file using the "writer" action you get a text file containing N rows (for N samples) and 2 columns, one for prediction and one for label.

@frankseide
Copy link
Contributor

Nice! This is our first actual code contribution, and I like how it fits nicely with the rest.

I do have two concerns though:

One is that Readers can interpret their inputs as “real” values or “category” values. The BinaryWriter can also write “category” values (not helpful to you because it’s binary). I think that is what you are achieving as well, but using a very different way.

The second is that so far, CNTK does not have a notion of index vectors (except in Readers, where category indices get converted into one-hot immediately). It has been a discussion whether we should introduce this explicit notion, and if we do, what impact will it have on other nodes/parts of the system. E.g. would we also need the opposite operation (an indexing node)? Shuffling operations? Back-propagation through an index vector? Our thinking so far is that it would hold off with introducing this new notion until we full understood and vetted the impact.

What I had in mind when I first saw your and similar Issues was somewhat different from your approach, to locate this conversion from one-hot to index in the writers:

· add an optional labelType parameter to the writer (SimpleWriter for now)

o it would expect the input to be a one-hot vector, and its output would be the index of the vector instead of the vector, like what ClassPredictionNode achieves, but confined to an I/O concept.

· add an optional labelMappingFile parameter that allows to map the indices back to a string representation

o (not needed in your specific case, but useful for language processing)

· classification would be done using HardmaxNode instead (which creates a one-hot representation), and in order to output both classification result and label, you would just pass both Hardmax and input-label nodes to the writer

o you would need to use one extra a ‘paste’ command on the command line in order to zip both values together next to each other

I wonder what are your thoughts on this? Would you be open to modifying your submission to implement it this way? (maybe without the labelMappingFile for now, we can add that once we have a test case)

Thanks so much for your contribution!

Frank

From: enricoschroeder [mailto:notifications@github.com]
Sent: Wednesday, February 3, 2016 13:39
To: Microsoft/CNTK CNTK@noreply.github.com
Cc: Frank Seide fseide@microsoft.com
Subject: Re: [CNTK] How to export the classification result to a txt file (#44)

Yes, exactly. It outputs the class index of prediction and label. When you dump the node's output to a text file using the "writer" action you get a text file containing N rows (for N samples) and 2 columns, one for prediction and one for label.


Reply to this email directly or view it on GitHubhttps://github.com//issues/44#issuecomment-179483667.

@frankseide
Copy link
Contributor

This is our first actual code contribution

Sorry, I should have said this more clearly: It's the first contribution of a new module--we are very happy about the various bug fixes we have also received!

@enricoschroeder
Copy link

Hi Frank and others, sorry for the late response (was on holidays and without internet for the last week). Your way of implementing this feature makes more sense. The way I did it was more like a little exercise to get acquainted with CNTKs architecture and especially how to implement missing functionality via new nodes. I could maybe spent some time in the next couple of weeks to implement it the way you proposed (or are you already working on it?).

@frankseide
Copy link
Contributor

Hi Enrico, thanks! I have already implemented it, as I needed the same for a new project. Once that lands in master, it will support interpreting outputs as category labels (it will pick the max), and you can also map it with a dictionary. For example, add this in the "write" section:

format = [ type = "category"; labelMappingFile = "echo -e 'zero\none\ntwo\n...' |" ]

You can also specify header, footer, and separator strings, which supports some simple syntaxes such as Matlab matrices.

I will notify this thread once it is in. (You can try it in branch fseide/s2s.)

@jhmeijer
Copy link

Hi,

Yes an easier export feature would be helpful.
I have been struggling four hours to get the simple network example to export anything else than log likelyhood values. It is still not clear to me how to export the predicted label values for each of the inputs of a test file.

Thanks,

JM

@jhmeijer
Copy link

Ah, I just saw that a correction has been added to the source code in Github that was not part of the binary that I downloaded.
It works now.

Thanks,

JM

@wolfma61 wolfma61 closed this as completed Jul 4, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants