Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reading CSV #57

Closed
parvanesh opened this issue Jan 29, 2016 · 6 comments
Closed

reading CSV #57

parvanesh opened this issue Jan 29, 2016 · 6 comments

Comments

@parvanesh
Copy link

Hello All,

  • For reading CSV files, Should I use UCIFastReader?
  • If my test dataset, doesnt have label, what kind of configuration should i apply?should i remove the label for test config? I didi it and the compiler stopped working..
@dongyu888
Copy link
Contributor

Yes, you should use UCIFastReader.

For testing you do need labels to compute the error rates. If what you want is to output the value of an output node so that you can compare it with labels you should use the “write” command instead of “eval”, in which case I think the label column does not need to be there.

Thanks,

Dong Yu (俞栋)

From: pari [mailto:notifications@github.com]
Sent: Friday, January 29, 2016 5:32 AM
To: Microsoft/CNTK CNTK@noreply.github.com
Subject: [CNTK] reading CSV (#57)

Hello All,

  • For reading CSV files, Should I use UCIFastReader?
  • If my test dataset, doesnt have label, what kind of configuration should i apply?should i remove the label for test config? I didi it and the compiler stopped working


Reply to this email directly or view it on GitHub #57 . https://github.com/notifications/beacon/AL5Pc-1bxBYLkik9xtQtVlPDhM8Ym22Wks5pe2FKgaJpZM4HPM4T.gif

@parvanesh
Copy link
Author

Thanks for the response...For the CSV reading, when I provide UCIFastReader as the reader with a comma separated file, compiler gives an error as "EXCEPTION occurred: label found in data not specified in label mapping file:XXXXXX", XXXX is the first row of my CSV file.
When I replace the comma seperated file with a tab separated file, i works. Is it a problem or I need to specify other parameter? I use it as:

Simple2_Demo_Test = [
action = "test"
# Parameter values for the reader
reader = [
readerType = "UCIFastReader"
file = "$DataDir$/iris.txt"
features = [
dim = 4
start = 0
]
labels = [
start = 4
dim = 1
labelDim = 3
labelMappingFile = "$DataDir$/SimpleMapping2.txt"
]
]
]

@dongyu888
Copy link
Contributor

Currently we do not support comma delimited files, but it would be very simple to do so. Just look for the state transition tables in UCIParser.cpp:

SetState(',', Whitespace, Whitespace);
SetState(' ', Whitespace, Whitespace);
SetState('\t', Whitespace, Whitespace);
SetState('\r', Whitespace, Whitespace);

There are several places ‘ ‘, ‘\t’, and ‘\r’ appear in the tables (near the top of the file). Just add a similar ‘,’ entry and it should all work.

Note that this would not handle empty fields (two commas in a row, with or without whitespace in-between), but if that is not an issue, then it would just work.
Also note that any commas would also not be allowed in labels or other strings (i.e. as a decimal point in some languages)

We will add "," and ";" as white spaces as a fix.

@parvanesh
Copy link
Author

Thnx..I wonder if there is a straight method to do it instead of converting..so it will be supported..Thnx!

@dongyu888
Copy link
Contributor

the change is now checked in. you can specify

customDelimiter="," 

inside the UCIFastReader block in the config file. Please note that it requires values between commas. In other words, it would not handle empty fields (two commas in a row, with or without whitespace in-between).

@parvanesh
Copy link
Author

Thnx!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants