Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DelimitedLineTokenizer always trims the input data [BATCH-1696] #1890

Closed
spring-issuemaster opened this issue Feb 16, 2011 · 3 comments
Closed

DelimitedLineTokenizer always trims the input data [BATCH-1696] #1890

spring-issuemaster opened this issue Feb 16, 2011 · 3 comments

Comments

@spring-issuemaster
Copy link
Collaborator

@spring-issuemaster spring-issuemaster commented Feb 16, 2011

Arun Yogesh opened BATCH-1696 and commented

The Delimited Line tokenizer seems to trim all the tokens by default, there is no way to get non trimmed raw data without implementing a custom tokenizer.

I guess this bug arose when fixing the JIRA BATCH-285.
The method 'maybeStripQuotes' trims the data blindly and returns it to 'doTokenize'


Affects: 2.1.5

Referenced from: commits 57b0cb7

@spring-issuemaster

This comment has been minimized.

Copy link
Collaborator Author

@spring-issuemaster spring-issuemaster commented Feb 20, 2011

Dave Syer commented

The way to get non-trimmed raw data is to quote it. That's pretty normal if you get the file from a spreadsheet for instance.

Moved from Bug to New Feature (since we'd have to keep the old behaviour anyway if we added an option not to trim unquited strings).

@spring-issuemaster

This comment has been minimized.

Copy link
Collaborator Author

@spring-issuemaster spring-issuemaster commented Feb 28, 2011

Arun Yogesh commented

I understand why you would want to put it as a feature request, but please do let me explain my line of reasoning.

1st point is we don't always have the control over the input files, especially in enterprise applications where the file may come from independent source systems, so i expected the tokenizer to separate out the tokens as is, without by itself modifying the data in any way(which is what is expected from a tokenizer is it not?).

2nd point is in the mapper, we already have the logic to get trimmed data or the raw data (FieldSet.readString(0) for instance returns trimmed data and readRawString() returns the actual value) by having the tokenizer itself trim the input data we break this functionality, since the input to the FieldSet itself is trimmed.

These were the things i felt when i noticed this behavior.

@spring-issuemaster

This comment has been minimized.

Copy link
Collaborator Author

@spring-issuemaster spring-issuemaster commented Mar 15, 2011

Dave Syer commented

Seems reasonable on reflection. DelimitedLineTokenizer now only strips whitespace from quoted fields.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
1 participant
You can’t perform that action at this time.