Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Line tokenizer: Invalid quote char [BATCH-2820] #794

Closed
spring-issuemaster opened this issue May 5, 2019 · 1 comment
Closed

Line tokenizer: Invalid quote char [BATCH-2820] #794

spring-issuemaster opened this issue May 5, 2019 · 1 comment

Comments

@spring-issuemaster
Copy link
Collaborator

@spring-issuemaster spring-issuemaster commented May 5, 2019

andre-castro-garcia opened BATCH-2820 and commented

I'm using a FlatFileItemReaderBuilder to transfer some CSV files (in Brazilian Portuguese) to a MongoDB Database. In one of my records I have a line formatted as an example bellow:

03.730.263/0001-44;INSTITUCIONAL "T' FUNDO DE INVESTIMENTO EM AÇÕES;2005-03-22;2000-03-23..........

Because the only one quote char (") in the line, the tokenizer will not create all tokens. It occurs because he tries to find another quote char e there's no another char in the line.

The file that I'm trying to load: http://dados.cvm.gov.br/dados/FI/CAD/DADOS/inf_cadastral_fi_20190503.csv

Examples:

  1. Works on the rest of the lines:

!image-2019-05-05-10-27-27-189.png!

  1. Not working on that line:

!image-2019-05-05-10-39-37-889.png!

 


Attachments:

Issue Links:

  • BATCH-2581 DelimitedLineTokenizer always interprets quotes
    ("duplicates")
@spring-issuemaster
Copy link
Collaborator Author

@spring-issuemaster spring-issuemaster commented May 5, 2019

andre-castro-garcia commented

In my case, there is an error in the csv file (i.e. only one char "). I tried to create more unit tests in the file +DelimitedLineTokenizerTests.java+ but this situation is kind a paradox, see below:

  1. Inside a quoted sentence, you can use the delimited char like commas;
  2. If you have a delimited char inside a quoted sentence, it is only one more char, not necessarily an end of the sentence;

In my case, I put a invalid break line as a quoted char, that works me.

val delimiter = DelimitedLineTokenizer(";")
delimiter.setQuoteCharacter('\n')

:)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
1 participant
You can’t perform that action at this time.