Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trim quoted values when importing csv? #1051

Closed
benwhalley opened this issue Jun 13, 2019 · 11 comments
Closed

Trim quoted values when importing csv? #1051

benwhalley opened this issue Jun 13, 2019 · 11 comments
Labels
A-WISH Some kind of improvement request, hare-brained proposal, or plea. csv The csv file format, csv output format, or generally CSV-related.

Comments

@benwhalley
Copy link

I am importing some CSV from Amex and noticed that their csv amount field is double-quoted and includes a space in front of the value. So, for example: " -140.00".

I noticed that this causes an error when trying to invert the amount in the csv rules file. So if I add the line: amount -%amount GBP it then throws and error on import.

In full, the output is:

the CSV record is:       "14/04/2019", "Reference: XXX", " 6.99", "PAYPAL *ELIDA76 LTD 4029357733", " Process Date 14/04/2019", ""
the amount rule is:      -%amount GBP
the currency rule is:    unspecified
the default-currency is: unspecified
the parse error is:      1:2:
  |
1 | - 6.99 GBP
  |  ^
unexpected space
expecting amount

Deleting the space in the input csv file removes the error.

Is there a way to force hledger to trim incoming values?

@benwhalley benwhalley added the A-BUG Something wrong, confusing or sub-standard in the software, docs, or user experience. label Jun 13, 2019
@simonmichael simonmichael added A-WISH Some kind of improvement request, hare-brained proposal, or plea. and removed A-BUG Something wrong, confusing or sub-standard in the software, docs, or user experience. labels Jun 13, 2019
@simonmichael
Copy link
Owner

It does some of that already, but I guess it would be good to do a bit more here. Thanks for the example. Would you be interested in having a look at the Hledger.Read.CsvReader code ? The strip function would do the job.

@simonmichael simonmichael added csv The csv file format, csv output format, or generally CSV-related. easy? labels Jun 13, 2019
@benwhalley
Copy link
Author

Hmmm. Haskell a bit of a mind bender. After fighting with types and fmap for a bit have lazily used this for the moment:

cat amex.csv | sed 's/,\" /,"/g' |  hledger import --rules-file rules.txt  -

@simonmichael
Copy link
Owner

simonmichael commented Jun 14, 2019

Cool. It definitely is at first. If you feel like taking another whack at it, help is available in #hledger. I'll recommend stack build hledger once, then make ghcid and make ghci in side windows, then try making a trivial change in CsvReader.hs and see ghcid update, then try reloading and running in ghci (:r <ENTER>, :main -f examples/sample.csv print <ENTER>), then try to see where to put a strip call. Though, maybe you tried that and found the failure is in the cassava CSV parsing lib, which would be a bit more difficult. I suppose then our options would be 1. preprocess the text like you've done above, but in parseCsv, and/or 2. see if we can get this supported in cassava.

@simonmichael
Copy link
Owner

Though, thinking about it a little more.. space is significant in CSV, so cassava is right to return it to us. The problem is our field assignment combining a minus sign and a CSV value, which is a fragile way to generate a parseable number. Interpolating a CSV value happens in renderTemplate. If we called strip there, this should work ? However maybe it's not appropriate to strip every interpolated CSV field this way..

And separately, I have become confused. My local repro has been:

2000-01-01,a, " 1"
fields date, description, amount
$ hledger -f a.csv print   # using latest hledger master, built with cassava-0.5.1.0
hledger: user error (/Users/simon/src/PLAINTEXTACCOUNTING/hledger/a.csv:1:15:
  |
1 | 2000-01-01,a, " 1"
  |               ^
unexpected '"'
expecting ',', end of input, end of line, or unescaped character
)

which has no interpolation rule, and fails earlier than yours, while cassava is parsing the CSV. I don't understand how yours gets as far as it does.

@simonmichael
Copy link
Owner

Oh yea, I see my error. , "1" causes cassava to fail. ," 1" causes your hledger rule to fail. I must be mixing up two issues.

@simonmichael
Copy link
Owner

Yes, I was thinking of #1037. Back to your issue: I confirm that adding strip like so fixes it:

    replace ('%':pat) = maybe pat (\i -> strip $ atDef "" record (i-1)) mindex

Now, is it acceptably intuitive/predictable/principled that %-interpolated values would always have whitespace stripped..

@simonmichael
Copy link
Owner

simonmichael commented Jun 14, 2019

Actually there's a separate code path when assigning an amount field (amount/amount-in/amount-out). So we could strip whitespace only in that case, and leave it as-is otherwise. Which is best ?

@simonmichael
Copy link
Owner

Sorry for the stream-of-thought. Rather, it would be:

"Any CSV value interpolated in a amount, amount-in or amount-out field assignment will have any outer whitespace removed. In all other field assignments, outer whitespace in CSV values is left intact."

Which sounds a bit messy.

simonmichael added a commit that referenced this issue Jun 14, 2019
This removes a potential snag in amount field assignments, and
hopefully is harmless and acceptable otherwise.
@simonmichael
Copy link
Owner

I just went ahead and committed "always strip outer whitespace when interpolating". This removes the snag you hit and will hopefully be harmless otherwise. I'll mention it on list.

@benwhalley
Copy link
Author

Hey - thanks. And for the tip on ghci too - that makes debugging etc much easier! I'll try and have a go if something else comes up. I've been interested in Haskell for a while.

@simonmichael
Copy link
Owner

For the record: RFC 4180 2.4 says "Spaces are considered part of a field and should not be ignored", so we're going slightly against the CSV spec here. Perhaps better if we would strip spaces only from amounts, but we're not aware of CSV fields' meanings when interpolating.

simonmichael added a commit that referenced this issue Jul 8, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-WISH Some kind of improvement request, hare-brained proposal, or plea. csv The csv file format, csv output format, or generally CSV-related.
Projects
None yet
Development

No branches or pull requests

2 participants