Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Import data into custom fields from CSV (& other file formats) #4270

Closed
jensb opened this issue Mar 30, 2022 · 14 comments
Closed

Import data into custom fields from CSV (& other file formats) #4270

jensb opened this issue Mar 30, 2022 · 14 comments
Assignees
Milestone

Comments

@jensb
Copy link

jensb commented Mar 30, 2022

It is possible to create custom fields for transactions which is a very valuable feature.
I would like to import transaction data from another finance app (almost 18 years of it to be honest) and these transactions contain columns not just for date, value, payee, notes and category, but also for

  • tags,
  • comments (made by me in addition to the bank providing notes),
  • a (free text) transaction type, and
  • bank balance (provided by the bank once every day, in addition to all transactions - to double check for double or missing transactions).

I created custom fields for these values, but I can't import them from the CSV because the custom fields cannot be selected/assigned to columns in the import dialog.

What would need to be done to allow users to import data into the custom fields?

@vomikan
Copy link
Member

vomikan commented Mar 30, 2022

What would need to be done to allow users to import data into the custom fields?

You need to write a fairly complex code.

@jensb
Copy link
Author

jensb commented Mar 30, 2022

What would need to be done to allow users to import data into the custom fields?

You need to write a fairly complex code.

:-) well, ok. More details please. Which step is the most complex one?

  1. Get all existing custom (text) fields by name and add them to the column assignment selection list in the import view so users can assign them to their CSV columns as needed.
  2. When import starts and an (otherwise) unknown name is encountered as a target for a value, check if a custom field with this name exists. If so, insert the value there instead of into the main field.

Since I'd like to import my data as losslessly as possible and need to get it out of a completely different SQLite schema (iFinance app), I'll probably implement a custom data import in Ruby (with SQLite3 library). Let's see where this will get me. Split transactions are also going to be exciting.

@n-stein
Copy link
Contributor

n-stein commented Jun 7, 2023

I'm working on this for 1.6.5, estimate about 80% finished. Basic idea is pretty straightforward, all custom fields will appear in the the import dialog after the default list. They are shown using their field descriptions:

image
image
image

In the DB they are saved/restored using their FIELDIDs so that renaming the custom field doesn't affect saved presets. Identified by the prefix UDF: :

image

Currently I'm working on data validations, and I have a few questions that should be discussed.

One basic question is "If the value a user is attempting to import into a custom field fails data validation (perhaps they attempt to load a decimal into a custom date field, or they load a value into a "choice" field that isn't one of the choices), should the record be rejected (with some suitable error message) or should the record be loaded but the failed custom fields initialized to the default values?"

Lets look at validations type-by-type. Here is what I currently have implemented and a few questions which need to be considered. Looking for feedback: anything incorrect, anything missed, questions, etc.

  1. STRING
    • Currently Implemented: RegEx validation (if selected)
  2. INTEGER
    • Currently Implemented: First validates the value is an integer, then performs RegEx validation (if selected)
    • Questions: Should we also accept doubles and either round or truncate them to an integer?
  3. DECIMAL
    • Currently Implemented: First validates the value is an integer or double, rounds to the specified precision ("Digits Scale"), then performs RegEx validation (if selected).
    • Questions: I've rounded, but I suppose we could truncate to the correct precision?
  4. BOOLEAN
    • Currently Implemented: Validates the user has loaded the string "True" or "False" (case insensitive). Converts to "TRUE"/"FALSE" for storage in the DB.
    • Questions: Should we accept any forms other than the words "True" and "False"?
  5. DATE
    • Currently Implemented: Attempts to parse the value as a date using the format selected in the CSV Import dialog. Converts the value to ISO format for storage in the DB.
    • Questions: Is using the user-selected format sufficient? This means the custom field should have the same format as the "Date" field.
  6. TIME
    • Currently Implemented: Attempts to parse the value as a time. Converts the value to ISO format for storage in the DB.
  7. SINGLE CHOICE
    • Currently Implemented: Validates that the value matches one of the selections (case insensitive). Stores the matching selection in the DB (to normalize text case).
  8. MULTIPLE CHOICE
    • Currently Implemented: Delimit value by ; character, check each value against list of choice options. Store matching selections in DB delimited by ;.
    • Questions: What should happen if a subset of the delimited values are valid choices -- save only the valid ones, save nothing, reject the record?

@jensb
Copy link
Author

jensb commented Jun 7, 2023

This looks very good, thank you!
My 2 cents regarding the import validation:

  • About booleans: Be as flexible as possible. "t", "1", "Y", "true" and can also be true (and "f", "0", "N", "false" and "" can be false). If possible, make this list configurable.
  • About multiple choice: ";" sounds fine to me. If there is no way to fully accept a row, reject it and tell the user.

Generally:

  • Don't lose data until explicitly told to (ie. user confirms it), i.e. don't ignore values, round values or skip parts of values.
  • Assume a failure will make the user try again with (possibly) corrected data. In this case, no duplicates should be imported even if some column values are different (= corrected).
  • If anything is skipped, be as precise as possible about what was skipped. (e.g. list of line numbers in CSV files, or list of quoted full lines if the UI allows it)
  • Provide working example files which exploit all possible options so users can see what should work and how their CSV should look.

Looking at it this way, it would probably be best to skip columns which have unimportable values, and also skip values which already exist, if MMEX doesn't already do this. This way, you can import a CSV, notice that some columns were skipped, correct the values with an editor, and then reimport and MMEX will fill in the missing columns.

Duplicate detection could be based on date + amount + sender/payee, or some other combination of mandatory fields.

@n-stein
Copy link
Contributor

n-stein commented Jun 7, 2023

  • Assume a failure will make the user try again with (possibly) corrected data. In this case, no duplicates should be imported even if some column values are different (= corrected).

Looking at it this way, it would probably be best to skip columns which have unimportable values, and also skip values which already exist, if MMEX doesn't already do this. This way, you can import a CSV, notice that some columns were skipped, correct the values with an editor, and then reimport and MMEX will fill in the missing columns.

There is no duplicate checking in MMEX right now. We can't skip columns then go back and load data to those columns on existing transactions in the DB. Its all or nothing by row.

Duplicate detection could be based on date + amount + sender/payee, or some other combination of mandatory fields.

The only mandatory fields are Date and either Amount or Withdrawal and Deposit. This makes duplicate checking very difficult.

  • Don't lose data until explicitly told to (ie. user confirms it), i.e. don't ignore values, round values or skip parts of values.

MMEX already presents the user with an option to either accept the load or reject it. If the user clicks "Cancel" the load is discarded and no data is saved.

image

I have made the log describe the errors like this:

Line 1: OK, imported.
Line 2: Error:  Invalid Date. Value wrongbool for custom field 'A Bool Field' is not type Boolean.
Line 3: Error:  Value Blah 123 %& ⚙️ 🎺 does not match regex ^reg test.*$ for custom field 'A String Field'.
Line 4: Error:  Value 123.45 for custom field 'An Int Field' is not type Integer.
Line 5: Error:  Value 01-05 for custom field 'A Date Field' is not type Date.
Line 6: Error:  Value TIME 12:34:56 for custom field 'A Time field' is not type Time.
Line 7: Error:  Value badChoice for SingleChoice custom field 'A SingleChoice Field' is not a valid selection.
Line 8: Error:  Value notADouble for custom field 'A Dec-3 Field' is not type Decimal. Value notADouble does not match regex [0-9]{3}\.[0-9]{3} for custom field 'A Dec-3 Field'.
Line 9: Error:  Value Choice3 for MultiChoice custom field 'A MultiChoice Field' is not a valid selection. Value Choice5 for MultiChoice custom field 'A MultiChoice Field' is not a valid selection.

@tactilis
Copy link

tactilis commented Jun 8, 2023

@n-stein

MMEX already presents the user with an option to either accept the load or reject it. If the user clicks "Cancel" the load is discarded and no data is saved.

And that is exactly how it should stay.

The onus is on the user to correct any errors that MMEX reports in their source data file and then re-run the import.

It's much safer that way and sidesteps a lot of complexity.

@tactilis
Copy link

tactilis commented Jun 8, 2023

@n-stein

In the DB they are saved/restored using their FIELDIDs so that renaming the custom field doesn't affect saved presets. Identified by the prefix UDF

Is it therefore the case that if a user chooses, perhaps unwisely, to create a custom field with the same name as a standard field ('Payee' for example), the import will still work correctly? There'll just be 2 fields named Payee in the list?

@tactilis
Copy link

tactilis commented Jun 8, 2023

@n-stein

Lets look at validations type-by-type. Here is what I currently have implemented and a few questions which need to be considered. Looking for feedback: anything incorrect, anything missed, questions, etc.

STRING
   Currently Implemented: RegEx validation (if selected)

All good.

INTEGER
   Currently Implemented: First validates the value is an integer, then performs RegEx validation (if selected)
   Questions: Should we also accept doubles and either round or truncate them to an integer?

My gut feeling is that the user should format their data appropriately before importing it. Importing a floating point value into an integer field loses precision and is probably an error. On that basis, anything that is not an integer, should be rejected.

Someone might put forward the counter argument that an Integer custom field on the Edit Transaction dialog does permit the entry of a floating point value, for example, 12.49 (rounded to 12) and 12.5 (rounded to 13). However, I would say this is just a side effect of the fact that the field permits the entry of arithmetic expressions, e.g 5 + 7 + 0.3 = 12.3 (rounded to 12). There is no reason to support any of this for import.

Finally, I'd say, that if the user really wants floating point values in the CSV file to be rounded to the nearest integer value when imported, then they should specify a Decimal custom field with the Decimal Places attribute set to 0.

DECIMAL
    Currently Implemented: First validates the value is an integer or double, rounds to the specified precision ("Digits Scale"), then performs RegEx validation (if selected).
    Questions: I've rounded, but I suppose we could truncate to the correct precision?

You should follow the rounding behaviour of the Decimal custom field on the Edit Transaction dialog.

BOOLEAN
    Currently Implemented: Validates the user has loaded the string "True" or "False" (case insensitive). Converts to "TRUE"/"FALSE" for storage in the DB.
    Questions: Should we accept any forms other than the words "True" and "False"?

Again, I think the user should be responsible for formatting their data appropriately.

However, I would have no reason to object if you chose to implement @jensb's suggestions of "t", "1", "Y", "true" = True, "f", "0", "N", "false" and "" = False (all case insensitive) provided that there was no expectation that the user would get anything other than "True" or "False" back if they subsequently exported the data from MMEX.

BTW, I don't think that the list of aliases for True/False should be configurable (as @jensb suggests). I think this is OTT unless people can present real-world use cases for such a feature where it would be unreasonable for them to have to pre-format the data in their CSV file.

DATE
    Currently Implemented: Attempts to parse the value as a date using the format selected in the CSV Import dialog. Converts the value to ISO format for storage in the DB.
    Questions: Is using the user-selected format sufficient? This means the custom field should have the same format as the "Date" field.

I think that unless people can present real-world use cases where it would be unreasonable for them to have to pre-format the data in their CSV file, requiring the custom date field to have the same format as the selected Date format is sufficient. It keeps the Import and Export UI simple.

TIME
    Currently Implemented: Attempts to parse the value as a time. Converts the value to ISO format for storage in the DB.

Do you require the full hh:mm:ss (i.e. the same format as displayed in a Time field) or are hh, hh:mm, or hh:mm:ss all accepted?

SINGLE CHOICE
    Currently Implemented: Validates that the value matches one of the selections (case insensitive). Stores the matching selection in the DB (to normalize text case).

All good.

MULTIPLE CHOICE
    Currently Implemented: Delimit value by ; character, check each value against list of choice options. Store matching selections in DB delimited by ;.
    Questions: What should happen if a subset of the delimited values are valid choices -- save only the valid ones, save nothing, reject the record?

It's an error in the data. Reject the record (and therefore the Import) so that the user can fix their data.

@n-stein
Copy link
Contributor

n-stein commented Jun 8, 2023

Is it therefore the case that if a user chooses, perhaps unwisely, to create a custom field with the same name as a standard field ('Payee' for example), the import will still work correctly? There'll just be 2 fields named Payee in the list?

Precisely. Users can name them whatever they want. All backend processes use IDs rather than text descriptions to handle the data to avoid any conflicts.

On that basis, anything that is not an integer, should be rejected.

Will leave the logic as implemented then.

You should follow the rounding behaviour of the Decimal custom field on the Edit Transaction dialog.

I can use the same format specifier "%.*f" so behavior is identical.

However, I would have no reason to object if you chose to implement @jensb's suggestions of "t", "1", "Y", "true" = True, "f", "0", "N", "false" and "" = False (all case insensitive)

I have added these additional values. They are not user configurable.

Do you require the full hh:mm:ss (i.e. the same format as displayed in a Time field) or are hh, hh:mm, or hh:mm:ss all accepted?

The wxDateTime::ParseTime function is pretty lenient. It will take hh, hh:mm (with optional am/pm), or hh:mm:ss.

@n-stein
Copy link
Contributor

n-stein commented Jun 8, 2023

If a custom field is selected for import and the custom field has a default value configured, should we populate the default value if a value is not provided by the file, or would users expect the field to stay empty?

I suppose the default value is more for convenience of entering new transactions within MMEX. It is not intended to automatically provide a value for every transaction, the user still has to click the check box to associate the default value with the transaction. Therefore I am inclined to leave them blank if they are imported with no value.

@n-stein
Copy link
Contributor

n-stein commented Jun 8, 2023

This is pretty much done, so once 1.6.4 is released I'll open a PR so others can test it out.

@n-stein
Copy link
Contributor

n-stein commented Jun 21, 2023

Import & export of custom field data for both CSV and XML formats is now available for testing.

vomikan added a commit that referenced this issue Jun 21, 2023
feat(#4270): custom field import & export for CSV/XML
@n-stein
Copy link
Contributor

n-stein commented Dec 23, 2023

@jensb Are you satisfied with the functionality? Can we close this issue?

@jensb
Copy link
Author

jensb commented Dec 23, 2023

I'd have to check some details with my data, but from what I've read, this is even more than I was hoping for.
So yes.
If I find anything, I'll reopen. :-)

Thanks!

@whalley whalley removed this from the v1.7.1 milestone Dec 23, 2023
@whalley whalley added this to the v1.7.0 milestone Dec 23, 2023
@whalley whalley closed this as completed Dec 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants