Import data into custom fields from CSV (& other file formats) #4270

jensb · 2022-03-30T20:48:41Z

It is possible to create custom fields for transactions which is a very valuable feature.
I would like to import transaction data from another finance app (almost 18 years of it to be honest) and these transactions contain columns not just for date, value, payee, notes and category, but also for

tags,
comments (made by me in addition to the bank providing notes),
a (free text) transaction type, and
bank balance (provided by the bank once every day, in addition to all transactions - to double check for double or missing transactions).

I created custom fields for these values, but I can't import them from the CSV because the custom fields cannot be selected/assigned to columns in the import dialog.

What would need to be done to allow users to import data into the custom fields?

vomikan · 2022-03-30T21:35:21Z

What would need to be done to allow users to import data into the custom fields?

You need to write a fairly complex code.

jensb · 2022-03-30T21:59:34Z

What would need to be done to allow users to import data into the custom fields?

You need to write a fairly complex code.

:-) well, ok. More details please. Which step is the most complex one?

Get all existing custom (text) fields by name and add them to the column assignment selection list in the import view so users can assign them to their CSV columns as needed.
When import starts and an (otherwise) unknown name is encountered as a target for a value, check if a custom field with this name exists. If so, insert the value there instead of into the main field.

Since I'd like to import my data as losslessly as possible and need to get it out of a completely different SQLite schema (iFinance app), I'll probably implement a custom data import in Ruby (with SQLite3 library). Let's see where this will get me. Split transactions are also going to be exciting.

n-stein · 2023-06-07T15:15:51Z

I'm working on this for 1.6.5, estimate about 80% finished. Basic idea is pretty straightforward, all custom fields will appear in the the import dialog after the default list. They are shown using their field descriptions:

In the DB they are saved/restored using their FIELDIDs so that renaming the custom field doesn't affect saved presets. Identified by the prefix UDF: :

Currently I'm working on data validations, and I have a few questions that should be discussed.

One basic question is "If the value a user is attempting to import into a custom field fails data validation (perhaps they attempt to load a decimal into a custom date field, or they load a value into a "choice" field that isn't one of the choices), should the record be rejected (with some suitable error message) or should the record be loaded but the failed custom fields initialized to the default values?"

Lets look at validations type-by-type. Here is what I currently have implemented and a few questions which need to be considered. Looking for feedback: anything incorrect, anything missed, questions, etc.

STRING
- Currently Implemented: RegEx validation (if selected)
INTEGER
- Currently Implemented: First validates the value is an integer, then performs RegEx validation (if selected)
- Questions: Should we also accept doubles and either round or truncate them to an integer?
DECIMAL
- Currently Implemented: First validates the value is an integer or double, rounds to the specified precision ("Digits Scale"), then performs RegEx validation (if selected).
- Questions: I've rounded, but I suppose we could truncate to the correct precision?
BOOLEAN
- Currently Implemented: Validates the user has loaded the string "True" or "False" (case insensitive). Converts to "TRUE"/"FALSE" for storage in the DB.
- Questions: Should we accept any forms other than the words "True" and "False"?
DATE
- Currently Implemented: Attempts to parse the value as a date using the format selected in the CSV Import dialog. Converts the value to ISO format for storage in the DB.
- Questions: Is using the user-selected format sufficient? This means the custom field should have the same format as the "Date" field.
TIME
- Currently Implemented: Attempts to parse the value as a time. Converts the value to ISO format for storage in the DB.
SINGLE CHOICE
- Currently Implemented: Validates that the value matches one of the selections (case insensitive). Stores the matching selection in the DB (to normalize text case).
MULTIPLE CHOICE
- Currently Implemented: Delimit value by ; character, check each value against list of choice options. Store matching selections in DB delimited by ;.
- Questions: What should happen if a subset of the delimited values are valid choices -- save only the valid ones, save nothing, reject the record?

jensb · 2023-06-07T19:20:45Z

This looks very good, thank you!
My 2 cents regarding the import validation:

About booleans: Be as flexible as possible. "t", "1", "Y", "true" and can also be true (and "f", "0", "N", "false" and "" can be false). If possible, make this list configurable.
About multiple choice: ";" sounds fine to me. If there is no way to fully accept a row, reject it and tell the user.

Generally:

Don't lose data until explicitly told to (ie. user confirms it), i.e. don't ignore values, round values or skip parts of values.
Assume a failure will make the user try again with (possibly) corrected data. In this case, no duplicates should be imported even if some column values are different (= corrected).
If anything is skipped, be as precise as possible about what was skipped. (e.g. list of line numbers in CSV files, or list of quoted full lines if the UI allows it)
Provide working example files which exploit all possible options so users can see what should work and how their CSV should look.

Looking at it this way, it would probably be best to skip columns which have unimportable values, and also skip values which already exist, if MMEX doesn't already do this. This way, you can import a CSV, notice that some columns were skipped, correct the values with an editor, and then reimport and MMEX will fill in the missing columns.

Duplicate detection could be based on date + amount + sender/payee, or some other combination of mandatory fields.

n-stein · 2023-06-07T22:45:48Z

Assume a failure will make the user try again with (possibly) corrected data. In this case, no duplicates should be imported even if some column values are different (= corrected).

Looking at it this way, it would probably be best to skip columns which have unimportable values, and also skip values which already exist, if MMEX doesn't already do this. This way, you can import a CSV, notice that some columns were skipped, correct the values with an editor, and then reimport and MMEX will fill in the missing columns.

There is no duplicate checking in MMEX right now. We can't skip columns then go back and load data to those columns on existing transactions in the DB. Its all or nothing by row.

Duplicate detection could be based on date + amount + sender/payee, or some other combination of mandatory fields.

The only mandatory fields are Date and either Amount or Withdrawal and Deposit. This makes duplicate checking very difficult.

Don't lose data until explicitly told to (ie. user confirms it), i.e. don't ignore values, round values or skip parts of values.

MMEX already presents the user with an option to either accept the load or reject it. If the user clicks "Cancel" the load is discarded and no data is saved.

I have made the log describe the errors like this:

Line 1: OK, imported.
Line 2: Error:  Invalid Date. Value wrongbool for custom field 'A Bool Field' is not type Boolean.
Line 3: Error:  Value Blah 123 %& ⚙️ 🎺 does not match regex ^reg test.*$ for custom field 'A String Field'.
Line 4: Error:  Value 123.45 for custom field 'An Int Field' is not type Integer.
Line 5: Error:  Value 01-05 for custom field 'A Date Field' is not type Date.
Line 6: Error:  Value TIME 12:34:56 for custom field 'A Time field' is not type Time.
Line 7: Error:  Value badChoice for SingleChoice custom field 'A SingleChoice Field' is not a valid selection.
Line 8: Error:  Value notADouble for custom field 'A Dec-3 Field' is not type Decimal. Value notADouble does not match regex [0-9]{3}\.[0-9]{3} for custom field 'A Dec-3 Field'.
Line 9: Error:  Value Choice3 for MultiChoice custom field 'A MultiChoice Field' is not a valid selection. Value Choice5 for MultiChoice custom field 'A MultiChoice Field' is not a valid selection.

tactilis · 2023-06-08T06:59:15Z

@n-stein

MMEX already presents the user with an option to either accept the load or reject it. If the user clicks "Cancel" the load is discarded and no data is saved.

And that is exactly how it should stay.

The onus is on the user to correct any errors that MMEX reports in their source data file and then re-run the import.

It's much safer that way and sidesteps a lot of complexity.

tactilis · 2023-06-08T20:38:09Z

@n-stein

In the DB they are saved/restored using their FIELDIDs so that renaming the custom field doesn't affect saved presets. Identified by the prefix UDF

Is it therefore the case that if a user chooses, perhaps unwisely, to create a custom field with the same name as a standard field ('Payee' for example), the import will still work correctly? There'll just be 2 fields named Payee in the list?

tactilis · 2023-06-08T21:40:14Z

@n-stein

Lets look at validations type-by-type. Here is what I currently have implemented and a few questions which need to be considered. Looking for feedback: anything incorrect, anything missed, questions, etc.

STRING
   Currently Implemented: RegEx validation (if selected)

All good.

INTEGER
   Currently Implemented: First validates the value is an integer, then performs RegEx validation (if selected)
   Questions: Should we also accept doubles and either round or truncate them to an integer?

My gut feeling is that the user should format their data appropriately before importing it. Importing a floating point value into an integer field loses precision and is probably an error. On that basis, anything that is not an integer, should be rejected.

Someone might put forward the counter argument that an Integer custom field on the Edit Transaction dialog does permit the entry of a floating point value, for example, 12.49 (rounded to 12) and 12.5 (rounded to 13). However, I would say this is just a side effect of the fact that the field permits the entry of arithmetic expressions, e.g 5 + 7 + 0.3 = 12.3 (rounded to 12). There is no reason to support any of this for import.

Finally, I'd say, that if the user really wants floating point values in the CSV file to be rounded to the nearest integer value when imported, then they should specify a Decimal custom field with the Decimal Places attribute set to 0.

DECIMAL
    Currently Implemented: First validates the value is an integer or double, rounds to the specified precision ("Digits Scale"), then performs RegEx validation (if selected).
    Questions: I've rounded, but I suppose we could truncate to the correct precision?

You should follow the rounding behaviour of the Decimal custom field on the Edit Transaction dialog.

BOOLEAN
    Currently Implemented: Validates the user has loaded the string "True" or "False" (case insensitive). Converts to "TRUE"/"FALSE" for storage in the DB.
    Questions: Should we accept any forms other than the words "True" and "False"?

Again, I think the user should be responsible for formatting their data appropriately.

However, I would have no reason to object if you chose to implement @jensb's suggestions of "t", "1", "Y", "true" = True, "f", "0", "N", "false" and "" = False (all case insensitive) provided that there was no expectation that the user would get anything other than "True" or "False" back if they subsequently exported the data from MMEX.

BTW, I don't think that the list of aliases for True/False should be configurable (as @jensb suggests). I think this is OTT unless people can present real-world use cases for such a feature where it would be unreasonable for them to have to pre-format the data in their CSV file.

DATE
    Currently Implemented: Attempts to parse the value as a date using the format selected in the CSV Import dialog. Converts the value to ISO format for storage in the DB.
    Questions: Is using the user-selected format sufficient? This means the custom field should have the same format as the "Date" field.

I think that unless people can present real-world use cases where it would be unreasonable for them to have to pre-format the data in their CSV file, requiring the custom date field to have the same format as the selected Date format is sufficient. It keeps the Import and Export UI simple.

TIME
    Currently Implemented: Attempts to parse the value as a time. Converts the value to ISO format for storage in the DB.

Do you require the full hh:mm:ss (i.e. the same format as displayed in a Time field) or are hh, hh:mm, or hh:mm:ss all accepted?

SINGLE CHOICE
    Currently Implemented: Validates that the value matches one of the selections (case insensitive). Stores the matching selection in the DB (to normalize text case).

All good.

MULTIPLE CHOICE
    Currently Implemented: Delimit value by ; character, check each value against list of choice options. Store matching selections in DB delimited by ;.
    Questions: What should happen if a subset of the delimited values are valid choices -- save only the valid ones, save nothing, reject the record?

It's an error in the data. Reject the record (and therefore the Import) so that the user can fix their data.

n-stein · 2023-06-08T22:40:11Z

Is it therefore the case that if a user chooses, perhaps unwisely, to create a custom field with the same name as a standard field ('Payee' for example), the import will still work correctly? There'll just be 2 fields named Payee in the list?

Precisely. Users can name them whatever they want. All backend processes use IDs rather than text descriptions to handle the data to avoid any conflicts.

On that basis, anything that is not an integer, should be rejected.

Will leave the logic as implemented then.

You should follow the rounding behaviour of the Decimal custom field on the Edit Transaction dialog.

I can use the same format specifier "%.*f" so behavior is identical.

However, I would have no reason to object if you chose to implement @jensb's suggestions of "t", "1", "Y", "true" = True, "f", "0", "N", "false" and "" = False (all case insensitive)

I have added these additional values. They are not user configurable.

Do you require the full hh:mm:ss (i.e. the same format as displayed in a Time field) or are hh, hh:mm, or hh:mm:ss all accepted?

The wxDateTime::ParseTime function is pretty lenient. It will take hh, hh:mm (with optional am/pm), or hh:mm:ss.

n-stein · 2023-06-08T22:43:36Z

If a custom field is selected for import and the custom field has a default value configured, should we populate the default value if a value is not provided by the file, or would users expect the field to stay empty?

I suppose the default value is more for convenience of entering new transactions within MMEX. It is not intended to automatically provide a value for every transaction, the user still has to click the check box to associate the default value with the transaction. Therefore I am inclined to leave them blank if they are imported with no value.

n-stein · 2023-06-08T22:44:48Z

This is pretty much done, so once 1.6.4 is released I'll open a PR so others can test it out.

n-stein · 2023-06-21T14:24:22Z

Import & export of custom field data for both CSV and XML formats is now available for testing.

feat(#4270): custom field import & export for CSV/XML

n-stein · 2023-12-23T16:23:05Z

@jensb Are you satisfied with the functionality? Can we close this issue?

jensb · 2023-12-23T20:47:33Z

I'd have to check some details with my data, but from what I've read, this is even more than I was hoping for.
So yes.
If I find anything, I'll reopen. :-)

Thanks!

vomikan added enhancement imp/exp labels Mar 30, 2022

vomikan modified the milestones: v1.6.0, v1.5.17 May 31, 2022

vomikan modified the milestones: v1.6.2, v1.6.3 Nov 1, 2022

vomikan modified the milestones: v1.6.3, v1.6.4 Dec 23, 2022

tactilis mentioned this issue Apr 7, 2023

The sorting on the "Date" column #5800

Open

n-stein mentioned this issue May 16, 2023

Import from CSV with more than one positive income source/deposit #5907

Closed

n-stein mentioned this issue Jun 14, 2023

Drag and Drop to import file in correct account #3684

Closed

n-stein added a commit to n-stein/moneymanagerex that referenced this issue Jun 21, 2023

feat(moneymanagerex#4270): custom field import & export for CSV/XML

8eab2de

vomikan added a commit that referenced this issue Jun 21, 2023

Merge pull request #6037 from n-stein/feat(#4270)

eec2bd3

feat(#4270): custom field import & export for CSV/XML

n-stein mentioned this issue Dec 9, 2023

Import with custom columns #5401

Closed

whalley removed this from the v1.7.1 milestone Dec 23, 2023

whalley added this to the v1.7.0 milestone Dec 23, 2023

whalley assigned n-stein Dec 23, 2023

whalley closed this as completed Dec 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Import data into custom fields from CSV (& other file formats) #4270

Import data into custom fields from CSV (& other file formats) #4270

jensb commented Mar 30, 2022

vomikan commented Mar 30, 2022

jensb commented Mar 30, 2022

n-stein commented Jun 7, 2023 •

edited

jensb commented Jun 7, 2023 •

edited

n-stein commented Jun 7, 2023

tactilis commented Jun 8, 2023

tactilis commented Jun 8, 2023

tactilis commented Jun 8, 2023

n-stein commented Jun 8, 2023

n-stein commented Jun 8, 2023 •

edited

n-stein commented Jun 8, 2023

n-stein commented Jun 21, 2023

n-stein commented Dec 23, 2023

jensb commented Dec 23, 2023

Import data into custom fields from CSV (& other file formats) #4270

Import data into custom fields from CSV (& other file formats) #4270

Comments

jensb commented Mar 30, 2022

vomikan commented Mar 30, 2022

jensb commented Mar 30, 2022

n-stein commented Jun 7, 2023 • edited

jensb commented Jun 7, 2023 • edited

n-stein commented Jun 7, 2023

tactilis commented Jun 8, 2023

tactilis commented Jun 8, 2023

tactilis commented Jun 8, 2023

n-stein commented Jun 8, 2023

n-stein commented Jun 8, 2023 • edited

n-stein commented Jun 8, 2023

n-stein commented Jun 21, 2023

n-stein commented Dec 23, 2023

jensb commented Dec 23, 2023

n-stein commented Jun 7, 2023 •

edited

jensb commented Jun 7, 2023 •

edited

n-stein commented Jun 8, 2023 •

edited