Semi-automatic import of .txt-Files #239

MadenMorris · 2022-06-21T19:05:19Z

Hello,
we have noticed two problems, when importing documents (raw data) in DNA 3.0.8. We used the Manual for DNA 2.0 Beta 25 and will use the same definitions in the following. We are not sure if these are just settings, we need to adjust first or if this are "bugs" in the software:

1. Disappearing Regex syntax

When using the semi-automatic way to upload documents, DNA 3.0.8 does not seem to come with a preset Regex configuration (2.0 does).

So, we used the syntax from DNA 2.0 Beta 25 and that worked fine (only one problem, see 2.). Unfortunately, the syntax is not being saved so we have to copy and paste it from DNA 2.0 every time we add new documents.

2. Date in Document Table

After importing documents, the column "Date" in the Document Table does show the date when the document has been added, not the date from the metadata.

DNA 2.0 does show the correct date:

Do you have solutions for these two problems?

Problem 1 may not be a bug, but it does make using DNA 3.0.8 a lot more hard work, because you either have to know regex or you need to copy and paste the syntax every time you use the import function.

Thanks for your help in advance and best regards,
Morris

leifeld · 2022-07-04T13:32:18Z

Sorry for the delay; I was hoping to get this done sooner. I am going to work on this next week.

leifeld · 2022-12-15T16:26:40Z

Hi there. Sorry this took me forever. But I have now fixed the problem.

There was a bug in the code that prevented dates from being parsed properly if only the date but not the time of the day was provided. It should work now. I will commit my changes in a minute, and I am going to release version 3.0.9 in a few weeks after adding more functionality.

The first issue with the missing defaults is a bit trickier. People use all sorts of file names and may want to include or omit those document meta-data.

I have now included the regex [a-zA-Z].+[a-zA-Z](?=\s*[0-9]{2}\.[0-9]{2}\.[0-9]{4}\s*\.[^\.\s]+$) for the title field. The first part, [a-zA-Z].+[a-zA-Z] looks for any text that starts with an a to z character ([a-zA-Z]) and ends with such a character as well ([a-zA-Z]) and has multiple arbitrary characters in between (.+). The title must be followed by a sequence that is not matched itself and is ignored. This is called a positive look-ahead, where (?=X) defines the look-ahead such that the following sequence X must exist but is ignored. This sequence starts with zero or more spaces (\s*), then contains a date of the form dd.MM.yyyy ([0-9]{2}\.[0-9]{2}\.[0-9]{4}), again one or more spaces (\s*) and finally a file extension starting with a dot and followed by some non-space characters (\.[^\.\s]+$).

I chose not to include defaults for author, source, and type. Instead, I suggest users should choose file names that are easy to parse. For example, consider the following file name structure:

[title: some title] [author: some author] [section: some section].txt

Then it's easy to parse the title using (?<=\[title: ).+?(?=\]), the author using (?<=\[author: ).+?(?=\]), and the section using (?<=\[section: ).+?(?=\]). I will leave these instructions here in case somebody else needs ideas on how to import text data.

I hope this works for everyone. I'll commit the code changes in a minute and will release the new version in two or three weeks.

leifeld self-assigned this Jun 21, 2022

leifeld added bug DNA GUI Issues related to the graphical user interface version 3.0 Issues relating to version 3.0, including rDNA labels Jun 21, 2022

leifeld added this to the DNA 3.0: GUI and export updates milestone Jun 21, 2022

leifeld closed this as completed in 8ec7f85 Dec 15, 2022

leifeld mentioned this issue Dec 24, 2022

Problems when import files from directory #254

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Semi-automatic import of .txt-Files #239

Semi-automatic import of .txt-Files #239

MadenMorris commented Jun 21, 2022 •

edited

leifeld commented Jul 4, 2022

leifeld commented Dec 15, 2022

Semi-automatic import of .txt-Files #239

Semi-automatic import of .txt-Files #239

Comments

MadenMorris commented Jun 21, 2022 • edited

leifeld commented Jul 4, 2022

leifeld commented Dec 15, 2022

MadenMorris commented Jun 21, 2022 •

edited