Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Semi-automatic import of .txt-Files #239

Closed
MadenMorris opened this issue Jun 21, 2022 · 2 comments
Closed

Semi-automatic import of .txt-Files #239

MadenMorris opened this issue Jun 21, 2022 · 2 comments
Assignees
Labels
bug DNA GUI Issues related to the graphical user interface version 3.0 Issues relating to version 3.0, including rDNA
Milestone

Comments

@MadenMorris
Copy link

MadenMorris commented Jun 21, 2022

Hello,
we have noticed two problems, when importing documents (raw data) in DNA 3.0.8. We used the Manual for DNA 2.0 Beta 25 and will use the same definitions in the following. We are not sure if these are just settings, we need to adjust first or if this are "bugs" in the software:

1. Disappearing Regex syntax

When using the semi-automatic way to upload documents, DNA 3.0.8 does not seem to come with a preset Regex configuration (2.0 does).

image

So, we used the syntax from DNA 2.0 Beta 25 and that worked fine (only one problem, see 2.). Unfortunately, the syntax is not being saved so we have to copy and paste it from DNA 2.0 every time we add new documents.

2. Date in Document Table

After importing documents, the column "Date" in the Document Table does show the date when the document has been added, not the date from the metadata.

image
image

DNA 2.0 does show the correct date:
image

Do you have solutions for these two problems?

Problem 1 may not be a bug, but it does make using DNA 3.0.8 a lot more hard work, because you either have to know regex or you need to copy and paste the syntax every time you use the import function.

Thanks for your help in advance and best regards,
Morris

@leifeld leifeld self-assigned this Jun 21, 2022
@leifeld leifeld added bug DNA GUI Issues related to the graphical user interface version 3.0 Issues relating to version 3.0, including rDNA labels Jun 21, 2022
@leifeld
Copy link
Owner

leifeld commented Jul 4, 2022

Sorry for the delay; I was hoping to get this done sooner. I am going to work on this next week.

@leifeld
Copy link
Owner

leifeld commented Dec 15, 2022

Hi there. Sorry this took me forever. But I have now fixed the problem.

There was a bug in the code that prevented dates from being parsed properly if only the date but not the time of the day was provided. It should work now. I will commit my changes in a minute, and I am going to release version 3.0.9 in a few weeks after adding more functionality.

The first issue with the missing defaults is a bit trickier. People use all sorts of file names and may want to include or omit those document meta-data.

I have now included the regex [a-zA-Z].+[a-zA-Z](?=\s*[0-9]{2}\.[0-9]{2}\.[0-9]{4}\s*\.[^\.\s]+$) for the title field. The first part, [a-zA-Z].+[a-zA-Z] looks for any text that starts with an a to z character ([a-zA-Z]) and ends with such a character as well ([a-zA-Z]) and has multiple arbitrary characters in between (.+). The title must be followed by a sequence that is not matched itself and is ignored. This is called a positive look-ahead, where (?=X) defines the look-ahead such that the following sequence X must exist but is ignored. This sequence starts with zero or more spaces (\s*), then contains a date of the form dd.MM.yyyy ([0-9]{2}\.[0-9]{2}\.[0-9]{4}), again one or more spaces (\s*) and finally a file extension starting with a dot and followed by some non-space characters (\.[^\.\s]+$).

I chose not to include defaults for author, source, and type. Instead, I suggest users should choose file names that are easy to parse. For example, consider the following file name structure:

[title: some title] [author: some author] [section: some section].txt

Then it's easy to parse the title using (?<=\[title: ).+?(?=\]), the author using (?<=\[author: ).+?(?=\]), and the section using (?<=\[section: ).+?(?=\]). I will leave these instructions here in case somebody else needs ideas on how to import text data.

I hope this works for everyone. I'll commit the code changes in a minute and will release the new version in two or three weeks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug DNA GUI Issues related to the graphical user interface version 3.0 Issues relating to version 3.0, including rDNA
Projects
None yet
Development

No branches or pull requests

2 participants