Merge libraries #295

koppor · 2018-01-30T12:50:19Z

As researcher, I have created dozens of .bib files, which I want to consolidate into one: I would like to point JabRef do a directory. Then, it recursively crawls the directory for *.bib files. For each found file: Import it in the currently opened library.

For each entry:

If there is an equal entry, silently ignore it.
If there is an entry with the same key, do not import it (or open the merge entries dialog, maybe configurable.)
If there is a duplicate entry (according to our algorithms), there should a) popup the merge entries dialog) or b) do not the entry at all -- may configurable for "silent dropping" - see above
If the attached file is not stored relatively under the directory of the bib file where it is imported, ask to copy it.

Note that this issue refs JabRef#160. That issue is about updated paper-bib-files and a main file, whereas this issue here is about merging data of "old" bib files.

The text was updated successfully, but these errors were encountered:

koppor · 2022-06-13T22:47:11Z

Test cases need to be implemented. Either create a separate sub folder in JabRef or use jimfs:

    testImplementation ('com.google.jimfs:jimfs:1.2') {
        exclude group: "com.google.auto.service"
        exclude group: "com.google.code.findbugs"
        exclude group: "org.checkerframework"
    }

leonzolati · 2022-10-11T04:50:19Z

Hi we are a group of students from the ANU and we would really like to work on this issue. What is the procedure to go about doing that?

leonzolati · 2022-10-13T03:29:34Z

In addition to the above question, we would like to ask for some clarification surrounding definitions.
What is the difference between an equal and duplicate entry? When an entry has the same key why should we not import it - from our understanding a key is like an intext reference which can have duplicates right?
Thank you very much.

koppor · 2022-10-14T22:48:06Z

@leonzolati I assigned you. Thus, it should be clear for others that someone is working on it.

Providing some background:

.bib files can have several thousand entries
When merging different .bib files, they could be originating from different researchers.
The result of a merge is a single .bib file
I want a single .bib file open and then execute the function "Merge other bib files into current library..."
Merging 10 .bib files could lead to thousands of thousands of entries
As researcher, I do not want to have duplicates in my database.
I as user like the concept of Wizards guiding me through the features of JabRef
BibTeX does not allow having multiple entries in the same file having the SAME BibTeX key (because the key needs to be unique).

What is the difference between an equal and duplicate entry?

org.jabref.model.entry.BibEntry#equals compares two entries following the Java conventions for equality. org.jabref.logic.database.DuplicateCheck#compareEntriesStrictly uses JabRef's duplicate algorithm.

from our understanding a key is like an intext reference which can have duplicates right?

I don't get your question fully.

I think, you mean, the BibTeX key coolbook could be different for you and me. Thus, it is NOT enough to check for BibTeX key equivalence.

However, Leymann2022 could be the same entry. As researcher, I do NOT want to have the same entries in my database.

Maybe following helps: Please check the paragraph https://en.wikipedia.org/wiki/BibTeX#Basic_structure at. With \cite{KEY} (from a .tex file), a reference to an entry in a .bib database is made. JabRef manages .bib files only. Example .bib file: https://github.com/JabRef/jabref/blob/main/src/test/resources/testbib/complex.bib

I also have a small presentation at https://speakerdeck.com/koppor/jabref-and-open-source-development?slide=6 - however, it is somehow incomplete as \cite{KEY} is missing.

leonzolati · 2022-10-18T21:03:23Z

thank you for this explanation it helped us a lot. Just to keep you informed, we have a working prototype of the code with some black-box testing but plan to increase code coverage in the coming days. We would like to ask for some additional clarification on this point:

what is meant by the following: If the attached file is not stored relatively under the directory of the bib file where it is imported, ask to copy it. Do you mean that the directory the the .bib files to merge should be in a child of the working directory and if it isn't, we should ask to copy it into a new directory that is a child of the working directory?

Thank you very much, Leon.

koppor · 2022-11-02T19:34:34Z

what is meant by the following: If the attached file is not stored relatively under the directory of the bib file where it is imported, ask to copy it.

Maybe, this is too confusing for the user and should be a separate functionality. :) - Forget in your PR.

claell · 2023-04-27T09:30:29Z

Just out of curiosity, in which way is this different to opening both files in JabRef, copying all entries from one file and pasting them in the other file? Isn't there already some check for duplicates and possibilities of merging? And if not sufficient, should this be enhanced on the go while implementing this functionality?

koppor · 2023-04-28T23:59:28Z

It is more a convenience feature. JabRef currently does not allow for having a "view" on all bibtex libraries. JabRef is still file-based. Situation: the researcher group manages papers at c:\git-repositories\publications. I manage my bib at c:\git-repositories\private-library. I just want to know all entries of the whole group. I can open all bib files, but with > 300 publications of the whole group, the usability of JabRef would shrink. Moreover, I do not want to manually open the bib file of each new publication and put into JabRef. I just want, if I am in the mode of collecting references, to have a "sync" of existing references. Thereby, I do not want to think: Which publications are new? Which bib files might have changed...

I know

Nevertheless a good exercise for students to think of cases, edge cases, ...

For sure, the duplication check needs to be adapted (in any case) - refs JabRef#9769.

claell · 2023-10-14T11:54:15Z

Got it, in your use case with a ton of different publications with own .bib files, that indeed makes sense!
One more thing to add to your suggestion: Having the imported entries grouped in the central database by their original source will be pretty helpful (at least to me). That also includes that duplicates won't simply be skipped, but in the case that the duplicate comes from a new source, they should get assigned to the corresponding group in the central database.

Additionally, there might be use cases where one wants to remove a group from an entry in the central database (possibly also remove the entry altogether if no groups from sources are left) in case where an entry is removed from a source. That might be even harder to implement in a robust way, though.

koppor added feature good first issue and removed good first issue labels Jan 30, 2018

koppor mentioned this issue Oct 13, 2021

Support of different type of BibTeX files JabRef/jabref#160

Open

ThiloteE added the Project: Teaching Project X label Jun 16, 2022

koppor assigned leonzolati Oct 14, 2022

leonzolati mentioned this issue Oct 24, 2022

Add MergeLibraries functionality JabRef/jabref#9292

Closed

6 tasks

koppor removed the Project: SE HIT 2022 label Oct 27, 2022

Siedlerchr unassigned leonzolati Jan 30, 2023

koppor mentioned this issue Jan 16, 2024

feature request: batch importer JabRef/jabref#10192

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merge libraries #295

Merge libraries #295

koppor commented Jan 30, 2018 •

edited

Loading

koppor commented Jun 13, 2022

leonzolati commented Oct 11, 2022

leonzolati commented Oct 13, 2022

koppor commented Oct 14, 2022 •

edited

Loading

leonzolati commented Oct 18, 2022 •

edited

Loading

koppor commented Nov 2, 2022

claell commented Apr 27, 2023

koppor commented Apr 28, 2023

claell commented Oct 14, 2023

Merge libraries #295

Merge libraries #295

Comments

koppor commented Jan 30, 2018 • edited Loading

koppor commented Jun 13, 2022

leonzolati commented Oct 11, 2022

leonzolati commented Oct 13, 2022

koppor commented Oct 14, 2022 • edited Loading

leonzolati commented Oct 18, 2022 • edited Loading

koppor commented Nov 2, 2022

claell commented Apr 27, 2023

koppor commented Apr 28, 2023

claell commented Oct 14, 2023

koppor commented Jan 30, 2018 •

edited

Loading

koppor commented Oct 14, 2022 •

edited

Loading

leonzolati commented Oct 18, 2022 •

edited

Loading