# Explanation

This is a Jupyter Notebook to explain the function "check_duplicates.py". To use a specific user case, we assume that a person wants to donate some books to the library. We have provided the doner with a Excel template with a series of columns already defined. The person sends us this table filled. This can be found in `Fachreferats-Toolbox/data/dummy_data/Beispiel_Geschenke.xlsx`.


To run this, you can download the repository and run this Jupyter Notebook. However, your computer should have Python and specific Python libraries installed (for example lxml).

# Import

## Libraries and Functions

In [1]:
import sys
import os
import glob
from lxml import etree
import pandas as pd
import numpy as np


In [2]:
#sys.path.append(os.path.abspath("./../../../../Dropbox/MTB/Göttingen/research/"))
sys.path.append(os.path.abspath("./../"))

In [3]:
from fachreferats_functions import check_duplicates



## Data

In [4]:
books_donated = pd.read_excel("./../../data/dummy_data/Beispiel_Geschenke.xlsx")

In [5]:
books_donated

Unnamed: 0,Titel,ISBN,Vorname_Autor,Nachname_Autor,Erscheinungsort,Verlag,Erscheinungsjahr,Bestand_SUB?,Bestand_Göttingen?,Bestand_Verbund_KVK?,Übernehmen?
0,La familia de Pascual Duarte,9788423000000.0,,,,,,,,,
1,The picture of Dorian Gray,9780141000000.0,,,,,,,,,
2,Auf der Eidechsburg,,Ilse-Dore,Tanner,Leipzig,A. H. Payne,1938?,,,,
3,Gustav Klimt,9781403000000.0,,,,,,,,,
4,Bayerisches Kochbuch,,Maria,Hofmann,München,Birken,1950,,,,
5,Centenaire de l‘Impressionisme,,,,Paris,Musées nationaux,1974,,,,
6,Les parents terribles,,Jean,Cocteau,Paris,Gallimard,1938,,,,
7,Straightforward Statistics,,James D.,Evans,Pacific Grove,Brooks/Cole,1996,,,,


# Run Functions on Data

## Check copies with ISBN
Checking whether the texts are at the Göttinge Library with the information from the ISBNs

In [6]:
books_donated

Unnamed: 0,Titel,ISBN,Vorname_Autor,Nachname_Autor,Erscheinungsort,Verlag,Erscheinungsjahr,Bestand_SUB?,Bestand_Göttingen?,Bestand_Verbund_KVK?,Übernehmen?
0,La familia de Pascual Duarte,9788423000000.0,,,,,,,,,
1,The picture of Dorian Gray,9780141000000.0,,,,,,,,,
2,Auf der Eidechsburg,,Ilse-Dore,Tanner,Leipzig,A. H. Payne,1938?,,,,
3,Gustav Klimt,9781403000000.0,,,,,,,,,
4,Bayerisches Kochbuch,,Maria,Hofmann,München,Birken,1950,,,,
5,Centenaire de l‘Impressionisme,,,,Paris,Musées nationaux,1974,,,,
6,Les parents terribles,,Jean,Cocteau,Paris,Gallimard,1938,,,,
7,Straightforward Statistics,,James D.,Evans,Pacific Grove,Brooks/Cole,1996,,,,


In [7]:
books_donated = check_duplicates.check_duplicate_with_isbn( books_donated, 
    name_column_isbn = "ISBN",
    name_column_title = "Titel",

    )

La familia de Pascual Duarte 9788423339044 1
The picture of Dorian Gray 9780140623222 0
Gustav Klimt 9781402759208 0


## Check copies with Title
Checking whether the texts are at the Göttinge Library with the information from the title.

In [8]:
books_donated = check_duplicates.check_duplicate_with_title( books_donated, 
    name_column_title = "Titel",
    )

La familia de Pascual Duarte
['8']
The picture of Dorian Gray
['42']
Auf der Eidechsburg
['0']
Gustav Klimt
['31']
Bayerisches Kochbuch
['1']
Centenaire de l‘Impressionisme
['0']
Les parents terribles
['4']
Straightforward Statistics
['2']


# Results

In [9]:
books_donated

Unnamed: 0,Titel,ISBN,Vorname_Autor,Nachname_Autor,Erscheinungsort,Verlag,Erscheinungsjahr,Bestand_SUB?,Bestand_Göttingen?,Bestand_Verbund_KVK?,Übernehmen?,based_on_ISBN_number_GUK,based_on_ISBN_in_GUK?,based_on_ISBN_url_GUK,based_on_Titel_number_GUK,based_on_Titel_in_GUK?,based_on_Titel_url_GUK
0,La familia de Pascual Duarte,9788423339044,,,,,,,,,,1.0,True,https://opac.sub.uni-goettingen.de/DB=1/SET=6/...,8,True,https://opac.sub.uni-goettingen.de/DB=1/SET=2/...
1,The picture of Dorian Gray,9780140623222,,,,,,,,,,0.0,False,https://opac.sub.uni-goettingen.de/DB=1/SET=6/...,42,True,https://opac.sub.uni-goettingen.de/DB=1/SET=2/...
2,Auf der Eidechsburg,0,Ilse-Dore,Tanner,Leipzig,A. H. Payne,1938?,,,,,,,,0,False,https://opac.sub.uni-goettingen.de/DB=1/SET=2/...
3,Gustav Klimt,9781402759208,,,,,,,,,,0.0,False,https://opac.sub.uni-goettingen.de/DB=1/SET=6/...,31,True,https://opac.sub.uni-goettingen.de/DB=1/SET=2/...
4,Bayerisches Kochbuch,0,Maria,Hofmann,München,Birken,1950,,,,,,,,1,True,https://opac.sub.uni-goettingen.de/DB=1/SET=2/...
5,Centenaire de l‘Impressionisme,0,,,Paris,Musées nationaux,1974,,,,,,,,0,False,https://opac.sub.uni-goettingen.de/DB=1/SET=2/...
6,Les parents terribles,0,Jean,Cocteau,Paris,Gallimard,1938,,,,,,,,4,True,https://opac.sub.uni-goettingen.de/DB=1/SET=2/...
7,Straightforward Statistics,0,James D.,Evans,Pacific Grove,Brooks/Cole,1996,,,,,,,,2,True,https://opac.sub.uni-goettingen.de/DB=1/SET=2/...


The column "based_on_ISBN_in_GUK?" shows that from the three books for which the donner has given a ISBN, the SUB already has one title with exactly this ISBN.

The column "based_on_Titel_in_GUK?" shows that 5 of the seven proposal for donnation are already in our catalogue. For the other two, there are no records in the catalog for entries with these titles. The column "based_on_Titel_number_GUK" shows the number of different catalog entries with this title. However, perhaps these are different works that have the same title as the ones being donnated. For this reason, the librarian gets also the column "based_on_Titel_url_GUK" with direct links to their catalogue to consider whether these books are accepted by the library or not. 

The results can be exported as table (Excel or Tab-Separated Values).

In [10]:
books_donated.to_excel("./../../data/Beispiel_Geschenke_checked.xlsx")

In [11]:
books_donated.to_csv("./../../data/Beispiel_Geschenke_checked.tsv", sep="\t")