#¬†Capstone Project - Deduplication of Swissbib Raw Data

**Program** Applied Data Science : Machine Learning<br>
**Institution** EPFL Extension School<br>
**Course** \#5, Capstone Project<br><br>
**Title** Deduplication of Swissbib Raw Data<br>
**Author** Andreas Jud<br>
**Date** 06-APR-2020

##¬†Table of Contents

- [Introduction](#Introduction)
    - [Requirements](#Requirements)
    - [Acknowledgements](#Acknowledgements)
- [Structure of the Project](#Structure-of-the-Project)
- [Runs and Results](#Runs-and-Results)
    - [Runtime Parameters](#Runtime-Parameters)
    - [Overview of Runs](#Overview-of-Runs)
    - [Runs Execution](#Runs-Execution)
- [Assessment of Results](#Assessment-of-Results)
    - [Run with id 0](#Run-with-id-0)
    - [Run with id 1](#Run-with-id-1)
    - [Run with id 2](#Run-with-id-2)
    - [Run with id 3](#Run-with-id-3)
    - [Run with id 4](#Run-with-id-4)
    - [Conclusion of Runs](#Conclusion-of-Runs)
    - [Wrong Predictions](#Wrong-Predictions)
    - [Classification of Swissbib's Goldstandard Data](#Classification-of-Swissbib's-Goldstandard-Data)
- [Comparison with Literature](#Comparison-with-Literature)
- [Summary and Outlook](#Summary-and-Outlook)

##¬†Introduction

The goal of this capstone project is to explore a data deduplication with the help of machine learning methods. The data to be deduplicated is bibliographical data, provided by the online catalogue [Swissbib](https://www.swissbib.ch/). Swissbib's currently implemented deduplication mechanism is to be replaced by a new mechanism implemented with machine learning. More information on Swissbib, its implemented architecture for deduplication and its data can be found in the [proposal](./project-proposal-andreas-jud.ipynb) for the capstone project.

This capstone project is the fifth and concluding module of an online course [Applied Data Science: Machine Learning](https://www.extensionschool.ch/applied-data-science-machine-learning) by the [EPFL Extension School](https://www.extensionschool.ch/). In its first two modules, the course teaches programming language [Python](https://www.python.org/) while in modules 3 and 4, the course gives an insight into machine learning. For this reason, this capstone project implements code written in Python and applies the machine learning methods that are covered by the course.

### Requirements

This capstone project uses several publically available Python libraries. The chapters where a library is needed show the
<br>$\texttt{! pip install <library name>}$<br>
command in a separate code cell, respectively. These commands have been executed once for the development environment of the author and have been commented out for later execution runs in order to produce more readable notebooks. For executing the set of notebooks of the capstone project on a python development environment with a basic setup, a [requirements.txt](./requirements.txt) file has been written. This file can be executed in the code cell below and installs the library packages needed for this capstone project.

In [1]:
#! pip install -r requirements.txt

###¬†Acknowledgements

The author of this capstone project is outside the academic library sector. This capstone project would not have been possible without a basic support by the Swissbib project team. During a short time period of about four months, the author has had the chance to meet Swissbib's project team members, learn from them and discuss his ideas with them. It was a steep learning curve and the author appreciates the team members' patience and curiosity for his methods. A special and warm thank is to be expressed to Silvia Witzig of the project team. She is the unique and absolute expert of Swissbib's goldstandard data and of the implemented deduplication logic. Her sharing of her knowledge was very efficient and effective. Another explicit thank you is to be expressed to G√ºnter Hipler. G√ºnter has brought up the idea for the project, a problem, for which the author in the beginning had no idea on how to solve it. And G√ºnter has implemented several file exports of the goldstandard and of test data. Without this data provisioning, this project would not have been possible.

Two more people have supported the author highly in the course of the online training in general and explicitly during this capstone project. EPFL course trainer Michael Notter was available on demand for inspiring discussions with helpful hints and ideas for the project. And the author's spouse Claudia has supported him mentally, has given him guidance and motivation even in moments of strain.

This capstone project forms the final module of an online course that has been done as an in-service training. The author's employer and sponsor Baloise Insurance has to be mentioned at this place. Baloise is a company with an open-minded management and the author's thanks go explicitly to his supervisor Nicole Rupp who agreed on a topic for this capstone project, outside the insurance sector.

Thanks to all of you! I would hope, I can give back a little bit of my enthusiasm to you.

##¬†Structure of the Project

The inital notebook of this capstone project is its [proposal](./project-proposal-andreas-jud.ipynb). Based on it, the collection of notebooks of the capstone project consists of the following chapters.

0. Overview and Summary
0. [Data Analysis](./1_DataAnalysis.ipynb)
0. [Goldstandard and Data Preparation](./2_GoldstandardDataPreparation.ipynb)
0. [Data Synthesizing](./3_DataSynthesizing.ipynb)
0. [Feature Matrix Generation](./4_FeatureMatrixGeneration.ipynb)
0. [Features Discussion and Dummy Classifier Baseline](./5_FeatureDiscussionDummyBaseline.ipynb)
0. [Decision Tree Model](./6_DecisionTreeModel.ipynb)
0. [Support Vector Classifier Model](./7_SVCModel.ipynb)
0. [Neural Network Model](./8_NeuralNetwork.ipynb)

Appendix

- [A. References](./A_References.ipynb)
- [B. Comparison of Similarity Metrics](./B_CompareSimilarities.ipynb)
- [C. Assessment of Models Trained with Synthetic Data](./C_AssessmentSyntheticModels.ipynb)

Chapter 0 is the summary chapter that executes all Jupyter Notebooks of the project, analyses their results and assesses the calculated models. The time needed to run this notebook is about 14 hours on a 2-years old Apple desktop.

Chapter 1 analyses an amount of nearly 200,000 records provided by Swissbib by a data extract. In the beginning of the chapter, some sample records of different formats of bibliographical units are shown. After this general assessment, all attributes are investigated separately in a profound way with the goal of understanding their contents and their potential contribution as a feature to a machine learning model. The artefact of the chapter is a dictionary of metadata on the attributes being processed in the scope of this capstone project. This artefact is read by the next chapter for further processing.

Chapter 2 analyses Swissbib's goldstandard data. After having understood the relationship of the records among each other in the raw data, pairs of records will be built. These records of pairs of the original records will be the starting point for the feature matrix with the similarity values. This feature base will be handed over as an artefact of the chapter to be further processed in the next chapter.

Chapter 3 generates artificial records of pairs with the goal to increase the ratio of pairs of duplicates in the data for training and testing. The amount of increase can be controlled by a numerical target value for the ratio of duplicate pairs. A pair of identical duplicates would not correspond to Swissbib's raw data reality, though. Therefore, a second target amount of pairs of duplicates is modified according to suggestions described in [[Chri2012](./A_References.ipynb#chri2012)]. This second target amount can be controlled by another parameter. At the end of the chapter, the amount of pairs of uniques can be reduced with the help of yet another control parameter. This reduction of pairs of uniques increases the ratio of duplicates in the training data. The main goal of this downsampling of uniques is the reduction of the full amount of training data, though. The idea behind it is an increase of the run performance of the notebooks where the models are calculated.

Chapter 4 is one central part of the capstone project. In there, the features of the feature matrix are being determined. For the project at hand, a feature is a numerical distance value between two same attributes of one pair of records. The distance value is calculated with the help of a similarity algorithm, provided by the functions of a library of Python code. The artefact of chapter 4 is the labelled feature matrix, still stored in the form of a pandas DataFrame.

Chapter 5 uses the artefact of the preceeding chapter to analyse the effect of the features calculated in chapter 4. A valid similarity metric for an attribute is given if it separates records of pairs of uniques from records of pairs of duplicates. The analysis is done with the help of a series of histograms. Afterwards, the first machine learning models are generated. The fitted models belong to the machine learning category of unsupervised learning. Omitting the target vector when training the classifiers, those models have to find clusters of records in the absence of labels. A Principal Component Analysis, a t-SNE model, and a k-means classifier generate impressive results each. Chapter 5 ends with fitting a dummy classifier which will be used as a statistical baseline for the models fitted in the remaining chapters.

Chapter 6 shows the title 'Decision Tree Model'. In it, three different classifiers are fitted each one of the Ensemble family. A simple Decision Tree Classifier is used as a warm-up of the chapter, before fitting statistically more robust models like Decision Tree Classifier with cross-validation and Random Forests Classifier. One important part of chapter 6 is the introduction of the performance metric used for this capstone project. Another important part is a first effort made in trying to understand and interpret the models of this project. Chapter 6 passes the models' performance to the chapter here as its results for global assessment and comparison.

Chapter 7 calculates in its central part a Support Vector Classifier with the help of cross-validation. This classifier offers another chance for approaching the interpretation of the models. Some effort is taken in the chapter. Chapter 7 passes the models' preformance to chapter 0 as its results for the global assessment, here.

Chapter 8 is the third and last chapter for training a model. A Neural Network is fitted in an implementation with the Keras library. The slow convergence of the networks is striking. A lot of epochs need to be run for the model to reach its absolute maximum. The second striking aspect is the size of the network favoured by better performance figures. On the one hand a network with two hidden layer and on the other hand, a network with a large amount of neurons per layer show the best performance characteristics. Chapter 8, too, contributes with its results to the global assessment of this chapter.

Appendix A holds the bibliography of the capstone project.

Appendix B systematically compares all similarity metrics available in Python library $\texttt{textdistance}$ for samples of all attributes used for the feature matrix of this capstone project. This systematic comparison is the basis for deciding on the similarity metrics of done in chapter [Feature Matrix Generation](./4_FeatureMatrixGeneration.ipynb).

Appendix C compares the results of two Ensemble classifier models trained with the additional help of synthetic data produced in chapter [Data Synthesizing](./3_DataSynthesizing.ipynb) with the results of the same classifiers trained exclusively with pure Swissbib data. This comparison reveals the quality of the chosen data synthesis of chapter [Data Synthesizing](./3_DataSynthesizing.ipynb).

##¬†Runs and Results

This section starts with explaining the runtime parameters with which the notebooks of the capstone project can be called. After the parameter space has been settled, a series of runs will be executed with different parameter values each.

###¬†Runtime Parameters

The notebooks of this capstone project can be called with eight specific global parameters. These parameters are listed and explained here.

- $\texttt{execution}\_\texttt{mode}$ - The reason for introducing this parameter has been runtime of execution. Grid search has been implemented with the goal to find the best parameters for a model. The bigger the grid space, i.e. the more grid points the grid space has for each of its dimensions, the more model fits have to run and the longer lasts the runtime of a notebook. Oversampling of records of duplicates intreases the runtime of a notebook, even more. When searching the best parameters for a model, the grid space has to be scanned widely. The runtime of the model may extend to hours, even days for such calculations. For some runs, smaller grid spaces may be sufficient. A restricted grid space can be chosen in order to save calculation time. The execution mode of a notebook may have four distinct values.
    - Mode $\texttt{manual}$ will be used for executing the notebook, opening it and running it directly cell by cell. This kind of execution shall be called a local execution mode. The original purpose of this mode of execution has been to open the notebook and read its text, in order to focus on the contents and specific explanation for a model. Runtime is supposed to be moderate or even short for these execution modes. The result of the notebook is to be reflected by its text with the purpose to explain it thoroughly. The grid parameters chosen for this mode have flowed back from the insights found from results with full execution mode of this chapter.
    - Mode $\texttt{full}$ will be used for executing the notebook, calling it in this very chapter and collecting the results of each notebook for final comparison and assessment.
    - Mode $\texttt{restricted}$ serves for exploring different data processing modes explained below in this list, cp. modes $\texttt{factor}$, $\texttt{mode}\_\texttt{exactDate}$, and $\texttt{strip}\_\texttt{number}\_\texttt{digits}$. Runtime is supposed to be short to moderate again for these execution modes. The grid parameters chosen for this mode have flowed back from the insights found from results with full execution mode of this chapter.
    - Mode $\texttt{tune}$ will be used for a final fine tuning of the models' parameters. Goal of a run in mode $\texttt{tune}$ is to get the best models of a grid space close to a precalculated best model of the wide grid space run with mode $\texttt{full}$. While mode $\texttt{full}$ will be used for scanning a wide range of orders of magnitude of the parameter space, mode $\texttt{tune}$ will be used for scanning the neighbouring parameter points of the best models of the mode $\texttt{full}$ run. This approach helps in an iterative search for the best parameters of the models.
- $\texttt{oversampling}$ - The number of records of duplicates generated with Swissbib's goldstandard data has been low compared to the number of records of pairs of uniques. The consequence has been to generally use balancing for model fitting. In order to increase the ratio of duplicates in the training and testing data, an oversampling with synthetic data has been implemented. To control the ratio, parameter $\texttt{oversampling}$ can be used. Synthetic data will be multiplyed with a for loop, so to reach a ratio of oversampling in percent \[%\] in the final data set for model calculation. If $\texttt{oversampling}=0$, no synthetic data will be added to the goldstandard data. This parameter will be used in chapter [Data Synthesizing](./3_DataSynthesizing.ipynb).
- $\texttt{modification}\_\texttt{ratio}$ - This parameter will be used in chapter [Data Synthesizing](./3_DataSynthesizing.ipynb), too. In that chapter, some specific kinds of data modification (typos) to be simulated have been defined for each attribute. If an attribute shows one or more kinds of modification, this parameter controls the ratio and therefore the amount of records with modification.
- $\texttt{sampling}\_\texttt{fraction}\_\texttt{nreb}$ - The models of chapters [Support Vector Classifier Model](./7_SVCModel.ipynb) and [Neural Network Model](./8_NeuralNetwork.ipynb) explicitly suffer from long runtime during training. Ways to reduce this duration to a smaller order of magnitude have been searched. Two different basic ways could be imagined. The first one is to use a PCA (Principle Component Analysis) classifier to transform the features of a model to a lower dimensionality. This way of dimensionality reduction has been rejected with the desire of keeping full information of all the features of the model. An alternative way of reducing the calculation load on a model has been chosen, instead. In chapter [Data Synthesizing](./3_DataSynthesizing.ipynb), two kinds of downsampling have been implemented. The first kind reduces the amount of records of the training, validation and testing data, independent of the class belonging by selecting a purely random subset of records out of the basic data set. This kind of downsampling leaves the ratio of the two classes unchanged. The target ratio for the subset is set by parameter $\texttt{sampling}\_\texttt{fraction}\_\texttt{nreb}$.
- $\texttt{sampling}\_\texttt{fraction}\_\texttt{reb}$ - The second kind of downsampling implemented in chapter [Data Synthesizing](./3_DataSynthesizing.ipynb) reduces the amount of records of class unique, only. This kind leaves the low amount of records of pairs of duplicates untouched. Reducing exclusively the amount of records of class unique increases the ratio of records of pairs of duplicates in the total data set for training, validation and testing as a side effect. Details on the implementation are given in chapter [Data Synthesizing](./3_DataSynthesizing.ipynb), the parameter for controlling the downsampling in the second kind is $\texttt{sampling}\_\texttt{fraction}\_\texttt{reb}$.
- $\texttt{factor}$ - In Swissbib's raw data, records may have missing values in attributes. When building pairs of records for generating the feature matrix, records may occur with a value on both sides of a pair, but also with missing values on one side of a pair and even with missing values on both sides of a pair, see chapters [Feature Matrix Generation](./4_FeatureMatrixGeneration.ipynb) and [Features Discussion and Dummy Classifier Baseline](./5_FeatureDiscussionDummyBaseline.ipynb) for a deeper discussion. Missing values may influence the model. For that reason, a decision has been taken to mark the features of records of pairs with missing attribute values. One way of marking them can be to transform them to a negative similarity value. During implementation, a discussion has been on how the distance from the origin (similarity value of 0) on the negative similarity side would influence a model, especially a Neural Network, due to its linear dependency on firing of a neuron. To be able to set the distance from the origin, this factor has been introduced. In the implemented code, the factor ...
    - multiplies -0.5 if one attribute of the pair is missing.
    - multiplies -1.0 if both attributes of the paire are missing.
- $\texttt{mode}\_\texttt{exactDate}$ - The basic similarity metric of attribute $\texttt{exact}\_\texttt{date}$, undergoes some modification in presence of unknown values, see chapter [Feature Matrix Generation](./4_FeatureMatrixGeneration.ipynb) for implementation details. Two different modes of modifying the basic similarity metric have been implemented. To decide on one mode of modification, parameter $\texttt{exact}\_\texttt{date}$ has been introduced. 
- $\texttt{strip}\_\texttt{number}\_\texttt{digits}$ - Swissbib's raw data bring attributes $\texttt{scale}$, $\texttt{part}$, and $\texttt{volumes}$ as full-text strings. Swissbib's deduplication engine extracts their number digit parts in a preprocessing step with the goal to generate more reliable results. A very basic stripping function has been implemented in this capstone project with the goal to copy Swissbib's more sophisticated logic. The model result may change as a function of the similarity values of these three attributes. To assess the effect of stripping the attributes values, parameter $\texttt{strip}\_\texttt{number}\_\texttt{digits}$ will be used for switching on ($\texttt{strip}\_\texttt{number}\_\texttt{digits}=\texttt{True}$) and off ($\texttt{strip}\_\texttt{number}\_\texttt{digits}=\texttt{False}$) the stripping to number digits logic.

The usage of the above global parameters for the notebooks will be explained in the run strategy outlined in the next subsection.

###¬†Overview of Runs

In the course of this capstone project, many runs have been executed. These runs were started in very early stages of the implementation. The results of some of these runs have been discarded while others have been stored in the github space for the proposal [[PropRepo](./A_References.ipynb#proposal_repo)] or the project github space [[ProjRepo](./A_References.ipynb#project_repo)] of the capstone project. An overview of the most important runs is given in the <a href='https://en.wikipedia.org/wiki/Numbers_(spreadsheet)'>Apple Numbers</a> file [runs_summary](./documentation/runs_summary.numbers) which is also provided as a .csv file [runs_summary.csv](./documentation/runs_summary.csv). The assessment of these runs has generated a specific kind of experience gained on the behaviour of the models fitted in the course of the capstone project. This experience has flowed into the run strategy described in this subsection.

With the description of the runtime parameters in the subsection above, a multi-dimensional space of calculation options has been spanned. The dimensions of this space have increased in the course of the project. This increase turned the limitations of the hardware resources more and more to the critical factor. One answer to the increased need for computational power was the downsampling of the records with the implementation controlled by parameters $\texttt{sampling}\_\texttt{fraction}\_\texttt{nreb}$ and $\texttt{sampling}\_\texttt{fraction}\_\texttt{reb}$ mentioned in the subsection above. It took the author a long time to decide for a resampling of the data. Reason was the fear to influence the models in an inappropriate way. Finally comparing the results of downsampled models with the results of the original runs showed a calming picture, though. The results remained tolerably close in their accuracy performance. This statement can be retraced in file [runs_summary.csv](./documentation/runs_summary.csv).

Another answer to the increased need for computational power due to the need to explain the results of the capstone project in this chapter is the strategy of runs described in this subsection. It is important to design well the runs to be done in order to reduce unnecessary calculation time and to increase the statements of the documented runs. This sophisticated design was only possible with the experience of many previous runs, documented in file [runs_summary.csv](./documentation/runs_summary.csv). The strategy described here is the result of a series of full runs of which only traces of a selection are visible in the summary file.

The strategy used for the runs of this capstone project is shown in the table below. This strategy with its specific parameters has grown in the course of the capstone project iteratively. In the end, this chapter condenses the author's learning on the basic behaviour of the models and their best-suited parameter space.

| run id |¬†description | parameter set |
| :----: | :---------- | :--------- |
| 0 | Goldstandard sampling,<br>**full feature modification** | $\texttt{execution}\_\texttt{mode}$ = $\texttt{full}$<br>$\texttt{oversampling}$ = $\texttt{None}$ with $\texttt{modification}\_\texttt{ratio}$ = \< irrelevant \><br>$\texttt{sampling}\_\texttt{fraction}\_\texttt{nreb}$ = $1.0$ and $\texttt{sampling}\_\texttt{fraction}\_\texttt{reb}$ = $0.4$<br>$\texttt{factor}$ = $1.0$<br>$\texttt{mode}\_\texttt{exactDate}$ = $\texttt{added}\_\texttt{u}$ and $\texttt{strip}\_\texttt{number}\_\texttt{digits}$ = $\texttt{True}$ |
| 1 | Goldstandard sampling,<br>**little feature modification** | $\texttt{execution}\_\texttt{mode}$ = $\texttt{restricted}$<br>$\texttt{oversampling}$ = $\texttt{None}$ with $\texttt{modification}\_\texttt{ratio}$ = \< irrelevant \><br>$\texttt{sampling}\_\texttt{fraction}\_\texttt{nreb}$ = $1.0$ and $\texttt{sampling}\_\texttt{fraction}\_\texttt{reb}$ = $0.4$<br>$\texttt{factor}$ = $1.0$<br><font color='red'>$\texttt{mode}\_\texttt{exactDate}$ = $\texttt{xor}$ and $\texttt{strip}\_\texttt{number}\_\texttt{digits}$ = $\texttt{False}$</font> |
| 2 | Goldstandard sampling,<br>**small separation of missings** | $\texttt{execution}\_\texttt{mode}$ = $\texttt{restricted}$<br>$\texttt{oversampling}$ = $\texttt{None}$ with $\texttt{modification}\_\texttt{ratio}$ = \< irrelevant \><br>$\texttt{sampling}\_\texttt{fraction}\_\texttt{nreb}$ = $1.0$ and $\texttt{sampling}\_\texttt{fraction}\_\texttt{reb}$ = $0.4$<br><font color='red'>$\texttt{factor}$ = $0.1$</font><br>$\texttt{mode}\_\texttt{exactDate}$ = $\texttt{added}\_\texttt{u}$ and $\texttt{strip}\_\texttt{number}\_\texttt{digits}$ = $\texttt{True}$ |
| 3 | **Oversampling** | $\texttt{execution}\_\texttt{mode}$ = $\texttt{full}$<br><font color='red'>$\texttt{oversampling}$ = $\texttt{20}$ with $\texttt{modification}\_\texttt{ratio}$ = $0.2$</font><br>$\texttt{sampling}\_\texttt{fraction}\_\texttt{nreb}$ = $1.0$ and $\texttt{sampling}\_\texttt{fraction}\_\texttt{reb}$ = $0.4$<br>$\texttt{factor}$ = $1.0$<br>$\texttt{mode}\_\texttt{exactDate}$ = $\texttt{added}\_\texttt{u}$ and $\texttt{strip}\_\texttt{number}\_\texttt{digits}$ = $\texttt{True}$ |
| 4 | Fine tuning | <font color='red'>$\texttt{execution}\_\texttt{mode}$ = $\texttt{tune}$</font><br>$\texttt{oversampling}$ = $\texttt{None}$ with $\texttt{modification}\_\texttt{ratio}$ = \< irrelevant \><br>$\texttt{sampling}\_\texttt{fraction}\_\texttt{nreb}$ = $1.0$ and $\texttt{sampling}\_\texttt{fraction}\_\texttt{reb}$ = $0.4$<br>$\texttt{factor}$ = $1.0$<br>$\texttt{mode}\_\texttt{exactDate}$ = $\texttt{added}\_\texttt{u}$ and $\texttt{strip}\_\texttt{number}\_\texttt{digits}$ = $\texttt{True}$ |

The strategy for finding the best parameters for the best model can be described as follows. The item numbers in the list below correspond to the run ids in the table above.
0. The first group of runs scans the parameter space widely with a coarse granularity in the grid space. The runs are done with a downsampled goldstandard data, due to runtime. The parameter $\texttt{factor}$ is set to its originally intended value of 1.0. The stripping of text attributes to numbers is done in a forced way with the expectation to better approach Swissbib's data preprocessing. This run represents a first search of parameters based on technical guesses for narrowing them down for the best models later on.
0. The assumption of the text attributes' stripping is validated with the next group of runs, leaving attributes $\texttt{part}$, $\texttt{scale}$, and $\texttt{volumes}$ unabbreviated to Swissbib's original raw data output. This validation is done with downsampling on a restricted grid space, based of the findings of the best models from run with id 0.
0. The next group of runs validates the assumptions of the influence of the distance of missing data from the origin on the models. Setting $\texttt{factor} = 0.1$ stands for the expectation of a better performance for Neural Networks, see above. The other parameters are set according to the findings so far, comparing the performance of the models.
0. The low ratio of records with duplicate pairs compared to the amount of records with uniques has shown to be of low significance for training the models. The effect of oversampling with synthetic data still remains an interesting point to be investigated. The result is to be compared with Swissbib's goldstandard data without oversampling. The other parameters will be the ones found from the best models, up to that point. For oversampled data, runtime becomes even more critical. Therefore, this run, too, will be done with a downsampled data set.
0. The last group of runs scans the grid space in a fine granularity in the vicinity of the grid points found for the best models in the preceding runs, explicitly in run 0. This will be a fine tuning step in order to be sure to have found the very best parameters for the best model of all best models.

Before the defined runs can be executed, the global parameters described in subsection [Runtime Parameters](#Runtime-Parameters) have to be set according to the strategy.

In [2]:
# Generate dictionary for parameter handover
runtime_param_dict = {
    'em' : 'full' #execution_mode : ['restricted', 'full', 'tune']
    , 'os' : 0 # oversampling : [0, 20]
    , 'mr' : 0.2 # modification_ratio
    , 'dsn' : 1 # ùöúùöäùöñùöôùöïùöíùöóùöê_ùöèùöõùöäùöåùöùùöíùöòùöó_ùöóùöõùöéùöã : <= 1
    , 'dsw' : 0.4 # ùöúùöäùöñùöôùöïùöíùöóùöê_ùöèùöõùöäùöåùöùùöíùöòùöó_ùöõùöéùöã : <= 1
    , 'fa' : 1.0 #¬†factor : [0.1, 1.0]
    , 'me' : 'added_u' # mode_exactDate : ['added_u', 'xor']
    , 'sn' : True #¬†strip_number_digits : [True, False]
}
#¬†Run id = 0
runtime_param_dict_list = [runtime_param_dict]

#¬†Run id = 1
runtime_param_dict = runtime_param_dict_list[0].copy()
runtime_param_dict['em'] = 'restricted'
runtime_param_dict['me'] = 'xor'
runtime_param_dict['sn'] = False
runtime_param_dict_list.append(runtime_param_dict)

#¬†Run id = 2
runtime_param_dict = runtime_param_dict_list[0].copy()
runtime_param_dict['em'] = 'restricted'
runtime_param_dict['fa'] = 0.1
runtime_param_dict_list.append(runtime_param_dict)

#¬†Run id = 3
runtime_param_dict = runtime_param_dict_list[0].copy()
runtime_param_dict['os'] = 20
runtime_param_dict_list.append(runtime_param_dict)

#¬†Run id = 4
runtime_param_dict = runtime_param_dict_list[0].copy()
runtime_param_dict['em'] = 'tune'
runtime_param_dict_list.append(runtime_param_dict)

#¬†Let's have a look at the predefined parameters
for run in range(len(runtime_param_dict_list)):
    print('Parameters for run', run, ': \n', runtime_param_dict_list[run])

Parameters for run 0 : 
 {'em': 'full', 'os': 0, 'mr': 0.2, 'dsn': 1, 'dsw': 0.4, 'fa': 1.0, 'me': 'added_u', 'sn': True}
Parameters for run 1 : 
 {'em': 'restricted', 'os': 0, 'mr': 0.2, 'dsn': 1, 'dsw': 0.4, 'fa': 1.0, 'me': 'xor', 'sn': False}
Parameters for run 2 : 
 {'em': 'restricted', 'os': 0, 'mr': 0.2, 'dsn': 1, 'dsw': 0.4, 'fa': 0.1, 'me': 'added_u', 'sn': True}
Parameters for run 3 : 
 {'em': 'full', 'os': 20, 'mr': 0.2, 'dsn': 1, 'dsw': 0.4, 'fa': 1.0, 'me': 'added_u', 'sn': True}
Parameters for run 4 : 
 {'em': 'tune', 'os': 0, 'mr': 0.2, 'dsn': 1, 'dsw': 0.4, 'fa': 1.0, 'me': 'added_u', 'sn': True}


The parameters for each run have been set according to the strategy of scanning the grid space. All groups of runs can be executed as a next step.

###¬†Runs Execution

To execute the notebooks of the capstone project, functions of Python library family for notebooks handling like $\texttt{nbformat}$, $\texttt{nbparameterise}$, and $\texttt{nbpconvert}$ will be used.

In [3]:
#! pip install nbparameterise

The calculations of the notebooks can be done with the parameter specified by the list of dictionaries $\texttt{runtime}\_\texttt{param}\_\texttt{dict}$. The call of a notebook and its execution is implemented in the separate library [results_saving_funcs.py](./results_saving_funcs.py).

In [4]:
import os
import results_saving_funcs as rsf
import pandas as pd

path_results = './results'
path_goldstandard = './daten_goldstandard/'

#¬†Determine all relevant notebooks, omit Overview Summary and Appendixes
notebook = ! ls [1-9]_* | grep .ipynb

for run in range(len(runtime_param_dict_list)):
    print('\nRun id', run)
    rsf.run_notebooks(notebook, runtime_param_dict_list[run], run, path_results)

    # Save the resulting handover files for the run done right now
    os.rename(os.path.join(path_results, 'results.pkl'),
              os.path.join(path_results, 'results_run_' + str(run) + '.pkl'))
    os.rename(os.path.join(path_goldstandard, 'wrong_predictions.pkl'),
              os.path.join(path_goldstandard, 'wrong_predictions_run_' + str(run) + '.pkl'))
    # Assessment of run
    results = rsf.restore_dict_results(path_results, 'results_run_' + str(run) + '.pkl')

    results['results_best_model'].reset_index(drop=True, inplace=True)
    # Ranking metric of models : accuracy
    display(results['results_best_model'].sort_values(by=['accuracy'], ascending=False))

    for classifier in results['results_model_scores'].keys() :
        # Persist results per classifer for analysis
        results['results_model_scores'][classifier].to_csv(os.path.join(path_results,
                                                                        classifier + '_run_' + str(run) + '.csv'),
                                                           index=False)

    print('********\n')

print('Done with all runs of all notebooks.')


Run id 0
Executing notebook 1_DataAnalysis.ipynb
Executing notebook 2_GoldstandardDataPreparation.ipynb
Executing notebook 3_DataSynthesizing.ipynb
Executing notebook 4_FeatureMatrixGeneration.ipynb
Executing notebook 5_FeatureDiscussionDummyBaseline.ipynb
Executing notebook 6_DecisionTreeModel.ipynb
Executing notebook 7_SVCModel.ipynb
Executing notebook 8_NeuralNetwork.ipynb


Unnamed: 0,model,auc,accuracy,precision,recall,auc_log,accuracy_log,precision_log,recall_log
3,RandomForestClassifier,97.268752,99.885338,97.212544,94.576271,3.600412,6.770933,3.580041,2.914387
1,DecisionTreeClassifier,96.927346,99.871005,96.853147,93.898305,3.482628,6.65315,3.458767,2.796604
2,DecisionTreeClassifier_CV,97.762689,99.871005,95.27027,95.59322,3.799895,6.65315,3.051302,3.122026
6,NeuralNetwork,96.573825,99.832784,94.827586,93.220339,3.373726,6.393639,2.961831,2.691243
5,SVC_CV,97.072608,99.828006,93.602694,94.237288,3.531058,6.365468,2.749293,2.853762
4,SVC,96.22515,99.804118,93.493151,92.542373,3.27681,6.235415,2.732315,2.595933
0,DummyClassifier,50.152071,97.243323,1.712329,1.694915,0.696193,3.591144,0.017272,0.017094


********


Run id 1
Executing notebook 1_DataAnalysis.ipynb
Executing notebook 2_GoldstandardDataPreparation.ipynb
Executing notebook 3_DataSynthesizing.ipynb
Executing notebook 4_FeatureMatrixGeneration.ipynb
Executing notebook 5_FeatureDiscussionDummyBaseline.ipynb
Executing notebook 6_DecisionTreeModel.ipynb
Executing notebook 7_SVCModel.ipynb
Executing notebook 8_NeuralNetwork.ipynb


Unnamed: 0,model,auc,accuracy,precision,recall,auc_log,accuracy_log,precision_log,recall_log
2,DecisionTreeClassifier_CV,98.610146,99.894893,95.348837,97.288136,4.275972,6.857944,3.068053,3.607534
3,RandomForestClassifier,97.096838,99.875782,96.864111,94.237288,3.53937,6.69089,3.462258,2.853762
1,DecisionTreeClassifier,96.905539,99.828006,93.898305,93.898305,3.475557,6.365468,2.796604,2.796604
6,NeuralNetwork,95.893436,99.808896,94.425087,91.864407,3.192584,6.260107,2.886893,2.508922
5,SVC_CV,96.215459,99.785008,92.22973,92.542373,3.274245,6.142324,2.554865,2.595933
4,SVC,95.200932,99.76112,92.387543,90.508475,3.036749,6.036964,2.575384,2.354771
0,DummyClassifier,50.152071,97.243323,1.712329,1.694915,0.696193,3.591144,0.017272,0.017094


********


Run id 2
Executing notebook 1_DataAnalysis.ipynb
Executing notebook 2_GoldstandardDataPreparation.ipynb
Executing notebook 3_DataSynthesizing.ipynb
Executing notebook 4_FeatureMatrixGeneration.ipynb
Executing notebook 5_FeatureDiscussionDummyBaseline.ipynb
Executing notebook 6_DecisionTreeModel.ipynb
Executing notebook 7_SVCModel.ipynb
Executing notebook 8_NeuralNetwork.ipynb


Unnamed: 0,model,auc,accuracy,precision,recall,auc_log,accuracy_log,precision_log,recall_log
3,RandomForestClassifier,97.607735,99.894893,97.231834,95.254237,3.73293,6.857944,3.586985,3.047918
2,DecisionTreeClassifier_CV,97.762689,99.871005,95.27027,95.59322,3.799895,6.65315,3.051302,3.122026
1,DecisionTreeClassifier,96.581094,99.847117,95.818815,93.220339,3.37585,6.483251,3.174576,2.691243
6,NeuralNetwork,97.0823,99.847117,94.880546,94.237288,3.534374,6.483251,2.972122,2.853762
4,SVC,96.067774,99.823229,95.104895,92.20339,3.235964,6.338069,3.016934,2.551481
5,SVC_CV,95.898282,99.818451,95.087719,91.864407,3.193764,6.3114,3.013432,2.508922
0,DummyClassifier,50.152071,97.243323,1.712329,1.694915,0.696193,3.591144,0.017272,0.017094


********


Run id 3
Executing notebook 1_DataAnalysis.ipynb
Executing notebook 2_GoldstandardDataPreparation.ipynb
Executing notebook 3_DataSynthesizing.ipynb
Executing notebook 4_FeatureMatrixGeneration.ipynb
Executing notebook 5_FeatureDiscussionDummyBaseline.ipynb
Executing notebook 6_DecisionTreeModel.ipynb
Executing notebook 7_SVCModel.ipynb
Executing notebook 8_NeuralNetwork.ipynb


Unnamed: 0,model,auc,accuracy,precision,recall,auc_log,accuracy_log,precision_log,recall_log
3,RandomForestClassifier,99.939074,99.932488,99.858705,99.97022,7.403268,7.30062,6.562072,8.119101
2,DecisionTreeClassifier_CV,99.896056,99.891394,99.806648,99.918106,6.869076,6.825196,6.248415,7.1075
5,SVC_CV,99.884713,99.885523,99.828856,99.880881,6.7655,6.772552,6.370422,6.732806
1,DecisionTreeClassifier,99.874845,99.876717,99.821402,99.865992,6.683373,6.698444,6.327788,6.615023
6,NeuralNetwork,99.870947,99.876717,99.843657,99.843657,6.652699,6.698444,6.460873,6.460873
4,SVC,99.844118,99.847364,99.784194,99.828767,6.463826,6.48487,6.138546,6.369901
0,DummyClassifier,50.394349,52.741576,39.912281,39.294223,0.701065,0.749539,0.509365,0.499131


********


Run id 4
Executing notebook 1_DataAnalysis.ipynb
Executing notebook 2_GoldstandardDataPreparation.ipynb
Executing notebook 3_DataSynthesizing.ipynb
Executing notebook 4_FeatureMatrixGeneration.ipynb
Executing notebook 5_FeatureDiscussionDummyBaseline.ipynb
Executing notebook 6_DecisionTreeModel.ipynb
Executing notebook 7_SVCModel.ipynb
Executing notebook 8_NeuralNetwork.ipynb


Unnamed: 0,model,auc,accuracy,precision,recall,auc_log,accuracy_log,precision_log,recall_log
3,RandomForestClassifier,97.438244,99.890115,97.222222,94.915254,3.664477,6.813492,3.583519,2.978925
2,DecisionTreeClassifier_CV,97.762689,99.871005,95.27027,95.59322,3.799895,6.65315,3.051302,3.122026
1,DecisionTreeClassifier,97.070185,99.823229,93.288591,94.237288,3.530231,6.338069,2.701361,2.853762
5,SVC_CV,96.401911,99.823229,94.482759,92.881356,3.324767,6.338069,2.897292,2.642453
6,NeuralNetwork,95.898282,99.818451,95.087719,91.864407,3.193764,6.3114,3.013432,2.508922
4,SVC,95.886167,99.794563,93.448276,91.864407,3.190815,6.187786,2.725442,2.508922
0,DummyClassifier,50.152071,97.243323,1.712329,1.694915,0.696193,3.591144,0.017272,0.017094


********

Done with all runs of all notebooks.


The results for each run have been stored in specific files and will be analysed in the next section of this chapter.

## Assessment of Results

The ranking of the models is shown above for each run. As a next step, the results are to be discussed for each run group separately. The rankings' display will be repeated for the discussions. Goal of this first step of discussion is, to identify the best parameters of the grid search for each model.

###¬†Run with id 0

In [5]:
# Import in case of not having imported, yet
import results_saving_funcs as rsf

run = 0
path_results = './results'

print('\nRun id', run, 'with parameters\n', runtime_param_dict_list[run])

results = rsf.restore_dict_results(path_results, 'results_run_' + str(run) + '.pkl')

results['results_best_model'].set_index('model', inplace=True)
# Ranking metric according to chapter 6 : 1. accuracy, 2. roc auc
display(results['results_best_model'].sort_values(by=['accuracy', 'auc'], ascending=False).round(3))


Run id 0 with parameters
 {'em': 'full', 'os': 0, 'mr': 0.2, 'dsn': 1, 'dsw': 0.4, 'fa': 1.0, 'me': 'added_u', 'sn': True}


Unnamed: 0_level_0,auc,accuracy,precision,recall,auc_log,accuracy_log,precision_log,recall_log
model,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
RandomForestClassifier,97.269,99.885,97.213,94.576,3.6,6.771,3.58,2.914
DecisionTreeClassifier_CV,97.763,99.871,95.27,95.593,3.8,6.653,3.051,3.122
DecisionTreeClassifier,96.927,99.871,96.853,93.898,3.483,6.653,3.459,2.797
NeuralNetwork,96.574,99.833,94.828,93.22,3.374,6.394,2.962,2.691
SVC_CV,97.073,99.828,93.603,94.237,3.531,6.365,2.749,2.854
SVC,96.225,99.804,93.493,92.542,3.277,6.235,2.732,2.596
DummyClassifier,50.152,97.243,1.712,1.695,0.696,3.591,0.017,0.017


The ranking of the best models of run with id 0 can be seen above. The overall best accuracy of the models compared has the Random Forest Classifier. This accuracy value can be derived from the sum of false predicted records which shows a total of 24 wrong predictions and a ratio of 0.11% in the resulting notebook [Decision Tree Model](./results/6_DecisionTreeModel_run_0.ipynb) of the run, see subsection [Wrong Predictions](#Wrong-Predictions) below. This highest value of accuracy is not confirmed by the highest value in all metrics like precision and recall. The Decision Tree Classifier with cross-validation shows a higher recall value than the Random Forest Classifier, resulting even in a higher roc auc score. The performance metric values for the Neural Network may vary in the ranking above. Reason is the Keras library used for the implementation of the Neural Network in this capstone project. Reproduceability can be controlled in a scikit-learn implementation when setting a parameter $\texttt{random}\_\texttt{state}$ to a fixed value on instantiating a classifier object. A Keras implementation requires some more lines of code, see [[KeraRand](./A_References.ipynb#kerarand)]. This effort has not been taken in this capstone project. The statement on the Neural Network may differ from the picture above, therefore. Some runs have been observed with the following properties.
- There have been runs where the Neural Network has shown an higher precision score than the Random Forest Classifier which has not resulted in a higher roc auc, though.
- There have been runs where the Neural Network have ranked in between the Ensemble classifier family and the Support Vector Classifiers with the second highest recall score and the second highest roc auc.

As an overall picture, the Ensemble family classifiers build the group of classifiers with the best overall scoring values followed by chance by the Support Vector Classifier with cross-validation or by the Neural Network classifier. The models are close to each other in their accuracy with a value of 99.8% and roc auc with values higher than 96.2%. The Dummy Classifier gives security on being on the safe side of statistics.

Altogether, this gives a nice and consistent picture.

In [6]:
# If not defined, yet
import pandas as pd

results = rsf.restore_dict_results(path_results, 'results_run_' + str(run) + '.pkl')

# Unlimited number of columns allowed
pd.options.display.max_columns = None

for classifier in results['results_model_scores'].keys() :
    if classifier != 'DummyClassifier': # DummyClassifier has no results to be analysed
        # Show results
        print(f'\n{classifier}')
        display(results['results_model_scores'][classifier].sort_values(by=['accuracy_val'], ascending=False).head(20))


DecisionTreeClassifier


Unnamed: 0,class_weight,criterion,max_depth,accuracy_tr,accuracy_val,log_accuracy_tr,log_accuracy_val
22,,entropy,10.0,0.999537,0.999343,7.678147,7.32796
24,,entropy,14.0,0.999776,0.999343,8.404084,7.32796
23,,entropy,12.0,0.999642,0.999343,7.934081,7.32796
69,balanced,entropy,28.0,0.999985,0.999283,11.112134,7.240948
74,balanced,entropy,50.0,0.999985,0.999283,11.112134,7.240948
65,balanced,entropy,20.0,0.999985,0.999283,11.112134,7.240948
66,balanced,entropy,22.0,0.999985,0.999283,11.112134,7.240948
67,balanced,entropy,24.0,0.999985,0.999283,11.112134,7.240948
68,balanced,entropy,26.0,0.999985,0.999283,11.112134,7.240948
73,balanced,entropy,45.0,0.999985,0.999283,11.112134,7.240948



DecisionTreeClassifier_CV


Unnamed: 0,class_weight,criterion,max_depth,accuracy_val,std_accuracy_val,log_accuracy_val
52,balanced,gini,35.0,0.998722,0.000317,6.662455
56,balanced,gini,,0.998722,0.000317,6.662455
55,balanced,gini,50.0,0.998722,0.000317,6.662455
54,balanced,gini,45.0,0.998722,0.000317,6.662455
53,balanced,gini,40.0,0.998722,0.000317,6.662455
51,balanced,gini,30.0,0.998722,0.000317,6.662455
50,balanced,gini,28.0,0.998722,0.000317,6.662455
49,balanced,gini,26.0,0.998722,0.000317,6.662455
48,balanced,gini,24.0,0.99871,0.000317,6.653152
47,balanced,gini,22.0,0.998686,0.000307,6.634803



RandomForestClassifier


Unnamed: 0,class_weight,max_depth,n_estimators,accuracy_tr,accuracy_val,log_accuracy_tr,log_accuracy_val
41,,26.0,16,0.999955,0.999582,10.013522,7.779945
51,,,16,0.999955,0.999582,10.013522,7.779945
46,,28.0,16,0.999955,0.999582,10.013522,7.779945
93,balanced,24.0,64,0.999985,0.999522,11.112134,7.646413
31,,22.0,16,0.999955,0.999522,10.013522,7.646413
17,,16.0,32,0.99991,0.999522,9.320375,7.646413
81,balanced,20.0,16,0.99994,0.999522,9.72584,7.646413
86,balanced,22.0,16,0.999955,0.999522,10.013522,7.646413
85,balanced,22.0,8,0.999955,0.999522,10.013522,7.646413
95,balanced,26.0,8,0.999955,0.999463,10.013522,7.52863



SVC


Unnamed: 0,C,class_weight,degree,gamma,kernel,accuracy_tr,accuracy_val,log_accuracy_tr,log_accuracy_val
43,10.0,,4,0.1,poly,0.999104,0.998985,7.01779,6.892642
23,1.0,,3,1.0,poly,0.999522,0.998925,7.646399,6.835483
5,0.1,,3,1.0,poly,0.999209,0.998806,7.141843,6.730123
32,1.0,balanced,3,1.0,poly,0.999164,0.998686,7.086783,6.634813
8,0.1,,4,1.0,poly,0.999716,0.998686,8.167695,6.634813
50,10.0,balanced,3,1.0,poly,0.999612,0.998626,7.854038,6.590361
41,10.0,,3,1.0,poly,0.999746,0.998567,8.278921,6.547801
40,10.0,,3,0.1,poly,0.998746,0.998567,6.681318,6.547801
26,1.0,,4,1.0,poly,0.999821,0.998507,8.627228,6.506979
17,0.1,balanced,4,1.0,poly,0.999582,0.998507,7.77993,6.506979



SVC_CV


Unnamed: 0,C,class_weight,degree,gamma,kernel,accuracy_val,std_accuracy_val,log_accuracy_val
23,1.0,,3,1.0,poly,0.998232,0.000335,6.338066
5,0.1,,3,1.0,poly,0.998208,0.000275,6.324645
25,1.0,,4,0.1,poly,0.998137,0.000317,6.285428
43,10.0,,4,0.1,poly,0.998125,0.000222,6.279036
38,10.0,,2,1.0,poly,0.998101,0.000357,6.266378
40,10.0,,3,0.1,poly,0.998089,0.000368,6.260109
20,1.0,,2,1.0,poly,0.998089,0.000223,6.260108
37,10.0,,2,0.1,poly,0.998053,0.000219,6.241532
2,0.1,,2,1.0,poly,0.998053,0.000219,6.241532
22,1.0,,3,0.1,poly,0.998029,0.0002,6.229337



NeuralNetwork


Unnamed: 0,class_weight,dropout_rate,l2_alpha,number_of_hidden1_layers,number_of_hidden2_layers,sgd_learnrate,accuracy_tr,accuracy_val,log_accuracy_tr,log_accuracy_val
46,"[0.507135415404744, 35.536502546689306]",0.0,0.0,40,80,0.001,0.998777,0.998194,6.706377,6.316682
10,,0.0,0.0,40,80,0.001,0.998765,0.998089,6.696678,6.260106
40,"[0.507135415404744, 35.536502546689306]",0.0,0.0,20,80,0.001,0.998693,0.997974,6.640286,6.201846
22,,0.1,0.0,40,80,0.001,0.998507,0.997974,6.506983,6.201834
58,"[0.507135415404744, 35.536502546689306]",0.1,0.0,40,80,0.001,0.998457,0.997965,6.47392,6.197137
16,,0.1,0.0,20,80,0.001,0.99818,0.997965,6.308805,6.197137
20,,0.1,0.0,40,40,0.001,0.99833,0.997955,6.395073,6.192451
56,"[0.507135415404744, 35.536502546689306]",0.1,0.0,40,40,0.001,0.998311,0.997927,6.383715,6.178528
7,,0.0,0.0,40,0,0.01,0.998335,0.997917,6.397932,6.173933
34,,0.2,0.0,40,80,0.001,0.998268,0.997917,6.358552,6.173933


Looking at the detailed ranking per classifier model above has the goal to find out the best grid parameter set for each model. Be aware that the accuracy values of the detailed runs' data differ from the accuracy values of the best models of the comparison ranking. Reason is the data set used for the ranking. For the models' comparison ranking, the validation part of the training data split has been used. For the grid parameter comparison results, the test data part of the full data set has been used.

| model | parameters assessment |
| :---- | :-------------------- |
| RandomForestClassifier | A tendency for unbalanced $\texttt{class}\_\texttt{weight}$ can be detected for the best estimator. High values of $\texttt{max}\_\texttt{depth}$ in a range around 20 are preferred which may confirm expectations as the model has been trained with a total of 20 features. Be aware though that the validation accuracy score of the first three models have identical values. Even the group of models ranked higher than 3 have a deviation in score $\texttt{accuracy}\_\texttt{val}$ of 0.006% which seems to be a statistically questionable significance. The lowest $\texttt{max}\_\texttt{depth}$ in the two groups of top rank is 20. For the parameter $\texttt{n}\_\texttt{estimators}$, values higher or equal to 16 show the best performance. |
| DecisionTreeClassifier_CV | Although the run for this classifier has been done with $\texttt{class}\_\texttt{weight}=\texttt{balanced}$ and $\texttt{None}$, balancing generates the best results. The gini measure and not entropy generates the overall best measure, when looking at the accuracy value. For the ranking of this classifier, it is noticeable that a $\texttt{max}\_\texttt{depth}$ value of 26 is the lowest value with the highest accuracy. For all $\texttt{max}\_\texttt{depth} \ge 26$, the accuracy remains constant. Remarkable is a view on the standard deviation of $\texttt{accuracy}\_\texttt{val}$. Comparing the accuracy scores with overlapping standard deviation intervals reveals that the models' statistical distinction is a hard job to do. |
| DecisionTreeClassifier | As the Decision Tree Classifier with cross-validation is stronger in its statistical statement, the classifier found without cross-validation will not be discussed deeper, here. |
| SVC_CV | Polynomial kernels of a degree of 3 or eventually 4 generate the best accuracy results on the validation data. A $\gamma$ value of around 1.0 or lower and a $\texttt{C}$ value of around 1.0 produce the best models. Again, looking at the accuracy scores of the validation data under additional consideration of standard deviation produces a weak picture for identifying the best parameters. |
| SVC | As the Support Vector Classifier with cross-validation is stronger in its statistical statement, the model found without cross-validation will not be discussed deeper, here.  |
| NeuralNetwork | The Neural Network fits its best classifiers with a balanced training data set. This observation can be confirmed with the experience from a big number of old runs, documented in file [runs_summary.csv](./documentation/runs_summary.csv). A low dropout rate with a maximum of 0.1 seems to be beneficial for the model. As can be seen in chapter [Neural Network Model](./8_NeuralNetwork.ipynb) the models need an extraordinary long training phase. Their velocity of converging is very low. This might be a reason why dropout does not seem to play an important role for training the network. As for regularization a value of 0 has always proven as best value for all models of all runs done. Therefore, no higher value of $\texttt{l2}\_\texttt{alpha}$ has been set in the grid search. Surprisingly, the highest numbers of neurons result in higher accuracy values. Explicitly, this is also true for a second hidden layer. This observation, together with the observation of a low learning rate of around 0.001 may be anonther reason, for the described slow stabilization velocity of the Neural Network models. |

###¬†Run with id 1

For the assessment of the runs with an id of 1 and higher, the same structure as in the previous subsection will be chosen. Mainly differences between the results of the runs will be pointed out.

In [7]:
run = 1

print('\nRun id', run, 'with parameters\n', runtime_param_dict_list[run])

results = rsf.restore_dict_results(path_results, 'results_run_' + str(run) + '.pkl')

results['results_best_model'].set_index('model', inplace=True)
# Ranking metric according to chapter 6 : 1. accuracy, 2. roc auc
display(results['results_best_model'].sort_values(by=['accuracy', 'auc'], ascending=False).round(3))


Run id 1 with parameters
 {'em': 'restricted', 'os': 0, 'mr': 0.2, 'dsn': 1, 'dsw': 0.4, 'fa': 1.0, 'me': 'xor', 'sn': False}


Unnamed: 0_level_0,auc,accuracy,precision,recall,auc_log,accuracy_log,precision_log,recall_log
model,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
DecisionTreeClassifier_CV,98.61,99.895,95.349,97.288,4.276,6.858,3.068,3.608
RandomForestClassifier,97.097,99.876,96.864,94.237,3.539,6.691,3.462,2.854
DecisionTreeClassifier,96.906,99.828,93.898,93.898,3.476,6.365,2.797,2.797
NeuralNetwork,95.893,99.809,94.425,91.864,3.193,6.26,2.887,2.509
SVC_CV,96.215,99.785,92.23,92.542,3.274,6.142,2.555,2.596
SVC,95.201,99.761,92.388,90.508,3.037,6.037,2.575,2.355
DummyClassifier,50.152,97.243,1.712,1.695,0.696,3.591,0.017,0.017


Comparing the accuracy scores of this run with the values of the first run, the question might come up why parameters $\texttt{mode}\_\texttt{exactDate}=\texttt{xor}$ and $\texttt{strip}\_\texttt{number}\_\texttt{digits}=\texttt{False}$ have been chosen for calculating the best of best models of the capstone project. The decision had been hard to take but, again, it is a summary of observations of model performances documented in file [runs_summary.csv](./documentation/runs_summary.csv) that turned the balance for the choice made.

The best accuracy of the models compared has the Decision Tree Classifier with cross-validation instead of the Random Forest Classifier. The family of Ensemble classifiers rank topmost, here followed by the Neural Network classifier. The Support Vector Classifier is ranked lowest although having a higher roc auc value than the Neural Network.

It can be mentioned that the results so far document a tendency for the general ranking of the methods used but the differences in the specific accuracy values of the validation data set show an arbitrariness of the results up to a certain degree.

In [8]:
results = rsf.restore_dict_results(path_results, 'results_run_' + str(run) + '.pkl')

# Unlimited number of columns allowed
pd.options.display.max_columns = None

for classifier in results['results_model_scores'].keys() :
    if classifier != 'DummyClassifier': # DummyClassifier has no results to be analysed
        # Show results
        print(f'\n{classifier}')
        display(results['results_model_scores'][classifier].sort_values(by=['accuracy_val'], ascending=False).head(20))


DecisionTreeClassifier


Unnamed: 0,class_weight,criterion,max_depth,accuracy_tr,accuracy_val,log_accuracy_tr,log_accuracy_val
19,balanced,entropy,18.0,0.999895,0.999283,9.166224,7.240948
21,balanced,entropy,22.0,0.999955,0.999224,10.013522,7.160906
23,balanced,entropy,,0.999985,0.999164,11.112134,7.086798
1,,gini,18.0,0.99997,0.999164,10.418987,7.086798
22,balanced,entropy,24.0,0.999985,0.999164,11.112134,7.086798
20,balanced,entropy,20.0,0.99994,0.999164,9.72584,7.086798
18,balanced,entropy,16.0,0.999701,0.999044,8.116402,6.953266
0,,gini,16.0,0.999895,0.999044,9.166224,6.953266
4,,gini,24.0,0.999985,0.998985,11.112134,6.892642
3,,gini,22.0,0.999985,0.998985,11.112134,6.892642



DecisionTreeClassifier_CV


Unnamed: 0,class_weight,criterion,max_depth,accuracy_val,std_accuracy_val,log_accuracy_val
23,balanced,entropy,,0.998614,0.000225,6.581692
20,balanced,entropy,20.0,0.998555,0.000246,6.539491
22,balanced,entropy,24.0,0.998543,0.000279,6.531262
21,balanced,entropy,22.0,0.998519,0.000286,6.515001
0,,gini,16.0,0.998471,0.000296,6.483252
17,balanced,gini,,0.998459,0.000312,6.475471
1,,gini,18.0,0.998447,0.000381,6.467749
8,,entropy,20.0,0.998447,0.000223,6.467747
19,balanced,entropy,18.0,0.998435,0.000308,6.460086
9,,entropy,22.0,0.998435,0.000231,6.460085



RandomForestClassifier


Unnamed: 0,class_weight,max_depth,n_estimators,accuracy_tr,accuracy_val,log_accuracy_tr,log_accuracy_val
0,,18.0,128,0.99994,0.999522,9.72584,7.646413
2,,22.0,128,0.999985,0.999403,11.112134,7.42327
3,,24.0,128,0.999985,0.999403,11.112134,7.42327
4,,,128,0.999985,0.999403,11.112134,7.42327
1,,20.0,128,0.999955,0.999283,10.013522,7.240948



SVC


Unnamed: 0,C,class_weight,degree,gamma,kernel,accuracy_tr,accuracy_val,log_accuracy_tr,log_accuracy_val
0,1.0,,3,1.0,poly,0.999403,0.998686,7.423255,6.634813



SVC_CV


Unnamed: 0,C,class_weight,degree,gamma,kernel,accuracy_val,std_accuracy_val,log_accuracy_val
0,1.0,,3,1.0,poly,0.998125,0.000349,6.279038



NeuralNetwork


Unnamed: 0,class_weight,dropout_rate,l2_alpha,number_of_hidden1_layers,number_of_hidden2_layers,sgd_learnrate,accuracy_tr,accuracy_val,log_accuracy_tr,log_accuracy_val
1,,0.1,0.0,40,80,0.001,0.998227,0.997974,6.335351,6.20184
0,,0.1,0.0,40,40,0.001,0.998125,0.997907,6.278997,6.169354
3,"[0.507135415404744, 35.536502546689306]",0.1,0.0,40,80,0.001,0.99817,0.997869,6.303547,6.151249
2,"[0.507135415404744, 35.536502546689306]",0.1,0.0,40,40,0.001,0.998127,0.99785,6.280301,6.142327


The global parameters of this run loose their importance due to the decision of using the parameters of run with id 0. Additionally, the $\texttt{execution}\_\texttt{mode}=\texttt{restricted}$ has scanned a limited grid space. The detailed discussion of the results above is omitted, therefore.

###¬†Run with id 2

In [9]:
run = 2

print('\nRun id', run, 'with parameters\n', runtime_param_dict_list[run])

results = rsf.restore_dict_results(path_results, 'results_run_' + str(run) + '.pkl')

results['results_best_model'].set_index('model', inplace=True)
# Ranking metric according to chapter 6 : 1. accuracy, 2. roc auc
display(results['results_best_model'].sort_values(by=['accuracy', 'auc'], ascending=False).round(3))


Run id 2 with parameters
 {'em': 'restricted', 'os': 0, 'mr': 0.2, 'dsn': 1, 'dsw': 0.4, 'fa': 0.1, 'me': 'added_u', 'sn': True}


Unnamed: 0_level_0,auc,accuracy,precision,recall,auc_log,accuracy_log,precision_log,recall_log
model,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
RandomForestClassifier,97.608,99.895,97.232,95.254,3.733,6.858,3.587,3.048
DecisionTreeClassifier_CV,97.763,99.871,95.27,95.593,3.8,6.653,3.051,3.122
NeuralNetwork,97.082,99.847,94.881,94.237,3.534,6.483,2.972,2.854
DecisionTreeClassifier,96.581,99.847,95.819,93.22,3.376,6.483,3.175,2.691
SVC,96.068,99.823,95.105,92.203,3.236,6.338,3.017,2.551
SVC_CV,95.898,99.818,95.088,91.864,3.194,6.311,3.013,2.509
DummyClassifier,50.152,97.243,1.712,1.695,0.696,3.591,0.017,0.017


With this run the Ensemble family ranks topmost as with the runs so far. The Random Forest Classifier shows the highest accuracy score with a value of 99.895% which is 0.01 percentage points higher than the accuracy of the same classifier in run with id 0. As can be checked in the result notebook [Decision Tree Model](./results/6_DecisionTreeModel_run_0.ipynb) of run 0 and in the result notebook [Decision Tree Model](./results/6_DecisionTreeModel_run_2.ipynb) of this run, this difference originates from a total of two false negative records that have been predicted correctly by the model of this run. In subsection [Wrong Predictions](#Wrong-Predictions) below, the records' index comparison reveals that the records with index \[103609, 103618, 104110, 104493\] have been wrongly predicted as false predicted uniques exclusively with run 0 while the records with index \[103813, 104493\] have been wrongly predicted as false predicted uniques exclusively with run 2. The set of false predicted duplicates have remained the same set for both runs. The Random Forest Classifier results in a higher roc auc score due to the smaller total of false negative records.

This run had been motivated by the assumption that a small value for parameter $\texttt{factor}$ would result in a better classifier of the Neural Network, compared to a parameter $\texttt{factor}=1$. The accuracy and recall scores confirm the expectation, leading to a ranking of the Neural Network higher than the Support Vector Classifier with cross-validation. The resulting roc auc score is remarkably higher for this run compared to run with id 0. On the other hand, the precision score is lower for this run. Overall, it is hard to decide a distinct effect of parameter $\texttt{factor}$ based on these score values.

Looking at the result of this run, might arise the question, why the global parameters of this group have been chosen for a subsidiary run. The answer is the same as above for [Run with id 1](#Run-with-id-1), see file [runs_summary.csv](./documentation/runs_summary.csv).

In [10]:
results = rsf.restore_dict_results(path_results, 'results_run_' + str(run) + '.pkl')

# Unlimited number of columns allowed
pd.options.display.max_columns = None

for classifier in results['results_model_scores'].keys() :
    if classifier != 'DummyClassifier': # DummyClassifier has no results to be analysed
        # Show results
        print(f'\n{classifier}')
        display(results['results_model_scores'][classifier].sort_values(by=['accuracy_val'], ascending=False).head(20))


DecisionTreeClassifier


Unnamed: 0,class_weight,criterion,max_depth,accuracy_tr,accuracy_val,log_accuracy_tr,log_accuracy_val
23,balanced,entropy,,0.999985,0.999283,11.112134,7.240948
22,balanced,entropy,24.0,0.999985,0.999283,11.112134,7.240948
21,balanced,entropy,22.0,0.999985,0.999283,11.112134,7.240948
20,balanced,entropy,20.0,0.999985,0.999283,11.112134,7.240948
6,,entropy,16.0,0.999866,0.999283,8.91491,7.240948
16,balanced,gini,24.0,0.999985,0.999224,11.112134,7.160906
7,,entropy,18.0,0.999925,0.999224,9.502697,7.160906
17,balanced,gini,,0.999985,0.999224,11.112134,7.160906
15,balanced,gini,22.0,0.999985,0.999224,11.112134,7.160906
14,balanced,gini,20.0,0.99997,0.999164,10.418987,7.086798



DecisionTreeClassifier_CV


Unnamed: 0,class_weight,criterion,max_depth,accuracy_val,std_accuracy_val,log_accuracy_val
17,balanced,gini,,0.998722,0.000317,6.662455
16,balanced,gini,24.0,0.99871,0.000317,6.653152
15,balanced,gini,22.0,0.998686,0.000307,6.634803
23,balanced,entropy,,0.998615,0.000371,6.581695
21,balanced,entropy,22.0,0.998615,0.000371,6.581695
22,balanced,entropy,24.0,0.998615,0.000371,6.581695
20,balanced,entropy,20.0,0.998615,0.000371,6.581695
14,balanced,gini,20.0,0.998614,0.000349,6.581694
8,,entropy,20.0,0.998603,0.000502,6.573112
19,balanced,entropy,18.0,0.998603,0.000274,6.573109



RandomForestClassifier


Unnamed: 0,class_weight,max_depth,n_estimators,accuracy_tr,accuracy_val,log_accuracy_tr,log_accuracy_val
1,,20.0,128,0.99997,0.999343,10.418987,7.32796
2,,22.0,128,0.999985,0.999343,11.112134,7.32796
3,,24.0,128,0.999985,0.999343,11.112134,7.32796
4,,,128,0.999985,0.999343,11.112134,7.32796
0,,18.0,128,0.999955,0.999283,10.013522,7.240948



SVC


Unnamed: 0,C,class_weight,degree,gamma,kernel,accuracy_tr,accuracy_val,log_accuracy_tr,log_accuracy_val
0,1.0,,3,1.0,poly,0.998985,0.998626,6.892627,6.590361



SVC_CV


Unnamed: 0,C,class_weight,degree,gamma,kernel,accuracy_val,std_accuracy_val,log_accuracy_val
0,1.0,,3,1.0,poly,0.998077,0.000298,6.253878



NeuralNetwork


Unnamed: 0,class_weight,dropout_rate,l2_alpha,number_of_hidden1_layers,number_of_hidden2_layers,sgd_learnrate,accuracy_tr,accuracy_val,log_accuracy_tr,log_accuracy_val
1,,0.1,0.0,40,80,0.001,0.998414,0.998127,6.446452,6.280313
2,"[0.507135415404744, 35.536502546689306]",0.1,0.0,40,40,0.001,0.99838,0.997898,6.425627,6.164801
0,,0.1,0.0,40,40,0.001,0.998321,0.997888,6.389377,6.160263
3,"[0.507135415404744, 35.536502546689306]",0.1,0.0,40,80,0.001,0.998378,0.997716,6.424156,6.081967


The global parameters of this run loose their importance due to the decision of using the parameters of run with id 0 for fine tuning. The detailed discussion of the results above is omitted.

###¬†Run with id 3

This run has been done with upsampling records of class duplicate with the help of synthetic data, see the description of chapter [Data Synthesizing](./3_DataSynthesizing.ipynb). The effect of the chosen algorithm for synthesizing the data is shown in the interesting result below.

In [11]:
run = 3

print('\nRun id', run, 'with parameters\n', runtime_param_dict_list[run])

results = rsf.restore_dict_results(path_results, 'results_run_' + str(run) + '.pkl')

results['results_best_model'].set_index('model', inplace=True)
# Ranking metric according to chapter 6 : 1. accuracy, 2. roc auc
display(results['results_best_model'].sort_values(by=['accuracy', 'auc'], ascending=False).round(3))


Run id 3 with parameters
 {'em': 'full', 'os': 20, 'mr': 0.2, 'dsn': 1, 'dsw': 0.4, 'fa': 1.0, 'me': 'added_u', 'sn': True}


Unnamed: 0_level_0,auc,accuracy,precision,recall,auc_log,accuracy_log,precision_log,recall_log
model,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
RandomForestClassifier,99.939,99.932,99.859,99.97,7.403,7.301,6.562,8.119
DecisionTreeClassifier_CV,99.896,99.891,99.807,99.918,6.869,6.825,6.248,7.107
SVC_CV,99.885,99.886,99.829,99.881,6.765,6.773,6.37,6.733
DecisionTreeClassifier,99.875,99.877,99.821,99.866,6.683,6.698,6.328,6.615
NeuralNetwork,99.871,99.877,99.844,99.844,6.653,6.698,6.461,6.461
SVC,99.844,99.847,99.784,99.829,6.464,6.485,6.139,6.37
DummyClassifier,50.394,52.742,39.912,39.294,0.701,0.75,0.509,0.499


The synthetic data, generated in chapter [Data Synthesizing](./3_DataSynthesizing.ipynb), lead to remarkably higher scores all over. Mainly precision and recall scores reach values the other models cannot reach. Be aware explicitly that not only the training but also the performance testing in this run has been done with the help of synthetic data. Therefore, the excellent performance of the models of this run come along with the suspicion that the performance scores would not be confirmed when testing the models' performances of this run with Swissbib's original data without synthesized supplement. The answer to this question remains open.

The total of wrong predictions for the best classifier of Random Forest for run 0 is 24, while the total of wrong predictions for the best classifier of Random Forest for this run is 28, compare subsection [Wrong Predictions](#Wrong-Predictions) below. The total of records in the testing data set is 20,931 for run 0 and 34,068 for the run of this subsection. The ratio of false predicted records for run 0 is lower with a value of 0.11% compared to the ratio of false predictions for run 3 with a value of 0.08%. This observation is in line with the high roc auc values of this subsection and demonstrates that the high score set is not only due to a higher amount of data records used for testing the performance.

In [12]:
results = rsf.restore_dict_results(path_results, 'results_run_' + str(run) + '.pkl')

# Unlimited number of columns allowed
pd.options.display.max_columns = None

for classifier in results['results_model_scores'].keys() :
    if classifier != 'DummyClassifier': # DummyClassifier has no results to be analysed
        # Show results
        print(f'\n{classifier}')
        display(results['results_model_scores'][classifier].sort_values(by=['accuracy_val'], ascending=False).head(20))


DecisionTreeClassifier


Unnamed: 0,class_weight,criterion,max_depth,accuracy_tr,accuracy_val,log_accuracy_tr,log_accuracy_val
26,,entropy,18.0,0.999963,0.998532,10.212965,6.524113
64,balanced,entropy,18.0,0.999982,0.998532,10.906112,6.524113
25,,entropy,16.0,0.999917,0.998496,9.402035,6.49942
23,,entropy,12.0,0.999716,0.998459,8.165272,6.475323
63,balanced,entropy,16.0,0.999936,0.998459,9.653349,6.475323
32,,entropy,30.0,0.999991,0.998422,11.599259,6.451792
28,,entropy,22.0,0.999991,0.998422,11.599259,6.451792
27,,entropy,20.0,0.999991,0.998422,11.599259,6.451792
35,,entropy,45.0,0.999991,0.998422,11.599259,6.451792
34,,entropy,40.0,0.999991,0.998422,11.599259,6.451792



DecisionTreeClassifier_CV


Unnamed: 0,class_weight,criterion,max_depth,accuracy_val,std_accuracy_val,log_accuracy_val
64,balanced,entropy,18.0,0.99876,0.000202,6.692509
62,balanced,entropy,14.0,0.998723,0.000147,6.663354
75,balanced,entropy,,0.998723,0.000234,6.663353
74,balanced,entropy,50.0,0.998723,0.000234,6.663353
72,balanced,entropy,40.0,0.998723,0.000234,6.663353
71,balanced,entropy,35.0,0.998723,0.000234,6.663353
70,balanced,entropy,30.0,0.998723,0.000234,6.663353
69,balanced,entropy,28.0,0.998723,0.000234,6.663353
68,balanced,entropy,26.0,0.998723,0.000234,6.663353
67,balanced,entropy,24.0,0.998723,0.000234,6.663353



RandomForestClassifier


Unnamed: 0,class_weight,max_depth,n_estimators,accuracy_tr,accuracy_val,log_accuracy_tr,log_accuracy_val
43,,26.0,64,0.999991,0.999303,11.599259,7.268553
38,,24.0,64,0.999991,0.999266,11.599259,7.21726
48,,28.0,64,0.999991,0.999266,11.599259,7.21726
94,balanced,24.0,128,0.999991,0.999266,11.599259,7.21726
53,,,64,0.999991,0.999266,11.599259,7.21726
23,,18.0,64,0.999972,0.999266,10.500647,7.21726
84,balanced,20.0,128,0.999991,0.999229,11.599259,7.16847
33,,22.0,64,0.999991,0.999229,11.599259,7.16847
89,balanced,22.0,128,0.999991,0.999229,11.599259,7.16847
83,balanced,20.0,64,0.999991,0.999229,11.599259,7.16847



SVC


Unnamed: 0,C,class_weight,degree,gamma,kernel,accuracy_tr,accuracy_val,log_accuracy_tr,log_accuracy_val
5,0.1,,3,1.0,poly,0.999404,0.998496,7.424872,6.49942
14,0.1,balanced,3,1.0,poly,0.999376,0.998422,7.379751,6.451792
52,10.0,balanced,4,0.1,poly,0.999303,0.998422,7.268526,6.451792
43,10.0,,4,0.1,poly,0.99934,0.998386,7.322593,6.428803
34,1.0,balanced,4,0.1,poly,0.99889,0.998386,6.803469,6.428803
49,10.0,balanced,3,0.1,poly,0.999064,0.998239,6.974286,6.341791
2,0.1,,2,1.0,poly,0.998688,0.998239,6.636414,6.341791
37,10.0,,2,0.1,poly,0.998688,0.998239,6.636414,6.341791
29,1.0,balanced,2,1.0,poly,0.998954,0.998239,6.863061,6.341791
25,1.0,,4,0.1,poly,0.998817,0.998202,6.739447,6.321172



SVC_CV


Unnamed: 0,C,class_weight,degree,gamma,kernel,accuracy_val,std_accuracy_val,log_accuracy_val
52,10.0,balanced,4,0.1,poly,0.998855,0.000126,6.772553
5,0.1,,3,1.0,poly,0.998826,0.000184,6.747233
43,10.0,,4,0.1,poly,0.998819,0.000163,6.741004
14,0.1,balanced,3,1.0,poly,0.998811,0.000142,6.734812
49,10.0,balanced,3,0.1,poly,0.998723,0.000134,6.663352
40,10.0,,3,0.1,poly,0.998672,7.8e-05,6.623911
38,10.0,,2,1.0,poly,0.998635,0.000107,6.596662
20,1.0,,2,1.0,poly,0.99862,0.000251,6.585968
47,10.0,balanced,2,1.0,poly,0.998613,0.00013,6.580662
25,1.0,,4,0.1,poly,0.998576,0.000126,6.554551



NeuralNetwork


Unnamed: 0,class_weight,dropout_rate,l2_alpha,number_of_hidden1_layers,number_of_hidden2_layers,sgd_learnrate,accuracy_tr,accuracy_val,log_accuracy_tr,log_accuracy_val
10,,0.0,0.0,40,80,0.001,0.998871,0.998632,6.786818,6.594511
44,"[0.8254306689603372, 1.2682127833823476]",0.0,0.0,40,40,0.001,0.998776,0.998503,6.705695,6.504296
11,,0.0,0.0,40,80,0.01,0.998667,0.998415,6.620593,6.447129
32,,0.2,0.0,40,40,0.001,0.998328,0.998409,6.393967,6.443428
46,"[0.8254306689603372, 1.2682127833823476]",0.0,0.0,40,80,0.001,0.998939,0.998391,6.84841,6.432429
47,"[0.8254306689603372, 1.2682127833823476]",0.0,0.0,40,80,0.01,0.998723,0.99838,6.663359,6.425148
58,"[0.8254306689603372, 1.2682127833823476]",0.1,0.0,40,80,0.001,0.998504,0.99835,6.505268,6.40719
38,"[0.8254306689603372, 1.2682127833823476]",0.0,0.0,20,40,0.001,0.99867,0.998333,6.622787,6.39658
20,,0.1,0.0,40,40,0.001,0.998515,0.998327,6.512187,6.393061
70,"[0.8254306689603372, 1.2682127833823476]",0.2,0.0,40,80,0.001,0.998258,0.998327,6.352684,6.393061


The run of this subsection had been motivated by the desire for more training data. This subsection shows that the models are influenced specifically by the specific implementation chosen for this kind of data upsampling. Due to this observation, the results of this subsection will be rejected and the above listing will not be analysed deeper.

###¬†Run with id 4

The last group of runs searches for the best parameters on the grid of each model. Several runs have been done with Swissbib's full goldstandard data that have been persisted in the repository of the project [[ProjRepo](./A_References.ipynb#project_repo)] with the goal to find the set of overall best parameters for each model, see [runs_summary.csv](./documentation/runs_summary.csv). Here, the runs done will be reproduced with the downsampled goldstandard data on the best parameters found.

In [13]:
run = 4

print('\nRun id', run, 'with parameters\n', runtime_param_dict_list[run])

results = rsf.restore_dict_results(path_results, 'results_run_' + str(run) + '.pkl')

results['results_best_model'].set_index('model', inplace=True)
# Ranking metric according to chapter 6 : 1. accuracy, 2. roc auc
display(results['results_best_model'].sort_values(by=['accuracy', 'auc'], ascending=False).round(3))


Run id 4 with parameters
 {'em': 'tune', 'os': 0, 'mr': 0.2, 'dsn': 1, 'dsw': 0.4, 'fa': 1.0, 'me': 'added_u', 'sn': True}


Unnamed: 0_level_0,auc,accuracy,precision,recall,auc_log,accuracy_log,precision_log,recall_log
model,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
RandomForestClassifier,97.438,99.89,97.222,94.915,3.664,6.813,3.584,2.979
DecisionTreeClassifier_CV,97.763,99.871,95.27,95.593,3.8,6.653,3.051,3.122
DecisionTreeClassifier,97.07,99.823,93.289,94.237,3.53,6.338,2.701,2.854
SVC_CV,96.402,99.823,94.483,92.881,3.325,6.338,2.897,2.642
NeuralNetwork,95.898,99.818,95.088,91.864,3.194,6.311,3.013,2.509
SVC,95.886,99.795,93.448,91.864,3.191,6.188,2.725,2.509
DummyClassifier,50.152,97.243,1.712,1.695,0.696,3.591,0.017,0.017


After fully tuning all models separately, the Ensemble family has remained on top of all models. Explicitly the Random Forest Classifier could be tuned to remain constantly the best classifier over all. The tuning of the Neural Network has moved this classifier into close vicinity of the Ensemble family. The Support Vector Classifiers have moved down to the last ranked models. This ranking is due to the comparison of the accuracy. When looking at the roc auc score, the Support Vector Classifier with cross-validation may exhibit a higher score than the Neural Network. This result has some randomness as for [[KeraRand](./A_References.ipynb#kerarand)], see above.

In [14]:
results = rsf.restore_dict_results(path_results, 'results_run_' + str(run) + '.pkl')

# Unlimited number of columns allowed
pd.options.display.max_columns = None

for classifier in results['results_model_scores'].keys() :
    if classifier != 'DummyClassifier': # DummyClassifier has no results to be analysed
        # Show results
        print(f'\n{classifier}')
        display(results['results_model_scores'][classifier].sort_values(by=['accuracy_val'], ascending=False).head(20))


DecisionTreeClassifier


Unnamed: 0,class_weight,criterion,max_depth,accuracy_tr,accuracy_val,log_accuracy_tr,log_accuracy_val
51,balanced,entropy,17.0,0.99991,0.999343,9.320375,7.32796
58,balanced,entropy,24.0,0.999985,0.999283,11.112134,7.240948
68,balanced,entropy,50.0,0.999985,0.999283,11.112134,7.240948
65,balanced,entropy,35.0,0.999985,0.999283,11.112134,7.240948
67,balanced,entropy,45.0,0.999985,0.999283,11.112134,7.240948
54,balanced,entropy,20.0,0.999985,0.999283,11.112134,7.240948
55,balanced,entropy,21.0,0.999985,0.999283,11.112134,7.240948
56,balanced,entropy,22.0,0.999985,0.999283,11.112134,7.240948
57,balanced,entropy,23.0,0.999985,0.999283,11.112134,7.240948
69,balanced,entropy,,0.999985,0.999283,11.112134,7.240948



DecisionTreeClassifier_CV


Unnamed: 0,class_weight,criterion,max_depth,accuracy_val,std_accuracy_val,log_accuracy_val
33,balanced,gini,50.0,0.998722,0.000317,6.662455
31,balanced,gini,40.0,0.998722,0.000317,6.662455
29,balanced,gini,30.0,0.998722,0.000317,6.662455
28,balanced,gini,29.0,0.998722,0.000317,6.662455
27,balanced,gini,28.0,0.998722,0.000317,6.662455
26,balanced,gini,27.0,0.998722,0.000317,6.662455
25,balanced,gini,26.0,0.998722,0.000317,6.662455
24,balanced,gini,25.0,0.998722,0.000317,6.662455
32,balanced,gini,45.0,0.998722,0.000317,6.662455
34,balanced,gini,,0.998722,0.000317,6.662455



RandomForestClassifier


Unnamed: 0,class_weight,max_depth,n_estimators,accuracy_tr,accuracy_val,log_accuracy_tr,log_accuracy_val
39,,19,100,0.999955,0.999522,10.013522,7.646413
217,balanced,24,110,0.999985,0.999522,11.112134,7.646413
228,balanced,25,110,0.999985,0.999522,11.112134,7.646413
42,,19,115,0.999955,0.999522,10.013522,7.646413
132,balanced,17,70,0.99997,0.999522,10.418987,7.646413
43,,19,120,0.999955,0.999522,10.013522,7.646413
34,,19,75,0.999955,0.999522,10.013522,7.646413
206,balanced,23,110,0.999985,0.999522,11.112134,7.646413
40,,19,105,0.999955,0.999522,10.013522,7.646413
41,,19,110,0.999955,0.999522,10.013522,7.646413



SVC


Unnamed: 0,C,class_weight,degree,gamma,kernel,accuracy_tr,accuracy_val,log_accuracy_tr,log_accuracy_val
12,1.3,,3,0.6,poly,0.999358,0.999044,7.350934,6.953266
7,1.25,,3,0.6,poly,0.999358,0.999044,7.350934,6.953266
2,1.2,,3,0.6,poly,0.999328,0.999044,7.305472,6.953266
22,1.4,,3,0.6,poly,0.999358,0.998985,7.350934,6.892642
4,1.2,,3,0.8,poly,0.999418,0.998985,7.448573,6.892642
9,1.25,,3,0.8,poly,0.999418,0.998985,7.448573,6.892642
34,1.5,,3,0.8,poly,0.999463,0.998985,7.528615,6.892642
14,1.3,,3,0.8,poly,0.999418,0.998985,7.448573,6.892642
19,1.35,,3,0.8,poly,0.999433,0.998985,7.474548,6.892642
24,1.4,,3,0.8,poly,0.999433,0.998985,7.474548,6.892642



SVC_CV


Unnamed: 0,C,class_weight,degree,gamma,kernel,accuracy_val,std_accuracy_val,log_accuracy_val
2,1.2,,3,0.6,poly,0.998388,0.000309,6.430005
12,1.3,,3,0.6,poly,0.998352,0.000331,6.408027
26,1.45,,3,0.5,poly,0.998352,0.000299,6.408027
17,1.35,,3,0.6,poly,0.99834,0.000349,6.400807
22,1.4,,3,0.6,poly,0.99834,0.000349,6.400807
31,1.5,,3,0.5,poly,0.99834,0.000286,6.400807
7,1.25,,3,0.6,poly,0.99834,0.000334,6.400806
32,1.5,,3,0.6,poly,0.998328,0.000368,6.393638
27,1.45,,3,0.6,poly,0.998328,0.000368,6.393638
21,1.4,,3,0.5,poly,0.998328,0.00032,6.393638



NeuralNetwork


Unnamed: 0,class_weight,dropout_rate,l2_alpha,number_of_hidden1_layers,number_of_hidden2_layers,sgd_learnrate,accuracy_tr,accuracy_val,log_accuracy_tr,log_accuracy_val
23,"[0.507135415404744, 35.536502546689306]",0.0,0.0,50,80,0.002,0.99892,0.998357,6.831037,6.410933
10,"[0.507135415404744, 35.536502546689306]",0.0,0.0,30,75,0.002,0.998839,0.998194,6.758491,6.316676
8,"[0.507135415404744, 35.536502546689306]",0.0,0.0,30,65,0.002,0.998689,0.998165,6.636598,6.300931
17,"[0.507135415404744, 35.536502546689306]",0.0,0.0,40,80,0.002,0.99887,0.998165,6.785604,6.300931
20,"[0.507135415404744, 35.536502546689306]",0.0,0.0,50,65,0.002,0.99893,0.998118,6.839908,6.275228
15,"[0.507135415404744, 35.536502546689306]",0.0,0.0,40,70,0.002,0.998827,0.998108,6.748327,6.270168
13,"[0.507135415404744, 35.536502546689306]",0.0,0.0,40,60,0.002,0.998755,0.998099,6.688985,6.265115
4,"[0.507135415404744, 35.536502546689306]",0.0,0.0,20,75,0.002,0.998698,0.998041,6.643942,6.235417
14,"[0.507135415404744, 35.536502546689306]",0.0,0.0,40,65,0.002,0.998868,0.998003,6.783496,6.216094
3,"[0.507135415404744, 35.536502546689306]",0.0,0.0,20,70,0.002,0.998717,0.997946,6.658701,6.187792


The assumption of this section is that the best parameters found on the full goldstandard data are the best parameters found on the downsampled goldstandard data. In this capstone project, the finally wanted results refer to the full goldstandard data. The shortcuts taken in this chapter are due to runtime restrictions and serve as description of the found overall models. Looking at the parameters' ranking of each model brings the insight described in the following table.

| model | parameters assessment |
| :---- | :-------------------- |
| RandomForestClassifier | The observed tendency for unbalanced $\texttt{class}\_\texttt{weight}$ of the runs above is confirmed with the runs of this group for Random Forest Classifier. A $\texttt{max}\_\texttt{depth}$ value of 22 produces the overall best accuracy on the test data of the full goldstandard data set. Although this value is close to the number of features, the run with $\texttt{max}\_\texttt{depth}=20$ has been ranked much lower than the runs of $\texttt{max}\_\texttt{depth}$ values in the vicinity of 20. Parameter $\texttt{n}\_\texttt{estimators}$ is confirmed to be high. Again, the accuracy values of the test data are close to each other for models with different parameters. It is difficult to concisely define the best parameter point of the grid. |
| DecisionTreeClassifier_CV | The Decision Tree Classifier with cross-validation shows a separation between $\texttt{criterion}$ gini and entropy. Measure gini exhibits the 15 top accuracy values. The accuracies with $\texttt{max}\_\texttt{depth}\ge 25$ are all the same and at the very top. This picture is consistent with the one of [Run with id 0](#Run-with-id-0) and gives the additional information of finer granularity. |
| DecisionTreeClassifier | For the same reason as documented in the runs above, see subsection [Run with id 0](#Run-with-id-0), this model is not discussed deeper here. |
| SVC_CV | For the tuning run of Support Vector Classifier, the decision for polynomial of degree 3 has been taken. The overall best accuracy has been reached with $C=1.2$ and $\gamma=0.6$. For an additional assessment of these values, see the comments of the next subsection, below. |
| SVC | For the same reason as documented in the runs above, this model is not discussed deeper here.  |
| NeuralNetwork | The tuning of the Neural Network results in a two-layer network with a high amount of neurons in both layers. The first hidden layer has its best tuning value at 40 and the second hidden layer at 75 which is close to the doubled value of the first layer. The specific result may vary, though, due to [[KeraRand](./A_References.ipynb#kerarand)]. |

###¬†Conclusion of Runs

Several observations can be held, as a general conclusion of the runs above.

- For the problem at hand, the overall best classifiers can be built with models of the Ensemble family. These classifiers have two advantages. They have a high performance in their training process and they can be interpreted easily. In the course of the capstone project, these models were the least complex ones when doing grid search. Only a little amount of parameters had to be searched and the results showed reproducible stability. For all runs, the classifiers of the Ensemble family have ranked at the top, although the differences between the models have shown to be low.
- The results of the Support Vector Classifier have come along with a poorer performance than the models of the Ensemble family. On the one hand, it took a longer time for training the models and on the other hand, searching the best parameters of $C$ and $\gamma$ has shown to be tricky. The two parameters interact with each other and the search for the maximum accuracy as a function of this two parameter space was a task that needed some patience. The results above remain arbitrary in this aspect of the capstone project. Stated positively, the Support Vector Classifier models show stability over a wide range of parameter values. Unfortunately, they do not exhibit the same prediction accuracy as the models of the Ensemble family.
- Some effort has been spent during the capstone project to interpret and understand the models. In chapter [Support Vector Classifier Model](./7_SVCModel.ipynb), the idea had been to gain insight into the threshold that would devide the subset of records with pairs of uniques from the records with pairs of duplicates. Adding up the weighted ($w_i$ : feature weight as to relative feature importance) values of all features $s_{ij}$ of a record $r_j = (s_{1j}, ... , s_{20j})$ would result in a sum $S_j$ for the similarity $$S_j = \sum_{i=1}^{20}w_i\cdot s_{ij},\;\;j=1,...,N$$ which could be compared to a threshold $\Theta$. This threshold would have been the similarity threshold that would have divided the two classes $S_u \le \Theta < S_d$, where $r_u \in \{\texttt{records of pair of uniques}\}$ and $r_d \in \{\texttt{records of pair of duplicates}\}$. This threshold would have helped in identifying the feature(s) $s_{ij}$ of a record $r_j$ that determine its class belonging for a prediction, like the ones shown in subsection [Wrong Predictions](#Wrong-Predictions) below. This intention had to be given up after some effort due to the fact that the Support Vector Classifier model use a non-linear kernel, so the sum above does not hold and not threshold $\Theta$ could be determined. Therefore, identifying the cause for a wrong prediction and deriving any measures to turn it into a correct prediction remains an open topic.
- The Neural Network training have shown to be a difficult task. Although the implementation with library keras has been done quickly, the trained models have shown a very slow convergence to a constant accuracy value, only. This has made the grid search process cumbersome. The final result found above with a two-layer network with a high amount of neurons in the second layer stands representative for the generally slow learning rate of the models. The network model find a first rough separation between the classes quickly. To find the finally stable network for the goldstandard data with a sharp threshold between records of pairs of uniques and records of pairs of duplicates seem to be way more difficult. One reason for this observation may be given in the overall assessment of the results at the very end of this chapter.

###¬†Wrong Predictions

In a first step, some print functions are to be defined.

In [15]:
# Unlimited number of columns allowed
pd.options.display.max_columns = None

def print_wrongly_predicted_records(run_id, wpg, wrong_predictions):
    # If not defined, yet
    path_goldstandard = './daten_goldstandard/'

    number_of_test_records = 20931
    number_of_class_in_test = [number_of_test_records - 20636, number_of_test_records - 295]

    for g in range(2) :
        print('Run', run_id, '-', wpg[g], '\n*****')
        for model in wrong_predictions.keys() :
            fp = wrong_predictions[model][wpg[g]].sort_index().index.tolist()
            print('\n{} has {:d} {} which corresponds to ...\n * {:.3f}% of records of class duplicate\n * {:.3f}% of all test data'.format(
                model, len(fp), wpg[g],
                100*len(fp)/number_of_class_in_test[g],
                100*len(fp)/number_of_test_records))
            print('The wrongly classified records have the index ...\n', fp)
        print('')
        
    return None

def print_union_intersection_wrongly_predicted_records(run_id, wpg, wrong_predictions):
    for g in range(2) :
        print('Run', run_id, '-', wpg[g], '\n*****')
        false_predicted_union = set()
        false_predicted_intersect = set(wrong_predictions['SVC_CV'][wpg[g]].sort_index().index.tolist())
        for model in wrong_predictions.keys() :
            fp = wrong_predictions[model][wpg[g]].sort_index().index.tolist()
            false_predicted_union = false_predicted_union.union(fp)
            false_predicted_intersect = false_predicted_intersect.intersection(fp)

        #¬†Result output
        print('- All {} of all classifiers (union)\n{}'.format(
            wpg[g], false_predicted_union))
        print('- All common {} of all classifiers (intersection)\n{}\n'.format(
            wpg[g], false_predicted_intersect))

    return None

def display_wrongly_predicted_records(run_id, wpg, wrong_predictions, df):
    for g in range(2) :
        print('Run', run_id, '-', wpg[g], '\n*****')
        for model in wrong_predictions.keys() :
            fp = wrong_predictions[model][wpg[g]].sort_index().index.tolist()
            print(wpg[g], 'for', model)
            display(df.iloc[fp])

    return None

The amounts of wrong predictions of a model are condensed in the characteristic numbers accuracy, precision, and recall, see chapter [Decision Tree Model](./6_DecisionTreeModel.ipynb). So far, the discussion has been on this condesed view of the performance of the models. When looking at the specific wrong predictions, for each model, two aspects may be interesting to assess.
- How many false predicted records of pairs of unique and how many false predicted records of pairs of duplicate can be differentiated?
- Which are the overlapping explicit records of pairs of unique and pairs of duplicate for the models? Identifying this common set of records that are difficult to classify for the models, might be helpful in understanding the reason why a record is hard to classify. Identifying the set of distinct records on the other hand, might be helpful to understand the specifics of each model.

Starting with run with id 0, the explicit false predictions are shown below.

In [16]:
run = 0
wrong_prediction_groups = ['false_predicted_uniques', 'false_predicted_duplicates']

#¬†Read confusion matrix results from chapters
wrong_predictions = rsf.restore_dict_results(path_goldstandard, 'wrong_predictions_run_' + str(run) + '.pkl')

print_wrongly_predicted_records(run, wrong_prediction_groups, wrong_predictions)

Run 0 - false_predicted_uniques 
*****

DecisionTreeClassifier has 18 false_predicted_uniques which corresponds to ...
 * 6.102% of records of class duplicate
 * 0.086% of all test data
The wrongly classified records have the index ...
 [103289, 103290, 103810, 103813, 103823, 103872, 103923, 103959, 104086, 104094, 104098, 104099, 104100, 104101, 104105, 104110, 104113, 104114]

DecisionTreeClassifier_CV has 13 false_predicted_uniques which corresponds to ...
 * 4.407% of records of class duplicate
 * 0.062% of all test data
The wrongly classified records have the index ...
 [103289, 103290, 103816, 104094, 104098, 104099, 104100, 104101, 104105, 104113, 104114, 104493, 104497]

RandomForestClassifier has 16 false_predicted_uniques which corresponds to ...
 * 5.424% of records of class duplicate
 * 0.076% of all test data
The wrongly classified records have the index ...
 [103289, 103290, 103609, 103618, 103810, 103872, 103959, 104098, 104099, 104101, 104105, 104110, 104113, 104114, 1

The total number of records in the test data set is 20,931 with a distribution of 20,636 records of uniques and 295 records of duplicates, see e.g. [Decision Tree Model](./6_DecisionTreeModel.ipynb) and subsections for the train/test split, there. In the list above, the amount of false predicted uniques and the amount of false predicted duplicates has been divided by this total amount to illustrate the proportion of wrong predictions. All ratios are below 10% for each model, which is a satisfying result.

As for the specific indices the union of all models and the intersection of the indices of all models can be differentiated. This is shown below.

In [17]:
#¬†Read confusion matrix results from chapters
wrong_predictions = rsf.restore_dict_results(path_goldstandard, 'wrong_predictions_run_' + str(run) + '.pkl')

print_union_intersection_wrongly_predicted_records(run, wrong_prediction_groups, wrong_predictions)

Run 0 - false_predicted_uniques 
*****
- All false_predicted_uniques of all classifiers (union)
{103872, 103810, 103609, 103618, 103813, 104003, 103816, 103802, 103823, 103184, 103188, 104086, 103959, 104605, 104094, 104225, 104098, 104099, 104100, 104101, 103653, 104226, 104105, 104620, 104493, 104110, 104621, 104113, 104114, 103923, 104497, 103729, 103289, 103290, 103867, 103614}
- All common false_predicted_uniques of all classifiers (intersection)
{104098, 103289, 103290, 104101}

Run 0 - false_predicted_duplicates 
*****
- All false_predicted_duplicates of all classifiers (union)
{84736, 2688, 1290, 85901, 58382, 97551, 64145, 11282, 42386, 90898, 92068, 2597, 56493, 17455, 59185, 97845, 98487, 73144, 25401, 97851, 7361, 1222, 89031, 100823, 22615, 58588, 29537, 22113, 78945, 100968, 82158, 39154, 70130, 58868, 628, 36341, 35061, 93949}
- All common false_predicted_duplicates of all classifiers (intersection)
{39154, 2597}



Now, let's have a look at the full data of the false predicted records.

In [18]:
import bz2
import _pickle as cPickle

#¬†Restore DataFrame with features from compressed pickle file
with bz2.BZ2File((os.path.join(
    path_goldstandard, 'labelled_feature_matrix_full.pkl')), 'rb') as file:
    df_attribute_with_sim_feature = cPickle.load(file)

In [19]:
#¬†Read confusion matrix results from chapters
wrong_predictions = rsf.restore_dict_results(path_goldstandard, 'wrong_predictions_run_' + str(run) + '.pkl')

display_wrongly_predicted_records(run, wrong_prediction_groups, wrong_predictions, df_attribute_with_sim_feature)

Run 0 - false_predicted_uniques 
*****
false_predicted_uniques for DecisionTreeClassifier


Unnamed: 0,duplicates,coordinate_E_delta,coordinate_E_x,coordinate_E_y,coordinate_N_delta,coordinate_N_x,coordinate_N_y,corporate_full_delta,corporate_full_x,corporate_full_y,doi_delta,doi_x,doi_y,edition_delta,edition_x,edition_y,exactDate_delta,exactDate_x,exactDate_y,format_postfix_delta,format_postfix_x,format_postfix_y,format_prefix_delta,format_prefix_x,format_prefix_y,isbn_delta,isbn_x,isbn_y,ismn_delta,ismn_x,ismn_y,musicid_delta,musicid_x,musicid_y,part_delta,part_x,part_y,person_100_delta,person_100_x,person_100_y,person_245c_delta,person_245c_x,person_245c_y,person_700_delta,person_700_x,person_700_y,pubinit_delta,pubinit_x,pubinit_y,scale_delta,scale_x,scale_y,ttlfull_245_delta,ttlfull_245_x,ttlfull_245_y,ttlfull_246_delta,ttlfull_246_x,ttlfull_246_y,volumes_delta,volumes_x,volumes_y
103289,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,aaaaaaaa,1920uuuu,1.0,20000,20000,1.0,bk,bk,1.0,[],[],-1.0,,,-1.0,,,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,-0.5,,w. a. mozart ; oper in 2 akten v. emanuel schi...,-0.5,,schikanederemanuel,-0.5,ernst eulenburg,,-1.0,,,1.0,die zauberfl√∂te,die zauberfl√∂te,-1.0,,,0.0,412,1
103290,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,1920aaaa,uuuuuuuu,1.0,20000,20000,1.0,bk,bk,1.0,[],[],-1.0,,,-1.0,,,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,-0.5,w. a. mozart ; oper in 2 akten v. emanuel schi...,,-0.5,schikanederemanuel,,-0.5,,ernst eulenburg,-1.0,,,1.0,die zauberfl√∂te,die zauberfl√∂te,-1.0,,,0.0,1,412
103810,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,aaaaaaaa,19001999,0.428571,10200,10100,1.0,mu,mu,1.0,[],[],-1.0,,,-0.5,,3714.0,-0.5,,12,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,-0.5,,von w. a. mozart ; revidiert und mit einf√ºhrun...,-1.0,,,-1.0,,,-1.0,,,0.747312,"die zauberfl√∂te, [daraus:] aria ""in diesen hei...",die zauberfl√∂te,-1.0,,,-0.5,,1
103813,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,aaaaaaaa,uuuuuuuu,1.0,10200,10200,1.0,mu,mu,1.0,[],[],-1.0,,,-0.5,,10425.0,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,-0.5,von w.a. mozart ; klavierauszug nach dem in de...,,-0.5,,soldankurt,-1.0,,,-1.0,,,0.794613,"die zauberfl√∂te, oper in zwei aufz√ºgen","die zauberfl√∂te, oper in zwei aufz√ºgen : klavi...",-0.5,,"die zauberfl√∂te, ausgabe f√ºr gesang und klavier",0.733333,1 188,1
103823,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.625,19aaaaaa,1950uuuu,1.0,10100,10100,1.0,mu,mu,1.0,[],[],-1.0,,,0.0,4355.0,912.0,-0.5,,912 912,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.880856,a german opera by emanuel schikaneder ; music ...,a german opera by emanuel schikaneder ; music ...,-0.5,,"aberthermann, schikanederemanuel",-0.5,,e. eulenburg,-1.0,,,0.833333,die zauberfl√∂te,"die zauberfl√∂te, k√∂chel no 620",-1.0,,,0.733333,1,1 412
103872,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,aaaaaaaa,19451966,0.428571,30300,30700,1.0,cr,cr,0.0,[],[1144-1321],-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-0.5,idealibri,,-1.0,,,0.681481,arts,"arts, beaux-arts, litt√©rature, spectacles : (j...",-1.0,,,-1.0,,
103923,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,1.0,19702006,19702006,1.0,10100,10100,1.0,mu,mu,1.0,[],[],-0.5,m006546749,,1.0,4553.0,4553.0,0.0,19,5 19,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.757681,wolfgang amadeus mozart ; vorgelegt von gernot...,wolfgang amadeus mozart ; [text von emanuel sc...,0.774269,"schikanederemanuel, grubergernot, faberrudolf","grubergernot, orelalfred, schikanederemanuel, ...",1.0,b√§renreiter,b√§renreiter,-1.0,,,0.989418,"die zauberfl√∂te, kv 620 : eine deutsche oper i...","die zauberfl√∂te, [kv 620 : eine deutsche oper ...",-1.0,,,0.0,1,379
103959,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.75,1993aaaa,1993uuuu,0.428571,10100,10000,1.0,mu,mu,1.0,[963-8303-08-5],[963-8303-08-5],-1.0,,,-0.5,,1004.0,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,1.0,wolfgang amadeus mozart,wolfgang amadeus mozart,-1.0,,,-0.5,k√∂nemann,,-1.0,,,1.0,"die zauberfl√∂te, partitura","die zauberfl√∂te, partitura",-1.0,,,0.511111,225,1 225
104086,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,1880aaaa,uuuuuuuu,1.0,20000,20000,1.0,bk,bk,1.0,[],[],-1.0,,,-1.0,,,-0.5,21,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.761014,von emanuel schikaneder ; musik von w.a. mozart,von emanuel schikaneder [und j.g.k.l. giesecke],0.892308,schikanederemanuel,"schikanederemanuel, gieseckecarl ludwig",-0.5,breitkopf & h√§rtel,,-1.0,,,0.989744,"die zauberfl√∂te, il flauto magico : deutsche o...","die zauberfl√∂te, (il flauto magico) : deutsche...",-1.0,,,0.0,34,1
104094,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,aaaaaaaa,1941uuuu,1.0,20000,20000,1.0,bk,bk,1.0,[],[],-1.0,,,-1.0,,,0.814815,2620,2620 2620,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.858072,von mozart ; dichtung von emanuel schikaneder ...,von mozart ; dichtung von emanuel schikaneder ...,1.0,"schikanederemanuel, krusegeorg richard","schikanederemanuel, krusegeorg richard",-0.5,,p. reclam jun.,-1.0,,,0.881356,"die zauberfl√∂te, oper in zwei aufz√ºgen","die zauberfl√∂te, oper in zwei aufz√ºgen : volls...",-1.0,,,1.0,74,74


false_predicted_uniques for DecisionTreeClassifier_CV


Unnamed: 0,duplicates,coordinate_E_delta,coordinate_E_x,coordinate_E_y,coordinate_N_delta,coordinate_N_x,coordinate_N_y,corporate_full_delta,corporate_full_x,corporate_full_y,doi_delta,doi_x,doi_y,edition_delta,edition_x,edition_y,exactDate_delta,exactDate_x,exactDate_y,format_postfix_delta,format_postfix_x,format_postfix_y,format_prefix_delta,format_prefix_x,format_prefix_y,isbn_delta,isbn_x,isbn_y,ismn_delta,ismn_x,ismn_y,musicid_delta,musicid_x,musicid_y,part_delta,part_x,part_y,person_100_delta,person_100_x,person_100_y,person_245c_delta,person_245c_x,person_245c_y,person_700_delta,person_700_x,person_700_y,pubinit_delta,pubinit_x,pubinit_y,scale_delta,scale_x,scale_y,ttlfull_245_delta,ttlfull_245_x,ttlfull_245_y,ttlfull_246_delta,ttlfull_246_x,ttlfull_246_y,volumes_delta,volumes_x,volumes_y
103289,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,aaaaaaaa,1920uuuu,1.0,20000,20000,1.0,bk,bk,1.0,[],[],-1.0,,,-1.0,,,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,-0.5,,w. a. mozart ; oper in 2 akten v. emanuel schi...,-0.5,,schikanederemanuel,-0.5,ernst eulenburg,,-1.0,,,1.0,die zauberfl√∂te,die zauberfl√∂te,-1.0,,,0.0,412.0,1.0
103290,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,1920aaaa,uuuuuuuu,1.0,20000,20000,1.0,bk,bk,1.0,[],[],-1.0,,,-1.0,,,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,-0.5,w. a. mozart ; oper in 2 akten v. emanuel schi...,,-0.5,schikanederemanuel,,-0.5,,ernst eulenburg,-1.0,,,1.0,die zauberfl√∂te,die zauberfl√∂te,-1.0,,,0.0,1.0,412.0
103816,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,aaaaaaaa,uuuuuuuu,1.0,10200,10200,1.0,mu,mu,1.0,[],[],-1.0,,,1.0,10425.0,10425.0,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,-1.0,,,1.0,soldankurt,soldankurt,-1.0,,,-1.0,,,1.0,"die zauberfl√∂te, oper in zwei aufz√ºgen : klavi...","die zauberfl√∂te, oper in zwei aufz√ºgen : klavi...",1.0,"die zauberfl√∂te, ausgabe f√ºr gesang und klavier","die zauberfl√∂te, ausgabe f√ºr gesang und klavier",1.0,1.0,1.0
104094,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,aaaaaaaa,1941uuuu,1.0,20000,20000,1.0,bk,bk,1.0,[],[],-1.0,,,-1.0,,,0.814815,2620,2620 2620,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.858072,von mozart ; dichtung von emanuel schikaneder ...,von mozart ; dichtung von emanuel schikaneder ...,1.0,"schikanederemanuel, krusegeorg richard","schikanederemanuel, krusegeorg richard",-0.5,,p. reclam jun.,-1.0,,,0.881356,"die zauberfl√∂te, oper in zwei aufz√ºgen","die zauberfl√∂te, oper in zwei aufz√ºgen : volls...",-1.0,,,1.0,74.0,74.0
104098,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,aaaaaaaa,1941uuuu,1.0,20000,20000,1.0,bk,bk,1.0,[],[],-1.0,,,-1.0,,,0.796296,2620 5,2620 2620,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.839556,von mozart ; dichtung von emanuel schikaneder ...,von mozart ; dichtung von emanuel schikaneder ...,0.771292,"krusegeorg richard, schikanederemanuel","schikanederemanuel, krusegeorg richard",0.809524,reclam,p. reclam jun.,-1.0,,,0.881356,"die zauberfl√∂te, oper in zwei aufz√ºgen","die zauberfl√∂te, oper in zwei aufz√ºgen : volls...",-1.0,,,1.0,74.0,74.0
104099,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,aaaaaaaa,uuuuuuuu,1.0,20000,20000,1.0,bk,bk,1.0,[],[],-1.0,,,-1.0,,,0.796296,2620 5,2620 2620,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.770017,von mozart ; dichtung von emanuel schikaneder ...,wolfgang amadeus mozart ; dichtung von emanuel...,1.0,"krusegeorg richard, schikanederemanuel","krusegeorg richard, schikanederemanuel",0.412698,reclam,philipp reclam,-1.0,,,0.881356,"die zauberfl√∂te, oper in zwei aufz√ºgen","die zauberfl√∂te, oper in zwei aufz√ºgen : volls...",-1.0,,,1.0,74.0,74.0
104100,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,1941aaaa,uuuuuuuu,1.0,20000,20000,1.0,bk,bk,1.0,[],[],-1.0,,,-1.0,,,0.814815,2620 2620,2620,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.858072,von mozart ; dichtung von emanuel schikaneder ...,von mozart ; dichtung von emanuel schikaneder ...,1.0,"schikanederemanuel, krusegeorg richard","schikanederemanuel, krusegeorg richard",-0.5,p. reclam jun.,,-1.0,,,0.881356,"die zauberfl√∂te, oper in zwei aufz√ºgen : volls...","die zauberfl√∂te, oper in zwei aufz√ºgen",-1.0,,,1.0,74.0,74.0
104101,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,1941aaaa,uuuuuuuu,1.0,20000,20000,1.0,bk,bk,1.0,[],[],-1.0,,,-1.0,,,0.796296,2620 2620,2620 5,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.839556,von mozart ; dichtung von emanuel schikaneder ...,von mozart ; dichtung von emanuel schikaneder ...,0.771292,"schikanederemanuel, krusegeorg richard","krusegeorg richard, schikanederemanuel",0.809524,p. reclam jun.,reclam,-1.0,,,0.881356,"die zauberfl√∂te, oper in zwei aufz√ºgen : volls...","die zauberfl√∂te, oper in zwei aufz√ºgen",-1.0,,,1.0,74.0,74.0
104105,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,aaaaaaaa,uuuuuuuu,1.0,20000,20000,1.0,bk,bk,1.0,[],[],-1.0,,,-1.0,,,0.796296,2620 2620,2620 5,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.770017,wolfgang amadeus mozart ; dichtung von emanuel...,von mozart ; dichtung von emanuel schikaneder ...,1.0,"krusegeorg richard, schikanederemanuel","krusegeorg richard, schikanederemanuel",0.412698,philipp reclam,reclam,-1.0,,,0.881356,"die zauberfl√∂te, oper in zwei aufz√ºgen : volls...","die zauberfl√∂te, oper in zwei aufz√ºgen",-1.0,,,1.0,74.0,74.0
104113,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,aaaaaaaa,19739999,0.428571,30600,30500,1.0,cr,cr,0.0,[],[0335-1793],-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-0.5,,lib√©ration,-1.0,,,0.933333,liberation,lib√©ration,-1.0,,,-1.0,,


false_predicted_uniques for RandomForestClassifier


Unnamed: 0,duplicates,coordinate_E_delta,coordinate_E_x,coordinate_E_y,coordinate_N_delta,coordinate_N_x,coordinate_N_y,corporate_full_delta,corporate_full_x,corporate_full_y,doi_delta,doi_x,doi_y,edition_delta,edition_x,edition_y,exactDate_delta,exactDate_x,exactDate_y,format_postfix_delta,format_postfix_x,format_postfix_y,format_prefix_delta,format_prefix_x,format_prefix_y,isbn_delta,isbn_x,isbn_y,ismn_delta,ismn_x,ismn_y,musicid_delta,musicid_x,musicid_y,part_delta,part_x,part_y,person_100_delta,person_100_x,person_100_y,person_245c_delta,person_245c_x,person_245c_y,person_700_delta,person_700_x,person_700_y,pubinit_delta,pubinit_x,pubinit_y,scale_delta,scale_x,scale_y,ttlfull_245_delta,ttlfull_245_x,ttlfull_245_y,ttlfull_246_delta,ttlfull_246_x,ttlfull_246_y,volumes_delta,volumes_x,volumes_y
103289,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,aaaaaaaa,1920uuuu,1.0,20000,20000,1.0,bk,bk,1.0,[],[],-1.0,,,-1.0,,,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,-0.5,,w. a. mozart ; oper in 2 akten v. emanuel schi...,-0.5,,schikanederemanuel,-0.5,ernst eulenburg,,-1.0,,,1.0,die zauberfl√∂te,die zauberfl√∂te,-1.0,,,0.0,412,1
103290,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,1920aaaa,uuuuuuuu,1.0,20000,20000,1.0,bk,bk,1.0,[],[],-1.0,,,-1.0,,,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,-0.5,w. a. mozart ; oper in 2 akten v. emanuel schi...,,-0.5,schikanederemanuel,,-0.5,,ernst eulenburg,-1.0,,,1.0,die zauberfl√∂te,die zauberfl√∂te,-1.0,,,0.0,1,412
103609,1,-1.0,,,-1.0,,,-0.5,,wiener philharmoniker,-1.0,,,-1.0,,,0.75,1991aaaa,1991uuuu,1.0,40100,40100,1.0,mu,mu,1.0,[],[],-1.0,,,0.428571,171433.0,433.0,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.499433,von emanuel schikaneder ; wolfgang amadeus mozart,mozart,0.562408,schikanederemanuel,"heilmannuwe, josumi, krausmichael, leitnerlott...",0.805556,decca record,decca,-1.0,,,0.798246,"die zauberfl√∂te, oper in zwei aufz√ºgen",die zauberfl√∂te,-1.0,,,0.733333,2 152,2
103618,1,-1.0,,,-1.0,,,-0.5,wiener philharmoniker,,-1.0,,,-1.0,,,0.75,1991aaaa,1991uuuu,1.0,40100,40100,1.0,mu,mu,1.0,[],[],-1.0,,,0.428571,433.0,171433.0,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.499433,mozart,von emanuel schikaneder ; wolfgang amadeus mozart,0.562408,"heilmannuwe, josumi, krausmichael, leitnerlott...",schikanederemanuel,0.805556,decca,decca record,-1.0,,,0.798246,die zauberfl√∂te,"die zauberfl√∂te, oper in zwei aufz√ºgen",-1.0,,,0.733333,2,2 152
103810,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,aaaaaaaa,19001999,0.428571,10200,10100,1.0,mu,mu,1.0,[],[],-1.0,,,-0.5,,3714.0,-0.5,,12,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,-0.5,,von w. a. mozart ; revidiert und mit einf√ºhrun...,-1.0,,,-1.0,,,-1.0,,,0.747312,"die zauberfl√∂te, [daraus:] aria ""in diesen hei...",die zauberfl√∂te,-1.0,,,-0.5,,1
103872,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,aaaaaaaa,19451966,0.428571,30300,30700,1.0,cr,cr,0.0,[],[1144-1321],-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-0.5,idealibri,,-1.0,,,0.681481,arts,"arts, beaux-arts, litt√©rature, spectacles : (j...",-1.0,,,-1.0,,
103959,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.75,1993aaaa,1993uuuu,0.428571,10100,10000,1.0,mu,mu,1.0,[963-8303-08-5],[963-8303-08-5],-1.0,,,-0.5,,1004.0,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,1.0,wolfgang amadeus mozart,wolfgang amadeus mozart,-1.0,,,-0.5,k√∂nemann,,-1.0,,,1.0,"die zauberfl√∂te, partitura","die zauberfl√∂te, partitura",-1.0,,,0.511111,225,1 225
104098,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,aaaaaaaa,1941uuuu,1.0,20000,20000,1.0,bk,bk,1.0,[],[],-1.0,,,-1.0,,,0.796296,2620 5,2620 2620,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.839556,von mozart ; dichtung von emanuel schikaneder ...,von mozart ; dichtung von emanuel schikaneder ...,0.771292,"krusegeorg richard, schikanederemanuel","schikanederemanuel, krusegeorg richard",0.809524,reclam,p. reclam jun.,-1.0,,,0.881356,"die zauberfl√∂te, oper in zwei aufz√ºgen","die zauberfl√∂te, oper in zwei aufz√ºgen : volls...",-1.0,,,1.0,74,74
104099,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,aaaaaaaa,uuuuuuuu,1.0,20000,20000,1.0,bk,bk,1.0,[],[],-1.0,,,-1.0,,,0.796296,2620 5,2620 2620,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.770017,von mozart ; dichtung von emanuel schikaneder ...,wolfgang amadeus mozart ; dichtung von emanuel...,1.0,"krusegeorg richard, schikanederemanuel","krusegeorg richard, schikanederemanuel",0.412698,reclam,philipp reclam,-1.0,,,0.881356,"die zauberfl√∂te, oper in zwei aufz√ºgen","die zauberfl√∂te, oper in zwei aufz√ºgen : volls...",-1.0,,,1.0,74,74
104101,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,1941aaaa,uuuuuuuu,1.0,20000,20000,1.0,bk,bk,1.0,[],[],-1.0,,,-1.0,,,0.796296,2620 2620,2620 5,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.839556,von mozart ; dichtung von emanuel schikaneder ...,von mozart ; dichtung von emanuel schikaneder ...,0.771292,"schikanederemanuel, krusegeorg richard","krusegeorg richard, schikanederemanuel",0.809524,p. reclam jun.,reclam,-1.0,,,0.881356,"die zauberfl√∂te, oper in zwei aufz√ºgen : volls...","die zauberfl√∂te, oper in zwei aufz√ºgen",-1.0,,,1.0,74,74


false_predicted_uniques for SVC


Unnamed: 0,duplicates,coordinate_E_delta,coordinate_E_x,coordinate_E_y,coordinate_N_delta,coordinate_N_x,coordinate_N_y,corporate_full_delta,corporate_full_x,corporate_full_y,doi_delta,doi_x,doi_y,edition_delta,edition_x,edition_y,exactDate_delta,exactDate_x,exactDate_y,format_postfix_delta,format_postfix_x,format_postfix_y,format_prefix_delta,format_prefix_x,format_prefix_y,isbn_delta,isbn_x,isbn_y,ismn_delta,ismn_x,ismn_y,musicid_delta,musicid_x,musicid_y,part_delta,part_x,part_y,person_100_delta,person_100_x,person_100_y,person_245c_delta,person_245c_x,person_245c_y,person_700_delta,person_700_x,person_700_y,pubinit_delta,pubinit_x,pubinit_y,scale_delta,scale_x,scale_y,ttlfull_245_delta,ttlfull_245_x,ttlfull_245_y,ttlfull_246_delta,ttlfull_246_x,ttlfull_246_y,volumes_delta,volumes_x,volumes_y
103289,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,aaaaaaaa,1920uuuu,1.0,20000,20000,1.0,bk,bk,1.0,[],[],-1.0,,,-1.0,,,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,-0.5,,w. a. mozart ; oper in 2 akten v. emanuel schi...,-0.5,,schikanederemanuel,-0.5,ernst eulenburg,,-1.0,,,1.0,die zauberfl√∂te,die zauberfl√∂te,-1.0,,,0.0,412,1
103290,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,1920aaaa,uuuuuuuu,1.0,20000,20000,1.0,bk,bk,1.0,[],[],-1.0,,,-1.0,,,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,-0.5,w. a. mozart ; oper in 2 akten v. emanuel schi...,,-0.5,schikanederemanuel,,-0.5,,ernst eulenburg,-1.0,,,1.0,die zauberfl√∂te,die zauberfl√∂te,-1.0,,,0.0,1,412
103653,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,19201929,uuuuuuuu,1.0,10200,10200,1.0,mu,mu,1.0,[],[],-1.0,,,-0.5,245.0,,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.67731,von emanuel schikaneder ; [musik von] wolfgang...,von w.a. mozart ; klavierauszug neu rev. von w...,-0.5,"schikanederemanuel, kienzlwilhelm",,-0.5,universal edition,,-1.0,,,0.854023,"die zauberfl√∂te, il flauto magico : oper in zw...","die zauberfl√∂te, oper in 2 akten = il flauto m...",-1.0,,,1.0,1 167,1 167
103802,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.75,2005aaaa,2005uuuu,1.0,10300,10300,1.0,vm,vm,1.0,[],[],-1.0,,,-0.5,,99036.0,-1.0,,,-1.0,,,1.0,ein film von luc jacquet,ein film von luc jacquet,1.0,jacquetluc,jacquetluc,-0.5,frenetic films,,-1.0,,,0.742268,"die reise der pinguine, die natur schreibt die...",die reise der pinguine,-1.0,,,0.888889,1 82 2,1 82
103810,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,aaaaaaaa,19001999,0.428571,10200,10100,1.0,mu,mu,1.0,[],[],-1.0,,,-0.5,,3714.0,-0.5,,12,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,-0.5,,von w. a. mozart ; revidiert und mit einf√ºhrun...,-1.0,,,-1.0,,,-1.0,,,0.747312,"die zauberfl√∂te, [daraus:] aria ""in diesen hei...",die zauberfl√∂te,-1.0,,,-0.5,,1
103813,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,aaaaaaaa,uuuuuuuu,1.0,10200,10200,1.0,mu,mu,1.0,[],[],-1.0,,,-0.5,,10425.0,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,-0.5,von w.a. mozart ; klavierauszug nach dem in de...,,-0.5,,soldankurt,-1.0,,,-1.0,,,0.794613,"die zauberfl√∂te, oper in zwei aufz√ºgen","die zauberfl√∂te, oper in zwei aufz√ºgen : klavi...",-0.5,,"die zauberfl√∂te, ausgabe f√ºr gesang und klavier",0.733333,1 188,1
103823,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.625,19aaaaaa,1950uuuu,1.0,10100,10100,1.0,mu,mu,1.0,[],[],-1.0,,,0.0,4355.0,912.0,-0.5,,912 912,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.880856,a german opera by emanuel schikaneder ; music ...,a german opera by emanuel schikaneder ; music ...,-0.5,,"aberthermann, schikanederemanuel",-0.5,,e. eulenburg,-1.0,,,0.833333,die zauberfl√∂te,"die zauberfl√∂te, k√∂chel no 620",-1.0,,,0.733333,1,1 412
103872,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,aaaaaaaa,19451966,0.428571,30300,30700,1.0,cr,cr,0.0,[],[1144-1321],-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-0.5,idealibri,,-1.0,,,0.681481,arts,"arts, beaux-arts, litt√©rature, spectacles : (j...",-1.0,,,-1.0,,
103959,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.75,1993aaaa,1993uuuu,0.428571,10100,10000,1.0,mu,mu,1.0,[963-8303-08-5],[963-8303-08-5],-1.0,,,-0.5,,1004.0,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,1.0,wolfgang amadeus mozart,wolfgang amadeus mozart,-1.0,,,-0.5,k√∂nemann,,-1.0,,,1.0,"die zauberfl√∂te, partitura","die zauberfl√∂te, partitura",-1.0,,,0.511111,225,1 225
104003,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.75,2005aaaa,2005uuuu,1.0,10300,10300,1.0,vm,vm,1.0,[],[],-1.0,,,-0.5,99064.0,,-1.0,,,-1.0,,,1.0,ein film von luc jacquet,ein film von luc jacquet,1.0,jacquetluc,jacquetluc,-0.5,bonne pioche,,-1.0,,,0.767123,"die reise der pinguine, die natur schreibt die...",die reise der pinguine,-1.0,,,0.75,1,1 82


false_predicted_uniques for SVC_CV


Unnamed: 0,duplicates,coordinate_E_delta,coordinate_E_x,coordinate_E_y,coordinate_N_delta,coordinate_N_x,coordinate_N_y,corporate_full_delta,corporate_full_x,corporate_full_y,doi_delta,doi_x,doi_y,edition_delta,edition_x,edition_y,exactDate_delta,exactDate_x,exactDate_y,format_postfix_delta,format_postfix_x,format_postfix_y,format_prefix_delta,format_prefix_x,format_prefix_y,isbn_delta,isbn_x,isbn_y,ismn_delta,ismn_x,ismn_y,musicid_delta,musicid_x,musicid_y,part_delta,part_x,part_y,person_100_delta,person_100_x,person_100_y,person_245c_delta,person_245c_x,person_245c_y,person_700_delta,person_700_x,person_700_y,pubinit_delta,pubinit_x,pubinit_y,scale_delta,scale_x,scale_y,ttlfull_245_delta,ttlfull_245_x,ttlfull_245_y,ttlfull_246_delta,ttlfull_246_x,ttlfull_246_y,volumes_delta,volumes_x,volumes_y
103289,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,aaaaaaaa,1920uuuu,1.0,20000,20000,1.0,bk,bk,1.0,[],[],-1.0,,,-1.0,,,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,-0.5,,w. a. mozart ; oper in 2 akten v. emanuel schi...,-0.5,,schikanederemanuel,-0.5,ernst eulenburg,,-1.0,,,1.0,die zauberfl√∂te,die zauberfl√∂te,-1.0,,,0.0,412,1
103290,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,1920aaaa,uuuuuuuu,1.0,20000,20000,1.0,bk,bk,1.0,[],[],-1.0,,,-1.0,,,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,-0.5,w. a. mozart ; oper in 2 akten v. emanuel schi...,,-0.5,schikanederemanuel,,-0.5,,ernst eulenburg,-1.0,,,1.0,die zauberfl√∂te,die zauberfl√∂te,-1.0,,,0.0,1,412
103609,1,-1.0,,,-1.0,,,-0.5,,wiener philharmoniker,-1.0,,,-1.0,,,0.75,1991aaaa,1991uuuu,1.0,40100,40100,1.0,mu,mu,1.0,[],[],-1.0,,,0.428571,171433.0,433.0,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.499433,von emanuel schikaneder ; wolfgang amadeus mozart,mozart,0.562408,schikanederemanuel,"heilmannuwe, josumi, krausmichael, leitnerlott...",0.805556,decca record,decca,-1.0,,,0.798246,"die zauberfl√∂te, oper in zwei aufz√ºgen",die zauberfl√∂te,-1.0,,,0.733333,2 152,2
103614,1,-1.0,,,-1.0,,,-0.5,wiener philharmoniker,,-1.0,,,-1.0,,,0.75,1991aaaa,1991uuuu,1.0,40100,40100,1.0,mu,mu,1.0,[],[],-1.0,,,0.428571,433210.0,171433.0,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.713536,wolfgang amadeus mozart ; wiener philharmonike...,von emanuel schikaneder ; wolfgang amadeus mozart,0.587407,soltigeorg,schikanederemanuel,-0.5,,decca record,-1.0,,,0.798246,die zauberfl√∂te,"die zauberfl√∂te, oper in zwei aufz√ºgen",-1.0,,,1.0,2 152,2 152
103618,1,-1.0,,,-1.0,,,-0.5,wiener philharmoniker,,-1.0,,,-1.0,,,0.75,1991aaaa,1991uuuu,1.0,40100,40100,1.0,mu,mu,1.0,[],[],-1.0,,,0.428571,433.0,171433.0,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.499433,mozart,von emanuel schikaneder ; wolfgang amadeus mozart,0.562408,"heilmannuwe, josumi, krausmichael, leitnerlott...",schikanederemanuel,0.805556,decca,decca record,-1.0,,,0.798246,die zauberfl√∂te,"die zauberfl√∂te, oper in zwei aufz√ºgen",-1.0,,,0.733333,2,2 152
103653,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,19201929,uuuuuuuu,1.0,10200,10200,1.0,mu,mu,1.0,[],[],-1.0,,,-0.5,245.0,,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.67731,von emanuel schikaneder ; [musik von] wolfgang...,von w.a. mozart ; klavierauszug neu rev. von w...,-0.5,"schikanederemanuel, kienzlwilhelm",,-0.5,universal edition,,-1.0,,,0.854023,"die zauberfl√∂te, il flauto magico : oper in zw...","die zauberfl√∂te, oper in 2 akten = il flauto m...",-1.0,,,1.0,1 167,1 167
103729,1,-1.0,,,-1.0,,,1.0,schweizbundesamt f√ºr landestopografie,schweizbundesamt f√ºr landestopografie,-1.0,,,-1.0,,,0.75,2007aaaa,2007uuuu,1.0,10347,10347,1.0,mp,mp,1.0,[],[],-1.0,,,-1.0,,,-1.0,,,-1.0,,,1.0,bundesamt f√ºr landestopografie swisstopo,bundesamt f√ºr landestopografie swisstopo,1.0,dufourguillaume henri,dufourguillaume henri,-1.0,,,-1.0,,,1.0,"dufourkarten, topografische karte der schweiz","dufourkarten, topografische karte der schweiz",-1.0,,,1.0,2,2
103810,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,aaaaaaaa,19001999,0.428571,10200,10100,1.0,mu,mu,1.0,[],[],-1.0,,,-0.5,,3714.0,-0.5,,12,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,-0.5,,von w. a. mozart ; revidiert und mit einf√ºhrun...,-1.0,,,-1.0,,,-1.0,,,0.747312,"die zauberfl√∂te, [daraus:] aria ""in diesen hei...",die zauberfl√∂te,-1.0,,,-0.5,,1
103813,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,aaaaaaaa,uuuuuuuu,1.0,10200,10200,1.0,mu,mu,1.0,[],[],-1.0,,,-0.5,,10425.0,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,-0.5,von w.a. mozart ; klavierauszug nach dem in de...,,-0.5,,soldankurt,-1.0,,,-1.0,,,0.794613,"die zauberfl√∂te, oper in zwei aufz√ºgen","die zauberfl√∂te, oper in zwei aufz√ºgen : klavi...",-0.5,,"die zauberfl√∂te, ausgabe f√ºr gesang und klavier",0.733333,1 188,1
103823,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.625,19aaaaaa,1950uuuu,1.0,10100,10100,1.0,mu,mu,1.0,[],[],-1.0,,,0.0,4355.0,912.0,-0.5,,912 912,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.880856,a german opera by emanuel schikaneder ; music ...,a german opera by emanuel schikaneder ; music ...,-0.5,,"aberthermann, schikanederemanuel",-0.5,,e. eulenburg,-1.0,,,0.833333,die zauberfl√∂te,"die zauberfl√∂te, k√∂chel no 620",-1.0,,,0.733333,1,1 412


false_predicted_uniques for NeuralNetwork


Unnamed: 0,duplicates,coordinate_E_delta,coordinate_E_x,coordinate_E_y,coordinate_N_delta,coordinate_N_x,coordinate_N_y,corporate_full_delta,corporate_full_x,corporate_full_y,doi_delta,doi_x,doi_y,edition_delta,edition_x,edition_y,exactDate_delta,exactDate_x,exactDate_y,format_postfix_delta,format_postfix_x,format_postfix_y,format_prefix_delta,format_prefix_x,format_prefix_y,isbn_delta,isbn_x,isbn_y,ismn_delta,ismn_x,ismn_y,musicid_delta,musicid_x,musicid_y,part_delta,part_x,part_y,person_100_delta,person_100_x,person_100_y,person_245c_delta,person_245c_x,person_245c_y,person_700_delta,person_700_x,person_700_y,pubinit_delta,pubinit_x,pubinit_y,scale_delta,scale_x,scale_y,ttlfull_245_delta,ttlfull_245_x,ttlfull_245_y,ttlfull_246_delta,ttlfull_246_x,ttlfull_246_y,volumes_delta,volumes_x,volumes_y
103184,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.75,2009aaaa,2009uuuu,1.0,20000,20000,1.0,bk,bk,1.0,[978-3-15-020008-7],[978-3-15-020008-7],-1.0,,,-1.0,,,1.0,20008,20008,1.0,austenjane,austenjane,0.69774,jane austen ; aus dem englischen √ºbersetzt von...,jane austen,-0.5,"grawechristian, graweursula",,0.848485,reclam jun.,reclam,-1.0,,,1.0,"emma, roman","emma, roman",-1.0,,,1.0,600,600
103188,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.75,2009aaaa,2009uuuu,1.0,20000,20000,1.0,bk,bk,1.0,[978-3-15-020008-7],[978-3-15-020008-7],-1.0,,,-1.0,,,1.0,20008,20008,1.0,austenjane,austenjane,0.69774,jane austen,jane austen ; aus dem englischen √ºbersetzt von...,-0.5,,"grawechristian, graweursula",0.848485,reclam,reclam jun.,-1.0,,,1.0,"emma, roman","emma, roman",-1.0,,,1.0,600,600
103289,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,aaaaaaaa,1920uuuu,1.0,20000,20000,1.0,bk,bk,1.0,[],[],-1.0,,,-1.0,,,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,-0.5,,w. a. mozart ; oper in 2 akten v. emanuel schi...,-0.5,,schikanederemanuel,-0.5,ernst eulenburg,,-1.0,,,1.0,die zauberfl√∂te,die zauberfl√∂te,-1.0,,,0.0,412,1
103290,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,1920aaaa,uuuuuuuu,1.0,20000,20000,1.0,bk,bk,1.0,[],[],-1.0,,,-1.0,,,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,-0.5,w. a. mozart ; oper in 2 akten v. emanuel schi...,,-0.5,schikanederemanuel,,-0.5,,ernst eulenburg,-1.0,,,1.0,die zauberfl√∂te,die zauberfl√∂te,-1.0,,,0.0,1,412
103609,1,-1.0,,,-1.0,,,-0.5,,wiener philharmoniker,-1.0,,,-1.0,,,0.75,1991aaaa,1991uuuu,1.0,40100,40100,1.0,mu,mu,1.0,[],[],-1.0,,,0.428571,171433.0,433.0,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.499433,von emanuel schikaneder ; wolfgang amadeus mozart,mozart,0.562408,schikanederemanuel,"heilmannuwe, josumi, krausmichael, leitnerlott...",0.805556,decca record,decca,-1.0,,,0.798246,"die zauberfl√∂te, oper in zwei aufz√ºgen",die zauberfl√∂te,-1.0,,,0.733333,2 152,2
103618,1,-1.0,,,-1.0,,,-0.5,wiener philharmoniker,,-1.0,,,-1.0,,,0.75,1991aaaa,1991uuuu,1.0,40100,40100,1.0,mu,mu,1.0,[],[],-1.0,,,0.428571,433.0,171433.0,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.499433,mozart,von emanuel schikaneder ; wolfgang amadeus mozart,0.562408,"heilmannuwe, josumi, krausmichael, leitnerlott...",schikanederemanuel,0.805556,decca,decca record,-1.0,,,0.798246,die zauberfl√∂te,"die zauberfl√∂te, oper in zwei aufz√ºgen",-1.0,,,0.733333,2,2 152
103729,1,-1.0,,,-1.0,,,1.0,schweizbundesamt f√ºr landestopografie,schweizbundesamt f√ºr landestopografie,-1.0,,,-1.0,,,0.75,2007aaaa,2007uuuu,1.0,10347,10347,1.0,mp,mp,1.0,[],[],-1.0,,,-1.0,,,-1.0,,,-1.0,,,1.0,bundesamt f√ºr landestopografie swisstopo,bundesamt f√ºr landestopografie swisstopo,1.0,dufourguillaume henri,dufourguillaume henri,-1.0,,,-1.0,,,1.0,"dufourkarten, topografische karte der schweiz","dufourkarten, topografische karte der schweiz",-1.0,,,1.0,2,2
103810,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,aaaaaaaa,19001999,0.428571,10200,10100,1.0,mu,mu,1.0,[],[],-1.0,,,-0.5,,3714.0,-0.5,,12,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,-0.5,,von w. a. mozart ; revidiert und mit einf√ºhrun...,-1.0,,,-1.0,,,-1.0,,,0.747312,"die zauberfl√∂te, [daraus:] aria ""in diesen hei...",die zauberfl√∂te,-1.0,,,-0.5,,1
103867,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.75,1471aaaa,1471uuuu,1.0,20053,20053,1.0,bk,bk,1.0,[],[],-1.0,,,-1.0,,,-1.0,,,1.0,crescenzipietro de',crescenzipietro de',-1.0,,,-1.0,,,-1.0,,,-1.0,,,1.0,ruralia commoda,ruralia commoda,-1.0,,,1.0,418,418
104086,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,1880aaaa,uuuuuuuu,1.0,20000,20000,1.0,bk,bk,1.0,[],[],-1.0,,,-1.0,,,-0.5,21,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.761014,von emanuel schikaneder ; musik von w.a. mozart,von emanuel schikaneder [und j.g.k.l. giesecke],0.892308,schikanederemanuel,"schikanederemanuel, gieseckecarl ludwig",-0.5,breitkopf & h√§rtel,,-1.0,,,0.989744,"die zauberfl√∂te, il flauto magico : deutsche o...","die zauberfl√∂te, (il flauto magico) : deutsche...",-1.0,,,0.0,34,1


Run 0 - false_predicted_duplicates 
*****
false_predicted_duplicates for DecisionTreeClassifier


Unnamed: 0,duplicates,coordinate_E_delta,coordinate_E_x,coordinate_E_y,coordinate_N_delta,coordinate_N_x,coordinate_N_y,corporate_full_delta,corporate_full_x,corporate_full_y,doi_delta,doi_x,doi_y,edition_delta,edition_x,edition_y,exactDate_delta,exactDate_x,exactDate_y,format_postfix_delta,format_postfix_x,format_postfix_y,format_prefix_delta,format_prefix_x,format_prefix_y,isbn_delta,isbn_x,isbn_y,ismn_delta,ismn_x,ismn_y,musicid_delta,musicid_x,musicid_y,part_delta,part_x,part_y,person_100_delta,person_100_x,person_100_y,person_245c_delta,person_245c_x,person_245c_y,person_700_delta,person_700_x,person_700_y,pubinit_delta,pubinit_x,pubinit_y,scale_delta,scale_x,scale_y,ttlfull_245_delta,ttlfull_245_x,ttlfull_245_y,ttlfull_246_delta,ttlfull_246_x,ttlfull_246_y,volumes_delta,volumes_x,volumes_y
1290,0,-1.0,,,-1.0,,,0.8,les arts florissants,arts florissants,-1.0,,,-1.0,,,0.75,1996aaaa,1996uuuu,0.428571,40100,40000,1.0,mu,mu,1.0,[],[],-1.0,,,1.0,630.0,630.0,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.591919,wolfgang amadeus mozart ; libretto: emanuel sc...,mozart,0.775282,"mozartwolfgang amadeus, schikanederemanuel, ch...","christiewilliam, dessaynatalie, mannionrosa, b...",-0.5,,erato,-1.0,,,0.709588,"die zauberfl√∂te, the magic flute : opera in tw...","die zauberfl√∂te, kv 620",-1.0,,,0.777778,2,2 1
2597,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.75,1970aaaa,1970uuuu,1.0,10200,10200,1.0,mu,mu,1.0,[],[],-1.0,,,-0.5,,4553.0,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.743253,wolfgang amadeus mozart ; text von emanuel sch...,w.a. mozart ; libretto: emanuel schikaneder ; ...,0.749495,"grubergernot, moehnheinz, schikanederemanuel","moehnheinz, schikanederemanuel",1.0,b√§renreiter,b√§renreiter,-1.0,,,0.83484,"die zauberfl√∂te, eine deutsche oper in zwei au...","die zauberfl√∂te, eine deutsche oper in zwei au...",-1.0,,,1.0,1,1
25401,0,-1.0,,,-1.0,,,0.095238,les arts florissants,wiener philharmoniker,-1.0,,,-1.0,,,0.25,1996aaaa,2004uuuu,1.0,40100,40100,1.0,mu,mu,1.0,[],[],-1.0,,,-0.5,630.0,,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,-0.5,wolfgang amadeus mozart ; libretto: emanuel sc...,,0.559259,"mozartwolfgang amadeus, schikanederemanuel, ch...",karajanherbert von,-0.5,,membran,-1.0,,,0.762821,"die zauberfl√∂te, the magic flute : opera in tw...",die zauberfl√∂te,-1.0,,,1.0,2,2
29537,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-0.5,,2.0,0.75,2009aaaa,2009uuuu,1.0,20000,20000,1.0,bk,bk,0.0,[978-3-15-020008-7],[978-3-596-90041-1],-1.0,,,-1.0,,,0.6,20008,90041,1.0,austenjane,austenjane,0.676656,jane austen ; aus dem englischen √ºbersetzt von...,jane austen ; mit einem nachw. von rudolf s√ºhnel,-0.5,"grawechristian, graweursula",,0.480963,reclam jun.,fischer-taschenbuch-verlag,-1.0,,,1.0,"emma, roman","emma, roman",-1.0,,,0.555556,600,509
39154,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.75,2011aaaa,2011uuuu,1.0,10300,10300,1.0,vm,vm,1.0,[],[],-1.0,,,-1.0,,,0.777778,1 1,1,-0.5,schl√∂ndorffvolker,,0.700586,volker schl√∂ndorff ; nach dem roman von max fr...,regie: volker schl√∂ndorff ; nach dem roman von...,0.612207,"frischmax, junkersdorfeberhard","schl√∂ndorffvolker, frischmax, shepardsam, depl...",0.765873,"suhrkamp, absolut medien",absolut medien,-1.0,,,1.0,homo faber,homo faber,-1.0,,,1.0,1 117,1 117
58588,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.75,20091990,20081991,1.0,10300,10300,1.0,vm,vm,1.0,[],[],-1.0,,,-0.5,502430.0,,-1.0,,,-1.0,,,0.794715,ein film von volker schl√∂ndorff ; nach dem rom...,ein film von volker schl√∂ndorff ; nach dem rom...,0.871796,"schl√∂ndorffvolker, wurlitzerrudy, frischmax, s...","schl√∂ndorffvolker, frischmax, arvantisjorgos, ...",1.0,kinowelt home entertainment,kinowelt home entertainment,-1.0,,,1.0,homo faber,homo faber,-1.0,,,1.0,1 109,1 109
58868,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.75,1984aaaa,1984uuuu,1.0,20000,20000,1.0,bk,bk,1.0,[],[],-1.0,,,-1.0,,,-0.5,1,,-1.0,,,0.75,sigrid kessler... [et al.] ; [√©d.:] interkanto...,sigrid kessler... [et al.],-0.5,,kesslersigrid,1.0,staatlicher lehrmittelverlag,staatlicher lehrmittelverlag,-1.0,,,0.759266,"bonne chance!, cours de langue fran√ßaise, troi...","bonne chance !, cours de langue fran√ßaise troi...",-1.0,,,1.0,237,237
97845,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.75,2012aaaa,2012uuuu,1.0,20000,20000,1.0,bk,bk,1.0,[978-3-7255-6535-1],[978-3-7255-6535-1],-1.0,,,-1.0,,,1.0,63 63,63 63,1.0,k√§serbeatrice,k√§serbeatrice,0.533493,beatrice k√§ser,sozialversicherungsmissbrauch : am beispiel de...,-1.0,,,1.0,schulthess,schulthess,-1.0,,,0.727735,"sozialleistungsbetrug, sozialversicherungsbetr...","sozialleistungsbetrug, sozialversicherungsbetrug",-1.0,,,1.0,244,244
100823,0,-1.0,,,-1.0,,,0.354839,philharmonia orchestra (london),wiener philharmoniker,-1.0,,,-1.0,,,0.625,2009aaaa,2004uuuu,0.428571,40000,40100,1.0,mu,mu,1.0,[],[],-1.0,,,-0.5,50999.0,,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,-0.5,mozart,,0.600204,klempererotto,karajanherbert von,0.455988,emi records,membran,-1.0,,,1.0,die zauberfl√∂te,die zauberfl√∂te,-1.0,,,1.0,2,2


false_predicted_duplicates for DecisionTreeClassifier_CV


Unnamed: 0,duplicates,coordinate_E_delta,coordinate_E_x,coordinate_E_y,coordinate_N_delta,coordinate_N_x,coordinate_N_y,corporate_full_delta,corporate_full_x,corporate_full_y,doi_delta,doi_x,doi_y,edition_delta,edition_x,edition_y,exactDate_delta,exactDate_x,exactDate_y,format_postfix_delta,format_postfix_x,format_postfix_y,format_prefix_delta,format_prefix_x,format_prefix_y,isbn_delta,isbn_x,isbn_y,ismn_delta,ismn_x,ismn_y,musicid_delta,musicid_x,musicid_y,part_delta,part_x,part_y,person_100_delta,person_100_x,person_100_y,person_245c_delta,person_245c_x,person_245c_y,person_700_delta,person_700_x,person_700_y,pubinit_delta,pubinit_x,pubinit_y,scale_delta,scale_x,scale_y,ttlfull_245_delta,ttlfull_245_x,ttlfull_245_y,ttlfull_246_delta,ttlfull_246_x,ttlfull_246_y,volumes_delta,volumes_x,volumes_y
628,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.625,1999aaaa,1998uuuu,1.0,20000,20000,1.0,bk,bk,1.0,[3-495-47879-5],[3-495-47879-5],-1.0,,,-1.0,,,-0.5,,57.0,1.0,fluryandreas,fluryandreas,1.0,andreas flury,andreas flury,-1.0,,,1.0,alber,alber,-1.0,,,1.0,"der moralische status der tiere, henry salt, p...","der moralische status der tiere, henry salt, p...",-1.0,,,0.777778,316,356
1222,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.625,19aaaaaa,19601969,0.428571,10200,10100,1.0,mu,mu,1.0,[],[],-1.0,,,0.25,245.0,4355.0,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.694948,von emanuel schikaneder ; [musik von] wolfgang...,by emanuel schikaneder ; music by wolfgang ama...,-0.5,,"schikanederemanuel, aberthermann",-0.5,,eulenburg,-1.0,,,0.747011,"die zauberfl√∂te, oper in zwei akten [kv 620] =...","die zauberfl√∂te, a german opera : k√∂chel no. 620",-0.5,"die zauberfl√∂te, ausgabe f√ºr gesang und klavier",,1.0,1,1
2597,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.75,1970aaaa,1970uuuu,1.0,10200,10200,1.0,mu,mu,1.0,[],[],-1.0,,,-0.5,,4553.0,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.743253,wolfgang amadeus mozart ; text von emanuel sch...,w.a. mozart ; libretto: emanuel schikaneder ; ...,0.749495,"grubergernot, moehnheinz, schikanederemanuel","moehnheinz, schikanederemanuel",1.0,b√§renreiter,b√§renreiter,-1.0,,,0.83484,"die zauberfl√∂te, eine deutsche oper in zwei au...","die zauberfl√∂te, eine deutsche oper in zwei au...",-1.0,,,1.0,1,1
22113,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-0.5,1.0,,0.75,2007aaaa,2007uuuu,1.0,20000,20000,1.0,bk,bk,0.0,"[978-3-7815-1531-4, 3-7815-1531-1]",[978-3-7815-1513-0],-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.856549,hrsg. michaela gl√§ser-zikuda ... [et al.],hrsg. von michaela gl√§ser-zikuda ... [et al.],-0.5,gl√§ser-zikudamichaela,,-0.5,klinkhardt,,-1.0,,,1.0,"lernprozesse dokumentieren, reflektieren und b...","lernprozesse dokumentieren, reflektieren und b...",-1.0,,,1.0,304,304
22615,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.625,19aaaaaa,1955uuuu,0.111111,10200,40000,1.0,mu,mu,1.0,[],[],-1.0,,,0.142857,245.0,4357412.0,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.573413,von emanuel schikaneder ; [musik von] wolfgang...,mozart,-0.5,,mozartwolfgang amadeus,-0.5,,polydor,-1.0,,,0.74143,"die zauberfl√∂te, oper in zwei akten [kv 620] =...",die zauberfl√∂te [¬Ö],-0.5,"die zauberfl√∂te, ausgabe f√ºr gesang und klavier",,0.777778,1,1 2
36341,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,aaaaaaaa,181uuuuu,0.428571,10200,10000,1.0,mu,mu,1.0,[],[],-1.0,,,0.0,245.0,117.0,-0.5,,2.0,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.715134,von emanuel schikaneder ; [musik von] wolfgang...,von w. a. mozart ; in vollst√§ndigem clavieraus...,0.681818,"kienzlwilhelm, schikanederemanuel",schikanederemanuel,-0.5,,heckel,-1.0,,,0.752891,"die zauberfl√∂te (il flauto magico), oper in zw...","die zauberfl√∂te, grosse oper in zwei aufz√ºgen",-0.5,"die zauberfl√∂te, ausgabe f√ºr gesang und klavier",,1.0,1,1
39154,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.75,2011aaaa,2011uuuu,1.0,10300,10300,1.0,vm,vm,1.0,[],[],-1.0,,,-1.0,,,0.777778,1 1,1.0,-0.5,schl√∂ndorffvolker,,0.700586,volker schl√∂ndorff ; nach dem roman von max fr...,regie: volker schl√∂ndorff ; nach dem roman von...,0.612207,"frischmax, junkersdorfeberhard","schl√∂ndorffvolker, frischmax, shepardsam, depl...",0.765873,"suhrkamp, absolut medien",absolut medien,-1.0,,,1.0,homo faber,homo faber,-1.0,,,1.0,1 117,1 117
56493,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-0.5,2.0,,0.75,1990aaaa,1990uuuu,1.0,20000,20000,1.0,bk,bk,1.0,[],[],-1.0,,,-1.0,,,-0.5,,4.0,-1.0,,,0.691477,sigrid kessler ... [et al.],sigrid kessler... [et al.] ; [√©d.:] interkanto...,-0.5,kesslersigrid,,1.0,staatlicher lehrmittelverlag,staatlicher lehrmittelverlag,-1.0,,,0.842656,"bonne chance!, cours de langue fran√ßaise : tro...","bonne chance!, cours de langue fran√ßaise, deux...",-1.0,,,-0.5,,589
58382,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,aaaaaaaa,1970uuuu,1.0,20000,20000,1.0,bk,bk,1.0,[],[],-1.0,,,-1.0,,,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.596141,von emanuel schikaneder [und j.g.k.l. giesecke],wolfgang amadeus mozart ; nacherz√§hlt von ingr...,0.659968,"schikanederemanuel, gieseckecarl ludwig","weixelbaumeringrid, riera rochasroque",-0.5,,ueberreuter,-1.0,,,0.74359,"die zauberfl√∂te, (il flauto magico) : deutsche...",die zauberfl√∂te,-1.0,,,1.0,1,1
59185,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.75,19001999,19601969,1.0,10100,10100,1.0,mu,mu,1.0,[],[],-1.0,,,0.25,3714.0,4355.0,-0.5,12,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.654449,von w. a. mozart ; revidiert und mit einf√ºhrun...,by emanuel schikaneder ; music by wolfgang ama...,-0.5,,"schikanederemanuel, aberthermann",-0.5,,eulenburg,-1.0,,,0.770833,die zauberfl√∂te,"die zauberfl√∂te, a german opera : k√∂chel no. 620",-1.0,,,1.0,1,1


false_predicted_duplicates for RandomForestClassifier


Unnamed: 0,duplicates,coordinate_E_delta,coordinate_E_x,coordinate_E_y,coordinate_N_delta,coordinate_N_x,coordinate_N_y,corporate_full_delta,corporate_full_x,corporate_full_y,doi_delta,doi_x,doi_y,edition_delta,edition_x,edition_y,exactDate_delta,exactDate_x,exactDate_y,format_postfix_delta,format_postfix_x,format_postfix_y,format_prefix_delta,format_prefix_x,format_prefix_y,isbn_delta,isbn_x,isbn_y,ismn_delta,ismn_x,ismn_y,musicid_delta,musicid_x,musicid_y,part_delta,part_x,part_y,person_100_delta,person_100_x,person_100_y,person_245c_delta,person_245c_x,person_245c_y,person_700_delta,person_700_x,person_700_y,pubinit_delta,pubinit_x,pubinit_y,scale_delta,scale_x,scale_y,ttlfull_245_delta,ttlfull_245_x,ttlfull_245_y,ttlfull_246_delta,ttlfull_246_x,ttlfull_246_y,volumes_delta,volumes_x,volumes_y
1290,0,-1.0,,,-1.0,,,0.8,les arts florissants,arts florissants,-1.0,,,-1.0,,,0.75,1996aaaa,1996uuuu,0.428571,40100,40000,1.0,mu,mu,1.0,[],[],-1.0,,,1.0,630.0,630.0,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.591919,wolfgang amadeus mozart ; libretto: emanuel sc...,mozart,0.775282,"mozartwolfgang amadeus, schikanederemanuel, ch...","christiewilliam, dessaynatalie, mannionrosa, b...",-0.5,,erato,-1.0,,,0.709588,"die zauberfl√∂te, the magic flute : opera in tw...","die zauberfl√∂te, kv 620",-1.0,,,0.777778,2,2 1
2597,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.75,1970aaaa,1970uuuu,1.0,10200,10200,1.0,mu,mu,1.0,[],[],-1.0,,,-0.5,,4553.0,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.743253,wolfgang amadeus mozart ; text von emanuel sch...,w.a. mozart ; libretto: emanuel schikaneder ; ...,0.749495,"grubergernot, moehnheinz, schikanederemanuel","moehnheinz, schikanederemanuel",1.0,b√§renreiter,b√§renreiter,-1.0,,,0.83484,"die zauberfl√∂te, eine deutsche oper in zwei au...","die zauberfl√∂te, eine deutsche oper in zwei au...",-1.0,,,1.0,1,1
22113,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-0.5,1.0,,0.75,2007aaaa,2007uuuu,1.0,20000,20000,1.0,bk,bk,0.0,"[978-3-7815-1531-4, 3-7815-1531-1]",[978-3-7815-1513-0],-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.856549,hrsg. michaela gl√§ser-zikuda ... [et al.],hrsg. von michaela gl√§ser-zikuda ... [et al.],-0.5,gl√§ser-zikudamichaela,,-0.5,klinkhardt,,-1.0,,,1.0,"lernprozesse dokumentieren, reflektieren und b...","lernprozesse dokumentieren, reflektieren und b...",-1.0,,,1.0,304,304
39154,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.75,2011aaaa,2011uuuu,1.0,10300,10300,1.0,vm,vm,1.0,[],[],-1.0,,,-1.0,,,0.777778,1 1,1,-0.5,schl√∂ndorffvolker,,0.700586,volker schl√∂ndorff ; nach dem roman von max fr...,regie: volker schl√∂ndorff ; nach dem roman von...,0.612207,"frischmax, junkersdorfeberhard","schl√∂ndorffvolker, frischmax, shepardsam, depl...",0.765873,"suhrkamp, absolut medien",absolut medien,-1.0,,,1.0,homo faber,homo faber,-1.0,,,1.0,1 117,1 117
58588,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.75,20091990,20081991,1.0,10300,10300,1.0,vm,vm,1.0,[],[],-1.0,,,-0.5,502430.0,,-1.0,,,-1.0,,,0.794715,ein film von volker schl√∂ndorff ; nach dem rom...,ein film von volker schl√∂ndorff ; nach dem rom...,0.871796,"schl√∂ndorffvolker, wurlitzerrudy, frischmax, s...","schl√∂ndorffvolker, frischmax, arvantisjorgos, ...",1.0,kinowelt home entertainment,kinowelt home entertainment,-1.0,,,1.0,homo faber,homo faber,-1.0,,,1.0,1 109,1 109
58868,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.75,1984aaaa,1984uuuu,1.0,20000,20000,1.0,bk,bk,1.0,[],[],-1.0,,,-1.0,,,-0.5,1,,-1.0,,,0.75,sigrid kessler... [et al.] ; [√©d.:] interkanto...,sigrid kessler... [et al.],-0.5,,kesslersigrid,1.0,staatlicher lehrmittelverlag,staatlicher lehrmittelverlag,-1.0,,,0.759266,"bonne chance!, cours de langue fran√ßaise, troi...","bonne chance !, cours de langue fran√ßaise troi...",-1.0,,,1.0,237,237
85901,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,1.0,20151475,20151475,1.0,20000,20000,1.0,bk,bk,1.0,"[88-7922-121-3, 978-88-7922-121-4]",[978-88-7922-121-4],-1.0,,,-1.0,,,1.0,1 1,1 1,1.0,petrarcafrancesco,petrarcafrancesco,1.0,francesco petrarca ; commento di bernardo lapini,francesco petrarca ; commento di bernardo lapini,1.0,lapinibernardo,lapinibernardo,-0.5,"adv, biblioteca cantonale di lugano",,-1.0,,,0.691756,"trionfi, riedizione accurata dell'incunabolo c...",trionfi,-1.0,,,0.703704,2,2 494 102
97551,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.75,2011aaaa,20111991,1.0,10300,10300,1.0,vm,vm,1.0,[],[],-1.0,,,-1.0,,,0.777778,1 1,1,-0.5,schl√∂ndorffvolker,,0.741667,volker schl√∂ndorff ; nach dem roman von max fr...,volker schl√∂ndorff,0.590909,"frischmax, junkersdorfeberhard","schl√∂ndorffvolker, frischmax, shepardsam, delp...",0.777778,"suhrkamp, absolut medien",suhrkamp,-1.0,,,0.747967,homo faber,"homo faber, nach dem roman von max frisch",-1.0,,,1.0,1 117,1 117


false_predicted_duplicates for SVC


Unnamed: 0,duplicates,coordinate_E_delta,coordinate_E_x,coordinate_E_y,coordinate_N_delta,coordinate_N_x,coordinate_N_y,corporate_full_delta,corporate_full_x,corporate_full_y,doi_delta,doi_x,doi_y,edition_delta,edition_x,edition_y,exactDate_delta,exactDate_x,exactDate_y,format_postfix_delta,format_postfix_x,format_postfix_y,format_prefix_delta,format_prefix_x,format_prefix_y,isbn_delta,isbn_x,isbn_y,ismn_delta,ismn_x,ismn_y,musicid_delta,musicid_x,musicid_y,part_delta,part_x,part_y,person_100_delta,person_100_x,person_100_y,person_245c_delta,person_245c_x,person_245c_y,person_700_delta,person_700_x,person_700_y,pubinit_delta,pubinit_x,pubinit_y,scale_delta,scale_x,scale_y,ttlfull_245_delta,ttlfull_245_x,ttlfull_245_y,ttlfull_246_delta,ttlfull_246_x,ttlfull_246_y,volumes_delta,volumes_x,volumes_y
628,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.625,1999aaaa,1998uuuu,1.0,20000,20000,1.0,bk,bk,1.0,[3-495-47879-5],[3-495-47879-5],-1.0,,,-1.0,,,-0.5,,57,1.0,fluryandreas,fluryandreas,1.0,andreas flury,andreas flury,-1.0,,,1.0,alber,alber,-1.0,,,1.0,"der moralische status der tiere, henry salt, p...","der moralische status der tiere, henry salt, p...",-1.0,,,0.777778,316,356
2597,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.75,1970aaaa,1970uuuu,1.0,10200,10200,1.0,mu,mu,1.0,[],[],-1.0,,,-0.5,,4553.0,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.743253,wolfgang amadeus mozart ; text von emanuel sch...,w.a. mozart ; libretto: emanuel schikaneder ; ...,0.749495,"grubergernot, moehnheinz, schikanederemanuel","moehnheinz, schikanederemanuel",1.0,b√§renreiter,b√§renreiter,-1.0,,,0.83484,"die zauberfl√∂te, eine deutsche oper in zwei au...","die zauberfl√∂te, eine deutsche oper in zwei au...",-1.0,,,1.0,1,1
2688,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-0.5,2.0,,0.5,1836aaaa,18001899,1.0,10200,10200,1.0,mu,mu,1.0,[],[],-1.0,,,-0.5,134.0,,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.748787,von w. a. mozart ; [text von emanuel schikaned...,w.a. mozart ; in vollst√§ndigem clavierauszug m...,-0.5,schikanederemanuel,,-0.5,meyer,,-1.0,,,0.97575,"die zauberfl√∂te, grosse oper in zwei aufz√ºgen ...","die zauberfl√∂te, grosse oper in zwei aufz√ºgen ...",-1.0,,,1.0,1,1
7361,0,-1.0,,,-1.0,,,-0.5,interkantonale lehrmittelzentrale,,-1.0,,,-1.0,,,0.75,1981aaaa,1981uuuu,1.0,20000,20000,1.0,bk,bk,1.0,[],[],-1.0,,,-1.0,,,-1.0,,,-0.5,binderheidy,,0.74821,"heidy binder, sigrid kessler, charlotte ritsch...",heidy binder... [et al.] ; [√©d.:] interkantona...,-0.5,"kesslersigrid, ritschardcharlotte",,0.963845,staatlicher lehrmittelverlag,staatlicher lehrmittelverl.,-1.0,,,0.921296,"bonne chance!, cours de langue fran√ßaise, prem...","bonne chance!, cours de langue fran√ßaise, prem...",-1.0,,,0.777778,4,413
11282,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.625,1989aaaa,1999uuuu,1.0,20000,20000,1.0,bk,bk,1.0,[2-08-070552-0],[2-08-070552-0],-1.0,,,-1.0,,,1.0,552 552,552 552,1.0,voltaire,voltaire,0.90777,"voltaire ; introd., notes, bibliogr., chronolo...","voltaire ; introd., notes , bibliogr., chronol...",1.0,pomeauren√©,pomeauren√©,0.923077,flammarion,gf flammarion,-1.0,,,1.0,trait√© sur la tol√©rance,trait√© sur la tol√©rance,-1.0,,,1.0,192,192
17455,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,1983aaaa,1970uuuu,1.0,10200,10200,1.0,mu,mu,1.0,[],[],-1.0,,,1.0,4553.0,4553.0,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.730483,wolfgang amadeus mozart ; text von emanuel sch...,w.a. mozart ; libretto: emanuel schikaneder ; ...,0.636111,"grubergernot, orelalfred, moehnheinz, schikane...","moehnheinz, schikanederemanuel",1.0,b√§renreiter,b√§renreiter,-1.0,,,0.83484,"die zauberfl√∂te, eine deutsche oper in zwei au...","die zauberfl√∂te, eine deutsche oper in zwei au...",-1.0,,,0.733333,1 221,1
22113,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-0.5,1.0,,0.75,2007aaaa,2007uuuu,1.0,20000,20000,1.0,bk,bk,0.0,"[978-3-7815-1531-4, 3-7815-1531-1]",[978-3-7815-1513-0],-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.856549,hrsg. michaela gl√§ser-zikuda ... [et al.],hrsg. von michaela gl√§ser-zikuda ... [et al.],-0.5,gl√§ser-zikudamichaela,,-0.5,klinkhardt,,-1.0,,,1.0,"lernprozesse dokumentieren, reflektieren und b...","lernprozesse dokumentieren, reflektieren und b...",-1.0,,,1.0,304,304
39154,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.75,2011aaaa,2011uuuu,1.0,10300,10300,1.0,vm,vm,1.0,[],[],-1.0,,,-1.0,,,0.777778,1 1,1,-0.5,schl√∂ndorffvolker,,0.700586,volker schl√∂ndorff ; nach dem roman von max fr...,regie: volker schl√∂ndorff ; nach dem roman von...,0.612207,"frischmax, junkersdorfeberhard","schl√∂ndorffvolker, frischmax, shepardsam, depl...",0.765873,"suhrkamp, absolut medien",absolut medien,-1.0,,,1.0,homo faber,homo faber,-1.0,,,1.0,1 117,1 117
42386,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.75,1987aaaa,1987uuuu,1.0,20000,20000,1.0,bk,bk,1.0,[],[],-1.0,,,-1.0,,,-1.0,,,-1.0,,,-0.5,,heidy binder... [et al.],-0.5,,binderheidy,0.977778,[staatlicher lehrmittelverlag],staatlicher lehrmittelverlag,-1.0,,,0.910003,"bonne chance !, cours de langue fran√ßaise prem...","bonne chance !, cours de langue fran√ßaise prem...",-1.0,,,0.555556,134,145
58588,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.75,20091990,20081991,1.0,10300,10300,1.0,vm,vm,1.0,[],[],-1.0,,,-0.5,502430.0,,-1.0,,,-1.0,,,0.794715,ein film von volker schl√∂ndorff ; nach dem rom...,ein film von volker schl√∂ndorff ; nach dem rom...,0.871796,"schl√∂ndorffvolker, wurlitzerrudy, frischmax, s...","schl√∂ndorffvolker, frischmax, arvantisjorgos, ...",1.0,kinowelt home entertainment,kinowelt home entertainment,-1.0,,,1.0,homo faber,homo faber,-1.0,,,1.0,1 109,1 109


false_predicted_duplicates for SVC_CV


Unnamed: 0,duplicates,coordinate_E_delta,coordinate_E_x,coordinate_E_y,coordinate_N_delta,coordinate_N_x,coordinate_N_y,corporate_full_delta,corporate_full_x,corporate_full_y,doi_delta,doi_x,doi_y,edition_delta,edition_x,edition_y,exactDate_delta,exactDate_x,exactDate_y,format_postfix_delta,format_postfix_x,format_postfix_y,format_prefix_delta,format_prefix_x,format_prefix_y,isbn_delta,isbn_x,isbn_y,ismn_delta,ismn_x,ismn_y,musicid_delta,musicid_x,musicid_y,part_delta,part_x,part_y,person_100_delta,person_100_x,person_100_y,person_245c_delta,person_245c_x,person_245c_y,person_700_delta,person_700_x,person_700_y,pubinit_delta,pubinit_x,pubinit_y,scale_delta,scale_x,scale_y,ttlfull_245_delta,ttlfull_245_x,ttlfull_245_y,ttlfull_246_delta,ttlfull_246_x,ttlfull_246_y,volumes_delta,volumes_x,volumes_y
2597,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.75,1970aaaa,1970uuuu,1.0,10200,10200,1.0,mu,mu,1.0,[],[],-1.0,,,-0.5,,4553.0,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.743253,wolfgang amadeus mozart ; text von emanuel sch...,w.a. mozart ; libretto: emanuel schikaneder ; ...,0.749495,"grubergernot, moehnheinz, schikanederemanuel","moehnheinz, schikanederemanuel",1.0,b√§renreiter,b√§renreiter,-1.0,,,0.83484,"die zauberfl√∂te, eine deutsche oper in zwei au...","die zauberfl√∂te, eine deutsche oper in zwei au...",-1.0,,,1.0,1,1
2688,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-0.5,2.0,,0.5,1836aaaa,18001899,1.0,10200,10200,1.0,mu,mu,1.0,[],[],-1.0,,,-0.5,134.0,,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.748787,von w. a. mozart ; [text von emanuel schikaned...,w.a. mozart ; in vollst√§ndigem clavierauszug m...,-0.5,schikanederemanuel,,-0.5,meyer,,-1.0,,,0.97575,"die zauberfl√∂te, grosse oper in zwei aufz√ºgen ...","die zauberfl√∂te, grosse oper in zwei aufz√ºgen ...",-1.0,,,1.0,1,1
7361,0,-1.0,,,-1.0,,,-0.5,interkantonale lehrmittelzentrale,,-1.0,,,-1.0,,,0.75,1981aaaa,1981uuuu,1.0,20000,20000,1.0,bk,bk,1.0,[],[],-1.0,,,-1.0,,,-1.0,,,-0.5,binderheidy,,0.74821,"heidy binder, sigrid kessler, charlotte ritsch...",heidy binder... [et al.] ; [√©d.:] interkantona...,-0.5,"kesslersigrid, ritschardcharlotte",,0.963845,staatlicher lehrmittelverlag,staatlicher lehrmittelverl.,-1.0,,,0.921296,"bonne chance!, cours de langue fran√ßaise, prem...","bonne chance!, cours de langue fran√ßaise, prem...",-1.0,,,0.777778,4,413
11282,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.625,1989aaaa,1999uuuu,1.0,20000,20000,1.0,bk,bk,1.0,[2-08-070552-0],[2-08-070552-0],-1.0,,,-1.0,,,1.0,552 552,552 552,1.0,voltaire,voltaire,0.90777,"voltaire ; introd., notes, bibliogr., chronolo...","voltaire ; introd., notes , bibliogr., chronol...",1.0,pomeauren√©,pomeauren√©,0.923077,flammarion,gf flammarion,-1.0,,,1.0,trait√© sur la tol√©rance,trait√© sur la tol√©rance,-1.0,,,1.0,192,192
17455,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,1983aaaa,1970uuuu,1.0,10200,10200,1.0,mu,mu,1.0,[],[],-1.0,,,1.0,4553.0,4553.0,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.730483,wolfgang amadeus mozart ; text von emanuel sch...,w.a. mozart ; libretto: emanuel schikaneder ; ...,0.636111,"grubergernot, orelalfred, moehnheinz, schikane...","moehnheinz, schikanederemanuel",1.0,b√§renreiter,b√§renreiter,-1.0,,,0.83484,"die zauberfl√∂te, eine deutsche oper in zwei au...","die zauberfl√∂te, eine deutsche oper in zwei au...",-1.0,,,0.733333,1 221,1
35061,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-0.5,2.0,,0.5,aaaaaaaa,2014uuuu,0.428571,10200,40100,1.0,mu,mu,0.0,[],[978-3-944063-13-3],-1.0,,,-0.5,134.0,,-1.0,,,-0.5,mozartwolfgang amadeus,,-0.5,,"mit luca zamperoni ... [et al.] ; idee, text, ...",-0.5,,"zamperoniluca, braunrichard, petzoldbert alexa...",-1.0,,,-1.0,,,0.710992,"die zauberfl√∂te, grosse oper in zwei aufz√ºgen ...","wolfgang amadeus mozart - die zauberfl√∂te, f√ºr...",0.77305,"die zauberfl√∂te, ausgabe f√ºr gesang und klavier",die zauberfl√∂te,0.75,1,1 72
39154,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.75,2011aaaa,2011uuuu,1.0,10300,10300,1.0,vm,vm,1.0,[],[],-1.0,,,-1.0,,,0.777778,1 1,1,-0.5,schl√∂ndorffvolker,,0.700586,volker schl√∂ndorff ; nach dem roman von max fr...,regie: volker schl√∂ndorff ; nach dem roman von...,0.612207,"frischmax, junkersdorfeberhard","schl√∂ndorffvolker, frischmax, shepardsam, depl...",0.765873,"suhrkamp, absolut medien",absolut medien,-1.0,,,1.0,homo faber,homo faber,-1.0,,,1.0,1 117,1 117
42386,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.75,1987aaaa,1987uuuu,1.0,20000,20000,1.0,bk,bk,1.0,[],[],-1.0,,,-1.0,,,-1.0,,,-1.0,,,-0.5,,heidy binder... [et al.],-0.5,,binderheidy,0.977778,[staatlicher lehrmittelverlag],staatlicher lehrmittelverlag,-1.0,,,0.910003,"bonne chance !, cours de langue fran√ßaise prem...","bonne chance !, cours de langue fran√ßaise prem...",-1.0,,,0.555556,134,145
58868,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.75,1984aaaa,1984uuuu,1.0,20000,20000,1.0,bk,bk,1.0,[],[],-1.0,,,-1.0,,,-0.5,1,,-1.0,,,0.75,sigrid kessler... [et al.] ; [√©d.:] interkanto...,sigrid kessler... [et al.],-0.5,,kesslersigrid,1.0,staatlicher lehrmittelverlag,staatlicher lehrmittelverlag,-1.0,,,0.759266,"bonne chance!, cours de langue fran√ßaise, troi...","bonne chance !, cours de langue fran√ßaise troi...",-1.0,,,1.0,237,237
64145,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-0.5,,13.0,0.625,1982aaaa,1988uuuu,1.0,20000,20000,1.0,bk,bk,1.0,[3-499-17476-6],[3-499-17476-6],-1.0,,,-1.0,,,1.0,7476,7476,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.741633,wolfgang amadeus mozart ; hrsg. von attila csa...,hrsg. von attila csampai und dietmar holland,0.528309,csampaiattila,"schikanederemanuel, csampaiattila",1.0,rowohlt,rowohlt,-1.0,,,1.0,"die zauberfl√∂te, texte, materialien, kommentare","die zauberfl√∂te, texte, materialien, kommentare",-1.0,,,0.777778,282,281


false_predicted_duplicates for NeuralNetwork


Unnamed: 0,duplicates,coordinate_E_delta,coordinate_E_x,coordinate_E_y,coordinate_N_delta,coordinate_N_x,coordinate_N_y,corporate_full_delta,corporate_full_x,corporate_full_y,doi_delta,doi_x,doi_y,edition_delta,edition_x,edition_y,exactDate_delta,exactDate_x,exactDate_y,format_postfix_delta,format_postfix_x,format_postfix_y,format_prefix_delta,format_prefix_x,format_prefix_y,isbn_delta,isbn_x,isbn_y,ismn_delta,ismn_x,ismn_y,musicid_delta,musicid_x,musicid_y,part_delta,part_x,part_y,person_100_delta,person_100_x,person_100_y,person_245c_delta,person_245c_x,person_245c_y,person_700_delta,person_700_x,person_700_y,pubinit_delta,pubinit_x,pubinit_y,scale_delta,scale_x,scale_y,ttlfull_245_delta,ttlfull_245_x,ttlfull_245_y,ttlfull_246_delta,ttlfull_246_x,ttlfull_246_y,volumes_delta,volumes_x,volumes_y
628,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.625,1999aaaa,1998uuuu,1.0,20000,20000,1.0,bk,bk,1.0,[3-495-47879-5],[3-495-47879-5],-1.0,,,-1.0,,,-0.5,,57.0,1.0,fluryandreas,fluryandreas,1.0,andreas flury,andreas flury,-1.0,,,1.0,alber,alber,-1.0,,,1.0,"der moralische status der tiere, henry salt, p...","der moralische status der tiere, henry salt, p...",-1.0,,,0.777778,316,356
2597,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.75,1970aaaa,1970uuuu,1.0,10200,10200,1.0,mu,mu,1.0,[],[],-1.0,,,-0.5,,4553.0,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.743253,wolfgang amadeus mozart ; text von emanuel sch...,w.a. mozart ; libretto: emanuel schikaneder ; ...,0.749495,"grubergernot, moehnheinz, schikanederemanuel","moehnheinz, schikanederemanuel",1.0,b√§renreiter,b√§renreiter,-1.0,,,0.83484,"die zauberfl√∂te, eine deutsche oper in zwei au...","die zauberfl√∂te, eine deutsche oper in zwei au...",-1.0,,,1.0,1,1
7361,0,-1.0,,,-1.0,,,-0.5,interkantonale lehrmittelzentrale,,-1.0,,,-1.0,,,0.75,1981aaaa,1981uuuu,1.0,20000,20000,1.0,bk,bk,1.0,[],[],-1.0,,,-1.0,,,-1.0,,,-0.5,binderheidy,,0.74821,"heidy binder, sigrid kessler, charlotte ritsch...",heidy binder... [et al.] ; [√©d.:] interkantona...,-0.5,"kesslersigrid, ritschardcharlotte",,0.963845,staatlicher lehrmittelverlag,staatlicher lehrmittelverl.,-1.0,,,0.921296,"bonne chance!, cours de langue fran√ßaise, prem...","bonne chance!, cours de langue fran√ßaise, prem...",-1.0,,,0.777778,4,413
22113,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-0.5,1.0,,0.75,2007aaaa,2007uuuu,1.0,20000,20000,1.0,bk,bk,0.0,"[978-3-7815-1531-4, 3-7815-1531-1]",[978-3-7815-1513-0],-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.856549,hrsg. michaela gl√§ser-zikuda ... [et al.],hrsg. von michaela gl√§ser-zikuda ... [et al.],-0.5,gl√§ser-zikudamichaela,,-0.5,klinkhardt,,-1.0,,,1.0,"lernprozesse dokumentieren, reflektieren und b...","lernprozesse dokumentieren, reflektieren und b...",-1.0,,,1.0,304,304
39154,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.75,2011aaaa,2011uuuu,1.0,10300,10300,1.0,vm,vm,1.0,[],[],-1.0,,,-1.0,,,0.777778,1 1,1.0,-0.5,schl√∂ndorffvolker,,0.700586,volker schl√∂ndorff ; nach dem roman von max fr...,regie: volker schl√∂ndorff ; nach dem roman von...,0.612207,"frischmax, junkersdorfeberhard","schl√∂ndorffvolker, frischmax, shepardsam, depl...",0.765873,"suhrkamp, absolut medien",absolut medien,-1.0,,,1.0,homo faber,homo faber,-1.0,,,1.0,1 117,1 117
42386,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.75,1987aaaa,1987uuuu,1.0,20000,20000,1.0,bk,bk,1.0,[],[],-1.0,,,-1.0,,,-1.0,,,-1.0,,,-0.5,,heidy binder... [et al.],-0.5,,binderheidy,0.977778,[staatlicher lehrmittelverlag],staatlicher lehrmittelverlag,-1.0,,,0.910003,"bonne chance !, cours de langue fran√ßaise prem...","bonne chance !, cours de langue fran√ßaise prem...",-1.0,,,0.555556,134,145
58588,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.75,20091990,20081991,1.0,10300,10300,1.0,vm,vm,1.0,[],[],-1.0,,,-0.5,502430.0,,-1.0,,,-1.0,,,0.794715,ein film von volker schl√∂ndorff ; nach dem rom...,ein film von volker schl√∂ndorff ; nach dem rom...,0.871796,"schl√∂ndorffvolker, wurlitzerrudy, frischmax, s...","schl√∂ndorffvolker, frischmax, arvantisjorgos, ...",1.0,kinowelt home entertainment,kinowelt home entertainment,-1.0,,,1.0,homo faber,homo faber,-1.0,,,1.0,1 109,1 109
58868,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.75,1984aaaa,1984uuuu,1.0,20000,20000,1.0,bk,bk,1.0,[],[],-1.0,,,-1.0,,,-0.5,1,,-1.0,,,0.75,sigrid kessler... [et al.] ; [√©d.:] interkanto...,sigrid kessler... [et al.],-0.5,,kesslersigrid,1.0,staatlicher lehrmittelverlag,staatlicher lehrmittelverlag,-1.0,,,0.759266,"bonne chance!, cours de langue fran√ßaise, troi...","bonne chance !, cours de langue fran√ßaise troi...",-1.0,,,1.0,237,237
70130,0,-1.0,,,-1.0,,,1.0,interkantonale lehrmittelzentrale,interkantonale lehrmittelzentrale,-1.0,,,-1.0,,,0.625,1981aaaa,1984uuuu,1.0,20000,20000,1.0,bk,bk,1.0,[],[],-1.0,,,-1.0,,,-1.0,,,0.579953,binderheidy,kesslersigrid,0.77778,"heidy binder, sigrid kessler, charlotte ritsch...","sigrid keller, caty laubscher, helen wallimann...",0.558939,"kesslersigrid, ritschardcharlotte","laubschercaty, wallimannhelen",1.0,staatlicher lehrmittelverlag,staatlicher lehrmittelverlag,-1.0,,,0.894141,"bonne chance!, cours de langue fran√ßaise, prem...","bonne chance!, cours de langue fran√ßaise, exig...",-1.0,,,1.0,4,4
82158,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.75,19001999,19441999,1.0,10100,10100,1.0,mu,mu,1.0,[],[],-1.0,,,0.25,3714.0,4355.0,-0.5,12,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.69804,von w. a. mozart ; revidiert und mit einf√ºhrun...,wolfgang amadeus mozart ; libretto by emanuel ...,-0.5,,"aberthermann, schikanederemanuel",-1.0,,,-1.0,,,0.770833,die zauberfl√∂te,"die zauberfl√∂te, the magic flute : opera : k 620",-1.0,,,0.733333,1,1 412


The output displayed above has been used for assessing similarity metrics applyed during the preprocessing of the attributes. In the end, the different ways of attributes' preprocessing with the globel parameters $\texttt{exact}\_\texttt{date}$ and $\texttt{strip}\_\texttt{number}\_\texttt{digits}$, described in subsection [Runtime Parameters](#Runtime-Parameters), represent this kind of models' tuning. In the course of this capstone project, some effort has been taken to analyse the resulting wrong predictions for each model. It has turned out to be difficult to decide on improvements of modifying similarity metrics, due to only small changes in the specific results of above. The models show a high-level stability in their behaviour and results against the modifications tried. This is expressed in the stable order of magnitude of the characterstic numbers of the confusion matrix despite of modifying the similarity metrics of an attribute applyed. The specific records of false predictions may change, though. This stability is interpreted as a positive result of the project.

For reasons of completeness, all wrong predictions of all models of all runs are shown below.

In [20]:
run_done = run + 1

#¬†Rest of runs
for r in range(run_done, len(runtime_param_dict_list)):

    #¬†Read confusion matrix results from chapters
    wrong_predictions = rsf.restore_dict_results(path_goldstandard, 'wrong_predictions_run_' + str(r) + '.pkl')

    print_wrongly_predicted_records(r, wrong_prediction_groups, wrong_predictions)
    print_union_intersection_wrongly_predicted_records(r, wrong_prediction_groups, wrong_predictions)

Run 1 - false_predicted_uniques 
*****

DecisionTreeClassifier has 18 false_predicted_uniques which corresponds to ...
 * 6.102% of records of class duplicate
 * 0.086% of all test data
The wrongly classified records have the index ...
 [103289, 103290, 103628, 103810, 103872, 103959, 104094, 104098, 104099, 104100, 104101, 104105, 104113, 104114, 104147, 104381, 104493, 104497]

DecisionTreeClassifier_CV has 8 false_predicted_uniques which corresponds to ...
 * 2.712% of records of class duplicate
 * 0.038% of all test data
The wrongly classified records have the index ...
 [103289, 103290, 104094, 104098, 104099, 104100, 104101, 104105]

RandomForestClassifier has 17 false_predicted_uniques which corresponds to ...
 * 5.763% of records of class duplicate
 * 0.081% of all test data
The wrongly classified records have the index ...
 [103289, 103290, 103653, 103810, 103813, 103872, 104094, 104098, 104099, 104100, 104101, 104105, 104110, 104113, 104114, 104493, 104497]

SVC has 28 false_

In [21]:
#¬†Rest of runs
for r in range(run_done, len(runtime_param_dict_list)):
    if r != 3: # Run 3 has been done with oversampled data. This data is not available here.

        #¬†Read confusion matrix results from chapters
        wrong_predictions = rsf.restore_dict_results(path_goldstandard, 'wrong_predictions_run_' + str(r) + '.pkl')

        display_wrongly_predicted_records(r, wrong_prediction_groups, wrong_predictions, df_attribute_with_sim_feature)

Run 1 - false_predicted_uniques 
*****
false_predicted_uniques for DecisionTreeClassifier


Unnamed: 0,duplicates,coordinate_E_delta,coordinate_E_x,coordinate_E_y,coordinate_N_delta,coordinate_N_x,coordinate_N_y,corporate_full_delta,corporate_full_x,corporate_full_y,doi_delta,doi_x,doi_y,edition_delta,edition_x,edition_y,exactDate_delta,exactDate_x,exactDate_y,format_postfix_delta,format_postfix_x,format_postfix_y,format_prefix_delta,format_prefix_x,format_prefix_y,isbn_delta,isbn_x,isbn_y,ismn_delta,ismn_x,ismn_y,musicid_delta,musicid_x,musicid_y,part_delta,part_x,part_y,person_100_delta,person_100_x,person_100_y,person_245c_delta,person_245c_x,person_245c_y,person_700_delta,person_700_x,person_700_y,pubinit_delta,pubinit_x,pubinit_y,scale_delta,scale_x,scale_y,ttlfull_245_delta,ttlfull_245_x,ttlfull_245_y,ttlfull_246_delta,ttlfull_246_x,ttlfull_246_y,volumes_delta,volumes_x,volumes_y
103289,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,aaaaaaaa,1920uuuu,1.0,20000,20000,1.0,bk,bk,1.0,[],[],-1.0,,,-1.0,,,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,-0.5,,w. a. mozart ; oper in 2 akten v. emanuel schi...,-0.5,,schikanederemanuel,-0.5,ernst eulenburg,,-1.0,,,1.0,die zauberfl√∂te,die zauberfl√∂te,-1.0,,,0.0,412.0,1
103290,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,1920aaaa,uuuuuuuu,1.0,20000,20000,1.0,bk,bk,1.0,[],[],-1.0,,,-1.0,,,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,-0.5,w. a. mozart ; oper in 2 akten v. emanuel schi...,,-0.5,schikanederemanuel,,-0.5,,ernst eulenburg,-1.0,,,1.0,die zauberfl√∂te,die zauberfl√∂te,-1.0,,,0.0,1.0,412
103628,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.75,1998aaaa,1998uuuu,1.0,10100,10100,1.0,cf,cf,1.0,[3-932992-42-3],[3-932992-42-3],-1.0,,,-1.0,,,0.8,42 42,42,1.0,frischmax,frischmax,1.0,max frisch,max frisch,-1.0,,,-0.5,terzio-verlag,,-1.0,,,0.70412,homo faber,"homo faber, originaltext, interpretation, biog...",-1.0,,,1.0,1.0,1
103810,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,aaaaaaaa,19001999,0.428571,10200,10100,1.0,mu,mu,1.0,[],[],-1.0,,,-0.5,,3714.0,-0.5,,12,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,-0.5,,von w. a. mozart ; revidiert und mit einf√ºhrun...,-1.0,,,-1.0,,,-1.0,,,0.747312,"die zauberfl√∂te, [daraus:] aria ""in diesen hei...",die zauberfl√∂te,-1.0,,,-0.5,,1
103872,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,aaaaaaaa,19451966,0.428571,30300,30700,1.0,cr,cr,0.0,[],[1144-1321],-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-0.5,idealibri,,-1.0,,,0.681481,arts,"arts, beaux-arts, litt√©rature, spectacles : (j...",-1.0,,,-1.0,,
103959,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.75,1993aaaa,1993uuuu,0.428571,10100,10000,1.0,mu,mu,1.0,[963-8303-08-5],[963-8303-08-5],-1.0,,,-0.5,,1004.0,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,1.0,wolfgang amadeus mozart,wolfgang amadeus mozart,-1.0,,,-0.5,k√∂nemann,,-1.0,,,1.0,"die zauberfl√∂te, partitura","die zauberfl√∂te, partitura",-1.0,,,0.511111,225.0,1 225
104094,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,aaaaaaaa,1941uuuu,1.0,20000,20000,1.0,bk,bk,1.0,[],[],-1.0,,,-1.0,,,0.814815,2620,2620 2620,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.858072,von mozart ; dichtung von emanuel schikaneder ...,von mozart ; dichtung von emanuel schikaneder ...,1.0,"schikanederemanuel, krusegeorg richard","schikanederemanuel, krusegeorg richard",-0.5,,p. reclam jun.,-1.0,,,0.881356,"die zauberfl√∂te, oper in zwei aufz√ºgen","die zauberfl√∂te, oper in zwei aufz√ºgen : volls...",-1.0,,,1.0,74.0,74
104098,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,aaaaaaaa,1941uuuu,1.0,20000,20000,1.0,bk,bk,1.0,[],[],-1.0,,,-1.0,,,0.796296,2620 5,2620 2620,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.839556,von mozart ; dichtung von emanuel schikaneder ...,von mozart ; dichtung von emanuel schikaneder ...,0.771292,"krusegeorg richard, schikanederemanuel","schikanederemanuel, krusegeorg richard",0.809524,reclam,p. reclam jun.,-1.0,,,0.881356,"die zauberfl√∂te, oper in zwei aufz√ºgen","die zauberfl√∂te, oper in zwei aufz√ºgen : volls...",-1.0,,,1.0,74.0,74
104099,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,aaaaaaaa,uuuuuuuu,1.0,20000,20000,1.0,bk,bk,1.0,[],[],-1.0,,,-1.0,,,0.796296,2620 5,2620 2620,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.770017,von mozart ; dichtung von emanuel schikaneder ...,wolfgang amadeus mozart ; dichtung von emanuel...,1.0,"krusegeorg richard, schikanederemanuel","krusegeorg richard, schikanederemanuel",0.412698,reclam,philipp reclam,-1.0,,,0.881356,"die zauberfl√∂te, oper in zwei aufz√ºgen","die zauberfl√∂te, oper in zwei aufz√ºgen : volls...",-1.0,,,1.0,74.0,74
104100,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,1941aaaa,uuuuuuuu,1.0,20000,20000,1.0,bk,bk,1.0,[],[],-1.0,,,-1.0,,,0.814815,2620 2620,2620,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.858072,von mozart ; dichtung von emanuel schikaneder ...,von mozart ; dichtung von emanuel schikaneder ...,1.0,"schikanederemanuel, krusegeorg richard","schikanederemanuel, krusegeorg richard",-0.5,p. reclam jun.,,-1.0,,,0.881356,"die zauberfl√∂te, oper in zwei aufz√ºgen : volls...","die zauberfl√∂te, oper in zwei aufz√ºgen",-1.0,,,1.0,74.0,74


false_predicted_uniques for DecisionTreeClassifier_CV


Unnamed: 0,duplicates,coordinate_E_delta,coordinate_E_x,coordinate_E_y,coordinate_N_delta,coordinate_N_x,coordinate_N_y,corporate_full_delta,corporate_full_x,corporate_full_y,doi_delta,doi_x,doi_y,edition_delta,edition_x,edition_y,exactDate_delta,exactDate_x,exactDate_y,format_postfix_delta,format_postfix_x,format_postfix_y,format_prefix_delta,format_prefix_x,format_prefix_y,isbn_delta,isbn_x,isbn_y,ismn_delta,ismn_x,ismn_y,musicid_delta,musicid_x,musicid_y,part_delta,part_x,part_y,person_100_delta,person_100_x,person_100_y,person_245c_delta,person_245c_x,person_245c_y,person_700_delta,person_700_x,person_700_y,pubinit_delta,pubinit_x,pubinit_y,scale_delta,scale_x,scale_y,ttlfull_245_delta,ttlfull_245_x,ttlfull_245_y,ttlfull_246_delta,ttlfull_246_x,ttlfull_246_y,volumes_delta,volumes_x,volumes_y
103289,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,aaaaaaaa,1920uuuu,1.0,20000,20000,1.0,bk,bk,1.0,[],[],-1.0,,,-1.0,,,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,-0.5,,w. a. mozart ; oper in 2 akten v. emanuel schi...,-0.5,,schikanederemanuel,-0.5,ernst eulenburg,,-1.0,,,1.0,die zauberfl√∂te,die zauberfl√∂te,-1.0,,,0.0,412,1
103290,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,1920aaaa,uuuuuuuu,1.0,20000,20000,1.0,bk,bk,1.0,[],[],-1.0,,,-1.0,,,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,-0.5,w. a. mozart ; oper in 2 akten v. emanuel schi...,,-0.5,schikanederemanuel,,-0.5,,ernst eulenburg,-1.0,,,1.0,die zauberfl√∂te,die zauberfl√∂te,-1.0,,,0.0,1,412
104094,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,aaaaaaaa,1941uuuu,1.0,20000,20000,1.0,bk,bk,1.0,[],[],-1.0,,,-1.0,,,0.814815,2620,2620 2620,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.858072,von mozart ; dichtung von emanuel schikaneder ...,von mozart ; dichtung von emanuel schikaneder ...,1.0,"schikanederemanuel, krusegeorg richard","schikanederemanuel, krusegeorg richard",-0.5,,p. reclam jun.,-1.0,,,0.881356,"die zauberfl√∂te, oper in zwei aufz√ºgen","die zauberfl√∂te, oper in zwei aufz√ºgen : volls...",-1.0,,,1.0,74,74
104098,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,aaaaaaaa,1941uuuu,1.0,20000,20000,1.0,bk,bk,1.0,[],[],-1.0,,,-1.0,,,0.796296,2620 5,2620 2620,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.839556,von mozart ; dichtung von emanuel schikaneder ...,von mozart ; dichtung von emanuel schikaneder ...,0.771292,"krusegeorg richard, schikanederemanuel","schikanederemanuel, krusegeorg richard",0.809524,reclam,p. reclam jun.,-1.0,,,0.881356,"die zauberfl√∂te, oper in zwei aufz√ºgen","die zauberfl√∂te, oper in zwei aufz√ºgen : volls...",-1.0,,,1.0,74,74
104099,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,aaaaaaaa,uuuuuuuu,1.0,20000,20000,1.0,bk,bk,1.0,[],[],-1.0,,,-1.0,,,0.796296,2620 5,2620 2620,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.770017,von mozart ; dichtung von emanuel schikaneder ...,wolfgang amadeus mozart ; dichtung von emanuel...,1.0,"krusegeorg richard, schikanederemanuel","krusegeorg richard, schikanederemanuel",0.412698,reclam,philipp reclam,-1.0,,,0.881356,"die zauberfl√∂te, oper in zwei aufz√ºgen","die zauberfl√∂te, oper in zwei aufz√ºgen : volls...",-1.0,,,1.0,74,74
104100,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,1941aaaa,uuuuuuuu,1.0,20000,20000,1.0,bk,bk,1.0,[],[],-1.0,,,-1.0,,,0.814815,2620 2620,2620,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.858072,von mozart ; dichtung von emanuel schikaneder ...,von mozart ; dichtung von emanuel schikaneder ...,1.0,"schikanederemanuel, krusegeorg richard","schikanederemanuel, krusegeorg richard",-0.5,p. reclam jun.,,-1.0,,,0.881356,"die zauberfl√∂te, oper in zwei aufz√ºgen : volls...","die zauberfl√∂te, oper in zwei aufz√ºgen",-1.0,,,1.0,74,74
104101,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,1941aaaa,uuuuuuuu,1.0,20000,20000,1.0,bk,bk,1.0,[],[],-1.0,,,-1.0,,,0.796296,2620 2620,2620 5,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.839556,von mozart ; dichtung von emanuel schikaneder ...,von mozart ; dichtung von emanuel schikaneder ...,0.771292,"schikanederemanuel, krusegeorg richard","krusegeorg richard, schikanederemanuel",0.809524,p. reclam jun.,reclam,-1.0,,,0.881356,"die zauberfl√∂te, oper in zwei aufz√ºgen : volls...","die zauberfl√∂te, oper in zwei aufz√ºgen",-1.0,,,1.0,74,74
104105,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,aaaaaaaa,uuuuuuuu,1.0,20000,20000,1.0,bk,bk,1.0,[],[],-1.0,,,-1.0,,,0.796296,2620 2620,2620 5,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.770017,wolfgang amadeus mozart ; dichtung von emanuel...,von mozart ; dichtung von emanuel schikaneder ...,1.0,"krusegeorg richard, schikanederemanuel","krusegeorg richard, schikanederemanuel",0.412698,philipp reclam,reclam,-1.0,,,0.881356,"die zauberfl√∂te, oper in zwei aufz√ºgen : volls...","die zauberfl√∂te, oper in zwei aufz√ºgen",-1.0,,,1.0,74,74


false_predicted_uniques for RandomForestClassifier


Unnamed: 0,duplicates,coordinate_E_delta,coordinate_E_x,coordinate_E_y,coordinate_N_delta,coordinate_N_x,coordinate_N_y,corporate_full_delta,corporate_full_x,corporate_full_y,doi_delta,doi_x,doi_y,edition_delta,edition_x,edition_y,exactDate_delta,exactDate_x,exactDate_y,format_postfix_delta,format_postfix_x,format_postfix_y,format_prefix_delta,format_prefix_x,format_prefix_y,isbn_delta,isbn_x,isbn_y,ismn_delta,ismn_x,ismn_y,musicid_delta,musicid_x,musicid_y,part_delta,part_x,part_y,person_100_delta,person_100_x,person_100_y,person_245c_delta,person_245c_x,person_245c_y,person_700_delta,person_700_x,person_700_y,pubinit_delta,pubinit_x,pubinit_y,scale_delta,scale_x,scale_y,ttlfull_245_delta,ttlfull_245_x,ttlfull_245_y,ttlfull_246_delta,ttlfull_246_x,ttlfull_246_y,volumes_delta,volumes_x,volumes_y
103289,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,aaaaaaaa,1920uuuu,1.0,20000,20000,1.0,bk,bk,1.0,[],[],-1.0,,,-1.0,,,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,-0.5,,w. a. mozart ; oper in 2 akten v. emanuel schi...,-0.5,,schikanederemanuel,-0.5,ernst eulenburg,,-1.0,,,1.0,die zauberfl√∂te,die zauberfl√∂te,-1.0,,,0.0,412,1
103290,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,1920aaaa,uuuuuuuu,1.0,20000,20000,1.0,bk,bk,1.0,[],[],-1.0,,,-1.0,,,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,-0.5,w. a. mozart ; oper in 2 akten v. emanuel schi...,,-0.5,schikanederemanuel,,-0.5,,ernst eulenburg,-1.0,,,1.0,die zauberfl√∂te,die zauberfl√∂te,-1.0,,,0.0,1,412
103653,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,19201929,uuuuuuuu,1.0,10200,10200,1.0,mu,mu,1.0,[],[],-1.0,,,-0.5,245.0,,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.67731,von emanuel schikaneder ; [musik von] wolfgang...,von w.a. mozart ; klavierauszug neu rev. von w...,-0.5,"schikanederemanuel, kienzlwilhelm",,-0.5,universal edition,,-1.0,,,0.854023,"die zauberfl√∂te, il flauto magico : oper in zw...","die zauberfl√∂te, oper in 2 akten = il flauto m...",-1.0,,,1.0,1 167,1 167
103810,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,aaaaaaaa,19001999,0.428571,10200,10100,1.0,mu,mu,1.0,[],[],-1.0,,,-0.5,,3714.0,-0.5,,12,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,-0.5,,von w. a. mozart ; revidiert und mit einf√ºhrun...,-1.0,,,-1.0,,,-1.0,,,0.747312,"die zauberfl√∂te, [daraus:] aria ""in diesen hei...",die zauberfl√∂te,-1.0,,,-0.5,,1
103813,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,aaaaaaaa,uuuuuuuu,1.0,10200,10200,1.0,mu,mu,1.0,[],[],-1.0,,,-0.5,,10425.0,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,-0.5,von w.a. mozart ; klavierauszug nach dem in de...,,-0.5,,soldankurt,-1.0,,,-1.0,,,0.794613,"die zauberfl√∂te, oper in zwei aufz√ºgen","die zauberfl√∂te, oper in zwei aufz√ºgen : klavi...",-0.5,,"die zauberfl√∂te, ausgabe f√ºr gesang und klavier",0.733333,1 188,1
103872,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,aaaaaaaa,19451966,0.428571,30300,30700,1.0,cr,cr,0.0,[],[1144-1321],-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-0.5,idealibri,,-1.0,,,0.681481,arts,"arts, beaux-arts, litt√©rature, spectacles : (j...",-1.0,,,-1.0,,
104094,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,aaaaaaaa,1941uuuu,1.0,20000,20000,1.0,bk,bk,1.0,[],[],-1.0,,,-1.0,,,0.814815,2620,2620 2620,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.858072,von mozart ; dichtung von emanuel schikaneder ...,von mozart ; dichtung von emanuel schikaneder ...,1.0,"schikanederemanuel, krusegeorg richard","schikanederemanuel, krusegeorg richard",-0.5,,p. reclam jun.,-1.0,,,0.881356,"die zauberfl√∂te, oper in zwei aufz√ºgen","die zauberfl√∂te, oper in zwei aufz√ºgen : volls...",-1.0,,,1.0,74,74
104098,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,aaaaaaaa,1941uuuu,1.0,20000,20000,1.0,bk,bk,1.0,[],[],-1.0,,,-1.0,,,0.796296,2620 5,2620 2620,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.839556,von mozart ; dichtung von emanuel schikaneder ...,von mozart ; dichtung von emanuel schikaneder ...,0.771292,"krusegeorg richard, schikanederemanuel","schikanederemanuel, krusegeorg richard",0.809524,reclam,p. reclam jun.,-1.0,,,0.881356,"die zauberfl√∂te, oper in zwei aufz√ºgen","die zauberfl√∂te, oper in zwei aufz√ºgen : volls...",-1.0,,,1.0,74,74
104099,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,aaaaaaaa,uuuuuuuu,1.0,20000,20000,1.0,bk,bk,1.0,[],[],-1.0,,,-1.0,,,0.796296,2620 5,2620 2620,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.770017,von mozart ; dichtung von emanuel schikaneder ...,wolfgang amadeus mozart ; dichtung von emanuel...,1.0,"krusegeorg richard, schikanederemanuel","krusegeorg richard, schikanederemanuel",0.412698,reclam,philipp reclam,-1.0,,,0.881356,"die zauberfl√∂te, oper in zwei aufz√ºgen","die zauberfl√∂te, oper in zwei aufz√ºgen : volls...",-1.0,,,1.0,74,74
104100,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,1941aaaa,uuuuuuuu,1.0,20000,20000,1.0,bk,bk,1.0,[],[],-1.0,,,-1.0,,,0.814815,2620 2620,2620,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.858072,von mozart ; dichtung von emanuel schikaneder ...,von mozart ; dichtung von emanuel schikaneder ...,1.0,"schikanederemanuel, krusegeorg richard","schikanederemanuel, krusegeorg richard",-0.5,p. reclam jun.,,-1.0,,,0.881356,"die zauberfl√∂te, oper in zwei aufz√ºgen : volls...","die zauberfl√∂te, oper in zwei aufz√ºgen",-1.0,,,1.0,74,74


false_predicted_uniques for SVC


Unnamed: 0,duplicates,coordinate_E_delta,coordinate_E_x,coordinate_E_y,coordinate_N_delta,coordinate_N_x,coordinate_N_y,corporate_full_delta,corporate_full_x,corporate_full_y,doi_delta,doi_x,doi_y,edition_delta,edition_x,edition_y,exactDate_delta,exactDate_x,exactDate_y,format_postfix_delta,format_postfix_x,format_postfix_y,format_prefix_delta,format_prefix_x,format_prefix_y,isbn_delta,isbn_x,isbn_y,ismn_delta,ismn_x,ismn_y,musicid_delta,musicid_x,musicid_y,part_delta,part_x,part_y,person_100_delta,person_100_x,person_100_y,person_245c_delta,person_245c_x,person_245c_y,person_700_delta,person_700_x,person_700_y,pubinit_delta,pubinit_x,pubinit_y,scale_delta,scale_x,scale_y,ttlfull_245_delta,ttlfull_245_x,ttlfull_245_y,ttlfull_246_delta,ttlfull_246_x,ttlfull_246_y,volumes_delta,volumes_x,volumes_y
103289,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,aaaaaaaa,1920uuuu,1.0,20000,20000,1.0,bk,bk,1.0,[],[],-1.0,,,-1.0,,,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,-0.5,,w. a. mozart ; oper in 2 akten v. emanuel schi...,-0.5,,schikanederemanuel,-0.5,ernst eulenburg,,-1.0,,,1.0,die zauberfl√∂te,die zauberfl√∂te,-1.0,,,0.0,412,1
103290,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,1920aaaa,uuuuuuuu,1.0,20000,20000,1.0,bk,bk,1.0,[],[],-1.0,,,-1.0,,,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,-0.5,w. a. mozart ; oper in 2 akten v. emanuel schi...,,-0.5,schikanederemanuel,,-0.5,,ernst eulenburg,-1.0,,,1.0,die zauberfl√∂te,die zauberfl√∂te,-1.0,,,0.0,1,412
103609,1,-1.0,,,-1.0,,,-0.5,,wiener philharmoniker,-1.0,,,-1.0,,,0.75,1991aaaa,1991uuuu,1.0,40100,40100,1.0,mu,mu,1.0,[],[],-1.0,,,0.428571,171433.0,433.0,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.499433,von emanuel schikaneder ; wolfgang amadeus mozart,mozart,0.562408,schikanederemanuel,"heilmannuwe, josumi, krausmichael, leitnerlott...",0.805556,decca record,decca,-1.0,,,0.798246,"die zauberfl√∂te, oper in zwei aufz√ºgen",die zauberfl√∂te,-1.0,,,0.733333,2 152,2
103614,1,-1.0,,,-1.0,,,-0.5,wiener philharmoniker,,-1.0,,,-1.0,,,0.75,1991aaaa,1991uuuu,1.0,40100,40100,1.0,mu,mu,1.0,[],[],-1.0,,,0.428571,433210.0,171433.0,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.713536,wolfgang amadeus mozart ; wiener philharmonike...,von emanuel schikaneder ; wolfgang amadeus mozart,0.587407,soltigeorg,schikanederemanuel,-0.5,,decca record,-1.0,,,0.798246,die zauberfl√∂te,"die zauberfl√∂te, oper in zwei aufz√ºgen",-1.0,,,1.0,2 152,2 152
103618,1,-1.0,,,-1.0,,,-0.5,wiener philharmoniker,,-1.0,,,-1.0,,,0.75,1991aaaa,1991uuuu,1.0,40100,40100,1.0,mu,mu,1.0,[],[],-1.0,,,0.428571,433.0,171433.0,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.499433,mozart,von emanuel schikaneder ; wolfgang amadeus mozart,0.562408,"heilmannuwe, josumi, krausmichael, leitnerlott...",schikanederemanuel,0.805556,decca,decca record,-1.0,,,0.798246,die zauberfl√∂te,"die zauberfl√∂te, oper in zwei aufz√ºgen",-1.0,,,0.733333,2,2 152
103635,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.75,1998aaaa,1998uuuu,1.0,10100,10100,1.0,cf,cf,1.0,[3-932992-42-3],[3-932992-42-3],-1.0,,,-1.0,,,1.0,42 42,42 42,-0.5,,frischmax,1.0,max frisch,max frisch,-0.5,frischmax,,1.0,terzio-verlag,terzio-verlag,-1.0,,,1.0,homo faber,homo faber,-1.0,,,1.0,1,1
103653,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,19201929,uuuuuuuu,1.0,10200,10200,1.0,mu,mu,1.0,[],[],-1.0,,,-0.5,245.0,,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.67731,von emanuel schikaneder ; [musik von] wolfgang...,von w.a. mozart ; klavierauszug neu rev. von w...,-0.5,"schikanederemanuel, kienzlwilhelm",,-0.5,universal edition,,-1.0,,,0.854023,"die zauberfl√∂te, il flauto magico : oper in zw...","die zauberfl√∂te, oper in 2 akten = il flauto m...",-1.0,,,1.0,1 167,1 167
103729,1,-1.0,,,-1.0,,,1.0,schweizbundesamt f√ºr landestopografie,schweizbundesamt f√ºr landestopografie,-1.0,,,-1.0,,,0.75,2007aaaa,2007uuuu,1.0,10347,10347,1.0,mp,mp,1.0,[],[],-1.0,,,-1.0,,,-1.0,,,-1.0,,,1.0,bundesamt f√ºr landestopografie swisstopo,bundesamt f√ºr landestopografie swisstopo,1.0,dufourguillaume henri,dufourguillaume henri,-1.0,,,-1.0,,,1.0,"dufourkarten, topografische karte der schweiz","dufourkarten, topografische karte der schweiz",-1.0,,,1.0,2,2
103779,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.75,2005aaaa,2005uuuu,1.0,10300,10300,1.0,vm,vm,1.0,[],[],-1.0,,,-1.0,,,-1.0,,,-1.0,,,1.0,ein film von luc jacquet,ein film von luc jacquet,0.851282,"jacquetluc, du montsky, bohringerromane",jacquetluc,1.0,frenetic films,frenetic films,-1.0,,,0.742268,die reise der pinguine,"die reise der pinguine, die natur schreibt die...",-1.0,,,0.888889,1 82,1 82 2
103802,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.75,2005aaaa,2005uuuu,1.0,10300,10300,1.0,vm,vm,1.0,[],[],-1.0,,,-0.5,,99036.0,-1.0,,,-1.0,,,1.0,ein film von luc jacquet,ein film von luc jacquet,1.0,jacquetluc,jacquetluc,-0.5,frenetic films,,-1.0,,,0.742268,"die reise der pinguine, die natur schreibt die...",die reise der pinguine,-1.0,,,0.888889,1 82 2,1 82


false_predicted_uniques for SVC_CV


Unnamed: 0,duplicates,coordinate_E_delta,coordinate_E_x,coordinate_E_y,coordinate_N_delta,coordinate_N_x,coordinate_N_y,corporate_full_delta,corporate_full_x,corporate_full_y,doi_delta,doi_x,doi_y,edition_delta,edition_x,edition_y,exactDate_delta,exactDate_x,exactDate_y,format_postfix_delta,format_postfix_x,format_postfix_y,format_prefix_delta,format_prefix_x,format_prefix_y,isbn_delta,isbn_x,isbn_y,ismn_delta,ismn_x,ismn_y,musicid_delta,musicid_x,musicid_y,part_delta,part_x,part_y,person_100_delta,person_100_x,person_100_y,person_245c_delta,person_245c_x,person_245c_y,person_700_delta,person_700_x,person_700_y,pubinit_delta,pubinit_x,pubinit_y,scale_delta,scale_x,scale_y,ttlfull_245_delta,ttlfull_245_x,ttlfull_245_y,ttlfull_246_delta,ttlfull_246_x,ttlfull_246_y,volumes_delta,volumes_x,volumes_y
103289,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,aaaaaaaa,1920uuuu,1.0,20000,20000,1.0,bk,bk,1.0,[],[],-1.0,,,-1.0,,,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,-0.5,,w. a. mozart ; oper in 2 akten v. emanuel schi...,-0.5,,schikanederemanuel,-0.5,ernst eulenburg,,-1.0,,,1.0,die zauberfl√∂te,die zauberfl√∂te,-1.0,,,0.0,412,1
103290,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,1920aaaa,uuuuuuuu,1.0,20000,20000,1.0,bk,bk,1.0,[],[],-1.0,,,-1.0,,,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,-0.5,w. a. mozart ; oper in 2 akten v. emanuel schi...,,-0.5,schikanederemanuel,,-0.5,,ernst eulenburg,-1.0,,,1.0,die zauberfl√∂te,die zauberfl√∂te,-1.0,,,0.0,1,412
103609,1,-1.0,,,-1.0,,,-0.5,,wiener philharmoniker,-1.0,,,-1.0,,,0.75,1991aaaa,1991uuuu,1.0,40100,40100,1.0,mu,mu,1.0,[],[],-1.0,,,0.428571,171433.0,433.0,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.499433,von emanuel schikaneder ; wolfgang amadeus mozart,mozart,0.562408,schikanederemanuel,"heilmannuwe, josumi, krausmichael, leitnerlott...",0.805556,decca record,decca,-1.0,,,0.798246,"die zauberfl√∂te, oper in zwei aufz√ºgen",die zauberfl√∂te,-1.0,,,0.733333,2 152,2
103614,1,-1.0,,,-1.0,,,-0.5,wiener philharmoniker,,-1.0,,,-1.0,,,0.75,1991aaaa,1991uuuu,1.0,40100,40100,1.0,mu,mu,1.0,[],[],-1.0,,,0.428571,433210.0,171433.0,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.713536,wolfgang amadeus mozart ; wiener philharmonike...,von emanuel schikaneder ; wolfgang amadeus mozart,0.587407,soltigeorg,schikanederemanuel,-0.5,,decca record,-1.0,,,0.798246,die zauberfl√∂te,"die zauberfl√∂te, oper in zwei aufz√ºgen",-1.0,,,1.0,2 152,2 152
103618,1,-1.0,,,-1.0,,,-0.5,wiener philharmoniker,,-1.0,,,-1.0,,,0.75,1991aaaa,1991uuuu,1.0,40100,40100,1.0,mu,mu,1.0,[],[],-1.0,,,0.428571,433.0,171433.0,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.499433,mozart,von emanuel schikaneder ; wolfgang amadeus mozart,0.562408,"heilmannuwe, josumi, krausmichael, leitnerlott...",schikanederemanuel,0.805556,decca,decca record,-1.0,,,0.798246,die zauberfl√∂te,"die zauberfl√∂te, oper in zwei aufz√ºgen",-1.0,,,0.733333,2,2 152
103653,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,19201929,uuuuuuuu,1.0,10200,10200,1.0,mu,mu,1.0,[],[],-1.0,,,-0.5,245.0,,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.67731,von emanuel schikaneder ; [musik von] wolfgang...,von w.a. mozart ; klavierauszug neu rev. von w...,-0.5,"schikanederemanuel, kienzlwilhelm",,-0.5,universal edition,,-1.0,,,0.854023,"die zauberfl√∂te, il flauto magico : oper in zw...","die zauberfl√∂te, oper in 2 akten = il flauto m...",-1.0,,,1.0,1 167,1 167
103810,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,aaaaaaaa,19001999,0.428571,10200,10100,1.0,mu,mu,1.0,[],[],-1.0,,,-0.5,,3714.0,-0.5,,12,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,-0.5,,von w. a. mozart ; revidiert und mit einf√ºhrun...,-1.0,,,-1.0,,,-1.0,,,0.747312,"die zauberfl√∂te, [daraus:] aria ""in diesen hei...",die zauberfl√∂te,-1.0,,,-0.5,,1
103813,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,aaaaaaaa,uuuuuuuu,1.0,10200,10200,1.0,mu,mu,1.0,[],[],-1.0,,,-0.5,,10425.0,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,-0.5,von w.a. mozart ; klavierauszug nach dem in de...,,-0.5,,soldankurt,-1.0,,,-1.0,,,0.794613,"die zauberfl√∂te, oper in zwei aufz√ºgen","die zauberfl√∂te, oper in zwei aufz√ºgen : klavi...",-0.5,,"die zauberfl√∂te, ausgabe f√ºr gesang und klavier",0.733333,1 188,1
103823,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.625,19aaaaaa,1950uuuu,1.0,10100,10100,1.0,mu,mu,1.0,[],[],-1.0,,,0.0,4355.0,912.0,-0.5,,912 912,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.880856,a german opera by emanuel schikaneder ; music ...,a german opera by emanuel schikaneder ; music ...,-0.5,,"aberthermann, schikanederemanuel",-0.5,,e. eulenburg,-1.0,,,0.833333,die zauberfl√∂te,"die zauberfl√∂te, k√∂chel no 620",-1.0,,,0.733333,1,1 412
103867,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.75,1471aaaa,1471uuuu,1.0,20053,20053,1.0,bk,bk,1.0,[],[],-1.0,,,-1.0,,,-1.0,,,1.0,crescenzipietro de',crescenzipietro de',-1.0,,,-1.0,,,-1.0,,,-1.0,,,1.0,ruralia commoda,ruralia commoda,-1.0,,,1.0,418,418


false_predicted_uniques for NeuralNetwork


Unnamed: 0,duplicates,coordinate_E_delta,coordinate_E_x,coordinate_E_y,coordinate_N_delta,coordinate_N_x,coordinate_N_y,corporate_full_delta,corporate_full_x,corporate_full_y,doi_delta,doi_x,doi_y,edition_delta,edition_x,edition_y,exactDate_delta,exactDate_x,exactDate_y,format_postfix_delta,format_postfix_x,format_postfix_y,format_prefix_delta,format_prefix_x,format_prefix_y,isbn_delta,isbn_x,isbn_y,ismn_delta,ismn_x,ismn_y,musicid_delta,musicid_x,musicid_y,part_delta,part_x,part_y,person_100_delta,person_100_x,person_100_y,person_245c_delta,person_245c_x,person_245c_y,person_700_delta,person_700_x,person_700_y,pubinit_delta,pubinit_x,pubinit_y,scale_delta,scale_x,scale_y,ttlfull_245_delta,ttlfull_245_x,ttlfull_245_y,ttlfull_246_delta,ttlfull_246_x,ttlfull_246_y,volumes_delta,volumes_x,volumes_y
103183,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.75,2009aaaa,2009uuuu,1.0,20000,20000,1.0,bk,bk,1.0,[978-3-15-020008-7],[978-3-15-020008-7],-1.0,,,-1.0,,,1.0,20008,20008,1.0,austenjane,austenjane,0.818905,jane austen ; aus dem englischen √ºbersetzt von...,jane austen ; aus dem engl. √ºbers. von ursula ...,-0.5,"grawechristian, graweursula",,0.848485,reclam jun.,reclam,-1.0,,,0.787879,"emma, roman",emma,-1.0,,,1.0,600,600
103289,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,aaaaaaaa,1920uuuu,1.0,20000,20000,1.0,bk,bk,1.0,[],[],-1.0,,,-1.0,,,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,-0.5,,w. a. mozart ; oper in 2 akten v. emanuel schi...,-0.5,,schikanederemanuel,-0.5,ernst eulenburg,,-1.0,,,1.0,die zauberfl√∂te,die zauberfl√∂te,-1.0,,,0.0,412,1
103290,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,1920aaaa,uuuuuuuu,1.0,20000,20000,1.0,bk,bk,1.0,[],[],-1.0,,,-1.0,,,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,-0.5,w. a. mozart ; oper in 2 akten v. emanuel schi...,,-0.5,schikanederemanuel,,-0.5,,ernst eulenburg,-1.0,,,1.0,die zauberfl√∂te,die zauberfl√∂te,-1.0,,,0.0,1,412
103609,1,-1.0,,,-1.0,,,-0.5,,wiener philharmoniker,-1.0,,,-1.0,,,0.75,1991aaaa,1991uuuu,1.0,40100,40100,1.0,mu,mu,1.0,[],[],-1.0,,,0.428571,171433.0,433.0,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.499433,von emanuel schikaneder ; wolfgang amadeus mozart,mozart,0.562408,schikanederemanuel,"heilmannuwe, josumi, krausmichael, leitnerlott...",0.805556,decca record,decca,-1.0,,,0.798246,"die zauberfl√∂te, oper in zwei aufz√ºgen",die zauberfl√∂te,-1.0,,,0.733333,2 152,2
103618,1,-1.0,,,-1.0,,,-0.5,wiener philharmoniker,,-1.0,,,-1.0,,,0.75,1991aaaa,1991uuuu,1.0,40100,40100,1.0,mu,mu,1.0,[],[],-1.0,,,0.428571,433.0,171433.0,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.499433,mozart,von emanuel schikaneder ; wolfgang amadeus mozart,0.562408,"heilmannuwe, josumi, krausmichael, leitnerlott...",schikanederemanuel,0.805556,decca,decca record,-1.0,,,0.798246,die zauberfl√∂te,"die zauberfl√∂te, oper in zwei aufz√ºgen",-1.0,,,0.733333,2,2 152
103653,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,19201929,uuuuuuuu,1.0,10200,10200,1.0,mu,mu,1.0,[],[],-1.0,,,-0.5,245.0,,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.67731,von emanuel schikaneder ; [musik von] wolfgang...,von w.a. mozart ; klavierauszug neu rev. von w...,-0.5,"schikanederemanuel, kienzlwilhelm",,-0.5,universal edition,,-1.0,,,0.854023,"die zauberfl√∂te, il flauto magico : oper in zw...","die zauberfl√∂te, oper in 2 akten = il flauto m...",-1.0,,,1.0,1 167,1 167
103729,1,-1.0,,,-1.0,,,1.0,schweizbundesamt f√ºr landestopografie,schweizbundesamt f√ºr landestopografie,-1.0,,,-1.0,,,0.75,2007aaaa,2007uuuu,1.0,10347,10347,1.0,mp,mp,1.0,[],[],-1.0,,,-1.0,,,-1.0,,,-1.0,,,1.0,bundesamt f√ºr landestopografie swisstopo,bundesamt f√ºr landestopografie swisstopo,1.0,dufourguillaume henri,dufourguillaume henri,-1.0,,,-1.0,,,1.0,"dufourkarten, topografische karte der schweiz","dufourkarten, topografische karte der schweiz",-1.0,,,1.0,2,2
103810,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,aaaaaaaa,19001999,0.428571,10200,10100,1.0,mu,mu,1.0,[],[],-1.0,,,-0.5,,3714.0,-0.5,,12,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,-0.5,,von w. a. mozart ; revidiert und mit einf√ºhrun...,-1.0,,,-1.0,,,-1.0,,,0.747312,"die zauberfl√∂te, [daraus:] aria ""in diesen hei...",die zauberfl√∂te,-1.0,,,-0.5,,1
103813,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,aaaaaaaa,uuuuuuuu,1.0,10200,10200,1.0,mu,mu,1.0,[],[],-1.0,,,-0.5,,10425.0,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,-0.5,von w.a. mozart ; klavierauszug nach dem in de...,,-0.5,,soldankurt,-1.0,,,-1.0,,,0.794613,"die zauberfl√∂te, oper in zwei aufz√ºgen","die zauberfl√∂te, oper in zwei aufz√ºgen : klavi...",-0.5,,"die zauberfl√∂te, ausgabe f√ºr gesang und klavier",0.733333,1 188,1
103945,1,-1.0,,,-1.0,,,1.0,theater st. gallen,theater st. gallen,-1.0,,,-1.0,,,0.75,2002aaaa,2002uuuu,1.0,20000,20000,1.0,bk,bk,1.0,[],[],-1.0,,,-1.0,,,-1.0,,,-1.0,,,1.0,[red.: ute vollmar],[red.: ute vollmar],1.0,"mozartwolfgang amadeus, vollmarute","mozartwolfgang amadeus, vollmarute",1.0,theater st. gallen,theater st. gallen,-1.0,,,1.0,"die zauberfl√∂te, [grosse oper in zwei aufz√ºgen...","die zauberfl√∂te, [grosse oper in zwei aufz√ºgen...",-1.0,,,1.0,80,80


Run 1 - false_predicted_duplicates 
*****
false_predicted_duplicates for DecisionTreeClassifier


Unnamed: 0,duplicates,coordinate_E_delta,coordinate_E_x,coordinate_E_y,coordinate_N_delta,coordinate_N_x,coordinate_N_y,corporate_full_delta,corporate_full_x,corporate_full_y,doi_delta,doi_x,doi_y,edition_delta,edition_x,edition_y,exactDate_delta,exactDate_x,exactDate_y,format_postfix_delta,format_postfix_x,format_postfix_y,format_prefix_delta,format_prefix_x,format_prefix_y,isbn_delta,isbn_x,isbn_y,ismn_delta,ismn_x,ismn_y,musicid_delta,musicid_x,musicid_y,part_delta,part_x,part_y,person_100_delta,person_100_x,person_100_y,person_245c_delta,person_245c_x,person_245c_y,person_700_delta,person_700_x,person_700_y,pubinit_delta,pubinit_x,pubinit_y,scale_delta,scale_x,scale_y,ttlfull_245_delta,ttlfull_245_x,ttlfull_245_y,ttlfull_246_delta,ttlfull_246_x,ttlfull_246_y,volumes_delta,volumes_x,volumes_y
2597,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.75,1970aaaa,1970uuuu,1.0,10200,10200,1.0,mu,mu,1.0,[],[],-1.0,,,-0.5,,4553.0,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.743253,wolfgang amadeus mozart ; text von emanuel sch...,w.a. mozart ; libretto: emanuel schikaneder ; ...,0.749495,"grubergernot, moehnheinz, schikanederemanuel","moehnheinz, schikanederemanuel",1.0,b√§renreiter,b√§renreiter,-1.0,,,0.83484,"die zauberfl√∂te, eine deutsche oper in zwei au...","die zauberfl√∂te, eine deutsche oper in zwei au...",-1.0,,,1.0,1,1
12579,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,1.0,8.0,8.0,0.75,2011aaaa,2011uuuu,0.428571,20000,20053,1.0,bk,bk,0.0,"[978-3-642-16480-4, 3-642-16480-3, 978-3-642-1...",[978-3-642-16481-1],-1.0,,,-1.0,,,-1.0,,,-1.0,,,1.0,erland erdmann (hrsg.),erland erdmann (hrsg.),1.0,erdmannerland,erdmannerland,-0.5,springer,,-1.0,,,1.0,"klinische kardiologie, krankheiten des herzens...","klinische kardiologie, krankheiten des herzens...",-1.0,,,0.0,607,1 607
15850,0,-1.0,,,-1.0,,,-0.5,wiener philharmoniker,,-1.0,,,-1.0,,,0.625,1991aaaa,1995uuuu,1.0,40100,40100,1.0,mu,mu,1.0,[],[],-1.0,,,0.25,433.0,66467243.0,-0.5,,3.0,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.432367,mozart,wolfgang amadeus mozart,-0.5,"heilmannuwe, josumi, krausmichael, leitnerlott...",,0.527273,decca,emi records,-1.0,,,0.851852,die zauberfl√∂te,"die zauberfl√∂te, highlights",-1.0,,,0.0,2,1 72
16809,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,1988aaaa,1995uuuu,1.0,40100,40100,1.0,mu,mu,1.0,[],[],-1.0,,,-0.5,,66467243.0,-0.5,,3.0,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.432367,mozart,wolfgang amadeus mozart,-0.5,harnoncourtnikolaus,,0.590909,teldec,emi records,-1.0,,,0.851852,die zauberfl√∂te,"die zauberfl√∂te, highlights",-1.0,,,0.0,2,1 72
22113,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-0.5,1.0,,0.75,2007aaaa,2007uuuu,1.0,20000,20000,1.0,bk,bk,0.0,"[978-3-7815-1531-4, 3-7815-1531-1]",[978-3-7815-1513-0],-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.856549,hrsg. michaela gl√§ser-zikuda ... [et al.],hrsg. von michaela gl√§ser-zikuda ... [et al.],-0.5,gl√§ser-zikudamichaela,,-0.5,klinkhardt,,-1.0,,,1.0,"lernprozesse dokumentieren, reflektieren und b...","lernprozesse dokumentieren, reflektieren und b...",-1.0,,,1.0,304,304
24027,0,-1.0,,,-1.0,,,-0.5,"opernhaus (z√ºrich)orchester, opernhaus (z√ºrich...",,-1.0,,,-1.0,,,0.625,1988aaaa,1980uuuu,1.0,40100,40100,1.0,mu,mu,1.0,[],[],-1.0,,,0.333333,242.0,410.0,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.720721,mozart ; [livret emanuel schikaneder],mozart,-0.5,"harnoncourtnikolaus, schikanederemanuel",,0.539683,teldec,polydor,-1.0,,,0.701346,"die zauberfl√∂te, kv 620","die zauberfl√∂te, the magic flute : oper in zwe...",-1.0,,,0.692308,2 73 17 70 26,3
27616,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,1932aaaa,1955uuuu,1.0,10200,10200,1.0,mu,mu,1.0,[],[],-1.0,,,1.0,10425.0,10425.0,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.782064,w. a. mozart ; klavierauszug nach dem in der p...,[musik] von w. a. mozart ; klavierauszug nach ...,-0.5,,"zallingermeinhard von, schikanederemanuel",-0.5,,peters,-1.0,,,0.92029,"die zauberfl√∂te, oper in 2 aufz√ºgen : [kv 620]","die zauberfl√∂te, oper in 2 aufz√ºgen",-1.0,,,1.0,1,1
35050,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,19aaaaaa,1880uuuu,1.0,10200,10200,1.0,mu,mu,1.0,[],[],-1.0,,,0.25,245.0,5944.0,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.571429,von emanuel schikaneder ; [musik von] wolfgang...,von w. a. mozart,-1.0,,,-0.5,,peters,-1.0,,,0.912726,"die zauberfl√∂te, oper in zwei akten [kv 620] =...","die zauberfl√∂te, oper in zwei akten = il flaut...",-0.5,"die zauberfl√∂te, ausgabe f√ºr gesang und klavier",,1.0,1,1
37143,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,aaaaaaaa,1814uuuu,1.0,10100,10100,1.0,mu,mu,1.0,[],[],-1.0,,,0.25,422.0,1092.0,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,-0.5,,von w. a. mozart,-0.5,,schikanederemanuel,-0.5,,simrock,-1.0,,,0.831929,"die zauberfl√∂te, an opera in two acts","die zauberfl√∂te, grosse oper in zwei akten",-0.5,die zauberfl√∂te,,1.0,1,1
39154,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.75,2011aaaa,2011uuuu,1.0,10300,10300,1.0,vm,vm,1.0,[],[],-1.0,,,-1.0,,,0.777778,1 1,1.0,-0.5,schl√∂ndorffvolker,,0.700586,volker schl√∂ndorff ; nach dem roman von max fr...,regie: volker schl√∂ndorff ; nach dem roman von...,0.612207,"frischmax, junkersdorfeberhard","schl√∂ndorffvolker, frischmax, shepardsam, depl...",0.765873,"suhrkamp, absolut medien",absolut medien,-1.0,,,1.0,homo faber,homo faber,-1.0,,,1.0,1 117,1 117


false_predicted_duplicates for DecisionTreeClassifier_CV


Unnamed: 0,duplicates,coordinate_E_delta,coordinate_E_x,coordinate_E_y,coordinate_N_delta,coordinate_N_x,coordinate_N_y,corporate_full_delta,corporate_full_x,corporate_full_y,doi_delta,doi_x,doi_y,edition_delta,edition_x,edition_y,exactDate_delta,exactDate_x,exactDate_y,format_postfix_delta,format_postfix_x,format_postfix_y,format_prefix_delta,format_prefix_x,format_prefix_y,isbn_delta,isbn_x,isbn_y,ismn_delta,ismn_x,ismn_y,musicid_delta,musicid_x,musicid_y,part_delta,part_x,part_y,person_100_delta,person_100_x,person_100_y,person_245c_delta,person_245c_x,person_245c_y,person_700_delta,person_700_x,person_700_y,pubinit_delta,pubinit_x,pubinit_y,scale_delta,scale_x,scale_y,ttlfull_245_delta,ttlfull_245_x,ttlfull_245_y,ttlfull_246_delta,ttlfull_246_x,ttlfull_246_y,volumes_delta,volumes_x,volumes_y
2597,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.75,1970aaaa,1970uuuu,1.0,10200,10200,1.0,mu,mu,1.0,[],[],-1.0,,,-0.5,,4553.0,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.743253,wolfgang amadeus mozart ; text von emanuel sch...,w.a. mozart ; libretto: emanuel schikaneder ; ...,0.749495,"grubergernot, moehnheinz, schikanederemanuel","moehnheinz, schikanederemanuel",1.0,b√§renreiter,b√§renreiter,-1.0,,,0.83484,"die zauberfl√∂te, eine deutsche oper in zwei au...","die zauberfl√∂te, eine deutsche oper in zwei au...",-1.0,,,1.0,1,1
2688,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-0.5,2.0,,0.5,1836aaaa,18001899,1.0,10200,10200,1.0,mu,mu,1.0,[],[],-1.0,,,-0.5,134.0,,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.748787,von w. a. mozart ; [text von emanuel schikaned...,w.a. mozart ; in vollst√§ndigem clavierauszug m...,-0.5,schikanederemanuel,,-0.5,meyer,,-1.0,,,0.97575,"die zauberfl√∂te, grosse oper in zwei aufz√ºgen ...","die zauberfl√∂te, grosse oper in zwei aufz√ºgen ...",-1.0,,,1.0,1,1
22113,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-0.5,1.0,,0.75,2007aaaa,2007uuuu,1.0,20000,20000,1.0,bk,bk,0.0,"[978-3-7815-1531-4, 3-7815-1531-1]",[978-3-7815-1513-0],-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.856549,hrsg. michaela gl√§ser-zikuda ... [et al.],hrsg. von michaela gl√§ser-zikuda ... [et al.],-0.5,gl√§ser-zikudamichaela,,-0.5,klinkhardt,,-1.0,,,1.0,"lernprozesse dokumentieren, reflektieren und b...","lernprozesse dokumentieren, reflektieren und b...",-1.0,,,1.0,304,304
27616,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,1932aaaa,1955uuuu,1.0,10200,10200,1.0,mu,mu,1.0,[],[],-1.0,,,1.0,10425.0,10425.0,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.782064,w. a. mozart ; klavierauszug nach dem in der p...,[musik] von w. a. mozart ; klavierauszug nach ...,-0.5,,"zallingermeinhard von, schikanederemanuel",-0.5,,peters,-1.0,,,0.92029,"die zauberfl√∂te, oper in 2 aufz√ºgen : [kv 620]","die zauberfl√∂te, oper in 2 aufz√ºgen",-1.0,,,1.0,1,1
33467,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,aaaaaaaa,1979uuuu,1.0,10100,10100,1.0,mu,mu,1.0,[],[],-0.5,m200205343,,-0.5,3714.0,,-0.5,,7,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.761299,wolfgang amadeus mozart ; ed. by hermann abert,wolfgang amadeus mozart ; text von emanuel sch...,-0.5,"mozartwolfgang amadeus, aberthermann",,-0.5,e. eulenburg,,-1.0,,,0.738987,"die zauberfl√∂te, the magic flute : overture to...","die zauberfl√∂te, kv 620 : eine deutsche oper i...",-1.0,,,-0.5,1 32,
35050,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,19aaaaaa,1880uuuu,1.0,10200,10200,1.0,mu,mu,1.0,[],[],-1.0,,,0.25,245.0,5944.0,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.571429,von emanuel schikaneder ; [musik von] wolfgang...,von w. a. mozart,-1.0,,,-0.5,,peters,-1.0,,,0.912726,"die zauberfl√∂te, oper in zwei akten [kv 620] =...","die zauberfl√∂te, oper in zwei akten = il flaut...",-0.5,"die zauberfl√∂te, ausgabe f√ºr gesang und klavier",,1.0,1,1
37143,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,aaaaaaaa,1814uuuu,1.0,10100,10100,1.0,mu,mu,1.0,[],[],-1.0,,,0.25,422.0,1092.0,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,-0.5,,von w. a. mozart,-0.5,,schikanederemanuel,-0.5,,simrock,-1.0,,,0.831929,"die zauberfl√∂te, an opera in two acts","die zauberfl√∂te, grosse oper in zwei akten",-0.5,die zauberfl√∂te,,1.0,1,1
39154,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.75,2011aaaa,2011uuuu,1.0,10300,10300,1.0,vm,vm,1.0,[],[],-1.0,,,-1.0,,,0.777778,1 1,1,-0.5,schl√∂ndorffvolker,,0.700586,volker schl√∂ndorff ; nach dem roman von max fr...,regie: volker schl√∂ndorff ; nach dem roman von...,0.612207,"frischmax, junkersdorfeberhard","schl√∂ndorffvolker, frischmax, shepardsam, depl...",0.765873,"suhrkamp, absolut medien",absolut medien,-1.0,,,1.0,homo faber,homo faber,-1.0,,,1.0,1 117,1 117
59061,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.6875,2009aaaa,200uuuuu,1.0,20000,20000,1.0,bk,bk,0.0,[978-3-15-020008-7],[978-0-14-062010-8],-1.0,,,-1.0,,,-0.5,20008,,1.0,austenjane,austenjane,0.69774,jane austen ; aus dem englischen √ºbersetzt von...,jane austen,-0.5,"grawechristian, graweursula",,0.473776,reclam jun.,penguin books,-1.0,,,0.787879,"emma, roman",emma,-1.0,,,0.0,600,367
73144,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.75,2005aaaa,2005uuuu,1.0,10300,10300,1.0,vm,vm,1.0,[],[],-1.0,,,-0.5,,99064.0,-1.0,,,-1.0,,,1.0,ein film von luc jacquet,ein film von luc jacquet,1.0,jacquetluc,jacquetluc,-1.0,,,-1.0,,,1.0,die reise der pinguine,die reise der pinguine,-1.0,,,0.597222,1 82,2 82 230


false_predicted_duplicates for RandomForestClassifier


Unnamed: 0,duplicates,coordinate_E_delta,coordinate_E_x,coordinate_E_y,coordinate_N_delta,coordinate_N_x,coordinate_N_y,corporate_full_delta,corporate_full_x,corporate_full_y,doi_delta,doi_x,doi_y,edition_delta,edition_x,edition_y,exactDate_delta,exactDate_x,exactDate_y,format_postfix_delta,format_postfix_x,format_postfix_y,format_prefix_delta,format_prefix_x,format_prefix_y,isbn_delta,isbn_x,isbn_y,ismn_delta,ismn_x,ismn_y,musicid_delta,musicid_x,musicid_y,part_delta,part_x,part_y,person_100_delta,person_100_x,person_100_y,person_245c_delta,person_245c_x,person_245c_y,person_700_delta,person_700_x,person_700_y,pubinit_delta,pubinit_x,pubinit_y,scale_delta,scale_x,scale_y,ttlfull_245_delta,ttlfull_245_x,ttlfull_245_y,ttlfull_246_delta,ttlfull_246_x,ttlfull_246_y,volumes_delta,volumes_x,volumes_y
1290,0,-1.0,,,-1.0,,,0.8,les arts florissants,arts florissants,-1.0,,,-1.0,,,0.75,1996aaaa,1996uuuu,0.428571,40100,40000,1.0,mu,mu,1.0,[],[],-1.0,,,1.0,630.0,630.0,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.591919,wolfgang amadeus mozart ; libretto: emanuel sc...,mozart,0.775282,"mozartwolfgang amadeus, schikanederemanuel, ch...","christiewilliam, dessaynatalie, mannionrosa, b...",-0.5,,erato,-1.0,,,0.709588,"die zauberfl√∂te, the magic flute : opera in tw...","die zauberfl√∂te, kv 620",-1.0,,,0.777778,2,2 1
2597,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.75,1970aaaa,1970uuuu,1.0,10200,10200,1.0,mu,mu,1.0,[],[],-1.0,,,-0.5,,4553.0,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.743253,wolfgang amadeus mozart ; text von emanuel sch...,w.a. mozart ; libretto: emanuel schikaneder ; ...,0.749495,"grubergernot, moehnheinz, schikanederemanuel","moehnheinz, schikanederemanuel",1.0,b√§renreiter,b√§renreiter,-1.0,,,0.83484,"die zauberfl√∂te, eine deutsche oper in zwei au...","die zauberfl√∂te, eine deutsche oper in zwei au...",-1.0,,,1.0,1,1
22113,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-0.5,1.0,,0.75,2007aaaa,2007uuuu,1.0,20000,20000,1.0,bk,bk,0.0,"[978-3-7815-1531-4, 3-7815-1531-1]",[978-3-7815-1513-0],-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.856549,hrsg. michaela gl√§ser-zikuda ... [et al.],hrsg. von michaela gl√§ser-zikuda ... [et al.],-0.5,gl√§ser-zikudamichaela,,-0.5,klinkhardt,,-1.0,,,1.0,"lernprozesse dokumentieren, reflektieren und b...","lernprozesse dokumentieren, reflektieren und b...",-1.0,,,1.0,304,304
39154,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.75,2011aaaa,2011uuuu,1.0,10300,10300,1.0,vm,vm,1.0,[],[],-1.0,,,-1.0,,,0.777778,1 1,1,-0.5,schl√∂ndorffvolker,,0.700586,volker schl√∂ndorff ; nach dem roman von max fr...,regie: volker schl√∂ndorff ; nach dem roman von...,0.612207,"frischmax, junkersdorfeberhard","schl√∂ndorffvolker, frischmax, shepardsam, depl...",0.765873,"suhrkamp, absolut medien",absolut medien,-1.0,,,1.0,homo faber,homo faber,-1.0,,,1.0,1 117,1 117
42386,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.75,1987aaaa,1987uuuu,1.0,20000,20000,1.0,bk,bk,1.0,[],[],-1.0,,,-1.0,,,-1.0,,,-1.0,,,-0.5,,heidy binder... [et al.],-0.5,,binderheidy,0.977778,[staatlicher lehrmittelverlag],staatlicher lehrmittelverlag,-1.0,,,0.910003,"bonne chance !, cours de langue fran√ßaise prem...","bonne chance !, cours de langue fran√ßaise prem...",-1.0,,,0.555556,134,145
58588,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.75,20091990,20081991,1.0,10300,10300,1.0,vm,vm,1.0,[],[],-1.0,,,-0.5,502430.0,,-1.0,,,-1.0,,,0.794715,ein film von volker schl√∂ndorff ; nach dem rom...,ein film von volker schl√∂ndorff ; nach dem rom...,0.871796,"schl√∂ndorffvolker, wurlitzerrudy, frischmax, s...","schl√∂ndorffvolker, frischmax, arvantisjorgos, ...",1.0,kinowelt home entertainment,kinowelt home entertainment,-1.0,,,1.0,homo faber,homo faber,-1.0,,,1.0,1 109,1 109
58868,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.75,1984aaaa,1984uuuu,1.0,20000,20000,1.0,bk,bk,1.0,[],[],-1.0,,,-1.0,,,-0.5,1,,-1.0,,,0.75,sigrid kessler... [et al.] ; [√©d.:] interkanto...,sigrid kessler... [et al.],-0.5,,kesslersigrid,1.0,staatlicher lehrmittelverlag,staatlicher lehrmittelverlag,-1.0,,,0.759266,"bonne chance!, cours de langue fran√ßaise, troi...","bonne chance !, cours de langue fran√ßaise troi...",-1.0,,,1.0,237,237
85901,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,1.0,20151475,20151475,1.0,20000,20000,1.0,bk,bk,1.0,"[88-7922-121-3, 978-88-7922-121-4]",[978-88-7922-121-4],-1.0,,,-1.0,,,1.0,1 1,1 1,1.0,petrarcafrancesco,petrarcafrancesco,1.0,francesco petrarca ; commento di bernardo lapini,francesco petrarca ; commento di bernardo lapini,1.0,lapinibernardo,lapinibernardo,-0.5,"adv, biblioteca cantonale di lugano",,-1.0,,,0.691756,"trionfi, riedizione accurata dell'incunabolo c...",trionfi,-1.0,,,0.703704,2,2 494 102
97551,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.75,2011aaaa,20111991,1.0,10300,10300,1.0,vm,vm,1.0,[],[],-1.0,,,-1.0,,,0.777778,1 1,1,-0.5,schl√∂ndorffvolker,,0.741667,volker schl√∂ndorff ; nach dem roman von max fr...,volker schl√∂ndorff,0.590909,"frischmax, junkersdorfeberhard","schl√∂ndorffvolker, frischmax, shepardsam, delp...",0.777778,"suhrkamp, absolut medien",suhrkamp,-1.0,,,0.747967,homo faber,"homo faber, nach dem roman von max frisch",-1.0,,,1.0,1 117,1 117


false_predicted_duplicates for SVC


Unnamed: 0,duplicates,coordinate_E_delta,coordinate_E_x,coordinate_E_y,coordinate_N_delta,coordinate_N_x,coordinate_N_y,corporate_full_delta,corporate_full_x,corporate_full_y,doi_delta,doi_x,doi_y,edition_delta,edition_x,edition_y,exactDate_delta,exactDate_x,exactDate_y,format_postfix_delta,format_postfix_x,format_postfix_y,format_prefix_delta,format_prefix_x,format_prefix_y,isbn_delta,isbn_x,isbn_y,ismn_delta,ismn_x,ismn_y,musicid_delta,musicid_x,musicid_y,part_delta,part_x,part_y,person_100_delta,person_100_x,person_100_y,person_245c_delta,person_245c_x,person_245c_y,person_700_delta,person_700_x,person_700_y,pubinit_delta,pubinit_x,pubinit_y,scale_delta,scale_x,scale_y,ttlfull_245_delta,ttlfull_245_x,ttlfull_245_y,ttlfull_246_delta,ttlfull_246_x,ttlfull_246_y,volumes_delta,volumes_x,volumes_y
6037,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.625,1994aaaa,1995uuuu,1.0,40100,40100,1.0,mu,mu,1.0,[],[],-1.0,,,0.5,6646.0,66467243.0,-0.5,,3,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,1.0,wolfgang amadeus mozart,wolfgang amadeus mozart,-1.0,,,1.0,emi records,emi records,-1.0,,,0.851852,die zauberfl√∂te,"die zauberfl√∂te, highlights",-1.0,,,0.464286,3 64 71,1 72
8803,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,aaaaaaaa,18uuuuuu,1.0,20000,20000,1.0,bk,bk,1.0,[],[],-1.0,,,-1.0,,,1.0,2620 5,2620 5,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.698207,von mozart ; dichtung von emanuel schikaneder ...,von w.a. mozart ; dichtung nach ludwig gieseck...,0.68797,"krusegeorg richard, schikanederemanuel","schikanederemanuel, wittmanncarl friedrich",1.0,reclam,reclam,-1.0,,,1.0,"die zauberfl√∂te, oper in zwei aufz√ºgen","die zauberfl√∂te, oper in zwei aufz√ºgen",-1.0,,,0.0,74,92
11282,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.625,1989aaaa,1999uuuu,1.0,20000,20000,1.0,bk,bk,1.0,[2-08-070552-0],[2-08-070552-0],-1.0,,,-1.0,,,1.0,552 552,552 552,1.0,voltaire,voltaire,0.90777,"voltaire ; introd., notes, bibliogr., chronolo...","voltaire ; introd., notes , bibliogr., chronol...",1.0,pomeauren√©,pomeauren√©,0.923077,flammarion,gf flammarion,-1.0,,,1.0,trait√© sur la tol√©rance,trait√© sur la tol√©rance,-1.0,,,1.0,192,192
16881,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-0.5,2.0,,0.625,1990aaaa,19972000,1.0,20000,20000,1.0,bk,bk,0.0,[],"[3-906721-31-0 (Cahier d'exercices), 3-906721-...",-1.0,,,-1.0,,,-0.5,,1,-1.0,,,0.539447,sigrid kessler ... [et al.],"[sigrid kessler, charlotte ritschard, helen wa...",0.853061,kesslersigrid,"kesslersigrid, ritschardcharlotte, wallimannhelen",1.0,staatlicher lehrmittelverlag,staatlicher lehrmittelverlag,-1.0,,,0.781053,"bonne chance!, cours de langue fran√ßaise : tro...","bonne chance!, cours de langue fran√ßaise, √©tap...",-1.0,,,-1.0,,
17455,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,1983aaaa,1970uuuu,1.0,10200,10200,1.0,mu,mu,1.0,[],[],-1.0,,,1.0,4553.0,4553.0,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.730483,wolfgang amadeus mozart ; text von emanuel sch...,w.a. mozart ; libretto: emanuel schikaneder ; ...,0.636111,"grubergernot, orelalfred, moehnheinz, schikane...","moehnheinz, schikanederemanuel",1.0,b√§renreiter,b√§renreiter,-1.0,,,0.83484,"die zauberfl√∂te, eine deutsche oper in zwei au...","die zauberfl√∂te, eine deutsche oper in zwei au...",-1.0,,,0.733333,1 221,1
19991,0,-1.0,,,-1.0,,,0.032787,"philharmonia chorus (london), philharmonia orc...",m√ºnchner bl√§serakademie,-1.0,,,-1.0,,,0.5,aaaaaaaa,1984uuuu,1.0,40000,40000,1.0,mu,mu,1.0,[],[],-1.0,,,0.333333,63.0,92.0,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.82,wolfgang amadeus mozart,wolfgang amadeus mozart ; arr. joseph heindenr...,-0.5,"frickgottlob, geddanicolai, janowitzgundula, k...",,0.0,emi,orfeo,-1.0,,,0.789259,"die zauberfl√∂te, grosser querschnitt","die zauberfl√∂te, harmoniemusik",-1.0,,,1.0,1 33,1 33
26177,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,aaaaaaaa,1879uuuu,0.428571,10200,10100,1.0,mu,mu,1.0,[],[],-1.0,,,-0.5,,620.0,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,-0.5,von w.a. mozart ; klavierauszug nach dem in de...,,-1.0,,,-1.0,,,-1.0,,,1.0,"die zauberfl√∂te, oper in zwei aufz√ºgen","die zauberfl√∂te, oper in zwei aufz√ºgen",-1.0,,,0.733333,1 188,1
39154,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.75,2011aaaa,2011uuuu,1.0,10300,10300,1.0,vm,vm,1.0,[],[],-1.0,,,-1.0,,,0.777778,1 1,1,-0.5,schl√∂ndorffvolker,,0.700586,volker schl√∂ndorff ; nach dem roman von max fr...,regie: volker schl√∂ndorff ; nach dem roman von...,0.612207,"frischmax, junkersdorfeberhard","schl√∂ndorffvolker, frischmax, shepardsam, depl...",0.765873,"suhrkamp, absolut medien",absolut medien,-1.0,,,1.0,homo faber,homo faber,-1.0,,,1.0,1 117,1 117
41508,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,1764aaaa,1961uuuu,0.428571,20000,20053,1.0,bk,bk,1.0,[],[],-1.0,,,-1.0,,,-1.0,,,1.0,voltaire,voltaire,-0.5,,voltaire,-1.0,,,0.409297,[s.n.],"in melanges, ed. van den heuvel. paris, gallimard",-1.0,,,1.0,trait√© sur la tol√©rance,trait√© sur la tol√©rance,-1.0,,,0.777778,191,1
42386,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.75,1987aaaa,1987uuuu,1.0,20000,20000,1.0,bk,bk,1.0,[],[],-1.0,,,-1.0,,,-1.0,,,-1.0,,,-0.5,,heidy binder... [et al.],-0.5,,binderheidy,0.977778,[staatlicher lehrmittelverlag],staatlicher lehrmittelverlag,-1.0,,,0.910003,"bonne chance !, cours de langue fran√ßaise prem...","bonne chance !, cours de langue fran√ßaise prem...",-1.0,,,0.555556,134,145


false_predicted_duplicates for SVC_CV


Unnamed: 0,duplicates,coordinate_E_delta,coordinate_E_x,coordinate_E_y,coordinate_N_delta,coordinate_N_x,coordinate_N_y,corporate_full_delta,corporate_full_x,corporate_full_y,doi_delta,doi_x,doi_y,edition_delta,edition_x,edition_y,exactDate_delta,exactDate_x,exactDate_y,format_postfix_delta,format_postfix_x,format_postfix_y,format_prefix_delta,format_prefix_x,format_prefix_y,isbn_delta,isbn_x,isbn_y,ismn_delta,ismn_x,ismn_y,musicid_delta,musicid_x,musicid_y,part_delta,part_x,part_y,person_100_delta,person_100_x,person_100_y,person_245c_delta,person_245c_x,person_245c_y,person_700_delta,person_700_x,person_700_y,pubinit_delta,pubinit_x,pubinit_y,scale_delta,scale_x,scale_y,ttlfull_245_delta,ttlfull_245_x,ttlfull_245_y,ttlfull_246_delta,ttlfull_246_x,ttlfull_246_y,volumes_delta,volumes_x,volumes_y
2597,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.75,1970aaaa,1970uuuu,1.0,10200,10200,1.0,mu,mu,1.0,[],[],-1.0,,,-0.5,,4553.0,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.743253,wolfgang amadeus mozart ; text von emanuel sch...,w.a. mozart ; libretto: emanuel schikaneder ; ...,0.749495,"grubergernot, moehnheinz, schikanederemanuel","moehnheinz, schikanederemanuel",1.0,b√§renreiter,b√§renreiter,-1.0,,,0.83484,"die zauberfl√∂te, eine deutsche oper in zwei au...","die zauberfl√∂te, eine deutsche oper in zwei au...",-1.0,,,1.0,1,1
2688,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-0.5,2.0,,0.5,1836aaaa,18001899,1.0,10200,10200,1.0,mu,mu,1.0,[],[],-1.0,,,-0.5,134.0,,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.748787,von w. a. mozart ; [text von emanuel schikaned...,w.a. mozart ; in vollst√§ndigem clavierauszug m...,-0.5,schikanederemanuel,,-0.5,meyer,,-1.0,,,0.97575,"die zauberfl√∂te, grosse oper in zwei aufz√ºgen ...","die zauberfl√∂te, grosse oper in zwei aufz√ºgen ...",-1.0,,,1.0,1,1
8803,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,aaaaaaaa,18uuuuuu,1.0,20000,20000,1.0,bk,bk,1.0,[],[],-1.0,,,-1.0,,,1.0,2620 5,2620 5,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.698207,von mozart ; dichtung von emanuel schikaneder ...,von w.a. mozart ; dichtung nach ludwig gieseck...,0.68797,"krusegeorg richard, schikanederemanuel","schikanederemanuel, wittmanncarl friedrich",1.0,reclam,reclam,-1.0,,,1.0,"die zauberfl√∂te, oper in zwei aufz√ºgen","die zauberfl√∂te, oper in zwei aufz√ºgen",-1.0,,,0.0,74,92
11282,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.625,1989aaaa,1999uuuu,1.0,20000,20000,1.0,bk,bk,1.0,[2-08-070552-0],[2-08-070552-0],-1.0,,,-1.0,,,1.0,552 552,552 552,1.0,voltaire,voltaire,0.90777,"voltaire ; introd., notes, bibliogr., chronolo...","voltaire ; introd., notes , bibliogr., chronol...",1.0,pomeauren√©,pomeauren√©,0.923077,flammarion,gf flammarion,-1.0,,,1.0,trait√© sur la tol√©rance,trait√© sur la tol√©rance,-1.0,,,1.0,192,192
16881,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-0.5,2.0,,0.625,1990aaaa,19972000,1.0,20000,20000,1.0,bk,bk,0.0,[],"[3-906721-31-0 (Cahier d'exercices), 3-906721-...",-1.0,,,-1.0,,,-0.5,,1,-1.0,,,0.539447,sigrid kessler ... [et al.],"[sigrid kessler, charlotte ritschard, helen wa...",0.853061,kesslersigrid,"kesslersigrid, ritschardcharlotte, wallimannhelen",1.0,staatlicher lehrmittelverlag,staatlicher lehrmittelverlag,-1.0,,,0.781053,"bonne chance!, cours de langue fran√ßaise : tro...","bonne chance!, cours de langue fran√ßaise, √©tap...",-1.0,,,-1.0,,
17455,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,1983aaaa,1970uuuu,1.0,10200,10200,1.0,mu,mu,1.0,[],[],-1.0,,,1.0,4553.0,4553.0,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.730483,wolfgang amadeus mozart ; text von emanuel sch...,w.a. mozart ; libretto: emanuel schikaneder ; ...,0.636111,"grubergernot, orelalfred, moehnheinz, schikane...","moehnheinz, schikanederemanuel",1.0,b√§renreiter,b√§renreiter,-1.0,,,0.83484,"die zauberfl√∂te, eine deutsche oper in zwei au...","die zauberfl√∂te, eine deutsche oper in zwei au...",-1.0,,,0.733333,1 221,1
39154,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.75,2011aaaa,2011uuuu,1.0,10300,10300,1.0,vm,vm,1.0,[],[],-1.0,,,-1.0,,,0.777778,1 1,1,-0.5,schl√∂ndorffvolker,,0.700586,volker schl√∂ndorff ; nach dem roman von max fr...,regie: volker schl√∂ndorff ; nach dem roman von...,0.612207,"frischmax, junkersdorfeberhard","schl√∂ndorffvolker, frischmax, shepardsam, depl...",0.765873,"suhrkamp, absolut medien",absolut medien,-1.0,,,1.0,homo faber,homo faber,-1.0,,,1.0,1 117,1 117
42386,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.75,1987aaaa,1987uuuu,1.0,20000,20000,1.0,bk,bk,1.0,[],[],-1.0,,,-1.0,,,-1.0,,,-1.0,,,-0.5,,heidy binder... [et al.],-0.5,,binderheidy,0.977778,[staatlicher lehrmittelverlag],staatlicher lehrmittelverlag,-1.0,,,0.910003,"bonne chance !, cours de langue fran√ßaise prem...","bonne chance !, cours de langue fran√ßaise prem...",-1.0,,,0.555556,134,145
54588,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.625,1977aaaa,19uuuuuu,0.428571,10000,10200,1.0,mu,mu,1.0,[],[],-1.0,,,-0.5,,12780.0,-0.5,,38,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.623769,[musik von] w.a. mozart ; [libretto von] e. sc...,mozart,-0.5,"schikanederemanuel, moszkowiczimo",,-0.5,,litolff,-1.0,,,0.759596,die zauberfl√∂te,zauberfl√∂te,-1.0,,,-0.5,,1
58588,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.75,20091990,20081991,1.0,10300,10300,1.0,vm,vm,1.0,[],[],-1.0,,,-0.5,502430.0,,-1.0,,,-1.0,,,0.794715,ein film von volker schl√∂ndorff ; nach dem rom...,ein film von volker schl√∂ndorff ; nach dem rom...,0.871796,"schl√∂ndorffvolker, wurlitzerrudy, frischmax, s...","schl√∂ndorffvolker, frischmax, arvantisjorgos, ...",1.0,kinowelt home entertainment,kinowelt home entertainment,-1.0,,,1.0,homo faber,homo faber,-1.0,,,1.0,1 109,1 109


false_predicted_duplicates for NeuralNetwork


Unnamed: 0,duplicates,coordinate_E_delta,coordinate_E_x,coordinate_E_y,coordinate_N_delta,coordinate_N_x,coordinate_N_y,corporate_full_delta,corporate_full_x,corporate_full_y,doi_delta,doi_x,doi_y,edition_delta,edition_x,edition_y,exactDate_delta,exactDate_x,exactDate_y,format_postfix_delta,format_postfix_x,format_postfix_y,format_prefix_delta,format_prefix_x,format_prefix_y,isbn_delta,isbn_x,isbn_y,ismn_delta,ismn_x,ismn_y,musicid_delta,musicid_x,musicid_y,part_delta,part_x,part_y,person_100_delta,person_100_x,person_100_y,person_245c_delta,person_245c_x,person_245c_y,person_700_delta,person_700_x,person_700_y,pubinit_delta,pubinit_x,pubinit_y,scale_delta,scale_x,scale_y,ttlfull_245_delta,ttlfull_245_x,ttlfull_245_y,ttlfull_246_delta,ttlfull_246_x,ttlfull_246_y,volumes_delta,volumes_x,volumes_y
628,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.625,1999aaaa,1998uuuu,1.0,20000,20000,1.0,bk,bk,1.0,[3-495-47879-5],[3-495-47879-5],-1.0,,,-1.0,,,-0.5,,57,1.0,fluryandreas,fluryandreas,1.0,andreas flury,andreas flury,-1.0,,,1.0,alber,alber,-1.0,,,1.0,"der moralische status der tiere, henry salt, p...","der moralische status der tiere, henry salt, p...",-1.0,,,0.777778,316,356
2597,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.75,1970aaaa,1970uuuu,1.0,10200,10200,1.0,mu,mu,1.0,[],[],-1.0,,,-0.5,,4553.0,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.743253,wolfgang amadeus mozart ; text von emanuel sch...,w.a. mozart ; libretto: emanuel schikaneder ; ...,0.749495,"grubergernot, moehnheinz, schikanederemanuel","moehnheinz, schikanederemanuel",1.0,b√§renreiter,b√§renreiter,-1.0,,,0.83484,"die zauberfl√∂te, eine deutsche oper in zwei au...","die zauberfl√∂te, eine deutsche oper in zwei au...",-1.0,,,1.0,1,1
6037,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.625,1994aaaa,1995uuuu,1.0,40100,40100,1.0,mu,mu,1.0,[],[],-1.0,,,0.5,6646.0,66467243.0,-0.5,,3,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,1.0,wolfgang amadeus mozart,wolfgang amadeus mozart,-1.0,,,1.0,emi records,emi records,-1.0,,,0.851852,die zauberfl√∂te,"die zauberfl√∂te, highlights",-1.0,,,0.464286,3 64 71,1 72
17016,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,aaaaaaaa,19001950,0.428571,10200,10100,1.0,mu,mu,1.0,[],[],-1.0,,,-0.5,245.0,,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.728388,von emanuel schikaneder ; [musik von] wolfgang...,von emanuel schikaneder ; musik von w. a. moza...,0.62896,"kienzlwilhelm, schikanederemanuel","aberthermann, schikanederemanuel, mozartwolfga...",-0.5,,e. eulenburg,-1.0,,,0.737566,"die zauberfl√∂te (il flauto magico), oper in zw...","die zauberfl√∂te, eine deutsche oper",-0.5,"die zauberfl√∂te, ausgabe f√ºr gesang und klavier",,0.733333,1,1 412
17455,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,1983aaaa,1970uuuu,1.0,10200,10200,1.0,mu,mu,1.0,[],[],-1.0,,,1.0,4553.0,4553.0,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.730483,wolfgang amadeus mozart ; text von emanuel sch...,w.a. mozart ; libretto: emanuel schikaneder ; ...,0.636111,"grubergernot, orelalfred, moehnheinz, schikane...","moehnheinz, schikanederemanuel",1.0,b√§renreiter,b√§renreiter,-1.0,,,0.83484,"die zauberfl√∂te, eine deutsche oper in zwei au...","die zauberfl√∂te, eine deutsche oper in zwei au...",-1.0,,,0.733333,1 221,1
22113,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-0.5,1.0,,0.75,2007aaaa,2007uuuu,1.0,20000,20000,1.0,bk,bk,0.0,"[978-3-7815-1531-4, 3-7815-1531-1]",[978-3-7815-1513-0],-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.856549,hrsg. michaela gl√§ser-zikuda ... [et al.],hrsg. von michaela gl√§ser-zikuda ... [et al.],-0.5,gl√§ser-zikudamichaela,,-0.5,klinkhardt,,-1.0,,,1.0,"lernprozesse dokumentieren, reflektieren und b...","lernprozesse dokumentieren, reflektieren und b...",-1.0,,,1.0,304,304
39154,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.75,2011aaaa,2011uuuu,1.0,10300,10300,1.0,vm,vm,1.0,[],[],-1.0,,,-1.0,,,0.777778,1 1,1,-0.5,schl√∂ndorffvolker,,0.700586,volker schl√∂ndorff ; nach dem roman von max fr...,regie: volker schl√∂ndorff ; nach dem roman von...,0.612207,"frischmax, junkersdorfeberhard","schl√∂ndorffvolker, frischmax, shepardsam, depl...",0.765873,"suhrkamp, absolut medien",absolut medien,-1.0,,,1.0,homo faber,homo faber,-1.0,,,1.0,1 117,1 117
42386,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.75,1987aaaa,1987uuuu,1.0,20000,20000,1.0,bk,bk,1.0,[],[],-1.0,,,-1.0,,,-1.0,,,-1.0,,,-0.5,,heidy binder... [et al.],-0.5,,binderheidy,0.977778,[staatlicher lehrmittelverlag],staatlicher lehrmittelverlag,-1.0,,,0.910003,"bonne chance !, cours de langue fran√ßaise prem...","bonne chance !, cours de langue fran√ßaise prem...",-1.0,,,0.555556,134,145
58588,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.75,20091990,20081991,1.0,10300,10300,1.0,vm,vm,1.0,[],[],-1.0,,,-0.5,502430.0,,-1.0,,,-1.0,,,0.794715,ein film von volker schl√∂ndorff ; nach dem rom...,ein film von volker schl√∂ndorff ; nach dem rom...,0.871796,"schl√∂ndorffvolker, wurlitzerrudy, frischmax, s...","schl√∂ndorffvolker, frischmax, arvantisjorgos, ...",1.0,kinowelt home entertainment,kinowelt home entertainment,-1.0,,,1.0,homo faber,homo faber,-1.0,,,1.0,1 109,1 109
58868,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.75,1984aaaa,1984uuuu,1.0,20000,20000,1.0,bk,bk,1.0,[],[],-1.0,,,-1.0,,,-0.5,1,,-1.0,,,0.75,sigrid kessler... [et al.] ; [√©d.:] interkanto...,sigrid kessler... [et al.],-0.5,,kesslersigrid,1.0,staatlicher lehrmittelverlag,staatlicher lehrmittelverlag,-1.0,,,0.759266,"bonne chance!, cours de langue fran√ßaise, troi...","bonne chance !, cours de langue fran√ßaise troi...",-1.0,,,1.0,237,237


Run 2 - false_predicted_uniques 
*****
false_predicted_uniques for DecisionTreeClassifier


Unnamed: 0,duplicates,coordinate_E_delta,coordinate_E_x,coordinate_E_y,coordinate_N_delta,coordinate_N_x,coordinate_N_y,corporate_full_delta,corporate_full_x,corporate_full_y,doi_delta,doi_x,doi_y,edition_delta,edition_x,edition_y,exactDate_delta,exactDate_x,exactDate_y,format_postfix_delta,format_postfix_x,format_postfix_y,format_prefix_delta,format_prefix_x,format_prefix_y,isbn_delta,isbn_x,isbn_y,ismn_delta,ismn_x,ismn_y,musicid_delta,musicid_x,musicid_y,part_delta,part_x,part_y,person_100_delta,person_100_x,person_100_y,person_245c_delta,person_245c_x,person_245c_y,person_700_delta,person_700_x,person_700_y,pubinit_delta,pubinit_x,pubinit_y,scale_delta,scale_x,scale_y,ttlfull_245_delta,ttlfull_245_x,ttlfull_245_y,ttlfull_246_delta,ttlfull_246_x,ttlfull_246_y,volumes_delta,volumes_x,volumes_y
103289,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,aaaaaaaa,1920uuuu,1.0,20000,20000,1.0,bk,bk,1.0,[],[],-1.0,,,-1.0,,,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,-0.5,,w. a. mozart ; oper in 2 akten v. emanuel schi...,-0.5,,schikanederemanuel,-0.5,ernst eulenburg,,-1.0,,,1.0,die zauberfl√∂te,die zauberfl√∂te,-1.0,,,0.0,412,1
103290,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,1920aaaa,uuuuuuuu,1.0,20000,20000,1.0,bk,bk,1.0,[],[],-1.0,,,-1.0,,,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,-0.5,w. a. mozart ; oper in 2 akten v. emanuel schi...,,-0.5,schikanederemanuel,,-0.5,,ernst eulenburg,-1.0,,,1.0,die zauberfl√∂te,die zauberfl√∂te,-1.0,,,0.0,1,412
103810,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,aaaaaaaa,19001999,0.428571,10200,10100,1.0,mu,mu,1.0,[],[],-1.0,,,-0.5,,3714.0,-0.5,,12,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,-0.5,,von w. a. mozart ; revidiert und mit einf√ºhrun...,-1.0,,,-1.0,,,-1.0,,,0.747312,"die zauberfl√∂te, [daraus:] aria ""in diesen hei...",die zauberfl√∂te,-1.0,,,-0.5,,1
103813,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,aaaaaaaa,uuuuuuuu,1.0,10200,10200,1.0,mu,mu,1.0,[],[],-1.0,,,-0.5,,10425.0,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,-0.5,von w.a. mozart ; klavierauszug nach dem in de...,,-0.5,,soldankurt,-1.0,,,-1.0,,,0.794613,"die zauberfl√∂te, oper in zwei aufz√ºgen","die zauberfl√∂te, oper in zwei aufz√ºgen : klavi...",-0.5,,"die zauberfl√∂te, ausgabe f√ºr gesang und klavier",0.733333,1 188,1
103872,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,aaaaaaaa,19451966,0.428571,30300,30700,1.0,cr,cr,0.0,[],[1144-1321],-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-0.5,idealibri,,-1.0,,,0.681481,arts,"arts, beaux-arts, litt√©rature, spectacles : (j...",-1.0,,,-1.0,,
103959,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.75,1993aaaa,1993uuuu,0.428571,10100,10000,1.0,mu,mu,1.0,[963-8303-08-5],[963-8303-08-5],-1.0,,,-0.5,,1004.0,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,1.0,wolfgang amadeus mozart,wolfgang amadeus mozart,-1.0,,,-0.5,k√∂nemann,,-1.0,,,1.0,"die zauberfl√∂te, partitura","die zauberfl√∂te, partitura",-1.0,,,0.511111,225,1 225
104086,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,1880aaaa,uuuuuuuu,1.0,20000,20000,1.0,bk,bk,1.0,[],[],-1.0,,,-1.0,,,-0.5,21,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.761014,von emanuel schikaneder ; musik von w.a. mozart,von emanuel schikaneder [und j.g.k.l. giesecke],0.892308,schikanederemanuel,"schikanederemanuel, gieseckecarl ludwig",-0.5,breitkopf & h√§rtel,,-1.0,,,0.989744,"die zauberfl√∂te, il flauto magico : deutsche o...","die zauberfl√∂te, (il flauto magico) : deutsche...",-1.0,,,0.0,34,1
104094,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,aaaaaaaa,1941uuuu,1.0,20000,20000,1.0,bk,bk,1.0,[],[],-1.0,,,-1.0,,,0.814815,2620,2620 2620,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.858072,von mozart ; dichtung von emanuel schikaneder ...,von mozart ; dichtung von emanuel schikaneder ...,1.0,"schikanederemanuel, krusegeorg richard","schikanederemanuel, krusegeorg richard",-0.5,,p. reclam jun.,-1.0,,,0.881356,"die zauberfl√∂te, oper in zwei aufz√ºgen","die zauberfl√∂te, oper in zwei aufz√ºgen : volls...",-1.0,,,1.0,74,74
104098,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,aaaaaaaa,1941uuuu,1.0,20000,20000,1.0,bk,bk,1.0,[],[],-1.0,,,-1.0,,,0.796296,2620 5,2620 2620,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.839556,von mozart ; dichtung von emanuel schikaneder ...,von mozart ; dichtung von emanuel schikaneder ...,0.771292,"krusegeorg richard, schikanederemanuel","schikanederemanuel, krusegeorg richard",0.809524,reclam,p. reclam jun.,-1.0,,,0.881356,"die zauberfl√∂te, oper in zwei aufz√ºgen","die zauberfl√∂te, oper in zwei aufz√ºgen : volls...",-1.0,,,1.0,74,74
104099,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,aaaaaaaa,uuuuuuuu,1.0,20000,20000,1.0,bk,bk,1.0,[],[],-1.0,,,-1.0,,,0.796296,2620 5,2620 2620,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.770017,von mozart ; dichtung von emanuel schikaneder ...,wolfgang amadeus mozart ; dichtung von emanuel...,1.0,"krusegeorg richard, schikanederemanuel","krusegeorg richard, schikanederemanuel",0.412698,reclam,philipp reclam,-1.0,,,0.881356,"die zauberfl√∂te, oper in zwei aufz√ºgen","die zauberfl√∂te, oper in zwei aufz√ºgen : volls...",-1.0,,,1.0,74,74


false_predicted_uniques for DecisionTreeClassifier_CV


Unnamed: 0,duplicates,coordinate_E_delta,coordinate_E_x,coordinate_E_y,coordinate_N_delta,coordinate_N_x,coordinate_N_y,corporate_full_delta,corporate_full_x,corporate_full_y,doi_delta,doi_x,doi_y,edition_delta,edition_x,edition_y,exactDate_delta,exactDate_x,exactDate_y,format_postfix_delta,format_postfix_x,format_postfix_y,format_prefix_delta,format_prefix_x,format_prefix_y,isbn_delta,isbn_x,isbn_y,ismn_delta,ismn_x,ismn_y,musicid_delta,musicid_x,musicid_y,part_delta,part_x,part_y,person_100_delta,person_100_x,person_100_y,person_245c_delta,person_245c_x,person_245c_y,person_700_delta,person_700_x,person_700_y,pubinit_delta,pubinit_x,pubinit_y,scale_delta,scale_x,scale_y,ttlfull_245_delta,ttlfull_245_x,ttlfull_245_y,ttlfull_246_delta,ttlfull_246_x,ttlfull_246_y,volumes_delta,volumes_x,volumes_y
103289,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,aaaaaaaa,1920uuuu,1.0,20000,20000,1.0,bk,bk,1.0,[],[],-1.0,,,-1.0,,,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,-0.5,,w. a. mozart ; oper in 2 akten v. emanuel schi...,-0.5,,schikanederemanuel,-0.5,ernst eulenburg,,-1.0,,,1.0,die zauberfl√∂te,die zauberfl√∂te,-1.0,,,0.0,412.0,1.0
103290,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,1920aaaa,uuuuuuuu,1.0,20000,20000,1.0,bk,bk,1.0,[],[],-1.0,,,-1.0,,,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,-0.5,w. a. mozart ; oper in 2 akten v. emanuel schi...,,-0.5,schikanederemanuel,,-0.5,,ernst eulenburg,-1.0,,,1.0,die zauberfl√∂te,die zauberfl√∂te,-1.0,,,0.0,1.0,412.0
103816,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,aaaaaaaa,uuuuuuuu,1.0,10200,10200,1.0,mu,mu,1.0,[],[],-1.0,,,1.0,10425.0,10425.0,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,-1.0,,,1.0,soldankurt,soldankurt,-1.0,,,-1.0,,,1.0,"die zauberfl√∂te, oper in zwei aufz√ºgen : klavi...","die zauberfl√∂te, oper in zwei aufz√ºgen : klavi...",1.0,"die zauberfl√∂te, ausgabe f√ºr gesang und klavier","die zauberfl√∂te, ausgabe f√ºr gesang und klavier",1.0,1.0,1.0
104094,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,aaaaaaaa,1941uuuu,1.0,20000,20000,1.0,bk,bk,1.0,[],[],-1.0,,,-1.0,,,0.814815,2620,2620 2620,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.858072,von mozart ; dichtung von emanuel schikaneder ...,von mozart ; dichtung von emanuel schikaneder ...,1.0,"schikanederemanuel, krusegeorg richard","schikanederemanuel, krusegeorg richard",-0.5,,p. reclam jun.,-1.0,,,0.881356,"die zauberfl√∂te, oper in zwei aufz√ºgen","die zauberfl√∂te, oper in zwei aufz√ºgen : volls...",-1.0,,,1.0,74.0,74.0
104098,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,aaaaaaaa,1941uuuu,1.0,20000,20000,1.0,bk,bk,1.0,[],[],-1.0,,,-1.0,,,0.796296,2620 5,2620 2620,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.839556,von mozart ; dichtung von emanuel schikaneder ...,von mozart ; dichtung von emanuel schikaneder ...,0.771292,"krusegeorg richard, schikanederemanuel","schikanederemanuel, krusegeorg richard",0.809524,reclam,p. reclam jun.,-1.0,,,0.881356,"die zauberfl√∂te, oper in zwei aufz√ºgen","die zauberfl√∂te, oper in zwei aufz√ºgen : volls...",-1.0,,,1.0,74.0,74.0
104099,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,aaaaaaaa,uuuuuuuu,1.0,20000,20000,1.0,bk,bk,1.0,[],[],-1.0,,,-1.0,,,0.796296,2620 5,2620 2620,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.770017,von mozart ; dichtung von emanuel schikaneder ...,wolfgang amadeus mozart ; dichtung von emanuel...,1.0,"krusegeorg richard, schikanederemanuel","krusegeorg richard, schikanederemanuel",0.412698,reclam,philipp reclam,-1.0,,,0.881356,"die zauberfl√∂te, oper in zwei aufz√ºgen","die zauberfl√∂te, oper in zwei aufz√ºgen : volls...",-1.0,,,1.0,74.0,74.0
104100,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,1941aaaa,uuuuuuuu,1.0,20000,20000,1.0,bk,bk,1.0,[],[],-1.0,,,-1.0,,,0.814815,2620 2620,2620,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.858072,von mozart ; dichtung von emanuel schikaneder ...,von mozart ; dichtung von emanuel schikaneder ...,1.0,"schikanederemanuel, krusegeorg richard","schikanederemanuel, krusegeorg richard",-0.5,p. reclam jun.,,-1.0,,,0.881356,"die zauberfl√∂te, oper in zwei aufz√ºgen : volls...","die zauberfl√∂te, oper in zwei aufz√ºgen",-1.0,,,1.0,74.0,74.0
104101,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,1941aaaa,uuuuuuuu,1.0,20000,20000,1.0,bk,bk,1.0,[],[],-1.0,,,-1.0,,,0.796296,2620 2620,2620 5,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.839556,von mozart ; dichtung von emanuel schikaneder ...,von mozart ; dichtung von emanuel schikaneder ...,0.771292,"schikanederemanuel, krusegeorg richard","krusegeorg richard, schikanederemanuel",0.809524,p. reclam jun.,reclam,-1.0,,,0.881356,"die zauberfl√∂te, oper in zwei aufz√ºgen : volls...","die zauberfl√∂te, oper in zwei aufz√ºgen",-1.0,,,1.0,74.0,74.0
104105,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,aaaaaaaa,uuuuuuuu,1.0,20000,20000,1.0,bk,bk,1.0,[],[],-1.0,,,-1.0,,,0.796296,2620 2620,2620 5,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.770017,wolfgang amadeus mozart ; dichtung von emanuel...,von mozart ; dichtung von emanuel schikaneder ...,1.0,"krusegeorg richard, schikanederemanuel","krusegeorg richard, schikanederemanuel",0.412698,philipp reclam,reclam,-1.0,,,0.881356,"die zauberfl√∂te, oper in zwei aufz√ºgen : volls...","die zauberfl√∂te, oper in zwei aufz√ºgen",-1.0,,,1.0,74.0,74.0
104113,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,aaaaaaaa,19739999,0.428571,30600,30500,1.0,cr,cr,0.0,[],[0335-1793],-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-0.5,,lib√©ration,-1.0,,,0.933333,liberation,lib√©ration,-1.0,,,-1.0,,


false_predicted_uniques for RandomForestClassifier


Unnamed: 0,duplicates,coordinate_E_delta,coordinate_E_x,coordinate_E_y,coordinate_N_delta,coordinate_N_x,coordinate_N_y,corporate_full_delta,corporate_full_x,corporate_full_y,doi_delta,doi_x,doi_y,edition_delta,edition_x,edition_y,exactDate_delta,exactDate_x,exactDate_y,format_postfix_delta,format_postfix_x,format_postfix_y,format_prefix_delta,format_prefix_x,format_prefix_y,isbn_delta,isbn_x,isbn_y,ismn_delta,ismn_x,ismn_y,musicid_delta,musicid_x,musicid_y,part_delta,part_x,part_y,person_100_delta,person_100_x,person_100_y,person_245c_delta,person_245c_x,person_245c_y,person_700_delta,person_700_x,person_700_y,pubinit_delta,pubinit_x,pubinit_y,scale_delta,scale_x,scale_y,ttlfull_245_delta,ttlfull_245_x,ttlfull_245_y,ttlfull_246_delta,ttlfull_246_x,ttlfull_246_y,volumes_delta,volumes_x,volumes_y
103289,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,aaaaaaaa,1920uuuu,1.0,20000,20000,1.0,bk,bk,1.0,[],[],-1.0,,,-1.0,,,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,-0.5,,w. a. mozart ; oper in 2 akten v. emanuel schi...,-0.5,,schikanederemanuel,-0.5,ernst eulenburg,,-1.0,,,1.0,die zauberfl√∂te,die zauberfl√∂te,-1.0,,,0.0,412,1
103290,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,1920aaaa,uuuuuuuu,1.0,20000,20000,1.0,bk,bk,1.0,[],[],-1.0,,,-1.0,,,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,-0.5,w. a. mozart ; oper in 2 akten v. emanuel schi...,,-0.5,schikanederemanuel,,-0.5,,ernst eulenburg,-1.0,,,1.0,die zauberfl√∂te,die zauberfl√∂te,-1.0,,,0.0,1,412
103810,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,aaaaaaaa,19001999,0.428571,10200,10100,1.0,mu,mu,1.0,[],[],-1.0,,,-0.5,,3714.0,-0.5,,12,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,-0.5,,von w. a. mozart ; revidiert und mit einf√ºhrun...,-1.0,,,-1.0,,,-1.0,,,0.747312,"die zauberfl√∂te, [daraus:] aria ""in diesen hei...",die zauberfl√∂te,-1.0,,,-0.5,,1
103813,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,aaaaaaaa,uuuuuuuu,1.0,10200,10200,1.0,mu,mu,1.0,[],[],-1.0,,,-0.5,,10425.0,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,-0.5,von w.a. mozart ; klavierauszug nach dem in de...,,-0.5,,soldankurt,-1.0,,,-1.0,,,0.794613,"die zauberfl√∂te, oper in zwei aufz√ºgen","die zauberfl√∂te, oper in zwei aufz√ºgen : klavi...",-0.5,,"die zauberfl√∂te, ausgabe f√ºr gesang und klavier",0.733333,1 188,1
103872,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,aaaaaaaa,19451966,0.428571,30300,30700,1.0,cr,cr,0.0,[],[1144-1321],-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-0.5,idealibri,,-1.0,,,0.681481,arts,"arts, beaux-arts, litt√©rature, spectacles : (j...",-1.0,,,-1.0,,
103959,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.75,1993aaaa,1993uuuu,0.428571,10100,10000,1.0,mu,mu,1.0,[963-8303-08-5],[963-8303-08-5],-1.0,,,-0.5,,1004.0,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,1.0,wolfgang amadeus mozart,wolfgang amadeus mozart,-1.0,,,-0.5,k√∂nemann,,-1.0,,,1.0,"die zauberfl√∂te, partitura","die zauberfl√∂te, partitura",-1.0,,,0.511111,225,1 225
104098,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,aaaaaaaa,1941uuuu,1.0,20000,20000,1.0,bk,bk,1.0,[],[],-1.0,,,-1.0,,,0.796296,2620 5,2620 2620,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.839556,von mozart ; dichtung von emanuel schikaneder ...,von mozart ; dichtung von emanuel schikaneder ...,0.771292,"krusegeorg richard, schikanederemanuel","schikanederemanuel, krusegeorg richard",0.809524,reclam,p. reclam jun.,-1.0,,,0.881356,"die zauberfl√∂te, oper in zwei aufz√ºgen","die zauberfl√∂te, oper in zwei aufz√ºgen : volls...",-1.0,,,1.0,74,74
104099,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,aaaaaaaa,uuuuuuuu,1.0,20000,20000,1.0,bk,bk,1.0,[],[],-1.0,,,-1.0,,,0.796296,2620 5,2620 2620,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.770017,von mozart ; dichtung von emanuel schikaneder ...,wolfgang amadeus mozart ; dichtung von emanuel...,1.0,"krusegeorg richard, schikanederemanuel","krusegeorg richard, schikanederemanuel",0.412698,reclam,philipp reclam,-1.0,,,0.881356,"die zauberfl√∂te, oper in zwei aufz√ºgen","die zauberfl√∂te, oper in zwei aufz√ºgen : volls...",-1.0,,,1.0,74,74
104101,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,1941aaaa,uuuuuuuu,1.0,20000,20000,1.0,bk,bk,1.0,[],[],-1.0,,,-1.0,,,0.796296,2620 2620,2620 5,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.839556,von mozart ; dichtung von emanuel schikaneder ...,von mozart ; dichtung von emanuel schikaneder ...,0.771292,"schikanederemanuel, krusegeorg richard","krusegeorg richard, schikanederemanuel",0.809524,p. reclam jun.,reclam,-1.0,,,0.881356,"die zauberfl√∂te, oper in zwei aufz√ºgen : volls...","die zauberfl√∂te, oper in zwei aufz√ºgen",-1.0,,,1.0,74,74
104105,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,aaaaaaaa,uuuuuuuu,1.0,20000,20000,1.0,bk,bk,1.0,[],[],-1.0,,,-1.0,,,0.796296,2620 2620,2620 5,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.770017,wolfgang amadeus mozart ; dichtung von emanuel...,von mozart ; dichtung von emanuel schikaneder ...,1.0,"krusegeorg richard, schikanederemanuel","krusegeorg richard, schikanederemanuel",0.412698,philipp reclam,reclam,-1.0,,,0.881356,"die zauberfl√∂te, oper in zwei aufz√ºgen : volls...","die zauberfl√∂te, oper in zwei aufz√ºgen",-1.0,,,1.0,74,74


false_predicted_uniques for SVC


Unnamed: 0,duplicates,coordinate_E_delta,coordinate_E_x,coordinate_E_y,coordinate_N_delta,coordinate_N_x,coordinate_N_y,corporate_full_delta,corporate_full_x,corporate_full_y,doi_delta,doi_x,doi_y,edition_delta,edition_x,edition_y,exactDate_delta,exactDate_x,exactDate_y,format_postfix_delta,format_postfix_x,format_postfix_y,format_prefix_delta,format_prefix_x,format_prefix_y,isbn_delta,isbn_x,isbn_y,ismn_delta,ismn_x,ismn_y,musicid_delta,musicid_x,musicid_y,part_delta,part_x,part_y,person_100_delta,person_100_x,person_100_y,person_245c_delta,person_245c_x,person_245c_y,person_700_delta,person_700_x,person_700_y,pubinit_delta,pubinit_x,pubinit_y,scale_delta,scale_x,scale_y,ttlfull_245_delta,ttlfull_245_x,ttlfull_245_y,ttlfull_246_delta,ttlfull_246_x,ttlfull_246_y,volumes_delta,volumes_x,volumes_y
103289,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,aaaaaaaa,1920uuuu,1.0,20000,20000,1.0,bk,bk,1.0,[],[],-1.0,,,-1.0,,,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,-0.5,,w. a. mozart ; oper in 2 akten v. emanuel schi...,-0.5,,schikanederemanuel,-0.5,ernst eulenburg,,-1.0,,,1.0,die zauberfl√∂te,die zauberfl√∂te,-1.0,,,0.0,412,1
103290,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,1920aaaa,uuuuuuuu,1.0,20000,20000,1.0,bk,bk,1.0,[],[],-1.0,,,-1.0,,,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,-0.5,w. a. mozart ; oper in 2 akten v. emanuel schi...,,-0.5,schikanederemanuel,,-0.5,,ernst eulenburg,-1.0,,,1.0,die zauberfl√∂te,die zauberfl√∂te,-1.0,,,0.0,1,412
103609,1,-1.0,,,-1.0,,,-0.5,,wiener philharmoniker,-1.0,,,-1.0,,,0.75,1991aaaa,1991uuuu,1.0,40100,40100,1.0,mu,mu,1.0,[],[],-1.0,,,0.428571,171433.0,433.0,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.499433,von emanuel schikaneder ; wolfgang amadeus mozart,mozart,0.562408,schikanederemanuel,"heilmannuwe, josumi, krausmichael, leitnerlott...",0.805556,decca record,decca,-1.0,,,0.798246,"die zauberfl√∂te, oper in zwei aufz√ºgen",die zauberfl√∂te,-1.0,,,0.733333,2 152,2
103614,1,-1.0,,,-1.0,,,-0.5,wiener philharmoniker,,-1.0,,,-1.0,,,0.75,1991aaaa,1991uuuu,1.0,40100,40100,1.0,mu,mu,1.0,[],[],-1.0,,,0.428571,433210.0,171433.0,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.713536,wolfgang amadeus mozart ; wiener philharmonike...,von emanuel schikaneder ; wolfgang amadeus mozart,0.587407,soltigeorg,schikanederemanuel,-0.5,,decca record,-1.0,,,0.798246,die zauberfl√∂te,"die zauberfl√∂te, oper in zwei aufz√ºgen",-1.0,,,1.0,2 152,2 152
103618,1,-1.0,,,-1.0,,,-0.5,wiener philharmoniker,,-1.0,,,-1.0,,,0.75,1991aaaa,1991uuuu,1.0,40100,40100,1.0,mu,mu,1.0,[],[],-1.0,,,0.428571,433.0,171433.0,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.499433,mozart,von emanuel schikaneder ; wolfgang amadeus mozart,0.562408,"heilmannuwe, josumi, krausmichael, leitnerlott...",schikanederemanuel,0.805556,decca,decca record,-1.0,,,0.798246,die zauberfl√∂te,"die zauberfl√∂te, oper in zwei aufz√ºgen",-1.0,,,0.733333,2,2 152
103653,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,19201929,uuuuuuuu,1.0,10200,10200,1.0,mu,mu,1.0,[],[],-1.0,,,-0.5,245.0,,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.67731,von emanuel schikaneder ; [musik von] wolfgang...,von w.a. mozart ; klavierauszug neu rev. von w...,-0.5,"schikanederemanuel, kienzlwilhelm",,-0.5,universal edition,,-1.0,,,0.854023,"die zauberfl√∂te, il flauto magico : oper in zw...","die zauberfl√∂te, oper in 2 akten = il flauto m...",-1.0,,,1.0,1 167,1 167
103802,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.75,2005aaaa,2005uuuu,1.0,10300,10300,1.0,vm,vm,1.0,[],[],-1.0,,,-0.5,,99036.0,-1.0,,,-1.0,,,1.0,ein film von luc jacquet,ein film von luc jacquet,1.0,jacquetluc,jacquetluc,-0.5,frenetic films,,-1.0,,,0.742268,"die reise der pinguine, die natur schreibt die...",die reise der pinguine,-1.0,,,0.888889,1 82 2,1 82
103810,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,aaaaaaaa,19001999,0.428571,10200,10100,1.0,mu,mu,1.0,[],[],-1.0,,,-0.5,,3714.0,-0.5,,12,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,-0.5,,von w. a. mozart ; revidiert und mit einf√ºhrun...,-1.0,,,-1.0,,,-1.0,,,0.747312,"die zauberfl√∂te, [daraus:] aria ""in diesen hei...",die zauberfl√∂te,-1.0,,,-0.5,,1
103813,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,aaaaaaaa,uuuuuuuu,1.0,10200,10200,1.0,mu,mu,1.0,[],[],-1.0,,,-0.5,,10425.0,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,-0.5,von w.a. mozart ; klavierauszug nach dem in de...,,-0.5,,soldankurt,-1.0,,,-1.0,,,0.794613,"die zauberfl√∂te, oper in zwei aufz√ºgen","die zauberfl√∂te, oper in zwei aufz√ºgen : klavi...",-0.5,,"die zauberfl√∂te, ausgabe f√ºr gesang und klavier",0.733333,1 188,1
103823,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.625,19aaaaaa,1950uuuu,1.0,10100,10100,1.0,mu,mu,1.0,[],[],-1.0,,,0.0,4355.0,912.0,-0.5,,912 912,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.880856,a german opera by emanuel schikaneder ; music ...,a german opera by emanuel schikaneder ; music ...,-0.5,,"aberthermann, schikanederemanuel",-0.5,,e. eulenburg,-1.0,,,0.833333,die zauberfl√∂te,"die zauberfl√∂te, k√∂chel no 620",-1.0,,,0.733333,1,1 412


false_predicted_uniques for SVC_CV


Unnamed: 0,duplicates,coordinate_E_delta,coordinate_E_x,coordinate_E_y,coordinate_N_delta,coordinate_N_x,coordinate_N_y,corporate_full_delta,corporate_full_x,corporate_full_y,doi_delta,doi_x,doi_y,edition_delta,edition_x,edition_y,exactDate_delta,exactDate_x,exactDate_y,format_postfix_delta,format_postfix_x,format_postfix_y,format_prefix_delta,format_prefix_x,format_prefix_y,isbn_delta,isbn_x,isbn_y,ismn_delta,ismn_x,ismn_y,musicid_delta,musicid_x,musicid_y,part_delta,part_x,part_y,person_100_delta,person_100_x,person_100_y,person_245c_delta,person_245c_x,person_245c_y,person_700_delta,person_700_x,person_700_y,pubinit_delta,pubinit_x,pubinit_y,scale_delta,scale_x,scale_y,ttlfull_245_delta,ttlfull_245_x,ttlfull_245_y,ttlfull_246_delta,ttlfull_246_x,ttlfull_246_y,volumes_delta,volumes_x,volumes_y
103289,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,aaaaaaaa,1920uuuu,1.0,20000,20000,1.0,bk,bk,1.0,[],[],-1.0,,,-1.0,,,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,-0.5,,w. a. mozart ; oper in 2 akten v. emanuel schi...,-0.5,,schikanederemanuel,-0.5,ernst eulenburg,,-1.0,,,1.0,die zauberfl√∂te,die zauberfl√∂te,-1.0,,,0.0,412,1
103290,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,1920aaaa,uuuuuuuu,1.0,20000,20000,1.0,bk,bk,1.0,[],[],-1.0,,,-1.0,,,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,-0.5,w. a. mozart ; oper in 2 akten v. emanuel schi...,,-0.5,schikanederemanuel,,-0.5,,ernst eulenburg,-1.0,,,1.0,die zauberfl√∂te,die zauberfl√∂te,-1.0,,,0.0,1,412
103609,1,-1.0,,,-1.0,,,-0.5,,wiener philharmoniker,-1.0,,,-1.0,,,0.75,1991aaaa,1991uuuu,1.0,40100,40100,1.0,mu,mu,1.0,[],[],-1.0,,,0.428571,171433.0,433.0,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.499433,von emanuel schikaneder ; wolfgang amadeus mozart,mozart,0.562408,schikanederemanuel,"heilmannuwe, josumi, krausmichael, leitnerlott...",0.805556,decca record,decca,-1.0,,,0.798246,"die zauberfl√∂te, oper in zwei aufz√ºgen",die zauberfl√∂te,-1.0,,,0.733333,2 152,2
103614,1,-1.0,,,-1.0,,,-0.5,wiener philharmoniker,,-1.0,,,-1.0,,,0.75,1991aaaa,1991uuuu,1.0,40100,40100,1.0,mu,mu,1.0,[],[],-1.0,,,0.428571,433210.0,171433.0,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.713536,wolfgang amadeus mozart ; wiener philharmonike...,von emanuel schikaneder ; wolfgang amadeus mozart,0.587407,soltigeorg,schikanederemanuel,-0.5,,decca record,-1.0,,,0.798246,die zauberfl√∂te,"die zauberfl√∂te, oper in zwei aufz√ºgen",-1.0,,,1.0,2 152,2 152
103618,1,-1.0,,,-1.0,,,-0.5,wiener philharmoniker,,-1.0,,,-1.0,,,0.75,1991aaaa,1991uuuu,1.0,40100,40100,1.0,mu,mu,1.0,[],[],-1.0,,,0.428571,433.0,171433.0,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.499433,mozart,von emanuel schikaneder ; wolfgang amadeus mozart,0.562408,"heilmannuwe, josumi, krausmichael, leitnerlott...",schikanederemanuel,0.805556,decca,decca record,-1.0,,,0.798246,die zauberfl√∂te,"die zauberfl√∂te, oper in zwei aufz√ºgen",-1.0,,,0.733333,2,2 152
103653,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,19201929,uuuuuuuu,1.0,10200,10200,1.0,mu,mu,1.0,[],[],-1.0,,,-0.5,245.0,,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.67731,von emanuel schikaneder ; [musik von] wolfgang...,von w.a. mozart ; klavierauszug neu rev. von w...,-0.5,"schikanederemanuel, kienzlwilhelm",,-0.5,universal edition,,-1.0,,,0.854023,"die zauberfl√∂te, il flauto magico : oper in zw...","die zauberfl√∂te, oper in 2 akten = il flauto m...",-1.0,,,1.0,1 167,1 167
103656,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,aaaaaaaa,1900uuuu,1.0,10200,10200,1.0,mu,mu,1.0,[],[],-1.0,,,-0.5,,10425.0,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.909163,von w.a. mozart ; hrsg. von kurt soldan,von w.a. mozart ; ... hrsg. von kurt soldan,-0.5,,"soldankurt, mozartwolfgang amadeus",-0.5,,c.f. peters,-1.0,,,0.901235,"die zauberfl√∂te, oper in zwei aufz√ºgen : klavi...","die zauberfl√∂te, oper in zwei aufz√ºgen",-1.0,,,1.0,1 188,1 188
103729,1,-1.0,,,-1.0,,,1.0,schweizbundesamt f√ºr landestopografie,schweizbundesamt f√ºr landestopografie,-1.0,,,-1.0,,,0.75,2007aaaa,2007uuuu,1.0,10347,10347,1.0,mp,mp,1.0,[],[],-1.0,,,-1.0,,,-1.0,,,-1.0,,,1.0,bundesamt f√ºr landestopografie swisstopo,bundesamt f√ºr landestopografie swisstopo,1.0,dufourguillaume henri,dufourguillaume henri,-1.0,,,-1.0,,,1.0,"dufourkarten, topografische karte der schweiz","dufourkarten, topografische karte der schweiz",-1.0,,,1.0,2,2
103810,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,aaaaaaaa,19001999,0.428571,10200,10100,1.0,mu,mu,1.0,[],[],-1.0,,,-0.5,,3714.0,-0.5,,12,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,-0.5,,von w. a. mozart ; revidiert und mit einf√ºhrun...,-1.0,,,-1.0,,,-1.0,,,0.747312,"die zauberfl√∂te, [daraus:] aria ""in diesen hei...",die zauberfl√∂te,-1.0,,,-0.5,,1
103813,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,aaaaaaaa,uuuuuuuu,1.0,10200,10200,1.0,mu,mu,1.0,[],[],-1.0,,,-0.5,,10425.0,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,-0.5,von w.a. mozart ; klavierauszug nach dem in de...,,-0.5,,soldankurt,-1.0,,,-1.0,,,0.794613,"die zauberfl√∂te, oper in zwei aufz√ºgen","die zauberfl√∂te, oper in zwei aufz√ºgen : klavi...",-0.5,,"die zauberfl√∂te, ausgabe f√ºr gesang und klavier",0.733333,1 188,1


false_predicted_uniques for NeuralNetwork


Unnamed: 0,duplicates,coordinate_E_delta,coordinate_E_x,coordinate_E_y,coordinate_N_delta,coordinate_N_x,coordinate_N_y,corporate_full_delta,corporate_full_x,corporate_full_y,doi_delta,doi_x,doi_y,edition_delta,edition_x,edition_y,exactDate_delta,exactDate_x,exactDate_y,format_postfix_delta,format_postfix_x,format_postfix_y,format_prefix_delta,format_prefix_x,format_prefix_y,isbn_delta,isbn_x,isbn_y,ismn_delta,ismn_x,ismn_y,musicid_delta,musicid_x,musicid_y,part_delta,part_x,part_y,person_100_delta,person_100_x,person_100_y,person_245c_delta,person_245c_x,person_245c_y,person_700_delta,person_700_x,person_700_y,pubinit_delta,pubinit_x,pubinit_y,scale_delta,scale_x,scale_y,ttlfull_245_delta,ttlfull_245_x,ttlfull_245_y,ttlfull_246_delta,ttlfull_246_x,ttlfull_246_y,volumes_delta,volumes_x,volumes_y
103289,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,aaaaaaaa,1920uuuu,1.0,20000,20000,1.0,bk,bk,1.0,[],[],-1.0,,,-1.0,,,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,-0.5,,w. a. mozart ; oper in 2 akten v. emanuel schi...,-0.5,,schikanederemanuel,-0.5,ernst eulenburg,,-1.0,,,1.0,die zauberfl√∂te,die zauberfl√∂te,-1.0,,,0.0,412,1
103290,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,1920aaaa,uuuuuuuu,1.0,20000,20000,1.0,bk,bk,1.0,[],[],-1.0,,,-1.0,,,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,-0.5,w. a. mozart ; oper in 2 akten v. emanuel schi...,,-0.5,schikanederemanuel,,-0.5,,ernst eulenburg,-1.0,,,1.0,die zauberfl√∂te,die zauberfl√∂te,-1.0,,,0.0,1,412
103609,1,-1.0,,,-1.0,,,-0.5,,wiener philharmoniker,-1.0,,,-1.0,,,0.75,1991aaaa,1991uuuu,1.0,40100,40100,1.0,mu,mu,1.0,[],[],-1.0,,,0.428571,171433.0,433.0,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.499433,von emanuel schikaneder ; wolfgang amadeus mozart,mozart,0.562408,schikanederemanuel,"heilmannuwe, josumi, krausmichael, leitnerlott...",0.805556,decca record,decca,-1.0,,,0.798246,"die zauberfl√∂te, oper in zwei aufz√ºgen",die zauberfl√∂te,-1.0,,,0.733333,2 152,2
103618,1,-1.0,,,-1.0,,,-0.5,wiener philharmoniker,,-1.0,,,-1.0,,,0.75,1991aaaa,1991uuuu,1.0,40100,40100,1.0,mu,mu,1.0,[],[],-1.0,,,0.428571,433.0,171433.0,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.499433,mozart,von emanuel schikaneder ; wolfgang amadeus mozart,0.562408,"heilmannuwe, josumi, krausmichael, leitnerlott...",schikanederemanuel,0.805556,decca,decca record,-1.0,,,0.798246,die zauberfl√∂te,"die zauberfl√∂te, oper in zwei aufz√ºgen",-1.0,,,0.733333,2,2 152
103653,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,19201929,uuuuuuuu,1.0,10200,10200,1.0,mu,mu,1.0,[],[],-1.0,,,-0.5,245.0,,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.67731,von emanuel schikaneder ; [musik von] wolfgang...,von w.a. mozart ; klavierauszug neu rev. von w...,-0.5,"schikanederemanuel, kienzlwilhelm",,-0.5,universal edition,,-1.0,,,0.854023,"die zauberfl√∂te, il flauto magico : oper in zw...","die zauberfl√∂te, oper in 2 akten = il flauto m...",-1.0,,,1.0,1 167,1 167
103810,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,aaaaaaaa,19001999,0.428571,10200,10100,1.0,mu,mu,1.0,[],[],-1.0,,,-0.5,,3714.0,-0.5,,12,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,-0.5,,von w. a. mozart ; revidiert und mit einf√ºhrun...,-1.0,,,-1.0,,,-1.0,,,0.747312,"die zauberfl√∂te, [daraus:] aria ""in diesen hei...",die zauberfl√∂te,-1.0,,,-0.5,,1
103813,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,aaaaaaaa,uuuuuuuu,1.0,10200,10200,1.0,mu,mu,1.0,[],[],-1.0,,,-0.5,,10425.0,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,-0.5,von w.a. mozart ; klavierauszug nach dem in de...,,-0.5,,soldankurt,-1.0,,,-1.0,,,0.794613,"die zauberfl√∂te, oper in zwei aufz√ºgen","die zauberfl√∂te, oper in zwei aufz√ºgen : klavi...",-0.5,,"die zauberfl√∂te, ausgabe f√ºr gesang und klavier",0.733333,1 188,1
103867,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.75,1471aaaa,1471uuuu,1.0,20053,20053,1.0,bk,bk,1.0,[],[],-1.0,,,-1.0,,,-1.0,,,1.0,crescenzipietro de',crescenzipietro de',-1.0,,,-1.0,,,-1.0,,,-1.0,,,1.0,ruralia commoda,ruralia commoda,-1.0,,,1.0,418,418
103872,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,aaaaaaaa,19451966,0.428571,30300,30700,1.0,cr,cr,0.0,[],[1144-1321],-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-0.5,idealibri,,-1.0,,,0.681481,arts,"arts, beaux-arts, litt√©rature, spectacles : (j...",-1.0,,,-1.0,,
104086,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,1880aaaa,uuuuuuuu,1.0,20000,20000,1.0,bk,bk,1.0,[],[],-1.0,,,-1.0,,,-0.5,21,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.761014,von emanuel schikaneder ; musik von w.a. mozart,von emanuel schikaneder [und j.g.k.l. giesecke],0.892308,schikanederemanuel,"schikanederemanuel, gieseckecarl ludwig",-0.5,breitkopf & h√§rtel,,-1.0,,,0.989744,"die zauberfl√∂te, il flauto magico : deutsche o...","die zauberfl√∂te, (il flauto magico) : deutsche...",-1.0,,,0.0,34,1


Run 2 - false_predicted_duplicates 
*****
false_predicted_duplicates for DecisionTreeClassifier


Unnamed: 0,duplicates,coordinate_E_delta,coordinate_E_x,coordinate_E_y,coordinate_N_delta,coordinate_N_x,coordinate_N_y,corporate_full_delta,corporate_full_x,corporate_full_y,doi_delta,doi_x,doi_y,edition_delta,edition_x,edition_y,exactDate_delta,exactDate_x,exactDate_y,format_postfix_delta,format_postfix_x,format_postfix_y,format_prefix_delta,format_prefix_x,format_prefix_y,isbn_delta,isbn_x,isbn_y,ismn_delta,ismn_x,ismn_y,musicid_delta,musicid_x,musicid_y,part_delta,part_x,part_y,person_100_delta,person_100_x,person_100_y,person_245c_delta,person_245c_x,person_245c_y,person_700_delta,person_700_x,person_700_y,pubinit_delta,pubinit_x,pubinit_y,scale_delta,scale_x,scale_y,ttlfull_245_delta,ttlfull_245_x,ttlfull_245_y,ttlfull_246_delta,ttlfull_246_x,ttlfull_246_y,volumes_delta,volumes_x,volumes_y
1224,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,aaaaaaaa,20121970,0.428571,10200,10000,1.0,mu,mu,1.0,[],[],-0.5,,9790006201334.0,-0.5,,155.0,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.664665,von w.a. mozart ; klavierauszug nach dem in de...,w. a. mozart ; herausgegeben von/edited by ger...,-0.5,,"schikanederemanuel, grubergernot, orelalfred, ...",-1.0,,,-1.0,,,0.752098,"die zauberfl√∂te, oper in zwei aufz√ºgen","die zauberfl√∂te, kv 620",-1.0,,,0.6,1 188,1 370
2597,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.75,1970aaaa,1970uuuu,1.0,10200,10200,1.0,mu,mu,1.0,[],[],-1.0,,,-0.5,,4553.0,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.743253,wolfgang amadeus mozart ; text von emanuel sch...,w.a. mozart ; libretto: emanuel schikaneder ; ...,0.749495,"grubergernot, moehnheinz, schikanederemanuel","moehnheinz, schikanederemanuel",1.0,b√§renreiter,b√§renreiter,-1.0,,,0.83484,"die zauberfl√∂te, eine deutsche oper in zwei au...","die zauberfl√∂te, eine deutsche oper in zwei au...",-1.0,,,1.0,1,1
22113,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-0.5,1.0,,0.75,2007aaaa,2007uuuu,1.0,20000,20000,1.0,bk,bk,0.0,"[978-3-7815-1531-4, 3-7815-1531-1]",[978-3-7815-1513-0],-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.856549,hrsg. michaela gl√§ser-zikuda ... [et al.],hrsg. von michaela gl√§ser-zikuda ... [et al.],-0.5,gl√§ser-zikudamichaela,,-0.5,klinkhardt,,-1.0,,,1.0,"lernprozesse dokumentieren, reflektieren und b...","lernprozesse dokumentieren, reflektieren und b...",-1.0,,,1.0,304,304
25401,0,-1.0,,,-1.0,,,0.095238,les arts florissants,wiener philharmoniker,-1.0,,,-1.0,,,0.25,1996aaaa,2004uuuu,1.0,40100,40100,1.0,mu,mu,1.0,[],[],-1.0,,,-0.5,630.0,,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,-0.5,wolfgang amadeus mozart ; libretto: emanuel sc...,,0.559259,"mozartwolfgang amadeus, schikanederemanuel, ch...",karajanherbert von,-0.5,,membran,-1.0,,,0.762821,"die zauberfl√∂te, the magic flute : opera in tw...",die zauberfl√∂te,-1.0,,,1.0,2,2
35050,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,19aaaaaa,1880uuuu,1.0,10200,10200,1.0,mu,mu,1.0,[],[],-1.0,,,0.25,245.0,5944.0,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.571429,von emanuel schikaneder ; [musik von] wolfgang...,von w. a. mozart,-1.0,,,-0.5,,peters,-1.0,,,0.912726,"die zauberfl√∂te, oper in zwei akten [kv 620] =...","die zauberfl√∂te, oper in zwei akten = il flaut...",-0.5,"die zauberfl√∂te, ausgabe f√ºr gesang und klavier",,1.0,1,1
39154,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.75,2011aaaa,2011uuuu,1.0,10300,10300,1.0,vm,vm,1.0,[],[],-1.0,,,-1.0,,,0.777778,1 1,1,-0.5,schl√∂ndorffvolker,,0.700586,volker schl√∂ndorff ; nach dem roman von max fr...,regie: volker schl√∂ndorff ; nach dem roman von...,0.612207,"frischmax, junkersdorfeberhard","schl√∂ndorffvolker, frischmax, shepardsam, depl...",0.765873,"suhrkamp, absolut medien",absolut medien,-1.0,,,1.0,homo faber,homo faber,-1.0,,,1.0,1 117,1 117
47454,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.625,18aaaaaa,1875uuuu,1.0,10200,10200,1.0,mu,mu,1.0,[],[],-1.0,,,-0.5,,491.0,0.833333,2,23,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,-0.5,,von w.a. mozart,-0.5,,radecke,-0.5,,holle,-1.0,,,0.538961,"die zauberfl√∂te, grosse oper in zwei aufz√ºgen ...",zauberfl√∂te,-1.0,,,-0.5,,1
52061,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,aaaaaaaa,uuuuuuuu,0.428571,10200,10100,1.0,mu,mu,1.0,[],[],-1.0,,,-0.5,,3884.0,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.52749,"die zauberfl√∂te, [daraus:] aria ""in diesen hei...",zauberfl√∂te-ouvert√ºre,-1.0,,,-0.5,,1
55511,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,aaaaaaaa,192uuuuu,0.428571,10200,10100,1.0,mu,mu,1.0,[],[],-1.0,,,-0.5,,3884.0,-0.5,,14,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.637904,von w.a. mozart ; klavierauszug neu rev. von w...,w.a. mozart,-1.0,,,-1.0,,,-1.0,,,0.530399,"die zauberfl√∂te, oper in 2 akten = il flauto m...","zauberfl√∂te-ouvert√ºre, [kv 620] : [1791]",-1.0,,,0.783333,1 167,1 36
58868,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.75,1984aaaa,1984uuuu,1.0,20000,20000,1.0,bk,bk,1.0,[],[],-1.0,,,-1.0,,,-0.5,1,,-1.0,,,0.75,sigrid kessler... [et al.] ; [√©d.:] interkanto...,sigrid kessler... [et al.],-0.5,,kesslersigrid,1.0,staatlicher lehrmittelverlag,staatlicher lehrmittelverlag,-1.0,,,0.759266,"bonne chance!, cours de langue fran√ßaise, troi...","bonne chance !, cours de langue fran√ßaise troi...",-1.0,,,1.0,237,237


false_predicted_duplicates for DecisionTreeClassifier_CV


Unnamed: 0,duplicates,coordinate_E_delta,coordinate_E_x,coordinate_E_y,coordinate_N_delta,coordinate_N_x,coordinate_N_y,corporate_full_delta,corporate_full_x,corporate_full_y,doi_delta,doi_x,doi_y,edition_delta,edition_x,edition_y,exactDate_delta,exactDate_x,exactDate_y,format_postfix_delta,format_postfix_x,format_postfix_y,format_prefix_delta,format_prefix_x,format_prefix_y,isbn_delta,isbn_x,isbn_y,ismn_delta,ismn_x,ismn_y,musicid_delta,musicid_x,musicid_y,part_delta,part_x,part_y,person_100_delta,person_100_x,person_100_y,person_245c_delta,person_245c_x,person_245c_y,person_700_delta,person_700_x,person_700_y,pubinit_delta,pubinit_x,pubinit_y,scale_delta,scale_x,scale_y,ttlfull_245_delta,ttlfull_245_x,ttlfull_245_y,ttlfull_246_delta,ttlfull_246_x,ttlfull_246_y,volumes_delta,volumes_x,volumes_y
628,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.625,1999aaaa,1998uuuu,1.0,20000,20000,1.0,bk,bk,1.0,[3-495-47879-5],[3-495-47879-5],-1.0,,,-1.0,,,-0.5,,57.0,1.0,fluryandreas,fluryandreas,1.0,andreas flury,andreas flury,-1.0,,,1.0,alber,alber,-1.0,,,1.0,"der moralische status der tiere, henry salt, p...","der moralische status der tiere, henry salt, p...",-1.0,,,0.777778,316,356
1222,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.625,19aaaaaa,19601969,0.428571,10200,10100,1.0,mu,mu,1.0,[],[],-1.0,,,0.25,245.0,4355.0,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.694948,von emanuel schikaneder ; [musik von] wolfgang...,by emanuel schikaneder ; music by wolfgang ama...,-0.5,,"schikanederemanuel, aberthermann",-0.5,,eulenburg,-1.0,,,0.747011,"die zauberfl√∂te, oper in zwei akten [kv 620] =...","die zauberfl√∂te, a german opera : k√∂chel no. 620",-0.5,"die zauberfl√∂te, ausgabe f√ºr gesang und klavier",,1.0,1,1
2597,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.75,1970aaaa,1970uuuu,1.0,10200,10200,1.0,mu,mu,1.0,[],[],-1.0,,,-0.5,,4553.0,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.743253,wolfgang amadeus mozart ; text von emanuel sch...,w.a. mozart ; libretto: emanuel schikaneder ; ...,0.749495,"grubergernot, moehnheinz, schikanederemanuel","moehnheinz, schikanederemanuel",1.0,b√§renreiter,b√§renreiter,-1.0,,,0.83484,"die zauberfl√∂te, eine deutsche oper in zwei au...","die zauberfl√∂te, eine deutsche oper in zwei au...",-1.0,,,1.0,1,1
22113,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-0.5,1.0,,0.75,2007aaaa,2007uuuu,1.0,20000,20000,1.0,bk,bk,0.0,"[978-3-7815-1531-4, 3-7815-1531-1]",[978-3-7815-1513-0],-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.856549,hrsg. michaela gl√§ser-zikuda ... [et al.],hrsg. von michaela gl√§ser-zikuda ... [et al.],-0.5,gl√§ser-zikudamichaela,,-0.5,klinkhardt,,-1.0,,,1.0,"lernprozesse dokumentieren, reflektieren und b...","lernprozesse dokumentieren, reflektieren und b...",-1.0,,,1.0,304,304
22615,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.625,19aaaaaa,1955uuuu,0.111111,10200,40000,1.0,mu,mu,1.0,[],[],-1.0,,,0.142857,245.0,4357412.0,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.573413,von emanuel schikaneder ; [musik von] wolfgang...,mozart,-0.5,,mozartwolfgang amadeus,-0.5,,polydor,-1.0,,,0.74143,"die zauberfl√∂te, oper in zwei akten [kv 620] =...",die zauberfl√∂te [¬Ö],-0.5,"die zauberfl√∂te, ausgabe f√ºr gesang und klavier",,0.777778,1,1 2
36341,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,aaaaaaaa,181uuuuu,0.428571,10200,10000,1.0,mu,mu,1.0,[],[],-1.0,,,0.0,245.0,117.0,-0.5,,2.0,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.715134,von emanuel schikaneder ; [musik von] wolfgang...,von w. a. mozart ; in vollst√§ndigem clavieraus...,0.681818,"kienzlwilhelm, schikanederemanuel",schikanederemanuel,-0.5,,heckel,-1.0,,,0.752891,"die zauberfl√∂te (il flauto magico), oper in zw...","die zauberfl√∂te, grosse oper in zwei aufz√ºgen",-0.5,"die zauberfl√∂te, ausgabe f√ºr gesang und klavier",,1.0,1,1
39154,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.75,2011aaaa,2011uuuu,1.0,10300,10300,1.0,vm,vm,1.0,[],[],-1.0,,,-1.0,,,0.777778,1 1,1.0,-0.5,schl√∂ndorffvolker,,0.700586,volker schl√∂ndorff ; nach dem roman von max fr...,regie: volker schl√∂ndorff ; nach dem roman von...,0.612207,"frischmax, junkersdorfeberhard","schl√∂ndorffvolker, frischmax, shepardsam, depl...",0.765873,"suhrkamp, absolut medien",absolut medien,-1.0,,,1.0,homo faber,homo faber,-1.0,,,1.0,1 117,1 117
56493,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-0.5,2.0,,0.75,1990aaaa,1990uuuu,1.0,20000,20000,1.0,bk,bk,1.0,[],[],-1.0,,,-1.0,,,-0.5,,4.0,-1.0,,,0.691477,sigrid kessler ... [et al.],sigrid kessler... [et al.] ; [√©d.:] interkanto...,-0.5,kesslersigrid,,1.0,staatlicher lehrmittelverlag,staatlicher lehrmittelverlag,-1.0,,,0.842656,"bonne chance!, cours de langue fran√ßaise : tro...","bonne chance!, cours de langue fran√ßaise, deux...",-1.0,,,-0.5,,589
58382,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,aaaaaaaa,1970uuuu,1.0,20000,20000,1.0,bk,bk,1.0,[],[],-1.0,,,-1.0,,,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.596141,von emanuel schikaneder [und j.g.k.l. giesecke],wolfgang amadeus mozart ; nacherz√§hlt von ingr...,0.659968,"schikanederemanuel, gieseckecarl ludwig","weixelbaumeringrid, riera rochasroque",-0.5,,ueberreuter,-1.0,,,0.74359,"die zauberfl√∂te, (il flauto magico) : deutsche...",die zauberfl√∂te,-1.0,,,1.0,1,1
59185,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.75,19001999,19601969,1.0,10100,10100,1.0,mu,mu,1.0,[],[],-1.0,,,0.25,3714.0,4355.0,-0.5,12,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.654449,von w. a. mozart ; revidiert und mit einf√ºhrun...,by emanuel schikaneder ; music by wolfgang ama...,-0.5,,"schikanederemanuel, aberthermann",-0.5,,eulenburg,-1.0,,,0.770833,die zauberfl√∂te,"die zauberfl√∂te, a german opera : k√∂chel no. 620",-1.0,,,1.0,1,1


false_predicted_duplicates for RandomForestClassifier


Unnamed: 0,duplicates,coordinate_E_delta,coordinate_E_x,coordinate_E_y,coordinate_N_delta,coordinate_N_x,coordinate_N_y,corporate_full_delta,corporate_full_x,corporate_full_y,doi_delta,doi_x,doi_y,edition_delta,edition_x,edition_y,exactDate_delta,exactDate_x,exactDate_y,format_postfix_delta,format_postfix_x,format_postfix_y,format_prefix_delta,format_prefix_x,format_prefix_y,isbn_delta,isbn_x,isbn_y,ismn_delta,ismn_x,ismn_y,musicid_delta,musicid_x,musicid_y,part_delta,part_x,part_y,person_100_delta,person_100_x,person_100_y,person_245c_delta,person_245c_x,person_245c_y,person_700_delta,person_700_x,person_700_y,pubinit_delta,pubinit_x,pubinit_y,scale_delta,scale_x,scale_y,ttlfull_245_delta,ttlfull_245_x,ttlfull_245_y,ttlfull_246_delta,ttlfull_246_x,ttlfull_246_y,volumes_delta,volumes_x,volumes_y
1290,0,-1.0,,,-1.0,,,0.8,les arts florissants,arts florissants,-1.0,,,-1.0,,,0.75,1996aaaa,1996uuuu,0.428571,40100,40000,1.0,mu,mu,1.0,[],[],-1.0,,,1.0,630.0,630.0,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.591919,wolfgang amadeus mozart ; libretto: emanuel sc...,mozart,0.775282,"mozartwolfgang amadeus, schikanederemanuel, ch...","christiewilliam, dessaynatalie, mannionrosa, b...",-0.5,,erato,-1.0,,,0.709588,"die zauberfl√∂te, the magic flute : opera in tw...","die zauberfl√∂te, kv 620",-1.0,,,0.777778,2,2 1
2597,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.75,1970aaaa,1970uuuu,1.0,10200,10200,1.0,mu,mu,1.0,[],[],-1.0,,,-0.5,,4553.0,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.743253,wolfgang amadeus mozart ; text von emanuel sch...,w.a. mozart ; libretto: emanuel schikaneder ; ...,0.749495,"grubergernot, moehnheinz, schikanederemanuel","moehnheinz, schikanederemanuel",1.0,b√§renreiter,b√§renreiter,-1.0,,,0.83484,"die zauberfl√∂te, eine deutsche oper in zwei au...","die zauberfl√∂te, eine deutsche oper in zwei au...",-1.0,,,1.0,1,1
22113,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-0.5,1.0,,0.75,2007aaaa,2007uuuu,1.0,20000,20000,1.0,bk,bk,0.0,"[978-3-7815-1531-4, 3-7815-1531-1]",[978-3-7815-1513-0],-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.856549,hrsg. michaela gl√§ser-zikuda ... [et al.],hrsg. von michaela gl√§ser-zikuda ... [et al.],-0.5,gl√§ser-zikudamichaela,,-0.5,klinkhardt,,-1.0,,,1.0,"lernprozesse dokumentieren, reflektieren und b...","lernprozesse dokumentieren, reflektieren und b...",-1.0,,,1.0,304,304
39154,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.75,2011aaaa,2011uuuu,1.0,10300,10300,1.0,vm,vm,1.0,[],[],-1.0,,,-1.0,,,0.777778,1 1,1,-0.5,schl√∂ndorffvolker,,0.700586,volker schl√∂ndorff ; nach dem roman von max fr...,regie: volker schl√∂ndorff ; nach dem roman von...,0.612207,"frischmax, junkersdorfeberhard","schl√∂ndorffvolker, frischmax, shepardsam, depl...",0.765873,"suhrkamp, absolut medien",absolut medien,-1.0,,,1.0,homo faber,homo faber,-1.0,,,1.0,1 117,1 117
58588,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.75,20091990,20081991,1.0,10300,10300,1.0,vm,vm,1.0,[],[],-1.0,,,-0.5,502430.0,,-1.0,,,-1.0,,,0.794715,ein film von volker schl√∂ndorff ; nach dem rom...,ein film von volker schl√∂ndorff ; nach dem rom...,0.871796,"schl√∂ndorffvolker, wurlitzerrudy, frischmax, s...","schl√∂ndorffvolker, frischmax, arvantisjorgos, ...",1.0,kinowelt home entertainment,kinowelt home entertainment,-1.0,,,1.0,homo faber,homo faber,-1.0,,,1.0,1 109,1 109
58868,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.75,1984aaaa,1984uuuu,1.0,20000,20000,1.0,bk,bk,1.0,[],[],-1.0,,,-1.0,,,-0.5,1,,-1.0,,,0.75,sigrid kessler... [et al.] ; [√©d.:] interkanto...,sigrid kessler... [et al.],-0.5,,kesslersigrid,1.0,staatlicher lehrmittelverlag,staatlicher lehrmittelverlag,-1.0,,,0.759266,"bonne chance!, cours de langue fran√ßaise, troi...","bonne chance !, cours de langue fran√ßaise troi...",-1.0,,,1.0,237,237
85901,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,1.0,20151475,20151475,1.0,20000,20000,1.0,bk,bk,1.0,"[88-7922-121-3, 978-88-7922-121-4]",[978-88-7922-121-4],-1.0,,,-1.0,,,1.0,1 1,1 1,1.0,petrarcafrancesco,petrarcafrancesco,1.0,francesco petrarca ; commento di bernardo lapini,francesco petrarca ; commento di bernardo lapini,1.0,lapinibernardo,lapinibernardo,-0.5,"adv, biblioteca cantonale di lugano",,-1.0,,,0.691756,"trionfi, riedizione accurata dell'incunabolo c...",trionfi,-1.0,,,0.703704,2,2 494 102
97551,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.75,2011aaaa,20111991,1.0,10300,10300,1.0,vm,vm,1.0,[],[],-1.0,,,-1.0,,,0.777778,1 1,1,-0.5,schl√∂ndorffvolker,,0.741667,volker schl√∂ndorff ; nach dem roman von max fr...,volker schl√∂ndorff,0.590909,"frischmax, junkersdorfeberhard","schl√∂ndorffvolker, frischmax, shepardsam, delp...",0.777778,"suhrkamp, absolut medien",suhrkamp,-1.0,,,0.747967,homo faber,"homo faber, nach dem roman von max frisch",-1.0,,,1.0,1 117,1 117


false_predicted_duplicates for SVC


Unnamed: 0,duplicates,coordinate_E_delta,coordinate_E_x,coordinate_E_y,coordinate_N_delta,coordinate_N_x,coordinate_N_y,corporate_full_delta,corporate_full_x,corporate_full_y,doi_delta,doi_x,doi_y,edition_delta,edition_x,edition_y,exactDate_delta,exactDate_x,exactDate_y,format_postfix_delta,format_postfix_x,format_postfix_y,format_prefix_delta,format_prefix_x,format_prefix_y,isbn_delta,isbn_x,isbn_y,ismn_delta,ismn_x,ismn_y,musicid_delta,musicid_x,musicid_y,part_delta,part_x,part_y,person_100_delta,person_100_x,person_100_y,person_245c_delta,person_245c_x,person_245c_y,person_700_delta,person_700_x,person_700_y,pubinit_delta,pubinit_x,pubinit_y,scale_delta,scale_x,scale_y,ttlfull_245_delta,ttlfull_245_x,ttlfull_245_y,ttlfull_246_delta,ttlfull_246_x,ttlfull_246_y,volumes_delta,volumes_x,volumes_y
628,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.625,1999aaaa,1998uuuu,1.0,20000,20000,1.0,bk,bk,1.0,[3-495-47879-5],[3-495-47879-5],-1.0,,,-1.0,,,-0.5,,57.0,1.0,fluryandreas,fluryandreas,1.0,andreas flury,andreas flury,-1.0,,,1.0,alber,alber,-1.0,,,1.0,"der moralische status der tiere, henry salt, p...","der moralische status der tiere, henry salt, p...",-1.0,,,0.777778,316,356
2597,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.75,1970aaaa,1970uuuu,1.0,10200,10200,1.0,mu,mu,1.0,[],[],-1.0,,,-0.5,,4553.0,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.743253,wolfgang amadeus mozart ; text von emanuel sch...,w.a. mozart ; libretto: emanuel schikaneder ; ...,0.749495,"grubergernot, moehnheinz, schikanederemanuel","moehnheinz, schikanederemanuel",1.0,b√§renreiter,b√§renreiter,-1.0,,,0.83484,"die zauberfl√∂te, eine deutsche oper in zwei au...","die zauberfl√∂te, eine deutsche oper in zwei au...",-1.0,,,1.0,1,1
2688,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-0.5,2.0,,0.5,1836aaaa,18001899,1.0,10200,10200,1.0,mu,mu,1.0,[],[],-1.0,,,-0.5,134.0,,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.748787,von w. a. mozart ; [text von emanuel schikaned...,w.a. mozart ; in vollst√§ndigem clavierauszug m...,-0.5,schikanederemanuel,,-0.5,meyer,,-1.0,,,0.97575,"die zauberfl√∂te, grosse oper in zwei aufz√ºgen ...","die zauberfl√∂te, grosse oper in zwei aufz√ºgen ...",-1.0,,,1.0,1,1
7361,0,-1.0,,,-1.0,,,-0.5,interkantonale lehrmittelzentrale,,-1.0,,,-1.0,,,0.75,1981aaaa,1981uuuu,1.0,20000,20000,1.0,bk,bk,1.0,[],[],-1.0,,,-1.0,,,-1.0,,,-0.5,binderheidy,,0.74821,"heidy binder, sigrid kessler, charlotte ritsch...",heidy binder... [et al.] ; [√©d.:] interkantona...,-0.5,"kesslersigrid, ritschardcharlotte",,0.963845,staatlicher lehrmittelverlag,staatlicher lehrmittelverl.,-1.0,,,0.921296,"bonne chance!, cours de langue fran√ßaise, prem...","bonne chance!, cours de langue fran√ßaise, prem...",-1.0,,,0.777778,4,413
17455,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,1983aaaa,1970uuuu,1.0,10200,10200,1.0,mu,mu,1.0,[],[],-1.0,,,1.0,4553.0,4553.0,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.730483,wolfgang amadeus mozart ; text von emanuel sch...,w.a. mozart ; libretto: emanuel schikaneder ; ...,0.636111,"grubergernot, orelalfred, moehnheinz, schikane...","moehnheinz, schikanederemanuel",1.0,b√§renreiter,b√§renreiter,-1.0,,,0.83484,"die zauberfl√∂te, eine deutsche oper in zwei au...","die zauberfl√∂te, eine deutsche oper in zwei au...",-1.0,,,0.733333,1 221,1
22113,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-0.5,1.0,,0.75,2007aaaa,2007uuuu,1.0,20000,20000,1.0,bk,bk,0.0,"[978-3-7815-1531-4, 3-7815-1531-1]",[978-3-7815-1513-0],-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.856549,hrsg. michaela gl√§ser-zikuda ... [et al.],hrsg. von michaela gl√§ser-zikuda ... [et al.],-0.5,gl√§ser-zikudamichaela,,-0.5,klinkhardt,,-1.0,,,1.0,"lernprozesse dokumentieren, reflektieren und b...","lernprozesse dokumentieren, reflektieren und b...",-1.0,,,1.0,304,304
39154,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.75,2011aaaa,2011uuuu,1.0,10300,10300,1.0,vm,vm,1.0,[],[],-1.0,,,-1.0,,,0.777778,1 1,1.0,-0.5,schl√∂ndorffvolker,,0.700586,volker schl√∂ndorff ; nach dem roman von max fr...,regie: volker schl√∂ndorff ; nach dem roman von...,0.612207,"frischmax, junkersdorfeberhard","schl√∂ndorffvolker, frischmax, shepardsam, depl...",0.765873,"suhrkamp, absolut medien",absolut medien,-1.0,,,1.0,homo faber,homo faber,-1.0,,,1.0,1 117,1 117
42386,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.75,1987aaaa,1987uuuu,1.0,20000,20000,1.0,bk,bk,1.0,[],[],-1.0,,,-1.0,,,-1.0,,,-1.0,,,-0.5,,heidy binder... [et al.],-0.5,,binderheidy,0.977778,[staatlicher lehrmittelverlag],staatlicher lehrmittelverlag,-1.0,,,0.910003,"bonne chance !, cours de langue fran√ßaise prem...","bonne chance !, cours de langue fran√ßaise prem...",-1.0,,,0.555556,134,145
58588,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.75,20091990,20081991,1.0,10300,10300,1.0,vm,vm,1.0,[],[],-1.0,,,-0.5,502430.0,,-1.0,,,-1.0,,,0.794715,ein film von volker schl√∂ndorff ; nach dem rom...,ein film von volker schl√∂ndorff ; nach dem rom...,0.871796,"schl√∂ndorffvolker, wurlitzerrudy, frischmax, s...","schl√∂ndorffvolker, frischmax, arvantisjorgos, ...",1.0,kinowelt home entertainment,kinowelt home entertainment,-1.0,,,1.0,homo faber,homo faber,-1.0,,,1.0,1 109,1 109
58868,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.75,1984aaaa,1984uuuu,1.0,20000,20000,1.0,bk,bk,1.0,[],[],-1.0,,,-1.0,,,-0.5,1,,-1.0,,,0.75,sigrid kessler... [et al.] ; [√©d.:] interkanto...,sigrid kessler... [et al.],-0.5,,kesslersigrid,1.0,staatlicher lehrmittelverlag,staatlicher lehrmittelverlag,-1.0,,,0.759266,"bonne chance!, cours de langue fran√ßaise, troi...","bonne chance !, cours de langue fran√ßaise troi...",-1.0,,,1.0,237,237


false_predicted_duplicates for SVC_CV


Unnamed: 0,duplicates,coordinate_E_delta,coordinate_E_x,coordinate_E_y,coordinate_N_delta,coordinate_N_x,coordinate_N_y,corporate_full_delta,corporate_full_x,corporate_full_y,doi_delta,doi_x,doi_y,edition_delta,edition_x,edition_y,exactDate_delta,exactDate_x,exactDate_y,format_postfix_delta,format_postfix_x,format_postfix_y,format_prefix_delta,format_prefix_x,format_prefix_y,isbn_delta,isbn_x,isbn_y,ismn_delta,ismn_x,ismn_y,musicid_delta,musicid_x,musicid_y,part_delta,part_x,part_y,person_100_delta,person_100_x,person_100_y,person_245c_delta,person_245c_x,person_245c_y,person_700_delta,person_700_x,person_700_y,pubinit_delta,pubinit_x,pubinit_y,scale_delta,scale_x,scale_y,ttlfull_245_delta,ttlfull_245_x,ttlfull_245_y,ttlfull_246_delta,ttlfull_246_x,ttlfull_246_y,volumes_delta,volumes_x,volumes_y
628,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.625,1999aaaa,1998uuuu,1.0,20000,20000,1.0,bk,bk,1.0,[3-495-47879-5],[3-495-47879-5],-1.0,,,-1.0,,,-0.5,,57,1.0,fluryandreas,fluryandreas,1.0,andreas flury,andreas flury,-1.0,,,1.0,alber,alber,-1.0,,,1.0,"der moralische status der tiere, henry salt, p...","der moralische status der tiere, henry salt, p...",-1.0,,,0.777778,316,356
2597,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.75,1970aaaa,1970uuuu,1.0,10200,10200,1.0,mu,mu,1.0,[],[],-1.0,,,-0.5,,4553.0,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.743253,wolfgang amadeus mozart ; text von emanuel sch...,w.a. mozart ; libretto: emanuel schikaneder ; ...,0.749495,"grubergernot, moehnheinz, schikanederemanuel","moehnheinz, schikanederemanuel",1.0,b√§renreiter,b√§renreiter,-1.0,,,0.83484,"die zauberfl√∂te, eine deutsche oper in zwei au...","die zauberfl√∂te, eine deutsche oper in zwei au...",-1.0,,,1.0,1,1
11282,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.625,1989aaaa,1999uuuu,1.0,20000,20000,1.0,bk,bk,1.0,[2-08-070552-0],[2-08-070552-0],-1.0,,,-1.0,,,1.0,552 552,552 552,1.0,voltaire,voltaire,0.90777,"voltaire ; introd., notes, bibliogr., chronolo...","voltaire ; introd., notes , bibliogr., chronol...",1.0,pomeauren√©,pomeauren√©,0.923077,flammarion,gf flammarion,-1.0,,,1.0,trait√© sur la tol√©rance,trait√© sur la tol√©rance,-1.0,,,1.0,192,192
17455,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,1983aaaa,1970uuuu,1.0,10200,10200,1.0,mu,mu,1.0,[],[],-1.0,,,1.0,4553.0,4553.0,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.730483,wolfgang amadeus mozart ; text von emanuel sch...,w.a. mozart ; libretto: emanuel schikaneder ; ...,0.636111,"grubergernot, orelalfred, moehnheinz, schikane...","moehnheinz, schikanederemanuel",1.0,b√§renreiter,b√§renreiter,-1.0,,,0.83484,"die zauberfl√∂te, eine deutsche oper in zwei au...","die zauberfl√∂te, eine deutsche oper in zwei au...",-1.0,,,0.733333,1 221,1
22113,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-0.5,1.0,,0.75,2007aaaa,2007uuuu,1.0,20000,20000,1.0,bk,bk,0.0,"[978-3-7815-1531-4, 3-7815-1531-1]",[978-3-7815-1513-0],-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.856549,hrsg. michaela gl√§ser-zikuda ... [et al.],hrsg. von michaela gl√§ser-zikuda ... [et al.],-0.5,gl√§ser-zikudamichaela,,-0.5,klinkhardt,,-1.0,,,1.0,"lernprozesse dokumentieren, reflektieren und b...","lernprozesse dokumentieren, reflektieren und b...",-1.0,,,1.0,304,304
39154,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.75,2011aaaa,2011uuuu,1.0,10300,10300,1.0,vm,vm,1.0,[],[],-1.0,,,-1.0,,,0.777778,1 1,1,-0.5,schl√∂ndorffvolker,,0.700586,volker schl√∂ndorff ; nach dem roman von max fr...,regie: volker schl√∂ndorff ; nach dem roman von...,0.612207,"frischmax, junkersdorfeberhard","schl√∂ndorffvolker, frischmax, shepardsam, depl...",0.765873,"suhrkamp, absolut medien",absolut medien,-1.0,,,1.0,homo faber,homo faber,-1.0,,,1.0,1 117,1 117
42386,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.75,1987aaaa,1987uuuu,1.0,20000,20000,1.0,bk,bk,1.0,[],[],-1.0,,,-1.0,,,-1.0,,,-1.0,,,-0.5,,heidy binder... [et al.],-0.5,,binderheidy,0.977778,[staatlicher lehrmittelverlag],staatlicher lehrmittelverlag,-1.0,,,0.910003,"bonne chance !, cours de langue fran√ßaise prem...","bonne chance !, cours de langue fran√ßaise prem...",-1.0,,,0.555556,134,145
58588,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.75,20091990,20081991,1.0,10300,10300,1.0,vm,vm,1.0,[],[],-1.0,,,-0.5,502430.0,,-1.0,,,-1.0,,,0.794715,ein film von volker schl√∂ndorff ; nach dem rom...,ein film von volker schl√∂ndorff ; nach dem rom...,0.871796,"schl√∂ndorffvolker, wurlitzerrudy, frischmax, s...","schl√∂ndorffvolker, frischmax, arvantisjorgos, ...",1.0,kinowelt home entertainment,kinowelt home entertainment,-1.0,,,1.0,homo faber,homo faber,-1.0,,,1.0,1 109,1 109
58868,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.75,1984aaaa,1984uuuu,1.0,20000,20000,1.0,bk,bk,1.0,[],[],-1.0,,,-1.0,,,-0.5,1,,-1.0,,,0.75,sigrid kessler... [et al.] ; [√©d.:] interkanto...,sigrid kessler... [et al.],-0.5,,kesslersigrid,1.0,staatlicher lehrmittelverlag,staatlicher lehrmittelverlag,-1.0,,,0.759266,"bonne chance!, cours de langue fran√ßaise, troi...","bonne chance !, cours de langue fran√ßaise troi...",-1.0,,,1.0,237,237
70130,0,-1.0,,,-1.0,,,1.0,interkantonale lehrmittelzentrale,interkantonale lehrmittelzentrale,-1.0,,,-1.0,,,0.625,1981aaaa,1984uuuu,1.0,20000,20000,1.0,bk,bk,1.0,[],[],-1.0,,,-1.0,,,-1.0,,,0.579953,binderheidy,kesslersigrid,0.77778,"heidy binder, sigrid kessler, charlotte ritsch...","sigrid keller, caty laubscher, helen wallimann...",0.558939,"kesslersigrid, ritschardcharlotte","laubschercaty, wallimannhelen",1.0,staatlicher lehrmittelverlag,staatlicher lehrmittelverlag,-1.0,,,0.894141,"bonne chance!, cours de langue fran√ßaise, prem...","bonne chance!, cours de langue fran√ßaise, exig...",-1.0,,,1.0,4,4


false_predicted_duplicates for NeuralNetwork


Unnamed: 0,duplicates,coordinate_E_delta,coordinate_E_x,coordinate_E_y,coordinate_N_delta,coordinate_N_x,coordinate_N_y,corporate_full_delta,corporate_full_x,corporate_full_y,doi_delta,doi_x,doi_y,edition_delta,edition_x,edition_y,exactDate_delta,exactDate_x,exactDate_y,format_postfix_delta,format_postfix_x,format_postfix_y,format_prefix_delta,format_prefix_x,format_prefix_y,isbn_delta,isbn_x,isbn_y,ismn_delta,ismn_x,ismn_y,musicid_delta,musicid_x,musicid_y,part_delta,part_x,part_y,person_100_delta,person_100_x,person_100_y,person_245c_delta,person_245c_x,person_245c_y,person_700_delta,person_700_x,person_700_y,pubinit_delta,pubinit_x,pubinit_y,scale_delta,scale_x,scale_y,ttlfull_245_delta,ttlfull_245_x,ttlfull_245_y,ttlfull_246_delta,ttlfull_246_x,ttlfull_246_y,volumes_delta,volumes_x,volumes_y
628,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.625,1999aaaa,1998uuuu,1.0,20000,20000,1.0,bk,bk,1.0,[3-495-47879-5],[3-495-47879-5],-1.0,,,-1.0,,,-0.5,,57,1.0,fluryandreas,fluryandreas,1.0,andreas flury,andreas flury,-1.0,,,1.0,alber,alber,-1.0,,,1.0,"der moralische status der tiere, henry salt, p...","der moralische status der tiere, henry salt, p...",-1.0,,,0.777778,316,356
1290,0,-1.0,,,-1.0,,,0.8,les arts florissants,arts florissants,-1.0,,,-1.0,,,0.75,1996aaaa,1996uuuu,0.428571,40100,40000,1.0,mu,mu,1.0,[],[],-1.0,,,1.0,630.0,630.0,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.591919,wolfgang amadeus mozart ; libretto: emanuel sc...,mozart,0.775282,"mozartwolfgang amadeus, schikanederemanuel, ch...","christiewilliam, dessaynatalie, mannionrosa, b...",-0.5,,erato,-1.0,,,0.709588,"die zauberfl√∂te, the magic flute : opera in tw...","die zauberfl√∂te, kv 620",-1.0,,,0.777778,2,2 1
2597,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.75,1970aaaa,1970uuuu,1.0,10200,10200,1.0,mu,mu,1.0,[],[],-1.0,,,-0.5,,4553.0,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.743253,wolfgang amadeus mozart ; text von emanuel sch...,w.a. mozart ; libretto: emanuel schikaneder ; ...,0.749495,"grubergernot, moehnheinz, schikanederemanuel","moehnheinz, schikanederemanuel",1.0,b√§renreiter,b√§renreiter,-1.0,,,0.83484,"die zauberfl√∂te, eine deutsche oper in zwei au...","die zauberfl√∂te, eine deutsche oper in zwei au...",-1.0,,,1.0,1,1
2688,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-0.5,2.0,,0.5,1836aaaa,18001899,1.0,10200,10200,1.0,mu,mu,1.0,[],[],-1.0,,,-0.5,134.0,,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.748787,von w. a. mozart ; [text von emanuel schikaned...,w.a. mozart ; in vollst√§ndigem clavierauszug m...,-0.5,schikanederemanuel,,-0.5,meyer,,-1.0,,,0.97575,"die zauberfl√∂te, grosse oper in zwei aufz√ºgen ...","die zauberfl√∂te, grosse oper in zwei aufz√ºgen ...",-1.0,,,1.0,1,1
11282,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.625,1989aaaa,1999uuuu,1.0,20000,20000,1.0,bk,bk,1.0,[2-08-070552-0],[2-08-070552-0],-1.0,,,-1.0,,,1.0,552 552,552 552,1.0,voltaire,voltaire,0.90777,"voltaire ; introd., notes, bibliogr., chronolo...","voltaire ; introd., notes , bibliogr., chronol...",1.0,pomeauren√©,pomeauren√©,0.923077,flammarion,gf flammarion,-1.0,,,1.0,trait√© sur la tol√©rance,trait√© sur la tol√©rance,-1.0,,,1.0,192,192
22113,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-0.5,1.0,,0.75,2007aaaa,2007uuuu,1.0,20000,20000,1.0,bk,bk,0.0,"[978-3-7815-1531-4, 3-7815-1531-1]",[978-3-7815-1513-0],-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.856549,hrsg. michaela gl√§ser-zikuda ... [et al.],hrsg. von michaela gl√§ser-zikuda ... [et al.],-0.5,gl√§ser-zikudamichaela,,-0.5,klinkhardt,,-1.0,,,1.0,"lernprozesse dokumentieren, reflektieren und b...","lernprozesse dokumentieren, reflektieren und b...",-1.0,,,1.0,304,304
39154,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.75,2011aaaa,2011uuuu,1.0,10300,10300,1.0,vm,vm,1.0,[],[],-1.0,,,-1.0,,,0.777778,1 1,1,-0.5,schl√∂ndorffvolker,,0.700586,volker schl√∂ndorff ; nach dem roman von max fr...,regie: volker schl√∂ndorff ; nach dem roman von...,0.612207,"frischmax, junkersdorfeberhard","schl√∂ndorffvolker, frischmax, shepardsam, depl...",0.765873,"suhrkamp, absolut medien",absolut medien,-1.0,,,1.0,homo faber,homo faber,-1.0,,,1.0,1 117,1 117
42386,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.75,1987aaaa,1987uuuu,1.0,20000,20000,1.0,bk,bk,1.0,[],[],-1.0,,,-1.0,,,-1.0,,,-1.0,,,-0.5,,heidy binder... [et al.],-0.5,,binderheidy,0.977778,[staatlicher lehrmittelverlag],staatlicher lehrmittelverlag,-1.0,,,0.910003,"bonne chance !, cours de langue fran√ßaise prem...","bonne chance !, cours de langue fran√ßaise prem...",-1.0,,,0.555556,134,145
58588,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.75,20091990,20081991,1.0,10300,10300,1.0,vm,vm,1.0,[],[],-1.0,,,-0.5,502430.0,,-1.0,,,-1.0,,,0.794715,ein film von volker schl√∂ndorff ; nach dem rom...,ein film von volker schl√∂ndorff ; nach dem rom...,0.871796,"schl√∂ndorffvolker, wurlitzerrudy, frischmax, s...","schl√∂ndorffvolker, frischmax, arvantisjorgos, ...",1.0,kinowelt home entertainment,kinowelt home entertainment,-1.0,,,1.0,homo faber,homo faber,-1.0,,,1.0,1 109,1 109
58868,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.75,1984aaaa,1984uuuu,1.0,20000,20000,1.0,bk,bk,1.0,[],[],-1.0,,,-1.0,,,-0.5,1,,-1.0,,,0.75,sigrid kessler... [et al.] ; [√©d.:] interkanto...,sigrid kessler... [et al.],-0.5,,kesslersigrid,1.0,staatlicher lehrmittelverlag,staatlicher lehrmittelverlag,-1.0,,,0.759266,"bonne chance!, cours de langue fran√ßaise, troi...","bonne chance !, cours de langue fran√ßaise troi...",-1.0,,,1.0,237,237


Run 4 - false_predicted_uniques 
*****
false_predicted_uniques for DecisionTreeClassifier


Unnamed: 0,duplicates,coordinate_E_delta,coordinate_E_x,coordinate_E_y,coordinate_N_delta,coordinate_N_x,coordinate_N_y,corporate_full_delta,corporate_full_x,corporate_full_y,doi_delta,doi_x,doi_y,edition_delta,edition_x,edition_y,exactDate_delta,exactDate_x,exactDate_y,format_postfix_delta,format_postfix_x,format_postfix_y,format_prefix_delta,format_prefix_x,format_prefix_y,isbn_delta,isbn_x,isbn_y,ismn_delta,ismn_x,ismn_y,musicid_delta,musicid_x,musicid_y,part_delta,part_x,part_y,person_100_delta,person_100_x,person_100_y,person_245c_delta,person_245c_x,person_245c_y,person_700_delta,person_700_x,person_700_y,pubinit_delta,pubinit_x,pubinit_y,scale_delta,scale_x,scale_y,ttlfull_245_delta,ttlfull_245_x,ttlfull_245_y,ttlfull_246_delta,ttlfull_246_x,ttlfull_246_y,volumes_delta,volumes_x,volumes_y
103289,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,aaaaaaaa,1920uuuu,1.0,20000,20000,1.0,bk,bk,1.0,[],[],-1.0,,,-1.0,,,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,-0.5,,w. a. mozart ; oper in 2 akten v. emanuel schi...,-0.5,,schikanederemanuel,-0.5,ernst eulenburg,,-1.0,,,1.0,die zauberfl√∂te,die zauberfl√∂te,-1.0,,,0.0,412.0,1
103290,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,1920aaaa,uuuuuuuu,1.0,20000,20000,1.0,bk,bk,1.0,[],[],-1.0,,,-1.0,,,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,-0.5,w. a. mozart ; oper in 2 akten v. emanuel schi...,,-0.5,schikanederemanuel,,-0.5,,ernst eulenburg,-1.0,,,1.0,die zauberfl√∂te,die zauberfl√∂te,-1.0,,,0.0,1.0,412
103810,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,aaaaaaaa,19001999,0.428571,10200,10100,1.0,mu,mu,1.0,[],[],-1.0,,,-0.5,,3714.0,-0.5,,12,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,-0.5,,von w. a. mozart ; revidiert und mit einf√ºhrun...,-1.0,,,-1.0,,,-1.0,,,0.747312,"die zauberfl√∂te, [daraus:] aria ""in diesen hei...",die zauberfl√∂te,-1.0,,,-0.5,,1
103816,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,aaaaaaaa,uuuuuuuu,1.0,10200,10200,1.0,mu,mu,1.0,[],[],-1.0,,,1.0,10425.0,10425.0,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,-1.0,,,1.0,soldankurt,soldankurt,-1.0,,,-1.0,,,1.0,"die zauberfl√∂te, oper in zwei aufz√ºgen : klavi...","die zauberfl√∂te, oper in zwei aufz√ºgen : klavi...",1.0,"die zauberfl√∂te, ausgabe f√ºr gesang und klavier","die zauberfl√∂te, ausgabe f√ºr gesang und klavier",1.0,1.0,1
103872,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,aaaaaaaa,19451966,0.428571,30300,30700,1.0,cr,cr,0.0,[],[1144-1321],-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-0.5,idealibri,,-1.0,,,0.681481,arts,"arts, beaux-arts, litt√©rature, spectacles : (j...",-1.0,,,-1.0,,
103959,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.75,1993aaaa,1993uuuu,0.428571,10100,10000,1.0,mu,mu,1.0,[963-8303-08-5],[963-8303-08-5],-1.0,,,-0.5,,1004.0,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,1.0,wolfgang amadeus mozart,wolfgang amadeus mozart,-1.0,,,-0.5,k√∂nemann,,-1.0,,,1.0,"die zauberfl√∂te, partitura","die zauberfl√∂te, partitura",-1.0,,,0.511111,225.0,1 225
104094,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,aaaaaaaa,1941uuuu,1.0,20000,20000,1.0,bk,bk,1.0,[],[],-1.0,,,-1.0,,,0.814815,2620,2620 2620,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.858072,von mozart ; dichtung von emanuel schikaneder ...,von mozart ; dichtung von emanuel schikaneder ...,1.0,"schikanederemanuel, krusegeorg richard","schikanederemanuel, krusegeorg richard",-0.5,,p. reclam jun.,-1.0,,,0.881356,"die zauberfl√∂te, oper in zwei aufz√ºgen","die zauberfl√∂te, oper in zwei aufz√ºgen : volls...",-1.0,,,1.0,74.0,74
104098,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,aaaaaaaa,1941uuuu,1.0,20000,20000,1.0,bk,bk,1.0,[],[],-1.0,,,-1.0,,,0.796296,2620 5,2620 2620,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.839556,von mozart ; dichtung von emanuel schikaneder ...,von mozart ; dichtung von emanuel schikaneder ...,0.771292,"krusegeorg richard, schikanederemanuel","schikanederemanuel, krusegeorg richard",0.809524,reclam,p. reclam jun.,-1.0,,,0.881356,"die zauberfl√∂te, oper in zwei aufz√ºgen","die zauberfl√∂te, oper in zwei aufz√ºgen : volls...",-1.0,,,1.0,74.0,74
104099,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,aaaaaaaa,uuuuuuuu,1.0,20000,20000,1.0,bk,bk,1.0,[],[],-1.0,,,-1.0,,,0.796296,2620 5,2620 2620,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.770017,von mozart ; dichtung von emanuel schikaneder ...,wolfgang amadeus mozart ; dichtung von emanuel...,1.0,"krusegeorg richard, schikanederemanuel","krusegeorg richard, schikanederemanuel",0.412698,reclam,philipp reclam,-1.0,,,0.881356,"die zauberfl√∂te, oper in zwei aufz√ºgen","die zauberfl√∂te, oper in zwei aufz√ºgen : volls...",-1.0,,,1.0,74.0,74
104100,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,1941aaaa,uuuuuuuu,1.0,20000,20000,1.0,bk,bk,1.0,[],[],-1.0,,,-1.0,,,0.814815,2620 2620,2620,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.858072,von mozart ; dichtung von emanuel schikaneder ...,von mozart ; dichtung von emanuel schikaneder ...,1.0,"schikanederemanuel, krusegeorg richard","schikanederemanuel, krusegeorg richard",-0.5,p. reclam jun.,,-1.0,,,0.881356,"die zauberfl√∂te, oper in zwei aufz√ºgen : volls...","die zauberfl√∂te, oper in zwei aufz√ºgen",-1.0,,,1.0,74.0,74


false_predicted_uniques for DecisionTreeClassifier_CV


Unnamed: 0,duplicates,coordinate_E_delta,coordinate_E_x,coordinate_E_y,coordinate_N_delta,coordinate_N_x,coordinate_N_y,corporate_full_delta,corporate_full_x,corporate_full_y,doi_delta,doi_x,doi_y,edition_delta,edition_x,edition_y,exactDate_delta,exactDate_x,exactDate_y,format_postfix_delta,format_postfix_x,format_postfix_y,format_prefix_delta,format_prefix_x,format_prefix_y,isbn_delta,isbn_x,isbn_y,ismn_delta,ismn_x,ismn_y,musicid_delta,musicid_x,musicid_y,part_delta,part_x,part_y,person_100_delta,person_100_x,person_100_y,person_245c_delta,person_245c_x,person_245c_y,person_700_delta,person_700_x,person_700_y,pubinit_delta,pubinit_x,pubinit_y,scale_delta,scale_x,scale_y,ttlfull_245_delta,ttlfull_245_x,ttlfull_245_y,ttlfull_246_delta,ttlfull_246_x,ttlfull_246_y,volumes_delta,volumes_x,volumes_y
103289,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,aaaaaaaa,1920uuuu,1.0,20000,20000,1.0,bk,bk,1.0,[],[],-1.0,,,-1.0,,,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,-0.5,,w. a. mozart ; oper in 2 akten v. emanuel schi...,-0.5,,schikanederemanuel,-0.5,ernst eulenburg,,-1.0,,,1.0,die zauberfl√∂te,die zauberfl√∂te,-1.0,,,0.0,412.0,1.0
103290,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,1920aaaa,uuuuuuuu,1.0,20000,20000,1.0,bk,bk,1.0,[],[],-1.0,,,-1.0,,,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,-0.5,w. a. mozart ; oper in 2 akten v. emanuel schi...,,-0.5,schikanederemanuel,,-0.5,,ernst eulenburg,-1.0,,,1.0,die zauberfl√∂te,die zauberfl√∂te,-1.0,,,0.0,1.0,412.0
103816,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,aaaaaaaa,uuuuuuuu,1.0,10200,10200,1.0,mu,mu,1.0,[],[],-1.0,,,1.0,10425.0,10425.0,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,-1.0,,,1.0,soldankurt,soldankurt,-1.0,,,-1.0,,,1.0,"die zauberfl√∂te, oper in zwei aufz√ºgen : klavi...","die zauberfl√∂te, oper in zwei aufz√ºgen : klavi...",1.0,"die zauberfl√∂te, ausgabe f√ºr gesang und klavier","die zauberfl√∂te, ausgabe f√ºr gesang und klavier",1.0,1.0,1.0
104094,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,aaaaaaaa,1941uuuu,1.0,20000,20000,1.0,bk,bk,1.0,[],[],-1.0,,,-1.0,,,0.814815,2620,2620 2620,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.858072,von mozart ; dichtung von emanuel schikaneder ...,von mozart ; dichtung von emanuel schikaneder ...,1.0,"schikanederemanuel, krusegeorg richard","schikanederemanuel, krusegeorg richard",-0.5,,p. reclam jun.,-1.0,,,0.881356,"die zauberfl√∂te, oper in zwei aufz√ºgen","die zauberfl√∂te, oper in zwei aufz√ºgen : volls...",-1.0,,,1.0,74.0,74.0
104098,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,aaaaaaaa,1941uuuu,1.0,20000,20000,1.0,bk,bk,1.0,[],[],-1.0,,,-1.0,,,0.796296,2620 5,2620 2620,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.839556,von mozart ; dichtung von emanuel schikaneder ...,von mozart ; dichtung von emanuel schikaneder ...,0.771292,"krusegeorg richard, schikanederemanuel","schikanederemanuel, krusegeorg richard",0.809524,reclam,p. reclam jun.,-1.0,,,0.881356,"die zauberfl√∂te, oper in zwei aufz√ºgen","die zauberfl√∂te, oper in zwei aufz√ºgen : volls...",-1.0,,,1.0,74.0,74.0
104099,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,aaaaaaaa,uuuuuuuu,1.0,20000,20000,1.0,bk,bk,1.0,[],[],-1.0,,,-1.0,,,0.796296,2620 5,2620 2620,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.770017,von mozart ; dichtung von emanuel schikaneder ...,wolfgang amadeus mozart ; dichtung von emanuel...,1.0,"krusegeorg richard, schikanederemanuel","krusegeorg richard, schikanederemanuel",0.412698,reclam,philipp reclam,-1.0,,,0.881356,"die zauberfl√∂te, oper in zwei aufz√ºgen","die zauberfl√∂te, oper in zwei aufz√ºgen : volls...",-1.0,,,1.0,74.0,74.0
104100,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,1941aaaa,uuuuuuuu,1.0,20000,20000,1.0,bk,bk,1.0,[],[],-1.0,,,-1.0,,,0.814815,2620 2620,2620,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.858072,von mozart ; dichtung von emanuel schikaneder ...,von mozart ; dichtung von emanuel schikaneder ...,1.0,"schikanederemanuel, krusegeorg richard","schikanederemanuel, krusegeorg richard",-0.5,p. reclam jun.,,-1.0,,,0.881356,"die zauberfl√∂te, oper in zwei aufz√ºgen : volls...","die zauberfl√∂te, oper in zwei aufz√ºgen",-1.0,,,1.0,74.0,74.0
104101,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,1941aaaa,uuuuuuuu,1.0,20000,20000,1.0,bk,bk,1.0,[],[],-1.0,,,-1.0,,,0.796296,2620 2620,2620 5,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.839556,von mozart ; dichtung von emanuel schikaneder ...,von mozart ; dichtung von emanuel schikaneder ...,0.771292,"schikanederemanuel, krusegeorg richard","krusegeorg richard, schikanederemanuel",0.809524,p. reclam jun.,reclam,-1.0,,,0.881356,"die zauberfl√∂te, oper in zwei aufz√ºgen : volls...","die zauberfl√∂te, oper in zwei aufz√ºgen",-1.0,,,1.0,74.0,74.0
104105,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,aaaaaaaa,uuuuuuuu,1.0,20000,20000,1.0,bk,bk,1.0,[],[],-1.0,,,-1.0,,,0.796296,2620 2620,2620 5,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.770017,wolfgang amadeus mozart ; dichtung von emanuel...,von mozart ; dichtung von emanuel schikaneder ...,1.0,"krusegeorg richard, schikanederemanuel","krusegeorg richard, schikanederemanuel",0.412698,philipp reclam,reclam,-1.0,,,0.881356,"die zauberfl√∂te, oper in zwei aufz√ºgen : volls...","die zauberfl√∂te, oper in zwei aufz√ºgen",-1.0,,,1.0,74.0,74.0
104113,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,aaaaaaaa,19739999,0.428571,30600,30500,1.0,cr,cr,0.0,[],[0335-1793],-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-0.5,,lib√©ration,-1.0,,,0.933333,liberation,lib√©ration,-1.0,,,-1.0,,


false_predicted_uniques for RandomForestClassifier


Unnamed: 0,duplicates,coordinate_E_delta,coordinate_E_x,coordinate_E_y,coordinate_N_delta,coordinate_N_x,coordinate_N_y,corporate_full_delta,corporate_full_x,corporate_full_y,doi_delta,doi_x,doi_y,edition_delta,edition_x,edition_y,exactDate_delta,exactDate_x,exactDate_y,format_postfix_delta,format_postfix_x,format_postfix_y,format_prefix_delta,format_prefix_x,format_prefix_y,isbn_delta,isbn_x,isbn_y,ismn_delta,ismn_x,ismn_y,musicid_delta,musicid_x,musicid_y,part_delta,part_x,part_y,person_100_delta,person_100_x,person_100_y,person_245c_delta,person_245c_x,person_245c_y,person_700_delta,person_700_x,person_700_y,pubinit_delta,pubinit_x,pubinit_y,scale_delta,scale_x,scale_y,ttlfull_245_delta,ttlfull_245_x,ttlfull_245_y,ttlfull_246_delta,ttlfull_246_x,ttlfull_246_y,volumes_delta,volumes_x,volumes_y
103289,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,aaaaaaaa,1920uuuu,1.0,20000,20000,1.0,bk,bk,1.0,[],[],-1.0,,,-1.0,,,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,-0.5,,w. a. mozart ; oper in 2 akten v. emanuel schi...,-0.5,,schikanederemanuel,-0.5,ernst eulenburg,,-1.0,,,1.0,die zauberfl√∂te,die zauberfl√∂te,-1.0,,,0.0,412,1
103290,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,1920aaaa,uuuuuuuu,1.0,20000,20000,1.0,bk,bk,1.0,[],[],-1.0,,,-1.0,,,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,-0.5,w. a. mozart ; oper in 2 akten v. emanuel schi...,,-0.5,schikanederemanuel,,-0.5,,ernst eulenburg,-1.0,,,1.0,die zauberfl√∂te,die zauberfl√∂te,-1.0,,,0.0,1,412
103810,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,aaaaaaaa,19001999,0.428571,10200,10100,1.0,mu,mu,1.0,[],[],-1.0,,,-0.5,,3714.0,-0.5,,12,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,-0.5,,von w. a. mozart ; revidiert und mit einf√ºhrun...,-1.0,,,-1.0,,,-1.0,,,0.747312,"die zauberfl√∂te, [daraus:] aria ""in diesen hei...",die zauberfl√∂te,-1.0,,,-0.5,,1
103813,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,aaaaaaaa,uuuuuuuu,1.0,10200,10200,1.0,mu,mu,1.0,[],[],-1.0,,,-0.5,,10425.0,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,-0.5,von w.a. mozart ; klavierauszug nach dem in de...,,-0.5,,soldankurt,-1.0,,,-1.0,,,0.794613,"die zauberfl√∂te, oper in zwei aufz√ºgen","die zauberfl√∂te, oper in zwei aufz√ºgen : klavi...",-0.5,,"die zauberfl√∂te, ausgabe f√ºr gesang und klavier",0.733333,1 188,1
103872,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,aaaaaaaa,19451966,0.428571,30300,30700,1.0,cr,cr,0.0,[],[1144-1321],-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-0.5,idealibri,,-1.0,,,0.681481,arts,"arts, beaux-arts, litt√©rature, spectacles : (j...",-1.0,,,-1.0,,
103959,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.75,1993aaaa,1993uuuu,0.428571,10100,10000,1.0,mu,mu,1.0,[963-8303-08-5],[963-8303-08-5],-1.0,,,-0.5,,1004.0,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,1.0,wolfgang amadeus mozart,wolfgang amadeus mozart,-1.0,,,-0.5,k√∂nemann,,-1.0,,,1.0,"die zauberfl√∂te, partitura","die zauberfl√∂te, partitura",-1.0,,,0.511111,225,1 225
104098,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,aaaaaaaa,1941uuuu,1.0,20000,20000,1.0,bk,bk,1.0,[],[],-1.0,,,-1.0,,,0.796296,2620 5,2620 2620,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.839556,von mozart ; dichtung von emanuel schikaneder ...,von mozart ; dichtung von emanuel schikaneder ...,0.771292,"krusegeorg richard, schikanederemanuel","schikanederemanuel, krusegeorg richard",0.809524,reclam,p. reclam jun.,-1.0,,,0.881356,"die zauberfl√∂te, oper in zwei aufz√ºgen","die zauberfl√∂te, oper in zwei aufz√ºgen : volls...",-1.0,,,1.0,74,74
104099,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,aaaaaaaa,uuuuuuuu,1.0,20000,20000,1.0,bk,bk,1.0,[],[],-1.0,,,-1.0,,,0.796296,2620 5,2620 2620,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.770017,von mozart ; dichtung von emanuel schikaneder ...,wolfgang amadeus mozart ; dichtung von emanuel...,1.0,"krusegeorg richard, schikanederemanuel","krusegeorg richard, schikanederemanuel",0.412698,reclam,philipp reclam,-1.0,,,0.881356,"die zauberfl√∂te, oper in zwei aufz√ºgen","die zauberfl√∂te, oper in zwei aufz√ºgen : volls...",-1.0,,,1.0,74,74
104101,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,1941aaaa,uuuuuuuu,1.0,20000,20000,1.0,bk,bk,1.0,[],[],-1.0,,,-1.0,,,0.796296,2620 2620,2620 5,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.839556,von mozart ; dichtung von emanuel schikaneder ...,von mozart ; dichtung von emanuel schikaneder ...,0.771292,"schikanederemanuel, krusegeorg richard","krusegeorg richard, schikanederemanuel",0.809524,p. reclam jun.,reclam,-1.0,,,0.881356,"die zauberfl√∂te, oper in zwei aufz√ºgen : volls...","die zauberfl√∂te, oper in zwei aufz√ºgen",-1.0,,,1.0,74,74
104105,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,aaaaaaaa,uuuuuuuu,1.0,20000,20000,1.0,bk,bk,1.0,[],[],-1.0,,,-1.0,,,0.796296,2620 2620,2620 5,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.770017,wolfgang amadeus mozart ; dichtung von emanuel...,von mozart ; dichtung von emanuel schikaneder ...,1.0,"krusegeorg richard, schikanederemanuel","krusegeorg richard, schikanederemanuel",0.412698,philipp reclam,reclam,-1.0,,,0.881356,"die zauberfl√∂te, oper in zwei aufz√ºgen : volls...","die zauberfl√∂te, oper in zwei aufz√ºgen",-1.0,,,1.0,74,74


false_predicted_uniques for SVC


Unnamed: 0,duplicates,coordinate_E_delta,coordinate_E_x,coordinate_E_y,coordinate_N_delta,coordinate_N_x,coordinate_N_y,corporate_full_delta,corporate_full_x,corporate_full_y,doi_delta,doi_x,doi_y,edition_delta,edition_x,edition_y,exactDate_delta,exactDate_x,exactDate_y,format_postfix_delta,format_postfix_x,format_postfix_y,format_prefix_delta,format_prefix_x,format_prefix_y,isbn_delta,isbn_x,isbn_y,ismn_delta,ismn_x,ismn_y,musicid_delta,musicid_x,musicid_y,part_delta,part_x,part_y,person_100_delta,person_100_x,person_100_y,person_245c_delta,person_245c_x,person_245c_y,person_700_delta,person_700_x,person_700_y,pubinit_delta,pubinit_x,pubinit_y,scale_delta,scale_x,scale_y,ttlfull_245_delta,ttlfull_245_x,ttlfull_245_y,ttlfull_246_delta,ttlfull_246_x,ttlfull_246_y,volumes_delta,volumes_x,volumes_y
103289,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,aaaaaaaa,1920uuuu,1.0,20000,20000,1.0,bk,bk,1.0,[],[],-1.0,,,-1.0,,,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,-0.5,,w. a. mozart ; oper in 2 akten v. emanuel schi...,-0.5,,schikanederemanuel,-0.5,ernst eulenburg,,-1.0,,,1.0,die zauberfl√∂te,die zauberfl√∂te,-1.0,,,0.0,412,1
103290,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,1920aaaa,uuuuuuuu,1.0,20000,20000,1.0,bk,bk,1.0,[],[],-1.0,,,-1.0,,,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,-0.5,w. a. mozart ; oper in 2 akten v. emanuel schi...,,-0.5,schikanederemanuel,,-0.5,,ernst eulenburg,-1.0,,,1.0,die zauberfl√∂te,die zauberfl√∂te,-1.0,,,0.0,1,412
103609,1,-1.0,,,-1.0,,,-0.5,,wiener philharmoniker,-1.0,,,-1.0,,,0.75,1991aaaa,1991uuuu,1.0,40100,40100,1.0,mu,mu,1.0,[],[],-1.0,,,0.428571,171433.0,433.0,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.499433,von emanuel schikaneder ; wolfgang amadeus mozart,mozart,0.562408,schikanederemanuel,"heilmannuwe, josumi, krausmichael, leitnerlott...",0.805556,decca record,decca,-1.0,,,0.798246,"die zauberfl√∂te, oper in zwei aufz√ºgen",die zauberfl√∂te,-1.0,,,0.733333,2 152,2
103614,1,-1.0,,,-1.0,,,-0.5,wiener philharmoniker,,-1.0,,,-1.0,,,0.75,1991aaaa,1991uuuu,1.0,40100,40100,1.0,mu,mu,1.0,[],[],-1.0,,,0.428571,433210.0,171433.0,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.713536,wolfgang amadeus mozart ; wiener philharmonike...,von emanuel schikaneder ; wolfgang amadeus mozart,0.587407,soltigeorg,schikanederemanuel,-0.5,,decca record,-1.0,,,0.798246,die zauberfl√∂te,"die zauberfl√∂te, oper in zwei aufz√ºgen",-1.0,,,1.0,2 152,2 152
103618,1,-1.0,,,-1.0,,,-0.5,wiener philharmoniker,,-1.0,,,-1.0,,,0.75,1991aaaa,1991uuuu,1.0,40100,40100,1.0,mu,mu,1.0,[],[],-1.0,,,0.428571,433.0,171433.0,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.499433,mozart,von emanuel schikaneder ; wolfgang amadeus mozart,0.562408,"heilmannuwe, josumi, krausmichael, leitnerlott...",schikanederemanuel,0.805556,decca,decca record,-1.0,,,0.798246,die zauberfl√∂te,"die zauberfl√∂te, oper in zwei aufz√ºgen",-1.0,,,0.733333,2,2 152
103802,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.75,2005aaaa,2005uuuu,1.0,10300,10300,1.0,vm,vm,1.0,[],[],-1.0,,,-0.5,,99036.0,-1.0,,,-1.0,,,1.0,ein film von luc jacquet,ein film von luc jacquet,1.0,jacquetluc,jacquetluc,-0.5,frenetic films,,-1.0,,,0.742268,"die reise der pinguine, die natur schreibt die...",die reise der pinguine,-1.0,,,0.888889,1 82 2,1 82
103810,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,aaaaaaaa,19001999,0.428571,10200,10100,1.0,mu,mu,1.0,[],[],-1.0,,,-0.5,,3714.0,-0.5,,12,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,-0.5,,von w. a. mozart ; revidiert und mit einf√ºhrun...,-1.0,,,-1.0,,,-1.0,,,0.747312,"die zauberfl√∂te, [daraus:] aria ""in diesen hei...",die zauberfl√∂te,-1.0,,,-0.5,,1
103813,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,aaaaaaaa,uuuuuuuu,1.0,10200,10200,1.0,mu,mu,1.0,[],[],-1.0,,,-0.5,,10425.0,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,-0.5,von w.a. mozart ; klavierauszug nach dem in de...,,-0.5,,soldankurt,-1.0,,,-1.0,,,0.794613,"die zauberfl√∂te, oper in zwei aufz√ºgen","die zauberfl√∂te, oper in zwei aufz√ºgen : klavi...",-0.5,,"die zauberfl√∂te, ausgabe f√ºr gesang und klavier",0.733333,1 188,1
103821,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.625,19aaaaaa,19uuuuuu,1.0,10100,10100,1.0,mu,mu,1.0,[],[],-1.0,,,1.0,4355.0,4355.0,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,1.0,a german opera by emanuel schikaneder ; music ...,a german opera by emanuel schikaneder ; music ...,-1.0,,,-1.0,,,-1.0,,,1.0,die zauberfl√∂te,die zauberfl√∂te,-1.0,,,1.0,1,1
103823,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.625,19aaaaaa,1950uuuu,1.0,10100,10100,1.0,mu,mu,1.0,[],[],-1.0,,,0.0,4355.0,912.0,-0.5,,912 912,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.880856,a german opera by emanuel schikaneder ; music ...,a german opera by emanuel schikaneder ; music ...,-0.5,,"aberthermann, schikanederemanuel",-0.5,,e. eulenburg,-1.0,,,0.833333,die zauberfl√∂te,"die zauberfl√∂te, k√∂chel no 620",-1.0,,,0.733333,1,1 412


false_predicted_uniques for SVC_CV


Unnamed: 0,duplicates,coordinate_E_delta,coordinate_E_x,coordinate_E_y,coordinate_N_delta,coordinate_N_x,coordinate_N_y,corporate_full_delta,corporate_full_x,corporate_full_y,doi_delta,doi_x,doi_y,edition_delta,edition_x,edition_y,exactDate_delta,exactDate_x,exactDate_y,format_postfix_delta,format_postfix_x,format_postfix_y,format_prefix_delta,format_prefix_x,format_prefix_y,isbn_delta,isbn_x,isbn_y,ismn_delta,ismn_x,ismn_y,musicid_delta,musicid_x,musicid_y,part_delta,part_x,part_y,person_100_delta,person_100_x,person_100_y,person_245c_delta,person_245c_x,person_245c_y,person_700_delta,person_700_x,person_700_y,pubinit_delta,pubinit_x,pubinit_y,scale_delta,scale_x,scale_y,ttlfull_245_delta,ttlfull_245_x,ttlfull_245_y,ttlfull_246_delta,ttlfull_246_x,ttlfull_246_y,volumes_delta,volumes_x,volumes_y
103289,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,aaaaaaaa,1920uuuu,1.0,20000,20000,1.0,bk,bk,1.0,[],[],-1.0,,,-1.0,,,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,-0.5,,w. a. mozart ; oper in 2 akten v. emanuel schi...,-0.5,,schikanederemanuel,-0.5,ernst eulenburg,,-1.0,,,1.0,die zauberfl√∂te,die zauberfl√∂te,-1.0,,,0.0,412,1
103290,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,1920aaaa,uuuuuuuu,1.0,20000,20000,1.0,bk,bk,1.0,[],[],-1.0,,,-1.0,,,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,-0.5,w. a. mozart ; oper in 2 akten v. emanuel schi...,,-0.5,schikanederemanuel,,-0.5,,ernst eulenburg,-1.0,,,1.0,die zauberfl√∂te,die zauberfl√∂te,-1.0,,,0.0,1,412
103609,1,-1.0,,,-1.0,,,-0.5,,wiener philharmoniker,-1.0,,,-1.0,,,0.75,1991aaaa,1991uuuu,1.0,40100,40100,1.0,mu,mu,1.0,[],[],-1.0,,,0.428571,171433.0,433.0,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.499433,von emanuel schikaneder ; wolfgang amadeus mozart,mozart,0.562408,schikanederemanuel,"heilmannuwe, josumi, krausmichael, leitnerlott...",0.805556,decca record,decca,-1.0,,,0.798246,"die zauberfl√∂te, oper in zwei aufz√ºgen",die zauberfl√∂te,-1.0,,,0.733333,2 152,2
103614,1,-1.0,,,-1.0,,,-0.5,wiener philharmoniker,,-1.0,,,-1.0,,,0.75,1991aaaa,1991uuuu,1.0,40100,40100,1.0,mu,mu,1.0,[],[],-1.0,,,0.428571,433210.0,171433.0,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.713536,wolfgang amadeus mozart ; wiener philharmonike...,von emanuel schikaneder ; wolfgang amadeus mozart,0.587407,soltigeorg,schikanederemanuel,-0.5,,decca record,-1.0,,,0.798246,die zauberfl√∂te,"die zauberfl√∂te, oper in zwei aufz√ºgen",-1.0,,,1.0,2 152,2 152
103618,1,-1.0,,,-1.0,,,-0.5,wiener philharmoniker,,-1.0,,,-1.0,,,0.75,1991aaaa,1991uuuu,1.0,40100,40100,1.0,mu,mu,1.0,[],[],-1.0,,,0.428571,433.0,171433.0,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.499433,mozart,von emanuel schikaneder ; wolfgang amadeus mozart,0.562408,"heilmannuwe, josumi, krausmichael, leitnerlott...",schikanederemanuel,0.805556,decca,decca record,-1.0,,,0.798246,die zauberfl√∂te,"die zauberfl√∂te, oper in zwei aufz√ºgen",-1.0,,,0.733333,2,2 152
103653,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,19201929,uuuuuuuu,1.0,10200,10200,1.0,mu,mu,1.0,[],[],-1.0,,,-0.5,245.0,,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.67731,von emanuel schikaneder ; [musik von] wolfgang...,von w.a. mozart ; klavierauszug neu rev. von w...,-0.5,"schikanederemanuel, kienzlwilhelm",,-0.5,universal edition,,-1.0,,,0.854023,"die zauberfl√∂te, il flauto magico : oper in zw...","die zauberfl√∂te, oper in 2 akten = il flauto m...",-1.0,,,1.0,1 167,1 167
103810,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,aaaaaaaa,19001999,0.428571,10200,10100,1.0,mu,mu,1.0,[],[],-1.0,,,-0.5,,3714.0,-0.5,,12,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,-0.5,,von w. a. mozart ; revidiert und mit einf√ºhrun...,-1.0,,,-1.0,,,-1.0,,,0.747312,"die zauberfl√∂te, [daraus:] aria ""in diesen hei...",die zauberfl√∂te,-1.0,,,-0.5,,1
103813,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,aaaaaaaa,uuuuuuuu,1.0,10200,10200,1.0,mu,mu,1.0,[],[],-1.0,,,-0.5,,10425.0,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,-0.5,von w.a. mozart ; klavierauszug nach dem in de...,,-0.5,,soldankurt,-1.0,,,-1.0,,,0.794613,"die zauberfl√∂te, oper in zwei aufz√ºgen","die zauberfl√∂te, oper in zwei aufz√ºgen : klavi...",-0.5,,"die zauberfl√∂te, ausgabe f√ºr gesang und klavier",0.733333,1 188,1
103823,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.625,19aaaaaa,1950uuuu,1.0,10100,10100,1.0,mu,mu,1.0,[],[],-1.0,,,0.0,4355.0,912.0,-0.5,,912 912,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.880856,a german opera by emanuel schikaneder ; music ...,a german opera by emanuel schikaneder ; music ...,-0.5,,"aberthermann, schikanederemanuel",-0.5,,e. eulenburg,-1.0,,,0.833333,die zauberfl√∂te,"die zauberfl√∂te, k√∂chel no 620",-1.0,,,0.733333,1,1 412
103867,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.75,1471aaaa,1471uuuu,1.0,20053,20053,1.0,bk,bk,1.0,[],[],-1.0,,,-1.0,,,-1.0,,,1.0,crescenzipietro de',crescenzipietro de',-1.0,,,-1.0,,,-1.0,,,-1.0,,,1.0,ruralia commoda,ruralia commoda,-1.0,,,1.0,418,418


false_predicted_uniques for NeuralNetwork


Unnamed: 0,duplicates,coordinate_E_delta,coordinate_E_x,coordinate_E_y,coordinate_N_delta,coordinate_N_x,coordinate_N_y,corporate_full_delta,corporate_full_x,corporate_full_y,doi_delta,doi_x,doi_y,edition_delta,edition_x,edition_y,exactDate_delta,exactDate_x,exactDate_y,format_postfix_delta,format_postfix_x,format_postfix_y,format_prefix_delta,format_prefix_x,format_prefix_y,isbn_delta,isbn_x,isbn_y,ismn_delta,ismn_x,ismn_y,musicid_delta,musicid_x,musicid_y,part_delta,part_x,part_y,person_100_delta,person_100_x,person_100_y,person_245c_delta,person_245c_x,person_245c_y,person_700_delta,person_700_x,person_700_y,pubinit_delta,pubinit_x,pubinit_y,scale_delta,scale_x,scale_y,ttlfull_245_delta,ttlfull_245_x,ttlfull_245_y,ttlfull_246_delta,ttlfull_246_x,ttlfull_246_y,volumes_delta,volumes_x,volumes_y
103289,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,aaaaaaaa,1920uuuu,1.0,20000,20000,1.0,bk,bk,1.0,[],[],-1.0,,,-1.0,,,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,-0.5,,w. a. mozart ; oper in 2 akten v. emanuel schi...,-0.5,,schikanederemanuel,-0.5,ernst eulenburg,,-1.0,,,1.0,die zauberfl√∂te,die zauberfl√∂te,-1.0,,,0.0,412,1
103290,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,1920aaaa,uuuuuuuu,1.0,20000,20000,1.0,bk,bk,1.0,[],[],-1.0,,,-1.0,,,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,-0.5,w. a. mozart ; oper in 2 akten v. emanuel schi...,,-0.5,schikanederemanuel,,-0.5,,ernst eulenburg,-1.0,,,1.0,die zauberfl√∂te,die zauberfl√∂te,-1.0,,,0.0,1,412
103566,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.75,2005aaaa,2005uuuu,1.0,20000,20000,1.0,bk,bk,1.0,[0-521-82437-0],"[978-0-521-82437-8, 0-521-82437-0]",-1.0,,,-1.0,,,-1.0,,,1.0,austenjane,austenjane,1.0,jane austen ; ed. by richard cronin ... [et al.],jane austen ; ed. by richard cronin ... [et al.],-0.5,,croninrichard,-1.0,,,-1.0,,,1.0,emma,emma,-1.0,,,0.0,599,600
103609,1,-1.0,,,-1.0,,,-0.5,,wiener philharmoniker,-1.0,,,-1.0,,,0.75,1991aaaa,1991uuuu,1.0,40100,40100,1.0,mu,mu,1.0,[],[],-1.0,,,0.428571,171433.0,433.0,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.499433,von emanuel schikaneder ; wolfgang amadeus mozart,mozart,0.562408,schikanederemanuel,"heilmannuwe, josumi, krausmichael, leitnerlott...",0.805556,decca record,decca,-1.0,,,0.798246,"die zauberfl√∂te, oper in zwei aufz√ºgen",die zauberfl√∂te,-1.0,,,0.733333,2 152,2
103614,1,-1.0,,,-1.0,,,-0.5,wiener philharmoniker,,-1.0,,,-1.0,,,0.75,1991aaaa,1991uuuu,1.0,40100,40100,1.0,mu,mu,1.0,[],[],-1.0,,,0.428571,433210.0,171433.0,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.713536,wolfgang amadeus mozart ; wiener philharmonike...,von emanuel schikaneder ; wolfgang amadeus mozart,0.587407,soltigeorg,schikanederemanuel,-0.5,,decca record,-1.0,,,0.798246,die zauberfl√∂te,"die zauberfl√∂te, oper in zwei aufz√ºgen",-1.0,,,1.0,2 152,2 152
103618,1,-1.0,,,-1.0,,,-0.5,wiener philharmoniker,,-1.0,,,-1.0,,,0.75,1991aaaa,1991uuuu,1.0,40100,40100,1.0,mu,mu,1.0,[],[],-1.0,,,0.428571,433.0,171433.0,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.499433,mozart,von emanuel schikaneder ; wolfgang amadeus mozart,0.562408,"heilmannuwe, josumi, krausmichael, leitnerlott...",schikanederemanuel,0.805556,decca,decca record,-1.0,,,0.798246,die zauberfl√∂te,"die zauberfl√∂te, oper in zwei aufz√ºgen",-1.0,,,0.733333,2,2 152
103653,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,19201929,uuuuuuuu,1.0,10200,10200,1.0,mu,mu,1.0,[],[],-1.0,,,-0.5,245.0,,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.67731,von emanuel schikaneder ; [musik von] wolfgang...,von w.a. mozart ; klavierauszug neu rev. von w...,-0.5,"schikanederemanuel, kienzlwilhelm",,-0.5,universal edition,,-1.0,,,0.854023,"die zauberfl√∂te, il flauto magico : oper in zw...","die zauberfl√∂te, oper in 2 akten = il flauto m...",-1.0,,,1.0,1 167,1 167
103729,1,-1.0,,,-1.0,,,1.0,schweizbundesamt f√ºr landestopografie,schweizbundesamt f√ºr landestopografie,-1.0,,,-1.0,,,0.75,2007aaaa,2007uuuu,1.0,10347,10347,1.0,mp,mp,1.0,[],[],-1.0,,,-1.0,,,-1.0,,,-1.0,,,1.0,bundesamt f√ºr landestopografie swisstopo,bundesamt f√ºr landestopografie swisstopo,1.0,dufourguillaume henri,dufourguillaume henri,-1.0,,,-1.0,,,1.0,"dufourkarten, topografische karte der schweiz","dufourkarten, topografische karte der schweiz",-1.0,,,1.0,2,2
103810,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,aaaaaaaa,19001999,0.428571,10200,10100,1.0,mu,mu,1.0,[],[],-1.0,,,-0.5,,3714.0,-0.5,,12,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,-0.5,,von w. a. mozart ; revidiert und mit einf√ºhrun...,-1.0,,,-1.0,,,-1.0,,,0.747312,"die zauberfl√∂te, [daraus:] aria ""in diesen hei...",die zauberfl√∂te,-1.0,,,-0.5,,1
103821,1,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.625,19aaaaaa,19uuuuuu,1.0,10100,10100,1.0,mu,mu,1.0,[],[],-1.0,,,1.0,4355.0,4355.0,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,1.0,a german opera by emanuel schikaneder ; music ...,a german opera by emanuel schikaneder ; music ...,-1.0,,,-1.0,,,-1.0,,,1.0,die zauberfl√∂te,die zauberfl√∂te,-1.0,,,1.0,1,1


Run 4 - false_predicted_duplicates 
*****
false_predicted_duplicates for DecisionTreeClassifier


Unnamed: 0,duplicates,coordinate_E_delta,coordinate_E_x,coordinate_E_y,coordinate_N_delta,coordinate_N_x,coordinate_N_y,corporate_full_delta,corporate_full_x,corporate_full_y,doi_delta,doi_x,doi_y,edition_delta,edition_x,edition_y,exactDate_delta,exactDate_x,exactDate_y,format_postfix_delta,format_postfix_x,format_postfix_y,format_prefix_delta,format_prefix_x,format_prefix_y,isbn_delta,isbn_x,isbn_y,ismn_delta,ismn_x,ismn_y,musicid_delta,musicid_x,musicid_y,part_delta,part_x,part_y,person_100_delta,person_100_x,person_100_y,person_245c_delta,person_245c_x,person_245c_y,person_700_delta,person_700_x,person_700_y,pubinit_delta,pubinit_x,pubinit_y,scale_delta,scale_x,scale_y,ttlfull_245_delta,ttlfull_245_x,ttlfull_245_y,ttlfull_246_delta,ttlfull_246_x,ttlfull_246_y,volumes_delta,volumes_x,volumes_y
628,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.625,1999aaaa,1998uuuu,1.0,20000,20000,1.0,bk,bk,1.0,[3-495-47879-5],[3-495-47879-5],-1.0,,,-1.0,,,-0.5,,57,1.0,fluryandreas,fluryandreas,1.0,andreas flury,andreas flury,-1.0,,,1.0,alber,alber,-1.0,,,1.0,"der moralische status der tiere, henry salt, p...","der moralische status der tiere, henry salt, p...",-1.0,,,0.777778,316,356
2597,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.75,1970aaaa,1970uuuu,1.0,10200,10200,1.0,mu,mu,1.0,[],[],-1.0,,,-0.5,,4553.0,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.743253,wolfgang amadeus mozart ; text von emanuel sch...,w.a. mozart ; libretto: emanuel schikaneder ; ...,0.749495,"grubergernot, moehnheinz, schikanederemanuel","moehnheinz, schikanederemanuel",1.0,b√§renreiter,b√§renreiter,-1.0,,,0.83484,"die zauberfl√∂te, eine deutsche oper in zwei au...","die zauberfl√∂te, eine deutsche oper in zwei au...",-1.0,,,1.0,1,1
7361,0,-1.0,,,-1.0,,,-0.5,interkantonale lehrmittelzentrale,,-1.0,,,-1.0,,,0.75,1981aaaa,1981uuuu,1.0,20000,20000,1.0,bk,bk,1.0,[],[],-1.0,,,-1.0,,,-1.0,,,-0.5,binderheidy,,0.74821,"heidy binder, sigrid kessler, charlotte ritsch...",heidy binder... [et al.] ; [√©d.:] interkantona...,-0.5,"kesslersigrid, ritschardcharlotte",,0.963845,staatlicher lehrmittelverlag,staatlicher lehrmittelverl.,-1.0,,,0.921296,"bonne chance!, cours de langue fran√ßaise, prem...","bonne chance!, cours de langue fran√ßaise, prem...",-1.0,,,0.777778,4,413
22113,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-0.5,1.0,,0.75,2007aaaa,2007uuuu,1.0,20000,20000,1.0,bk,bk,0.0,"[978-3-7815-1531-4, 3-7815-1531-1]",[978-3-7815-1513-0],-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.856549,hrsg. michaela gl√§ser-zikuda ... [et al.],hrsg. von michaela gl√§ser-zikuda ... [et al.],-0.5,gl√§ser-zikudamichaela,,-0.5,klinkhardt,,-1.0,,,1.0,"lernprozesse dokumentieren, reflektieren und b...","lernprozesse dokumentieren, reflektieren und b...",-1.0,,,1.0,304,304
38403,0,-1.0,,,-1.0,,,-0.5,,"k√∂lnische bibliotheksgesellschaft, b√ºhnen k√∂ln...",-1.0,,,-1.0,,,0.5,1909aaaa,19911794,1.0,20000,20000,1.0,bk,bk,1.0,[],[],-1.0,,,-1.0,,,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.501876,textbuch von emanuel schikaneder ; szenische e...,[wolfgang amade mozart],0.872,"schikanederemanuel, loewenfeldhans, leflerhein...",schikanederemanuel,-1.0,,,-1.0,,,0.805556,die zauberfl√∂te,"die zauberfl√∂te, textbuch, k√∂ln 1794",-1.0,,,0.0,1,72
39154,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.75,2011aaaa,2011uuuu,1.0,10300,10300,1.0,vm,vm,1.0,[],[],-1.0,,,-1.0,,,0.777778,1 1,1,-0.5,schl√∂ndorffvolker,,0.700586,volker schl√∂ndorff ; nach dem roman von max fr...,regie: volker schl√∂ndorff ; nach dem roman von...,0.612207,"frischmax, junkersdorfeberhard","schl√∂ndorffvolker, frischmax, shepardsam, depl...",0.765873,"suhrkamp, absolut medien",absolut medien,-1.0,,,1.0,homo faber,homo faber,-1.0,,,1.0,1 117,1 117
40391,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,19201929,1955uuuu,1.0,10200,10200,1.0,mu,mu,1.0,[],[],-1.0,,,0.2,245.0,10425.0,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.693399,von emanuel schikaneder ; [musik von] wolfgang...,w.a. mozart ; klavierauszug nach dem in der de...,-0.5,"schikanederemanuel, kienzlwilhelm",,-0.5,universal edition,,-1.0,,,0.747354,"die zauberfl√∂te, il flauto magico : oper in zw...","die zauberfl√∂te, oper in zwei aufz√ºgen [kv 620...",-1.0,,,0.733333,1 167,1
41780,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,aaaaaaaa,1949uuuu,1.0,20000,20000,1.0,bk,bk,1.0,[],[],-1.0,,,-1.0,,,0.0,2620,7 7,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.707364,von mozart ; dichtung von emanuel schikaneder ...,von w.a. mozart ; dichtung nach ludwig gisecke...,0.736885,"schikanederemanuel, krusegeorg richard","giseckeludwig, schikanederemanuel",-0.5,,apollo-verlag,-1.0,,,1.0,"die zauberfl√∂te, oper in zwei aufz√ºgen","die zauberfl√∂te, oper in zwei aufz√ºgen",-1.0,,,0.0,74,40
43901,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,aaaaaaaa,2014uuuu,0.428571,40100,30100,1.0,mu,mu,0.0,[],[978-3-944063-13-3],-1.0,,,-1.0,,,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,1.0,wolfgang amadeus mozart,wolfgang amadeus mozart,0.499823,"mathisedith, karajanherbert von",zamperoniluca,-0.5,,amorverlag,-1.0,,,0.759596,zauberfl√∂te,die zauberfl√∂te,-1.0,,,0.0,3,1 72
44865,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,19aaaaaa,18801900,1.0,10100,10100,1.0,mu,mu,1.0,[],[],-1.0,,,-0.5,4355.0,,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.559229,a german opera by emanuel schikaneder ; music ...,von w. a. mozart,-1.0,,,-1.0,,,-1.0,,,0.809524,die zauberfl√∂te,"die zauberfl√∂te, oper in 2 aufz√ºgen",-1.0,,,0.733333,1,1 132


false_predicted_duplicates for DecisionTreeClassifier_CV


Unnamed: 0,duplicates,coordinate_E_delta,coordinate_E_x,coordinate_E_y,coordinate_N_delta,coordinate_N_x,coordinate_N_y,corporate_full_delta,corporate_full_x,corporate_full_y,doi_delta,doi_x,doi_y,edition_delta,edition_x,edition_y,exactDate_delta,exactDate_x,exactDate_y,format_postfix_delta,format_postfix_x,format_postfix_y,format_prefix_delta,format_prefix_x,format_prefix_y,isbn_delta,isbn_x,isbn_y,ismn_delta,ismn_x,ismn_y,musicid_delta,musicid_x,musicid_y,part_delta,part_x,part_y,person_100_delta,person_100_x,person_100_y,person_245c_delta,person_245c_x,person_245c_y,person_700_delta,person_700_x,person_700_y,pubinit_delta,pubinit_x,pubinit_y,scale_delta,scale_x,scale_y,ttlfull_245_delta,ttlfull_245_x,ttlfull_245_y,ttlfull_246_delta,ttlfull_246_x,ttlfull_246_y,volumes_delta,volumes_x,volumes_y
628,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.625,1999aaaa,1998uuuu,1.0,20000,20000,1.0,bk,bk,1.0,[3-495-47879-5],[3-495-47879-5],-1.0,,,-1.0,,,-0.5,,57.0,1.0,fluryandreas,fluryandreas,1.0,andreas flury,andreas flury,-1.0,,,1.0,alber,alber,-1.0,,,1.0,"der moralische status der tiere, henry salt, p...","der moralische status der tiere, henry salt, p...",-1.0,,,0.777778,316,356
1222,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.625,19aaaaaa,19601969,0.428571,10200,10100,1.0,mu,mu,1.0,[],[],-1.0,,,0.25,245.0,4355.0,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.694948,von emanuel schikaneder ; [musik von] wolfgang...,by emanuel schikaneder ; music by wolfgang ama...,-0.5,,"schikanederemanuel, aberthermann",-0.5,,eulenburg,-1.0,,,0.747011,"die zauberfl√∂te, oper in zwei akten [kv 620] =...","die zauberfl√∂te, a german opera : k√∂chel no. 620",-0.5,"die zauberfl√∂te, ausgabe f√ºr gesang und klavier",,1.0,1,1
2597,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.75,1970aaaa,1970uuuu,1.0,10200,10200,1.0,mu,mu,1.0,[],[],-1.0,,,-0.5,,4553.0,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.743253,wolfgang amadeus mozart ; text von emanuel sch...,w.a. mozart ; libretto: emanuel schikaneder ; ...,0.749495,"grubergernot, moehnheinz, schikanederemanuel","moehnheinz, schikanederemanuel",1.0,b√§renreiter,b√§renreiter,-1.0,,,0.83484,"die zauberfl√∂te, eine deutsche oper in zwei au...","die zauberfl√∂te, eine deutsche oper in zwei au...",-1.0,,,1.0,1,1
22113,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-0.5,1.0,,0.75,2007aaaa,2007uuuu,1.0,20000,20000,1.0,bk,bk,0.0,"[978-3-7815-1531-4, 3-7815-1531-1]",[978-3-7815-1513-0],-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.856549,hrsg. michaela gl√§ser-zikuda ... [et al.],hrsg. von michaela gl√§ser-zikuda ... [et al.],-0.5,gl√§ser-zikudamichaela,,-0.5,klinkhardt,,-1.0,,,1.0,"lernprozesse dokumentieren, reflektieren und b...","lernprozesse dokumentieren, reflektieren und b...",-1.0,,,1.0,304,304
22615,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.625,19aaaaaa,1955uuuu,0.111111,10200,40000,1.0,mu,mu,1.0,[],[],-1.0,,,0.142857,245.0,4357412.0,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.573413,von emanuel schikaneder ; [musik von] wolfgang...,mozart,-0.5,,mozartwolfgang amadeus,-0.5,,polydor,-1.0,,,0.74143,"die zauberfl√∂te, oper in zwei akten [kv 620] =...",die zauberfl√∂te [¬Ö],-0.5,"die zauberfl√∂te, ausgabe f√ºr gesang und klavier",,0.777778,1,1 2
36341,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,aaaaaaaa,181uuuuu,0.428571,10200,10000,1.0,mu,mu,1.0,[],[],-1.0,,,0.0,245.0,117.0,-0.5,,2.0,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.715134,von emanuel schikaneder ; [musik von] wolfgang...,von w. a. mozart ; in vollst√§ndigem clavieraus...,0.681818,"kienzlwilhelm, schikanederemanuel",schikanederemanuel,-0.5,,heckel,-1.0,,,0.752891,"die zauberfl√∂te (il flauto magico), oper in zw...","die zauberfl√∂te, grosse oper in zwei aufz√ºgen",-0.5,"die zauberfl√∂te, ausgabe f√ºr gesang und klavier",,1.0,1,1
39154,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.75,2011aaaa,2011uuuu,1.0,10300,10300,1.0,vm,vm,1.0,[],[],-1.0,,,-1.0,,,0.777778,1 1,1.0,-0.5,schl√∂ndorffvolker,,0.700586,volker schl√∂ndorff ; nach dem roman von max fr...,regie: volker schl√∂ndorff ; nach dem roman von...,0.612207,"frischmax, junkersdorfeberhard","schl√∂ndorffvolker, frischmax, shepardsam, depl...",0.765873,"suhrkamp, absolut medien",absolut medien,-1.0,,,1.0,homo faber,homo faber,-1.0,,,1.0,1 117,1 117
56493,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-0.5,2.0,,0.75,1990aaaa,1990uuuu,1.0,20000,20000,1.0,bk,bk,1.0,[],[],-1.0,,,-1.0,,,-0.5,,4.0,-1.0,,,0.691477,sigrid kessler ... [et al.],sigrid kessler... [et al.] ; [√©d.:] interkanto...,-0.5,kesslersigrid,,1.0,staatlicher lehrmittelverlag,staatlicher lehrmittelverlag,-1.0,,,0.842656,"bonne chance!, cours de langue fran√ßaise : tro...","bonne chance!, cours de langue fran√ßaise, deux...",-1.0,,,-0.5,,589
58382,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,aaaaaaaa,1970uuuu,1.0,20000,20000,1.0,bk,bk,1.0,[],[],-1.0,,,-1.0,,,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.596141,von emanuel schikaneder [und j.g.k.l. giesecke],wolfgang amadeus mozart ; nacherz√§hlt von ingr...,0.659968,"schikanederemanuel, gieseckecarl ludwig","weixelbaumeringrid, riera rochasroque",-0.5,,ueberreuter,-1.0,,,0.74359,"die zauberfl√∂te, (il flauto magico) : deutsche...",die zauberfl√∂te,-1.0,,,1.0,1,1
59185,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.75,19001999,19601969,1.0,10100,10100,1.0,mu,mu,1.0,[],[],-1.0,,,0.25,3714.0,4355.0,-0.5,12,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.654449,von w. a. mozart ; revidiert und mit einf√ºhrun...,by emanuel schikaneder ; music by wolfgang ama...,-0.5,,"schikanederemanuel, aberthermann",-0.5,,eulenburg,-1.0,,,0.770833,die zauberfl√∂te,"die zauberfl√∂te, a german opera : k√∂chel no. 620",-1.0,,,1.0,1,1


false_predicted_duplicates for RandomForestClassifier


Unnamed: 0,duplicates,coordinate_E_delta,coordinate_E_x,coordinate_E_y,coordinate_N_delta,coordinate_N_x,coordinate_N_y,corporate_full_delta,corporate_full_x,corporate_full_y,doi_delta,doi_x,doi_y,edition_delta,edition_x,edition_y,exactDate_delta,exactDate_x,exactDate_y,format_postfix_delta,format_postfix_x,format_postfix_y,format_prefix_delta,format_prefix_x,format_prefix_y,isbn_delta,isbn_x,isbn_y,ismn_delta,ismn_x,ismn_y,musicid_delta,musicid_x,musicid_y,part_delta,part_x,part_y,person_100_delta,person_100_x,person_100_y,person_245c_delta,person_245c_x,person_245c_y,person_700_delta,person_700_x,person_700_y,pubinit_delta,pubinit_x,pubinit_y,scale_delta,scale_x,scale_y,ttlfull_245_delta,ttlfull_245_x,ttlfull_245_y,ttlfull_246_delta,ttlfull_246_x,ttlfull_246_y,volumes_delta,volumes_x,volumes_y
1290,0,-1.0,,,-1.0,,,0.8,les arts florissants,arts florissants,-1.0,,,-1.0,,,0.75,1996aaaa,1996uuuu,0.428571,40100,40000,1.0,mu,mu,1.0,[],[],-1.0,,,1.0,630.0,630.0,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.591919,wolfgang amadeus mozart ; libretto: emanuel sc...,mozart,0.775282,"mozartwolfgang amadeus, schikanederemanuel, ch...","christiewilliam, dessaynatalie, mannionrosa, b...",-0.5,,erato,-1.0,,,0.709588,"die zauberfl√∂te, the magic flute : opera in tw...","die zauberfl√∂te, kv 620",-1.0,,,0.777778,2,2 1
2597,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.75,1970aaaa,1970uuuu,1.0,10200,10200,1.0,mu,mu,1.0,[],[],-1.0,,,-0.5,,4553.0,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.743253,wolfgang amadeus mozart ; text von emanuel sch...,w.a. mozart ; libretto: emanuel schikaneder ; ...,0.749495,"grubergernot, moehnheinz, schikanederemanuel","moehnheinz, schikanederemanuel",1.0,b√§renreiter,b√§renreiter,-1.0,,,0.83484,"die zauberfl√∂te, eine deutsche oper in zwei au...","die zauberfl√∂te, eine deutsche oper in zwei au...",-1.0,,,1.0,1,1
22113,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-0.5,1.0,,0.75,2007aaaa,2007uuuu,1.0,20000,20000,1.0,bk,bk,0.0,"[978-3-7815-1531-4, 3-7815-1531-1]",[978-3-7815-1513-0],-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.856549,hrsg. michaela gl√§ser-zikuda ... [et al.],hrsg. von michaela gl√§ser-zikuda ... [et al.],-0.5,gl√§ser-zikudamichaela,,-0.5,klinkhardt,,-1.0,,,1.0,"lernprozesse dokumentieren, reflektieren und b...","lernprozesse dokumentieren, reflektieren und b...",-1.0,,,1.0,304,304
39154,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.75,2011aaaa,2011uuuu,1.0,10300,10300,1.0,vm,vm,1.0,[],[],-1.0,,,-1.0,,,0.777778,1 1,1,-0.5,schl√∂ndorffvolker,,0.700586,volker schl√∂ndorff ; nach dem roman von max fr...,regie: volker schl√∂ndorff ; nach dem roman von...,0.612207,"frischmax, junkersdorfeberhard","schl√∂ndorffvolker, frischmax, shepardsam, depl...",0.765873,"suhrkamp, absolut medien",absolut medien,-1.0,,,1.0,homo faber,homo faber,-1.0,,,1.0,1 117,1 117
58588,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.75,20091990,20081991,1.0,10300,10300,1.0,vm,vm,1.0,[],[],-1.0,,,-0.5,502430.0,,-1.0,,,-1.0,,,0.794715,ein film von volker schl√∂ndorff ; nach dem rom...,ein film von volker schl√∂ndorff ; nach dem rom...,0.871796,"schl√∂ndorffvolker, wurlitzerrudy, frischmax, s...","schl√∂ndorffvolker, frischmax, arvantisjorgos, ...",1.0,kinowelt home entertainment,kinowelt home entertainment,-1.0,,,1.0,homo faber,homo faber,-1.0,,,1.0,1 109,1 109
58868,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.75,1984aaaa,1984uuuu,1.0,20000,20000,1.0,bk,bk,1.0,[],[],-1.0,,,-1.0,,,-0.5,1,,-1.0,,,0.75,sigrid kessler... [et al.] ; [√©d.:] interkanto...,sigrid kessler... [et al.],-0.5,,kesslersigrid,1.0,staatlicher lehrmittelverlag,staatlicher lehrmittelverlag,-1.0,,,0.759266,"bonne chance!, cours de langue fran√ßaise, troi...","bonne chance !, cours de langue fran√ßaise troi...",-1.0,,,1.0,237,237
85901,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,1.0,20151475,20151475,1.0,20000,20000,1.0,bk,bk,1.0,"[88-7922-121-3, 978-88-7922-121-4]",[978-88-7922-121-4],-1.0,,,-1.0,,,1.0,1 1,1 1,1.0,petrarcafrancesco,petrarcafrancesco,1.0,francesco petrarca ; commento di bernardo lapini,francesco petrarca ; commento di bernardo lapini,1.0,lapinibernardo,lapinibernardo,-0.5,"adv, biblioteca cantonale di lugano",,-1.0,,,0.691756,"trionfi, riedizione accurata dell'incunabolo c...",trionfi,-1.0,,,0.703704,2,2 494 102
97551,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.75,2011aaaa,20111991,1.0,10300,10300,1.0,vm,vm,1.0,[],[],-1.0,,,-1.0,,,0.777778,1 1,1,-0.5,schl√∂ndorffvolker,,0.741667,volker schl√∂ndorff ; nach dem roman von max fr...,volker schl√∂ndorff,0.590909,"frischmax, junkersdorfeberhard","schl√∂ndorffvolker, frischmax, shepardsam, delp...",0.777778,"suhrkamp, absolut medien",suhrkamp,-1.0,,,0.747967,homo faber,"homo faber, nach dem roman von max frisch",-1.0,,,1.0,1 117,1 117


false_predicted_duplicates for SVC


Unnamed: 0,duplicates,coordinate_E_delta,coordinate_E_x,coordinate_E_y,coordinate_N_delta,coordinate_N_x,coordinate_N_y,corporate_full_delta,corporate_full_x,corporate_full_y,doi_delta,doi_x,doi_y,edition_delta,edition_x,edition_y,exactDate_delta,exactDate_x,exactDate_y,format_postfix_delta,format_postfix_x,format_postfix_y,format_prefix_delta,format_prefix_x,format_prefix_y,isbn_delta,isbn_x,isbn_y,ismn_delta,ismn_x,ismn_y,musicid_delta,musicid_x,musicid_y,part_delta,part_x,part_y,person_100_delta,person_100_x,person_100_y,person_245c_delta,person_245c_x,person_245c_y,person_700_delta,person_700_x,person_700_y,pubinit_delta,pubinit_x,pubinit_y,scale_delta,scale_x,scale_y,ttlfull_245_delta,ttlfull_245_x,ttlfull_245_y,ttlfull_246_delta,ttlfull_246_x,ttlfull_246_y,volumes_delta,volumes_x,volumes_y
2597,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.75,1970aaaa,1970uuuu,1.0,10200,10200,1.0,mu,mu,1.0,[],[],-1.0,,,-0.5,,4553.0,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.743253,wolfgang amadeus mozart ; text von emanuel sch...,w.a. mozart ; libretto: emanuel schikaneder ; ...,0.749495,"grubergernot, moehnheinz, schikanederemanuel","moehnheinz, schikanederemanuel",1.0,b√§renreiter,b√§renreiter,-1.0,,,0.83484,"die zauberfl√∂te, eine deutsche oper in zwei au...","die zauberfl√∂te, eine deutsche oper in zwei au...",-1.0,,,1.0,1,1
2688,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-0.5,2.0,,0.5,1836aaaa,18001899,1.0,10200,10200,1.0,mu,mu,1.0,[],[],-1.0,,,-0.5,134.0,,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.748787,von w. a. mozart ; [text von emanuel schikaned...,w.a. mozart ; in vollst√§ndigem clavierauszug m...,-0.5,schikanederemanuel,,-0.5,meyer,,-1.0,,,0.97575,"die zauberfl√∂te, grosse oper in zwei aufz√ºgen ...","die zauberfl√∂te, grosse oper in zwei aufz√ºgen ...",-1.0,,,1.0,1,1
7361,0,-1.0,,,-1.0,,,-0.5,interkantonale lehrmittelzentrale,,-1.0,,,-1.0,,,0.75,1981aaaa,1981uuuu,1.0,20000,20000,1.0,bk,bk,1.0,[],[],-1.0,,,-1.0,,,-1.0,,,-0.5,binderheidy,,0.74821,"heidy binder, sigrid kessler, charlotte ritsch...",heidy binder... [et al.] ; [√©d.:] interkantona...,-0.5,"kesslersigrid, ritschardcharlotte",,0.963845,staatlicher lehrmittelverlag,staatlicher lehrmittelverl.,-1.0,,,0.921296,"bonne chance!, cours de langue fran√ßaise, prem...","bonne chance!, cours de langue fran√ßaise, prem...",-1.0,,,0.777778,4,413
11282,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.625,1989aaaa,1999uuuu,1.0,20000,20000,1.0,bk,bk,1.0,[2-08-070552-0],[2-08-070552-0],-1.0,,,-1.0,,,1.0,552 552,552 552,1.0,voltaire,voltaire,0.90777,"voltaire ; introd., notes, bibliogr., chronolo...","voltaire ; introd., notes , bibliogr., chronol...",1.0,pomeauren√©,pomeauren√©,0.923077,flammarion,gf flammarion,-1.0,,,1.0,trait√© sur la tol√©rance,trait√© sur la tol√©rance,-1.0,,,1.0,192,192
17455,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,1983aaaa,1970uuuu,1.0,10200,10200,1.0,mu,mu,1.0,[],[],-1.0,,,1.0,4553.0,4553.0,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.730483,wolfgang amadeus mozart ; text von emanuel sch...,w.a. mozart ; libretto: emanuel schikaneder ; ...,0.636111,"grubergernot, orelalfred, moehnheinz, schikane...","moehnheinz, schikanederemanuel",1.0,b√§renreiter,b√§renreiter,-1.0,,,0.83484,"die zauberfl√∂te, eine deutsche oper in zwei au...","die zauberfl√∂te, eine deutsche oper in zwei au...",-1.0,,,0.733333,1 221,1
39154,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.75,2011aaaa,2011uuuu,1.0,10300,10300,1.0,vm,vm,1.0,[],[],-1.0,,,-1.0,,,0.777778,1 1,1,-0.5,schl√∂ndorffvolker,,0.700586,volker schl√∂ndorff ; nach dem roman von max fr...,regie: volker schl√∂ndorff ; nach dem roman von...,0.612207,"frischmax, junkersdorfeberhard","schl√∂ndorffvolker, frischmax, shepardsam, depl...",0.765873,"suhrkamp, absolut medien",absolut medien,-1.0,,,1.0,homo faber,homo faber,-1.0,,,1.0,1 117,1 117
42386,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.75,1987aaaa,1987uuuu,1.0,20000,20000,1.0,bk,bk,1.0,[],[],-1.0,,,-1.0,,,-1.0,,,-1.0,,,-0.5,,heidy binder... [et al.],-0.5,,binderheidy,0.977778,[staatlicher lehrmittelverlag],staatlicher lehrmittelverlag,-1.0,,,0.910003,"bonne chance !, cours de langue fran√ßaise prem...","bonne chance !, cours de langue fran√ßaise prem...",-1.0,,,0.555556,134,145
58588,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.75,20091990,20081991,1.0,10300,10300,1.0,vm,vm,1.0,[],[],-1.0,,,-0.5,502430.0,,-1.0,,,-1.0,,,0.794715,ein film von volker schl√∂ndorff ; nach dem rom...,ein film von volker schl√∂ndorff ; nach dem rom...,0.871796,"schl√∂ndorffvolker, wurlitzerrudy, frischmax, s...","schl√∂ndorffvolker, frischmax, arvantisjorgos, ...",1.0,kinowelt home entertainment,kinowelt home entertainment,-1.0,,,1.0,homo faber,homo faber,-1.0,,,1.0,1 109,1 109
58868,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.75,1984aaaa,1984uuuu,1.0,20000,20000,1.0,bk,bk,1.0,[],[],-1.0,,,-1.0,,,-0.5,1,,-1.0,,,0.75,sigrid kessler... [et al.] ; [√©d.:] interkanto...,sigrid kessler... [et al.],-0.5,,kesslersigrid,1.0,staatlicher lehrmittelverlag,staatlicher lehrmittelverlag,-1.0,,,0.759266,"bonne chance!, cours de langue fran√ßaise, troi...","bonne chance !, cours de langue fran√ßaise troi...",-1.0,,,1.0,237,237
70130,0,-1.0,,,-1.0,,,1.0,interkantonale lehrmittelzentrale,interkantonale lehrmittelzentrale,-1.0,,,-1.0,,,0.625,1981aaaa,1984uuuu,1.0,20000,20000,1.0,bk,bk,1.0,[],[],-1.0,,,-1.0,,,-1.0,,,0.579953,binderheidy,kesslersigrid,0.77778,"heidy binder, sigrid kessler, charlotte ritsch...","sigrid keller, caty laubscher, helen wallimann...",0.558939,"kesslersigrid, ritschardcharlotte","laubschercaty, wallimannhelen",1.0,staatlicher lehrmittelverlag,staatlicher lehrmittelverlag,-1.0,,,0.894141,"bonne chance!, cours de langue fran√ßaise, prem...","bonne chance!, cours de langue fran√ßaise, exig...",-1.0,,,1.0,4,4


false_predicted_duplicates for SVC_CV


Unnamed: 0,duplicates,coordinate_E_delta,coordinate_E_x,coordinate_E_y,coordinate_N_delta,coordinate_N_x,coordinate_N_y,corporate_full_delta,corporate_full_x,corporate_full_y,doi_delta,doi_x,doi_y,edition_delta,edition_x,edition_y,exactDate_delta,exactDate_x,exactDate_y,format_postfix_delta,format_postfix_x,format_postfix_y,format_prefix_delta,format_prefix_x,format_prefix_y,isbn_delta,isbn_x,isbn_y,ismn_delta,ismn_x,ismn_y,musicid_delta,musicid_x,musicid_y,part_delta,part_x,part_y,person_100_delta,person_100_x,person_100_y,person_245c_delta,person_245c_x,person_245c_y,person_700_delta,person_700_x,person_700_y,pubinit_delta,pubinit_x,pubinit_y,scale_delta,scale_x,scale_y,ttlfull_245_delta,ttlfull_245_x,ttlfull_245_y,ttlfull_246_delta,ttlfull_246_x,ttlfull_246_y,volumes_delta,volumes_x,volumes_y
2597,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.75,1970aaaa,1970uuuu,1.0,10200,10200,1.0,mu,mu,1.0,[],[],-1.0,,,-0.5,,4553.0,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.743253,wolfgang amadeus mozart ; text von emanuel sch...,w.a. mozart ; libretto: emanuel schikaneder ; ...,0.749495,"grubergernot, moehnheinz, schikanederemanuel","moehnheinz, schikanederemanuel",1.0,b√§renreiter,b√§renreiter,-1.0,,,0.83484,"die zauberfl√∂te, eine deutsche oper in zwei au...","die zauberfl√∂te, eine deutsche oper in zwei au...",-1.0,,,1.0,1,1
2688,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-0.5,2.0,,0.5,1836aaaa,18001899,1.0,10200,10200,1.0,mu,mu,1.0,[],[],-1.0,,,-0.5,134.0,,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.748787,von w. a. mozart ; [text von emanuel schikaned...,w.a. mozart ; in vollst√§ndigem clavierauszug m...,-0.5,schikanederemanuel,,-0.5,meyer,,-1.0,,,0.97575,"die zauberfl√∂te, grosse oper in zwei aufz√ºgen ...","die zauberfl√∂te, grosse oper in zwei aufz√ºgen ...",-1.0,,,1.0,1,1
7361,0,-1.0,,,-1.0,,,-0.5,interkantonale lehrmittelzentrale,,-1.0,,,-1.0,,,0.75,1981aaaa,1981uuuu,1.0,20000,20000,1.0,bk,bk,1.0,[],[],-1.0,,,-1.0,,,-1.0,,,-0.5,binderheidy,,0.74821,"heidy binder, sigrid kessler, charlotte ritsch...",heidy binder... [et al.] ; [√©d.:] interkantona...,-0.5,"kesslersigrid, ritschardcharlotte",,0.963845,staatlicher lehrmittelverlag,staatlicher lehrmittelverl.,-1.0,,,0.921296,"bonne chance!, cours de langue fran√ßaise, prem...","bonne chance!, cours de langue fran√ßaise, prem...",-1.0,,,0.777778,4,413
11282,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.625,1989aaaa,1999uuuu,1.0,20000,20000,1.0,bk,bk,1.0,[2-08-070552-0],[2-08-070552-0],-1.0,,,-1.0,,,1.0,552 552,552 552,1.0,voltaire,voltaire,0.90777,"voltaire ; introd., notes, bibliogr., chronolo...","voltaire ; introd., notes , bibliogr., chronol...",1.0,pomeauren√©,pomeauren√©,0.923077,flammarion,gf flammarion,-1.0,,,1.0,trait√© sur la tol√©rance,trait√© sur la tol√©rance,-1.0,,,1.0,192,192
17455,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,1983aaaa,1970uuuu,1.0,10200,10200,1.0,mu,mu,1.0,[],[],-1.0,,,1.0,4553.0,4553.0,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.730483,wolfgang amadeus mozart ; text von emanuel sch...,w.a. mozart ; libretto: emanuel schikaneder ; ...,0.636111,"grubergernot, orelalfred, moehnheinz, schikane...","moehnheinz, schikanederemanuel",1.0,b√§renreiter,b√§renreiter,-1.0,,,0.83484,"die zauberfl√∂te, eine deutsche oper in zwei au...","die zauberfl√∂te, eine deutsche oper in zwei au...",-1.0,,,0.733333,1 221,1
39154,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.75,2011aaaa,2011uuuu,1.0,10300,10300,1.0,vm,vm,1.0,[],[],-1.0,,,-1.0,,,0.777778,1 1,1,-0.5,schl√∂ndorffvolker,,0.700586,volker schl√∂ndorff ; nach dem roman von max fr...,regie: volker schl√∂ndorff ; nach dem roman von...,0.612207,"frischmax, junkersdorfeberhard","schl√∂ndorffvolker, frischmax, shepardsam, depl...",0.765873,"suhrkamp, absolut medien",absolut medien,-1.0,,,1.0,homo faber,homo faber,-1.0,,,1.0,1 117,1 117
42386,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.75,1987aaaa,1987uuuu,1.0,20000,20000,1.0,bk,bk,1.0,[],[],-1.0,,,-1.0,,,-1.0,,,-1.0,,,-0.5,,heidy binder... [et al.],-0.5,,binderheidy,0.977778,[staatlicher lehrmittelverlag],staatlicher lehrmittelverlag,-1.0,,,0.910003,"bonne chance !, cours de langue fran√ßaise prem...","bonne chance !, cours de langue fran√ßaise prem...",-1.0,,,0.555556,134,145
58588,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.75,20091990,20081991,1.0,10300,10300,1.0,vm,vm,1.0,[],[],-1.0,,,-0.5,502430.0,,-1.0,,,-1.0,,,0.794715,ein film von volker schl√∂ndorff ; nach dem rom...,ein film von volker schl√∂ndorff ; nach dem rom...,0.871796,"schl√∂ndorffvolker, wurlitzerrudy, frischmax, s...","schl√∂ndorffvolker, frischmax, arvantisjorgos, ...",1.0,kinowelt home entertainment,kinowelt home entertainment,-1.0,,,1.0,homo faber,homo faber,-1.0,,,1.0,1 109,1 109
58868,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.75,1984aaaa,1984uuuu,1.0,20000,20000,1.0,bk,bk,1.0,[],[],-1.0,,,-1.0,,,-0.5,1,,-1.0,,,0.75,sigrid kessler... [et al.] ; [√©d.:] interkanto...,sigrid kessler... [et al.],-0.5,,kesslersigrid,1.0,staatlicher lehrmittelverlag,staatlicher lehrmittelverlag,-1.0,,,0.759266,"bonne chance!, cours de langue fran√ßaise, troi...","bonne chance !, cours de langue fran√ßaise troi...",-1.0,,,1.0,237,237
70130,0,-1.0,,,-1.0,,,1.0,interkantonale lehrmittelzentrale,interkantonale lehrmittelzentrale,-1.0,,,-1.0,,,0.625,1981aaaa,1984uuuu,1.0,20000,20000,1.0,bk,bk,1.0,[],[],-1.0,,,-1.0,,,-1.0,,,0.579953,binderheidy,kesslersigrid,0.77778,"heidy binder, sigrid kessler, charlotte ritsch...","sigrid keller, caty laubscher, helen wallimann...",0.558939,"kesslersigrid, ritschardcharlotte","laubschercaty, wallimannhelen",1.0,staatlicher lehrmittelverlag,staatlicher lehrmittelverlag,-1.0,,,0.894141,"bonne chance!, cours de langue fran√ßaise, prem...","bonne chance!, cours de langue fran√ßaise, exig...",-1.0,,,1.0,4,4


false_predicted_duplicates for NeuralNetwork


Unnamed: 0,duplicates,coordinate_E_delta,coordinate_E_x,coordinate_E_y,coordinate_N_delta,coordinate_N_x,coordinate_N_y,corporate_full_delta,corporate_full_x,corporate_full_y,doi_delta,doi_x,doi_y,edition_delta,edition_x,edition_y,exactDate_delta,exactDate_x,exactDate_y,format_postfix_delta,format_postfix_x,format_postfix_y,format_prefix_delta,format_prefix_x,format_prefix_y,isbn_delta,isbn_x,isbn_y,ismn_delta,ismn_x,ismn_y,musicid_delta,musicid_x,musicid_y,part_delta,part_x,part_y,person_100_delta,person_100_x,person_100_y,person_245c_delta,person_245c_x,person_245c_y,person_700_delta,person_700_x,person_700_y,pubinit_delta,pubinit_x,pubinit_y,scale_delta,scale_x,scale_y,ttlfull_245_delta,ttlfull_245_x,ttlfull_245_y,ttlfull_246_delta,ttlfull_246_x,ttlfull_246_y,volumes_delta,volumes_x,volumes_y
2597,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.75,1970aaaa,1970uuuu,1.0,10200,10200,1.0,mu,mu,1.0,[],[],-1.0,,,-0.5,,4553.0,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.743253,wolfgang amadeus mozart ; text von emanuel sch...,w.a. mozart ; libretto: emanuel schikaneder ; ...,0.749495,"grubergernot, moehnheinz, schikanederemanuel","moehnheinz, schikanederemanuel",1.0,b√§renreiter,b√§renreiter,-1.0,,,0.83484,"die zauberfl√∂te, eine deutsche oper in zwei au...","die zauberfl√∂te, eine deutsche oper in zwei au...",-1.0,,,1.0,1,1
11282,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.625,1989aaaa,1999uuuu,1.0,20000,20000,1.0,bk,bk,1.0,[2-08-070552-0],[2-08-070552-0],-1.0,,,-1.0,,,1.0,552 552,552 552,1.0,voltaire,voltaire,0.90777,"voltaire ; introd., notes, bibliogr., chronolo...","voltaire ; introd., notes , bibliogr., chronol...",1.0,pomeauren√©,pomeauren√©,0.923077,flammarion,gf flammarion,-1.0,,,1.0,trait√© sur la tol√©rance,trait√© sur la tol√©rance,-1.0,,,1.0,192,192
17016,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.5,aaaaaaaa,19001950,0.428571,10200,10100,1.0,mu,mu,1.0,[],[],-1.0,,,-0.5,245.0,,-1.0,,,1.0,mozartwolfgang amadeus,mozartwolfgang amadeus,0.728388,von emanuel schikaneder ; [musik von] wolfgang...,von emanuel schikaneder ; musik von w. a. moza...,0.62896,"kienzlwilhelm, schikanederemanuel","aberthermann, schikanederemanuel, mozartwolfga...",-0.5,,e. eulenburg,-1.0,,,0.737566,"die zauberfl√∂te (il flauto magico), oper in zw...","die zauberfl√∂te, eine deutsche oper",-0.5,"die zauberfl√∂te, ausgabe f√ºr gesang und klavier",,0.733333,1,1 412
39154,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.75,2011aaaa,2011uuuu,1.0,10300,10300,1.0,vm,vm,1.0,[],[],-1.0,,,-1.0,,,0.777778,1 1,1,-0.5,schl√∂ndorffvolker,,0.700586,volker schl√∂ndorff ; nach dem roman von max fr...,regie: volker schl√∂ndorff ; nach dem roman von...,0.612207,"frischmax, junkersdorfeberhard","schl√∂ndorffvolker, frischmax, shepardsam, depl...",0.765873,"suhrkamp, absolut medien",absolut medien,-1.0,,,1.0,homo faber,homo faber,-1.0,,,1.0,1 117,1 117
40703,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.625,20071990,2009uuuu,1.0,10300,10300,1.0,vm,vm,1.0,[],[],-1.0,,,0.5,502023.0,502430.0,-1.0,,,-1.0,,,0.729134,ein film von volker schl√∂ndorff ; nach dem rom...,ein film von volker schl√∂ndorff ; musik stanle...,0.890534,"schl√∂ndorffvolker, wurlitzerrudolph, frischmax...","schl√∂ndorffvolker, myersstanley, wurlitzerrudo...",-0.5,,"kinowelt film entertainment, arthaus",-1.0,,,1.0,homo faber,homo faber,-1.0,,,0.866667,2 109,1 109
42386,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.75,1987aaaa,1987uuuu,1.0,20000,20000,1.0,bk,bk,1.0,[],[],-1.0,,,-1.0,,,-1.0,,,-1.0,,,-0.5,,heidy binder... [et al.],-0.5,,binderheidy,0.977778,[staatlicher lehrmittelverlag],staatlicher lehrmittelverlag,-1.0,,,0.910003,"bonne chance !, cours de langue fran√ßaise prem...","bonne chance !, cours de langue fran√ßaise prem...",-1.0,,,0.555556,134,145
58588,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.75,20091990,20081991,1.0,10300,10300,1.0,vm,vm,1.0,[],[],-1.0,,,-0.5,502430.0,,-1.0,,,-1.0,,,0.794715,ein film von volker schl√∂ndorff ; nach dem rom...,ein film von volker schl√∂ndorff ; nach dem rom...,0.871796,"schl√∂ndorffvolker, wurlitzerrudy, frischmax, s...","schl√∂ndorffvolker, frischmax, arvantisjorgos, ...",1.0,kinowelt home entertainment,kinowelt home entertainment,-1.0,,,1.0,homo faber,homo faber,-1.0,,,1.0,1 109,1 109
58868,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.75,1984aaaa,1984uuuu,1.0,20000,20000,1.0,bk,bk,1.0,[],[],-1.0,,,-1.0,,,-0.5,1,,-1.0,,,0.75,sigrid kessler... [et al.] ; [√©d.:] interkanto...,sigrid kessler... [et al.],-0.5,,kesslersigrid,1.0,staatlicher lehrmittelverlag,staatlicher lehrmittelverlag,-1.0,,,0.759266,"bonne chance!, cours de langue fran√ßaise, troi...","bonne chance !, cours de langue fran√ßaise troi...",-1.0,,,1.0,237,237
86933,0,-0.5,,e0074147,-0.5,,n0460833,-0.5,,eidgen√∂ssische landestopographie,-1.0,,,0.2,10.0,1899.0,0.5,2002aaaa,1902uuuu,0.428571,10200,10300,0.0,mu,mp,1.0,[],[],-0.5,m006450510,,-0.5,4553.0,,-0.5,,23 1902,-0.5,mozartwolfgang amadeus,,0.576962,w.a. mozart ; libretto: emanuel schikaneder ; ...,g.h. dufour direxit ; h. m√ºllhaupt sculpsit,0.628571,"moehnheinz, schikanederemanuel","dufourguillaume henri, m√ºllhaupthans heinrich",-0.5,b√§renreiter,,-0.5,,100000.0,0.462293,"die zauberfl√∂te, eine deutsche oper in zwei au...","domo d'ossola, arona",-1.0,,,1.0,1,1
90898,0,-1.0,,,-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.625,2000aaaa,2003uuuu,1.0,10300,10300,1.0,vm,vm,1.0,[],[],-1.0,,,-1.0,,,-1.0,,,-1.0,,,0.442063,james levine...[u.a.],w.a. mozart ; video director brian large,0.829553,"mozartwolfgang amadeus, levinejames, schikaned...","mozartwolfgang amadeus, largebrian",0.660819,deutsche grammophon,del prado,-1.0,,,0.798246,"die zauberfl√∂te, oper in zwei aufz√ºgen",die zauberfl√∂te,-0.5,"the magic flute : [dvd-video], la fl√ªte enchant√©e",,0.904762,1 0 169,1 169


###¬†Classification of Swissbib's Goldstandard Data

For some final consistency checks and for communication with Swissbib's project team, the records' docids can be printed. Just one example is shown below.

In [22]:
# Binary intermediary DataFrame file for docid's
df_index_docids = pd.read_pickle(os.path.join(
    path_goldstandard, 'index_docids_df.pkl'), compression=None)

In [23]:
#¬†Careful, last run may be the only one to hold the right indices, ...
#  ... depending on global run parameters
run = len(runtime_param_dict_list)-1

wrong_predictions = rsf.restore_dict_results(path_goldstandard, 'wrong_predictions_run_' + str(run) + '.pkl')

#for g in range(2):
for model in wrong_predictions.keys() :
#    fp = wrong_predictions[model][wrong_prediction_groups[g]].sort_index().index.tolist()
    fp = wrong_predictions[model][wrong_prediction_groups[0]].sort_index().index.tolist()
#    print(model, '-', wrong_prediction_groups[g], '-', fp)
    print(model, '-', wrong_prediction_groups[0], '-', fp)
    display(df_index_docids.iloc[fp])

DecisionTreeClassifier - false_predicted_uniques - [103289, 103290, 103810, 103816, 103872, 103959, 104094, 104098, 104099, 104100, 104101, 104105, 104113, 104114, 104147, 104493, 104497]


Unnamed: 0,035liste_x,035liste_y,docid_x,docid_y
103289,"[(OCoLC)887157168, (ABN)000223912]","[(OCoLC)73930687, (IDSBB)004654810]",00236865X,122976142
103290,"[(OCoLC)73930687, (IDSBB)004654810]","[(OCoLC)887157168, (ABN)000223912]",122976142,00236865X
103810,"[(OCoLC)610683747, (IDSBB)002888041]","[(OCoLC)611159941, (IDSLU)000434741]",080158323,027991695
103816,"[(OCoLC)883955957, (IDSBB)002137208]","[(OCoLC)883955957, (IDSBB)002137208]",105670464,105670464
103872,"[(OCoLC)881895230, (SBT)000288262]","[(OCoLC)637580555, (NEBIS)000013680]",03928722X,126589577
103959,"[(OCoLC)806965128, (SGBN)000433323]","[(OCoLC)806965128, (IDSLU)001278755]",053400631,482993472
104094,"[(OCoLC)604627094, (IDSBB)002888123]",[(RERO)1706143],090996828,214241025
104098,"[(OCoLC)604627094, (NEBIS)009407654]",[(RERO)1706143],195531280,214241025
104099,"[(OCoLC)604627094, (NEBIS)009407654]",[(RERO)R006024330],195531280,247704512
104100,[(RERO)1706143],"[(OCoLC)604627094, (IDSBB)002888123]",214241025,090996828


DecisionTreeClassifier_CV - false_predicted_uniques - [103289, 103290, 103816, 104094, 104098, 104099, 104100, 104101, 104105, 104113, 104114, 104493, 104497]


Unnamed: 0,035liste_x,035liste_y,docid_x,docid_y
103289,"[(OCoLC)887157168, (ABN)000223912]","[(OCoLC)73930687, (IDSBB)004654810]",00236865X,122976142
103290,"[(OCoLC)73930687, (IDSBB)004654810]","[(OCoLC)887157168, (ABN)000223912]",122976142,00236865X
103816,"[(OCoLC)883955957, (IDSBB)002137208]","[(OCoLC)883955957, (IDSBB)002137208]",105670464,105670464
104094,"[(OCoLC)604627094, (IDSBB)002888123]",[(RERO)1706143],090996828,214241025
104098,"[(OCoLC)604627094, (NEBIS)009407654]",[(RERO)1706143],195531280,214241025
104099,"[(OCoLC)604627094, (NEBIS)009407654]",[(RERO)R006024330],195531280,247704512
104100,[(RERO)1706143],"[(OCoLC)604627094, (IDSBB)002888123]",214241025,090996828
104101,[(RERO)1706143],"[(OCoLC)604627094, (NEBIS)009407654]",214241025,195531280
104105,[(RERO)R006024330],"[(OCoLC)604627094, (NEBIS)009407654]",247704512,195531280
104113,"[(OCoLC)731363725, (IDSBB)001974245]","[(OCoLC)881692520, (NEBIS)001584785]",092884261,139651489


RandomForestClassifier - false_predicted_uniques - [103289, 103290, 103810, 103813, 103872, 103959, 104098, 104099, 104101, 104105, 104110, 104113, 104114, 104493, 104497]


Unnamed: 0,035liste_x,035liste_y,docid_x,docid_y
103289,"[(OCoLC)887157168, (ABN)000223912]","[(OCoLC)73930687, (IDSBB)004654810]",00236865X,122976142
103290,"[(OCoLC)73930687, (IDSBB)004654810]","[(OCoLC)887157168, (ABN)000223912]",122976142,00236865X
103810,"[(OCoLC)610683747, (IDSBB)002888041]","[(OCoLC)611159941, (IDSLU)000434741]",080158323,027991695
103813,"[(OCoLC)249599050, (IDSLU)000900603]","[(OCoLC)883955957, (IDSBB)002137208]",028592700,105670464
103872,"[(OCoLC)881895230, (SBT)000288262]","[(OCoLC)637580555, (NEBIS)000013680]",03928722X,126589577
103959,"[(OCoLC)806965128, (SGBN)000433323]","[(OCoLC)806965128, (IDSLU)001278755]",053400631,482993472
104098,"[(OCoLC)604627094, (NEBIS)009407654]",[(RERO)1706143],195531280,214241025
104099,"[(OCoLC)604627094, (NEBIS)009407654]",[(RERO)R006024330],195531280,247704512
104101,[(RERO)1706143],"[(OCoLC)604627094, (NEBIS)009407654]",214241025,195531280
104105,[(RERO)R006024330],"[(OCoLC)604627094, (NEBIS)009407654]",247704512,195531280


SVC - false_predicted_uniques - [103289, 103290, 103609, 103614, 103618, 103802, 103810, 103813, 103821, 103823, 103842, 103872, 103959, 104003, 104086, 104098, 104099, 104101, 104105, 104110, 104113, 104114, 104493, 104497]


Unnamed: 0,035liste_x,035liste_y,docid_x,docid_y
103289,"[(OCoLC)887157168, (ABN)000223912]","[(OCoLC)73930687, (IDSBB)004654810]",00236865X,122976142
103290,"[(OCoLC)73930687, (IDSBB)004654810]","[(OCoLC)887157168, (ABN)000223912]",122976142,00236865X
103609,"[(OCoLC)808021169, (BGR)000119170]",[(RERO)R006457716],020561318,250956500
103614,"[(OCoLC)796203880, (IDSBB)005967090]","[(OCoLC)808021169, (BGR)000119170]",114467048,020561318
103618,[(RERO)R006457716],"[(OCoLC)808021169, (BGR)000119170]",250956500,020561318
103802,"[(OCoLC)847628370, (SGBN)001196568]","[(OCoLC)884775669, (IDSBB)005975902, (RERO)vtl...",359526373,126081794
103810,"[(OCoLC)610683747, (IDSBB)002888041]","[(OCoLC)611159941, (IDSLU)000434741]",080158323,027991695
103813,"[(OCoLC)249599050, (IDSLU)000900603]","[(OCoLC)883955957, (IDSBB)002137208]",028592700,105670464
103821,"[(OCoLC)611159941, (IDSLU)000464498]","[(OCoLC)611159941, (IDSLU)000464498]",028968867,028968867
103823,"[(OCoLC)611159941, (IDSLU)000464498]","[(VAUD)991019165679702852, (RNV)000396480-41bc...",028968867,405473354


SVC_CV - false_predicted_uniques - [103289, 103290, 103609, 103614, 103618, 103653, 103810, 103813, 103823, 103867, 104003, 104086, 104098, 104099, 104101, 104105, 104110, 104225, 104226, 104493, 104497]


Unnamed: 0,035liste_x,035liste_y,docid_x,docid_y
103289,"[(OCoLC)887157168, (ABN)000223912]","[(OCoLC)73930687, (IDSBB)004654810]",00236865X,122976142
103290,"[(OCoLC)73930687, (IDSBB)004654810]","[(OCoLC)887157168, (ABN)000223912]",122976142,00236865X
103609,"[(OCoLC)808021169, (BGR)000119170]",[(RERO)R006457716],020561318,250956500
103614,"[(OCoLC)796203880, (IDSBB)005967090]","[(OCoLC)808021169, (BGR)000119170]",114467048,020561318
103618,[(RERO)R006457716],"[(OCoLC)808021169, (BGR)000119170]",250956500,020561318
103653,"[(OCoLC)890130815, (NEBIS)003645770]","[(OCoLC)695884327, (IDSLU)000901978]",15172783X,021555524
103810,"[(OCoLC)610683747, (IDSBB)002888041]","[(OCoLC)611159941, (IDSLU)000434741]",080158323,027991695
103813,"[(OCoLC)249599050, (IDSLU)000900603]","[(OCoLC)883955957, (IDSBB)002137208]",028592700,105670464
103823,"[(OCoLC)611159941, (IDSLU)000464498]","[(VAUD)991019165679702852, (RNV)000396480-41bc...",028968867,405473354
103867,"[(OCoLC)611643448, (IDSSG)000416104]","[(OCoLC)611643448, (IDSSG)000416104]",032531982,032531982


NeuralNetwork - false_predicted_uniques - [103289, 103290, 103566, 103609, 103614, 103618, 103653, 103729, 103810, 103821, 103865, 103867, 104086, 104098, 104099, 104101, 104105, 104110, 104225, 104226, 104493, 104497, 104582, 104588]


Unnamed: 0,035liste_x,035liste_y,docid_x,docid_y
103289,"[(OCoLC)887157168, (ABN)000223912]","[(OCoLC)73930687, (IDSBB)004654810]",00236865X,122976142
103290,"[(OCoLC)73930687, (IDSBB)004654810]","[(OCoLC)887157168, (ABN)000223912]",122976142,00236865X
103566,"[(OCoLC)218626148, (IDSLU)000449481]","[(OCoLC)218626148, (IDSSG)000338145]",022315098,035554215
103609,"[(OCoLC)808021169, (BGR)000119170]",[(RERO)R006457716],020561318,250956500
103614,"[(OCoLC)796203880, (IDSBB)005967090]","[(OCoLC)808021169, (BGR)000119170]",114467048,020561318
103618,[(RERO)R006457716],"[(OCoLC)808021169, (BGR)000119170]",250956500,020561318
103653,"[(OCoLC)890130815, (NEBIS)003645770]","[(OCoLC)695884327, (IDSLU)000901978]",15172783X,021555524
103729,"[(OCoLC)611356565, (IDSLU)000546873]","[(OCoLC)611356565, (IDSLU)000546873]",023403969,023403969
103810,"[(OCoLC)610683747, (IDSBB)002888041]","[(OCoLC)611159941, (IDSLU)000434741]",080158323,027991695
103821,"[(OCoLC)611159941, (IDSLU)000464498]","[(OCoLC)611159941, (IDSLU)000464498]",028968867,028968867


A final analysis of the resulting false predictions of the models had to be done with Swissbib's project team. Some specific sample records of pairs in the test data set were chosen that had a target classification of uniques but had been predicted as a pair of duplicates and vice versa. Swissbib's project team was asked, for an explanation why these sample records had been classified in their goldstandard as they were. The answer was twofold.

- The base data of the goldstandard is subject to change in their sources. Therefore, new criteria might have been added that would change today's classification of the goldstandard data, compared to its original classification done manually one year ago.
- In one example of Decision Tree Classifier discussed, the classification of machine learning model was judged as being correct despite of the opposite classification in the goldstandard data. This means that the machine learning result was found to be better than the target result out of the goldstandard data.

This surprising and promising answer has to be discussed deeper with Swissbib's project team. As a next step, each of the false predicted records could be analysed with the goal to improve the goldstandard data according to the its latest state of data. With an improved goldstandard data set, a new training could be initiated, fitting new models. This could be done iteratively until a satisfying level of quality has been reached.

## Comparison with Literature

In an early phase of this capstone project stands a review article [[Padm2012](./A_References.ipynb#padm2012)] that has been referenced in the [proposal](./project-proposal-andreas-jud.ipynb). In this section, the findings of this capstone project are to be compared with this review article.

[[Padm2012](./A_References.ipynb#padm2012)] has been the starting point and motivation for the idea of implementing a Neural Network for resolving the problem at hand. While the authors describe the implementation of a one-layer network, it has turned out early in this capstone project that a network with two hidden layers would produce slightly better results. The first difference between [[Padm2012](./A_References.ipynb#padm2012)] and this capstone project is the complexity of the implemented network and the second difference must be the implementation with library Keras of this project. Keras is a newly released library, not available in year 2012. Therefore, the implementation of this capstone project may be interesting in this approach.

One fundamental difference of this project and the approach described in [[Padm2012](./A_References.ipynb#padm2012)] is due to the features. While in this project, one single similarity metric is used for each attribute, [[Padm2012](./A_References.ipynb#padm2012)] uses three different similarity metric for each attribute. These three similarity metrics are chosen and kept fix for the input attributes. This is a very different approach to the one chosen here, but it would be interesting to compare a solution with several distinct similarity metric for one attribute of the data of this project.

The authors of [[Padm2012](./A_References.ipynb#padm2012)] claim for a resulting accuracy value of nearly 80%. This accuracy has been reached with the help of synthetic data, though. The reason for this small accuracy value compared to the accuracy values reached in this project remains open. However it can be said for sure that the data used for training and performance measuring of a machine learning model is critical.

## Summary and Outlook

This chapter of the capstone project builds the bracket of all chapters included. Each chapter is implemented and written in a separate Jupyter Notebook that can be run separately. This chapter executes each chapter as an alternative way of running and collecting the results of the project.

In this chapter, the implemented models have been fully run several times. Each execution has been done with a different set of parameters. The total of all runs explores the implemented models in different aspects. As an overall result, the findings can be summarised with the following items.

- The overall best models for the problem of deduplication with Swissbib's data can be found with the Ensemble classifier family. Fitting a Neural Network shows results of comparable performance, although the results of the Ensemble classifiers could not be exceeded. As for the Neural Network it is remarkable that networks with more than one layer and with a relatively high number of neurons exhibit the best results within a small range quality. A network with nore layers and more neurons learn more interactions between the features.
- The general and satisfying experience with the calculated models has been their stability. On changing some parameters of run conditions, the models exhibited about the same results with only minor differences. This observation generates a feeling of security of having found reliable and reproducible results in the course of this capstone project.

The following items list options for improvement of the results of this capstone project as a terminating outlook.

- As described in the [proposal](./project-proposal-andreas-jud.ipynb), Swissbib has implemented a sophisticated preprocessing of their data before its deduplication. Some examples can be mentioned here explicitly.
    - Attributes $\texttt{century}$ and $\texttt{decade}$ are extracted parts of attribute $\texttt{exactDate}$. Taking the same approach in this capstone project, has been discussed in chapter [Data Analysis](./1_DataAnalysis.ipynb). The decision taken there, could be revised and models could be fitted based on a feature matrix with additional, although redundant, information.
    - Attributes $\texttt{edition}$, $\texttt{musicid}$, $\texttt{pages}$, $\texttt{part}$, and $\texttt{volumes}$ are generated by elaborated algorithms each, that interpret number digits and even literal number expressions of different languages. This kind of preprocessing has been tried to copy with the stripping to number digits in this capstone project. The implementation here has remained very rudimentary, though. In a better implementation, Swissbib's data preprocessing could be used for the preparation of the goldstandard training data. It would be interesting to observe, whether some better prepared data would result in even better results.
    - Nearly all attributes of Swissbib's data are optional, see chapter [Data Analysis](./1_DataAnalysis.ipynb) for details. In the course of this capstone project, the decision has been taken to mark missing attributes with a negative number. An alternative implementation would be to mark missing attributes with an additional feature, a special flag in the feature matrix. It would be interesting to see the effect of an implementation alike.
- The similarity metrics applied for the attributes of Swissbib data have been chosen by an iterative process. [[Chri2012](./A_References.ipynb#chri2012)] cites some literature on how to find the best similarity metric for an attribute with the help of machine learning. Some effort would have to be taken for implementing this idea. Although the results of this capstone project come out to a satisfying level, a deeper engagement into the similarities used would be rewarding.
- A data record based understanding of the resulting prediction has been tried at several stages of the capstone project. At the end, the deep interpretability of the models, its feature-wise understanding remains an open issue that would require some more elaborate technics and effort.
- Subsection [Classification of Swissbib's Goldstandard Data](#Classification-of-Swissbib's-Goldstandard-Data) discusses the quality of the goldstandard data. An iterative improvement of the goldstandard data is proposed which requires detailed discussion with Swissbib's project team. Eventually, these discussions will be done, depending on the availability of Swissbib's resources.
- The [proposal](./project-proposal-andreas-jud.ipynb) of the capstone project suggests one of the models designed here to be implemented in [Apache Flink](https://flink.apache.org/) or [Apache Spark](https://spark.apache.org/). At the moment, [Apache Beam](https://beam.apache.org/) would be an even more attractive opportunity to explore. Goal of such an implementation would be the replacement of Swissbib's deduplication logic in production with a solution resulting from this capstone project. Considering the big amount of Swissbib data and the $O(N^2)$ scaling ot the pair comparison introduced here, one important step would have to be resolved, before any deployment into operation. Swissbib's amount of data only can be processed after a forceful preclustering, described in the [proposal](./project-proposal-andreas-jud.ipynb). Some feasible preclustering solutions are described in [[Chri2012](./A_References.ipynb#chri2012)]. An implementation would still have to be explored. But this is a different project.

These options conclude the discussion of the results of this capstone project.