Cognitive Sciences and Artificial Intelligence, Tilburg School of Humanities and Digital Sciences, Tilburg University, The Netherlands.
Abstract: Dimensionality reduction algorithms are commonly used for reducing the dimension of multi-dimensional data to visualize them on a standard display. Although many dimensionality reduction algorithms such as the t-distributed Stochastic Neighborhood Embedding aim to preserve close neighborhoods in low-dimensional space, they might not accomplish that for every sample of the data and eventually produce erroneous representations. In this study, we developed a supervised confidence estimation algorithm for detecting erroneous samples in embeddings. Our algorithm generates a confidence score for each sample in an embedding based on a distance-oriented score and a random forest regressor. We evaluate its performance on both intra- and inter-domain data and compare it with the neighborhood preservation ratio as our baseline. Our results showed that the resulting confidence score provides distinctive information about the correctness of any sample in an embedding compared to the baseline.
This code is the code of our journal publication:
[1] B. Ozgode Yigin and G. Saygili, "Confidence estimation for t‑SNE embeddings using random forest", International Journal of Machine Learning and Cybernetics, 2022.
Please cite our paper [1] in case you use the code.
Created by Busra Ozgode Yigin and Gorkem Saygili on 11-09-22.
Datasets:
- MNIST
- https://zenodo.org/record/4557712#.YUbplLgzZPY (AMB_integrated.zip)
Important Note: This code is under MIT License:
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
How to use:
- You can run conf_pred_with_existing_model function for using pre-trained existing models on AMB18 and MNIST dataset on your test set.
- You can run conf_pred_with_training function for training your own model on your own training set and make confidence predictions on your test set.