BiT-Tobacco-800

Overview

This project explores Document Embedding Vectors for Image Clustering and Reverse Image Search for Large scale document EDA. It is applicable to Entreprise type documents which are largely text based, black and white multi-page documents.

This project extends the BiT Colab Notebook to demonstrate how embeddings can be extracted from the model to be used for unsupervised learning. The note explores the Big Transfer model built by Google as explained in Big Transfer (BiT): General Visual Representation Learning.
The data is preloaded in the data folder to make the notebook easy to run without needing to download any additional packages. When initialised, the notebook will automatically run the set-up script in the background.

Note: Deepnote currently runs only on Cloud CPUs, GPU is not yet available

Paper Analysis

Abstract

Transfer of pre-trained representations improves sample efficiency and simplifies hyperparameter tuning when training deep neural networks for vision. We revisit the paradigm of pre-training on large supervised datasets and fine-tuning the model on a target task. We scale up pre-training, and propose a simple recipe that we call Big Transfer (BiT). By combining a few carefully selected components, and transferring using a simple heuristic, we achieve strong performance on over 20 datasets. BiT performs well across a surprisingly wide range of data regimes — from 1 example per class to 1 M total examples. BiT achieves 87.5% top-1 accuracy on ILSVRC-2012, 99.4% on CIFAR-10, and 76.3% on the 19 task Visual Task Adaptation Benchmark (VTAB). On small datasets, BiT attains 76.8% on ILSVRC-2012 with 10 examples per class, and 97.0% on CIFAR-10 with 10 examples per class. We conduct detailed analysis of the main components that lead to high transfer performance.

The goal of Big Transfer is to be the standard model to start any Computer Vision task, similar to the position of BERT in Natural Language Processing. To achieve this, Google Brain collate very large datasets to pre-train the model from, with the models working very well out of the box on 1-Shot examples and easy to fine tune on other image types.

BiT has 3 model variants:

BiT-S models pre-trained on 1.28M images from ILSVRC-2012
BiT-M models pre-trained on 14.2M images from ImageNet-21k
BiT-L models pre-trained on 300M images from JFT-300M.

One interesting point mentioned in the paper is that increases to model size will only have an affect if the corresponding dataset is big enough. Conversely, there is little benefit in pre-training smaller models on larger datasets.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
figures		figures
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
big_transfer_model.ipynb		big_transfer_model.ipynb
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BiT-Tobacco-800

Overview

Paper Analysis

Abstract

About

Releases

Packages

Languages

License

kayvane1/BiT-Tobacco-800

Folders and files

Latest commit

History

Repository files navigation

BiT-Tobacco-800

Overview

Paper Analysis

Abstract

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages