Finetuning any deep neural network for better embedding on neural search tasks
Finetuner allows one to tune the weights of any deep neural network for better embeddings on search tasks. It accompanies Jina to deliver the last mile of performance for domain-specific neural search applications.
finetuner.fit() - a one-liner that unlocks rich features such as
siamese/triplet network, interactive labeling, layer pruning, weights freezing, dimensionality reduction.
How does it work
pip install finetuner
|Usage||Do you have an embedding model?|
|Do you have labeled data?||Yes|
🟠 Have embedding model and labeled data
labeled_data are given by you already, simply do:
import finetuner tuned_model = finetuner.fit( embed_model, train_data=labeled_data )
🟢 Have embedding model and unlabeled data
You have an
embed_model to use, but no labeled data for finetuning this model. No worry, that's good enough already!
You can use Finetuner to interactive label data and train
embed_model as below:
import finetuner tuned_model = finetuner.fit( embed_model, train_data=unlabeled_data, interactive=True )
🟡 Have general model and labeled data
You have a
general_model which does not output embeddings. Luckily you provide some
labeled_data for training. No
worries, Finetuner can convert your model into an embedding model and train it via:
import finetuner tuned_model = finetuner.fit( general_model, train_data=labeled_data, to_embedding_model=True, layer_name='my_embedding_layer', freeze=['layer_1', 'layer_2'], )
🔵 Have general model and unlabeled data
You have a
general_model which is not for embeddings. Meanwhile, you don't have labeled data for training. But no
worries, Finetuner can help you train an embedding model with interactive labeling on-the-fly:
import finetuner tuned_model = finetuner.fit( general_model, train_data=unlabeled_data, interactive=True, to_embedding_model=True, layer_name='my_embedding_layer', freeze=['layer_1', 'layer_2'], )
Finetuning ResNet50 on CelebA
⚡To get the best experience, you will need a GPU-machine for this example. For CPU users, we provide finetuning a MLP on FashionMNIST and finetuning a Bi-LSTM on CovidQA that run out the box on low-profile machines. Check out more examples in our docs!
- Download CelebA-small dataset (7.7MB) and decompress it to
'./img_align_celeba'. Full dataset can be found here.
- Finetuner accepts Jina
DocumentArrayMemmap, so we load CelebA image into this format using a generator:
from jina.types.document.generators import from_files # please change the file path to your data path data = list(from_files('img_align_celeba/*.jpg', size=100, to_dataturi=True)) for doc in data: doc.load_uri_to_image_blob( height=224, width=224 ).set_image_blob_normalization().set_image_blob_channel_axis( -1, 0 ) # No need for changing channel axes line if you are using tf/keras
- Load pretrained ResNet50 using PyTorch/Keras/Paddle:
import torchvision model = torchvision.models.resnet50(pretrained=True)
import tensorflow as tf model = tf.keras.applications.resnet50.ResNet50(weights='imagenet')
import paddle model = paddle.vision.models.resnet50(pretrained=True)
- Start the Finetuner:
import finetuner finetuner.fit( model=model, interactive=True, train_data=data, freeze=True, to_embedding_model=True, input_size=(3, 224, 224), layer_name='my_embedding_layer', freeze=['layer_1', 'layer_2'], )
- After downloading the model and loading the data (takes ~20s depending on your network/CPU/GPU), your browser will open the Labeler UI as below. You can now label the relevance of celebrity faces via mouse/keyboard. The ResNet50 model will get finetuned and improved as you are labeling. If you are running this example on a CPU machine, it may take up to 20 seconds for each labeling round.
- Use Discussions to talk about your use cases, questions, and support queries.
- Join our Slack community and chat with other Jina community members about ideas.
- Join our Engineering All Hands meet-up to discuss your use case and learn Jina's new features.
- Subscribe to the latest video tutorials on our YouTube channel