# Arxiv Search with OpenCLIP and LanceDB

In this example we'll build a Arxiv Search or a recommender based on semantic search using LanceDB. We'll also compare the results with keyword based search on Nomic's atlast


## OpenCLIP

![CLIP (1)](https://github.com/lancedb/vectordb-recipes/assets/15766192/11b3b900-0bcb-4a4a-8fd4-804611c85972)


OpenCLIP an open source implementation of OpenAI's CLIP (Contrastive Language-Image Pre-training) as is available with various backends

In [1]:
# SETUP
!pip install lancedb open_clip_torch arxiv --q

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m115.1/115.1 kB[0m [31m1.6 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.5/1.5 MB[0m [31m12.9 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m21.6/21.6 MB[0m [31m45.5 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m53.4/53.4 kB[0m [31m7.0 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.2/2.2 MB[0m [31m107.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m81.1/81.1 kB[0m [31m13.0 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m98.7/98.7 kB[0m [31m9.2 MB/s[0m eta [36m0:00:00[0m
[?25h  Building wheel for sgmllib3k (setup.py) ... [?25l[?25hdone


In [2]:
!pip install pandas



## Creating table from arxiv API

### Embedding Paper Summary using CLIP


In [3]:
import torch
import open_clip
import pandas as pd
from open_clip import tokenizer
from tqdm import tqdm
from collections import defaultdict
import arxiv
import lancedb


def embed_func_clip(text):
    model, _, preprocess = open_clip.create_model_and_transforms(
        "ViT-B-32", pretrained="laion2b_s34b_b79k"
    )
    tokenizer = open_clip.get_tokenizer("ViT-B-32")
    with torch.no_grad():
        text_features = model.encode_text(tokenizer(text))
    return text_features

### Create a DataFrame of the desired length

Here we'll use arxiv python utility to interact with arxiv api and get the document data

In [4]:
def get_arxiv_df(embed_func, length=100):
    results = arxiv.Search(
        query="cat:cs.AI OR cat:cs.CV OR cat:stat.ML",
        max_results=length,
        sort_by=arxiv.SortCriterion.Relevance,
        sort_order=arxiv.SortOrder.Descending,
    ).results()
    df = defaultdict(list)
    for result in tqdm(results, total=length):
        try:
            df["title"].append(result.title)
            df["summary"].append(result.summary)
            df["authors"].append(str(result.authors))
            df["url"].append(result.entry_id)
            df["vector"].append(embed_func(result.summary).tolist()[0])

        except Exception as e:
            print("error: ", e)

    return pd.DataFrame(df)

In [5]:
LENGTH = 100  # Reduce the size for demo


def create_table():
    db = lancedb.connect("db")
    df = get_arxiv_df(embed_func_clip, LENGTH)

    tbl = db.create_table("arxiv", data=df, mode="overwrite")

    return tbl

In [6]:
tbl = create_table()

  ).results()
The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


open_clip_pytorch_model.bin:   0%|          | 0.00/605M [00:00<?, ?B/s]

100%|██████████| 100/100 [02:57<00:00,  1.77s/it]


In [7]:
import lancedb

db = lancedb.connect("db")

if "arxiv" not in db.table_names():
    tbl = create_table()
else:
    tbl = db.open_table("arxiv")

In [8]:
len(tbl)

100

## Semantic Search by concepts or summary

In [9]:
from IPython.display import display, HTML


def search_table(query, embed_func=embed_func_clip, lim=3):
    db = lancedb.connect("db")
    tbl = db.open_table("arxiv")

    embs = embed_func(query)

    return tbl.search(embs.tolist()[0]).limit(3).to_pandas()

In [10]:
len(tbl)

100

In [11]:
# MobileSAM paper abstract 2nd half
query = """
Many of such applications need to be run on resource-constraint edge devices,
like mobile phones. In this work, we aim to make SAM mobile-friendly by replacing the heavyweight
image encoder with a lightweight one. A naive way to train such a new SAM as in the original SAM
paper leads to unsatisfactory performance, especially when limited training sources are available. We
find that this is mainly caused by the coupled optimization of the image encoder and mask decoder,
motivated by which we propose decoupled distillation. Concretely, we distill the knowledge from
the heavy image encoder (ViT-H in the original SAM) to a lightweight image encoder, which can be
automatically compatible with the mask decoder in the original SAM. The training can be completed
on a single GPU within less than one day, and the resulting lightweight SAM is termed MobileSAM
which is more than 60 times smaller yet performs on par with the original SAM. For inference speed,
With a single GPU, MobileSAM runs around 10ms per image: 8ms on the image encoder and 4ms
on the mask decoder. With superior performance, our MobileSAM is around 5 times faster than the
concurrent FastSAM and 7 times smaller, making it more suitable for mobile applications. Moreover,
we show that MobileSAM can run relatively smoothly on CPU
"""

result = search_table(query)

result.pop("vector")
display(HTML(result.to_html()))

Unnamed: 0,title,summary,authors,url,_distance
0,XFlow: Cross-modal Deep Neural Networks for Audiovisual Classification,"In recent years, there have been numerous developments towards solving\nmultimodal tasks, aiming to learn a stronger representation than through a\nsingle modality. Certain aspects of the data can be particularly useful in this\ncase - for example, correlations in the space or time domain across modalities\n- but should be wisely exploited in order to benefit from their full predictive\npotential. We propose two deep learning architectures with multimodal\ncross-connections that allow for dataflow between several feature extractors\n(XFlow). Our models derive more interpretable features and achieve better\nperformances than models which do not exchange representations, usefully\nexploiting correlations between audio and visual data, which have a different\ndimensionality and are nontrivially exchangeable. Our work improves on existing\nmultimodal deep learning algorithms in two essential ways: (1) it presents a\nnovel method for performing cross-modality (before features are learned from\nindividual modalities) and (2) extends the previously proposed\ncross-connections which only transfer information between streams that process\ncompatible data. Illustrating some of the representations learned by the\nconnections, we analyse their contribution to the increase in discrimination\nability and reveal their compatibility with a lip-reading network intermediate\nrepresentation. We provide the research community with Digits, a new dataset\nconsisting of three data types extracted from videos of people saying the\ndigits 0-9. Results show that both cross-modal architectures outperform their\nbaselines (by up to 11.5%) when evaluated on the AVletters, CUAVE and Digits\ndatasets, achieving state-of-the-art results.","[arxiv.Result.Author('Cătălina Cangea'), arxiv.Result.Author('Petar Veličković'), arxiv.Result.Author('Pietro Liò')]",http://arxiv.org/abs/1709.00572v2,40.346901
1,Dualing GANs,"Generative adversarial nets (GANs) are a promising technique for modeling a\ndistribution from samples. It is however well known that GAN training suffers\nfrom instability due to the nature of its maximin formulation. In this paper,\nwe explore ways to tackle the instability problem by dualizing the\ndiscriminator. We start from linear discriminators in which case conjugate\nduality provides a mechanism to reformulate the saddle point objective into a\nmaximization problem, such that both the generator and the discriminator of\nthis 'dualing GAN' act in concert. We then demonstrate how to extend this\nintuition to non-linear formulations. For GANs with linear discriminators our\napproach is able to remove the instability in training, while for GANs with\nnonlinear discriminators our approach provides an alternative to the commonly\nused GAN training algorithm.","[arxiv.Result.Author('Yujia Li'), arxiv.Result.Author('Alexander Schwing'), arxiv.Result.Author('Kuan-Chieh Wang'), arxiv.Result.Author('Richard Zemel')]",http://arxiv.org/abs/1706.06216v1,40.449284
2,Domain Generalization for Object Recognition with Multi-task Autoencoders,"The problem of domain generalization is to take knowledge acquired from a\nnumber of related domains where training data is available, and to then\nsuccessfully apply it to previously unseen domains. We propose a new feature\nlearning algorithm, Multi-Task Autoencoder (MTAE), that provides good\ngeneralization performance for cross-domain object recognition.\n Our algorithm extends the standard denoising autoencoder framework by\nsubstituting artificially induced corruption with naturally occurring\ninter-domain variability in the appearance of objects. Instead of\nreconstructing images from noisy versions, MTAE learns to transform the\noriginal image into analogs in multiple related domains. It thereby learns\nfeatures that are robust to variations across domains. The learnt features are\nthen used as inputs to a classifier.\n We evaluated the performance of the algorithm on benchmark image recognition\ndatasets, where the task is to learn features from multiple datasets and to\nthen predict the image label from unseen datasets. We found that (denoising)\nMTAE outperforms alternative autoencoder-based models as well as the current\nstate-of-the-art algorithms for domain generalization.","[arxiv.Result.Author('Muhammad Ghifary'), arxiv.Result.Author('W. Bastiaan Kleijn'), arxiv.Result.Author('Mengjie Zhang'), arxiv.Result.Author('David Balduzzi')]",http://arxiv.org/abs/1508.07680v1,41.127644


In [12]:
# Exmaple 2: Search via a concept you're reading
query = """
What is the general idea behind self-supervised learning.
"""

result = search_table(query)

result.pop("vector")
display(HTML(result.to_html()))

Unnamed: 0,title,summary,authors,url,_distance
0,A General Theory for Training Learning Machine,"Though the deep learning is pushing the machine learning to a new stage,\nbasic theories of machine learning are still limited. The principle of\nlearning, the role of the a prior knowledge, the role of neuron bias, and the\nbasis for choosing neural transfer function and cost function, etc., are still\nfar from clear. In this paper, we present a general theoretical framework for\nmachine learning. We classify the prior knowledge into common and\nproblem-dependent parts, and consider that the aim of learning is to maximally\nincorporate them. The principle we suggested for maximizing the former is the\ndesign risk minimization principle, while the neural transfer function, the\ncost function, as well as pretreatment of samples, are endowed with the role\nfor maximizing the latter. The role of the neuron bias is explained from a\ndifferent angle. We develop a Monte Carlo algorithm to establish the\ninput-output responses, and we control the input-output sensitivity of a\nlearning machine by controlling that of individual neurons. Applications of\nfunction approaching and smoothing, pattern recognition and classification, are\nprovided to illustrate how to train general learning machines based on our\ntheory and algorithm. Our method may in addition induce new applications, such\nas the transductive inference.",[arxiv.Result.Author('Hong Zhao')],http://arxiv.org/abs/1704.06885v1,33.708359
1,Learning Visual Reasoning Without Strong Priors,"Achieving artificial visual reasoning - the ability to answer image-related\nquestions which require a multi-step, high-level process - is an important step\ntowards artificial general intelligence. This multi-modal task requires\nlearning a question-dependent, structured reasoning process over images from\nlanguage. Standard deep learning approaches tend to exploit biases in the data\nrather than learn this underlying structure, while leading methods learn to\nvisually reason successfully but are hand-crafted for reasoning. We show that a\ngeneral-purpose, Conditional Batch Normalization approach achieves\nstate-of-the-art results on the CLEVR Visual Reasoning benchmark with a 2.4%\nerror rate. We outperform the next best end-to-end method (4.5%) and even\nmethods that use extra supervision (3.1%). We probe our model to shed light on\nhow it reasons, showing it has learned a question-dependent, multi-step\nprocess. Previous work has operated under the assumption that visual reasoning\ncalls for a specialized architecture, but we show that a general architecture\nwith proper conditioning can learn to visually reason effectively.","[arxiv.Result.Author('Ethan Perez'), arxiv.Result.Author('Harm de Vries'), arxiv.Result.Author('Florian Strub'), arxiv.Result.Author('Vincent Dumoulin'), arxiv.Result.Author('Aaron Courville')]",http://arxiv.org/abs/1707.03017v5,36.282284
2,Encoder Based Lifelong Learning,"This paper introduces a new lifelong learning solution where a single model\nis trained for a sequence of tasks. The main challenge that vision systems face\nin this context is catastrophic forgetting: as they tend to adapt to the most\nrecently seen task, they lose performance on the tasks that were learned\npreviously. Our method aims at preserving the knowledge of the previous tasks\nwhile learning a new one by using autoencoders. For each task, an\nunder-complete autoencoder is learned, capturing the features that are crucial\nfor its achievement. When a new task is presented to the system, we prevent the\nreconstructions of the features with these autoencoders from changing, which\nhas the effect of preserving the information on which the previous tasks are\nmainly relying. At the same time, the features are given space to adjust to the\nmost recent environment as only their projection into a low dimension\nsubmanifold is controlled. The proposed system is evaluated on image\nclassification tasks and shows a reduction of forgetting over the\nstate-of-the-art","[arxiv.Result.Author('Amal Rannen Triki'), arxiv.Result.Author('Rahaf Aljundi'), arxiv.Result.Author('Mathew B. Blaschko'), arxiv.Result.Author('Tinne Tuytelaars')]",http://arxiv.org/abs/1704.01920v1,37.25425


# Full Text Search
In text retrieval, full-text search refers to techniques for searching a single computer-stored document or a collection in a full-text database. Full-text search is distinguished from searches based on metadata or on parts of the original texts represented in databases

LanceDB now provides **experimental** support for full text search. This is currently Python only. We plan to push the integration down to Rust in the future to make this available for JS as well.


In [16]:
!pip install tantivy

Collecting tantivy
  Downloading tantivy-0.21.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.2 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m4.2/4.2 MB[0m [31m17.2 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: tantivy
Successfully installed tantivy-0.21.0


### Build FTS index for the summary
Here, we're building the FTS index using python bindings for tantivy. You can also build the index for any other text column. A full-text index stores information about significant words and their location within one or more columns of a database table

In [17]:
# This cell might take a few mins
tbl.create_fts_index("summary")

In [18]:
## FTS via title
result = (
    tbl.search("What is the general idea behind self-supervised learning.")
    .limit(10)
    .to_pandas()
)

result.pop("vector")

display(HTML(result.to_html()))

Unnamed: 0,title,summary,authors,url,score
0,Expert Gate: Lifelong Learning with a Network of Experts,"In this paper we introduce a model of lifelong learning, based on a Network\nof Experts. New tasks / experts are learned and added to the model\nsequentially, building on what was learned before. To ensure scalability of\nthis process,data from previous tasks cannot be stored and hence is not\navailable when learning a new task. A critical issue in such context, not\naddressed in the literature so far, relates to the decision which expert to\ndeploy at test time. We introduce a set of gating autoencoders that learn a\nrepresentation for the task at hand, and, at test time, automatically forward\nthe test sample to the relevant expert. This also brings memory efficiency as\nonly one expert network has to be loaded into memory at any given time.\nFurther, the autoencoders inherently capture the relatedness of one task to\nanother, based on which the most relevant prior model to be used for training a\nnew expert, with finetuning or learning without-forgetting, can be selected. We\nevaluate our method on image classification and video prediction problems.","[arxiv.Result.Author('Rahaf Aljundi'), arxiv.Result.Author('Punarjay Chakravarty'), arxiv.Result.Author('Tinne Tuytelaars')]",http://arxiv.org/abs/1611.06194v2,4.703215
1,Approximate Bayesian Image Interpretation using Generative Probabilistic Graphics Programs,"The idea of computer vision as the Bayesian inverse problem to computer\ngraphics has a long history and an appealing elegance, but it has proved\ndifficult to directly implement. Instead, most vision tasks are approached via\ncomplex bottom-up processing pipelines. Here we show that it is possible to\nwrite short, simple probabilistic graphics programs that define flexible\ngenerative models and to automatically invert them to interpret real-world\nimages. Generative probabilistic graphics programs consist of a stochastic\nscene generator, a renderer based on graphics software, a stochastic likelihood\nmodel linking the renderer's output and the data, and latent variables that\nadjust the fidelity of the renderer and the tolerance of the likelihood model.\nRepresentations and algorithms from computer graphics, originally designed to\nproduce high-quality images, are instead used as the deterministic backbone for\nhighly approximate and stochastic generative models. This formulation combines\nprobabilistic programming, computer graphics, and approximate Bayesian\ncomputation, and depends only on general-purpose, automatic inference\ntechniques. We describe two applications: reading sequences of degraded and\nadversarially obscured alphanumeric characters, and inferring 3D road models\nfrom vehicle-mounted camera images. Each of the probabilistic graphics programs\nwe present relies on under 20 lines of probabilistic code, and supports\naccurate, approximately Bayesian inferences about ambiguous real-world images.","[arxiv.Result.Author('Vikash K. Mansinghka'), arxiv.Result.Author('Tejas D. Kulkarni'), arxiv.Result.Author('Yura N. Perov'), arxiv.Result.Author('Joshua B. Tenenbaum')]",http://arxiv.org/abs/1307.0060v1,4.515473
2,Learning Visual Reasoning Without Strong Priors,"Achieving artificial visual reasoning - the ability to answer image-related\nquestions which require a multi-step, high-level process - is an important step\ntowards artificial general intelligence. This multi-modal task requires\nlearning a question-dependent, structured reasoning process over images from\nlanguage. Standard deep learning approaches tend to exploit biases in the data\nrather than learn this underlying structure, while leading methods learn to\nvisually reason successfully but are hand-crafted for reasoning. We show that a\ngeneral-purpose, Conditional Batch Normalization approach achieves\nstate-of-the-art results on the CLEVR Visual Reasoning benchmark with a 2.4%\nerror rate. We outperform the next best end-to-end method (4.5%) and even\nmethods that use extra supervision (3.1%). We probe our model to shed light on\nhow it reasons, showing it has learned a question-dependent, multi-step\nprocess. Previous work has operated under the assumption that visual reasoning\ncalls for a specialized architecture, but we show that a general architecture\nwith proper conditioning can learn to visually reason effectively.","[arxiv.Result.Author('Ethan Perez'), arxiv.Result.Author('Harm de Vries'), arxiv.Result.Author('Florian Strub'), arxiv.Result.Author('Vincent Dumoulin'), arxiv.Result.Author('Aaron Courville')]",http://arxiv.org/abs/1707.03017v5,4.33287
3,Memory Aware Synapses: Learning what (not) to forget,"Humans can learn in a continuous manner. Old rarely utilized knowledge can be\noverwritten by new incoming information while important, frequently used\nknowledge is prevented from being erased. In artificial learning systems,\nlifelong learning so far has focused mainly on accumulating knowledge over\ntasks and overcoming catastrophic forgetting. In this paper, we argue that,\ngiven the limited model capacity and the unlimited new information to be\nlearned, knowledge has to be preserved or erased selectively. Inspired by\nneuroplasticity, we propose a novel approach for lifelong learning, coined\nMemory Aware Synapses (MAS). It computes the importance of the parameters of a\nneural network in an unsupervised and online manner. Given a new sample which\nis fed to the network, MAS accumulates an importance measure for each parameter\nof the network, based on how sensitive the predicted output function is to a\nchange in this parameter. When learning a new task, changes to important\nparameters can then be penalized, effectively preventing important knowledge\nrelated to previous tasks from being overwritten. Further, we show an\ninteresting connection between a local version of our method and Hebb's\nrule,which is a model for the learning process in the brain. We test our method\non a sequence of object recognition tasks and on the challenging problem of\nlearning an embedding for predicting $<$subject, predicate, object$>$ triplets.\nWe show state-of-the-art performance and, for the first time, the ability to\nadapt the importance of the parameters based on unlabeled data towards what the\nnetwork needs (not) to forget, which may vary depending on test conditions.","[arxiv.Result.Author('Rahaf Aljundi'), arxiv.Result.Author('Francesca Babiloni'), arxiv.Result.Author('Mohamed Elhoseiny'), arxiv.Result.Author('Marcus Rohrbach'), arxiv.Result.Author('Tinne Tuytelaars')]",http://arxiv.org/abs/1711.09601v4,4.307245
4,Explaining Aviation Safety Incidents Using Deep Temporal Multiple Instance Learning,"Although aviation accidents are rare, safety incidents occur more frequently\nand require a careful analysis to detect and mitigate risks in a timely manner.\nAnalyzing safety incidents using operational data and producing event-based\nexplanations is invaluable to airline companies as well as to governing\norganizations such as the Federal Aviation Administration (FAA) in the United\nStates. However, this task is challenging because of the complexity involved in\nmining multi-dimensional heterogeneous time series data, the lack of\ntime-step-wise annotation of events in a flight, and the lack of scalable tools\nto perform analysis over a large number of events. In this work, we propose a\nprecursor mining algorithm that identifies events in the multidimensional time\nseries that are correlated with the safety incident. Precursors are valuable to\nsystems health and safety monitoring and in explaining and forecasting safety\nincidents. Current methods suffer from poor scalability to high dimensional\ntime series data and are inefficient in capturing temporal behavior. We propose\nan approach by combining multiple-instance learning (MIL) and deep recurrent\nneural networks (DRNN) to take advantage of MIL's ability to learn using weakly\nsupervised data and DRNN's ability to model temporal behavior. We describe the\nalgorithm, the data, the intuition behind taking a MIL approach, and a\ncomparative analysis of the proposed algorithm with baseline models. We also\ndiscuss the application to a real-world aviation safety problem using data from\na commercial airline company and discuss the model's abilities and\nshortcomings, with some final remarks about possible deployment directions.",[arxiv.Result.Author('Vijay Manikandan Janakiraman')],http://arxiv.org/abs/1710.04749v2,4.206257
5,A General Theory for Training Learning Machine,"Though the deep learning is pushing the machine learning to a new stage,\nbasic theories of machine learning are still limited. The principle of\nlearning, the role of the a prior knowledge, the role of neuron bias, and the\nbasis for choosing neural transfer function and cost function, etc., are still\nfar from clear. In this paper, we present a general theoretical framework for\nmachine learning. We classify the prior knowledge into common and\nproblem-dependent parts, and consider that the aim of learning is to maximally\nincorporate them. The principle we suggested for maximizing the former is the\ndesign risk minimization principle, while the neural transfer function, the\ncost function, as well as pretreatment of samples, are endowed with the role\nfor maximizing the latter. The role of the neuron bias is explained from a\ndifferent angle. We develop a Monte Carlo algorithm to establish the\ninput-output responses, and we control the input-output sensitivity of a\nlearning machine by controlling that of individual neurons. Applications of\nfunction approaching and smoothing, pattern recognition and classification, are\nprovided to illustrate how to train general learning machines based on our\ntheory and algorithm. Our method may in addition induce new applications, such\nas the transductive inference.",[arxiv.Result.Author('Hong Zhao')],http://arxiv.org/abs/1704.06885v1,4.150894
6,A Brief Survey of Deep Reinforcement Learning,"Deep reinforcement learning is poised to revolutionise the field of AI and\nrepresents a step towards building autonomous systems with a higher level\nunderstanding of the visual world. Currently, deep learning is enabling\nreinforcement learning to scale to problems that were previously intractable,\nsuch as learning to play video games directly from pixels. Deep reinforcement\nlearning algorithms are also applied to robotics, allowing control policies for\nrobots to be learned directly from camera inputs in the real world. In this\nsurvey, we begin with an introduction to the general field of reinforcement\nlearning, then progress to the main streams of value-based and policy-based\nmethods. Our survey will cover central algorithms in deep reinforcement\nlearning, including the deep $Q$-network, trust region policy optimisation, and\nasynchronous advantage actor-critic. In parallel, we highlight the unique\nadvantages of deep neural networks, focusing on visual understanding via\nreinforcement learning. To conclude, we describe several current areas of\nresearch within the field.","[arxiv.Result.Author('Kai Arulkumaran'), arxiv.Result.Author('Marc Peter Deisenroth'), arxiv.Result.Author('Miles Brundage'), arxiv.Result.Author('Anil Anthony Bharath')]",http://arxiv.org/abs/1708.05866v2,3.549962
7,Interpretable Explanations of Black Boxes by Meaningful Perturbation,"As machine learning algorithms are increasingly applied to high impact yet\nhigh risk tasks, such as medical diagnosis or autonomous driving, it is\ncritical that researchers can explain how such algorithms arrived at their\npredictions. In recent years, a number of image saliency methods have been\ndeveloped to summarize where highly complex neural networks ""look"" in an image\nfor evidence for their predictions. However, these techniques are limited by\ntheir heuristic nature and architectural constraints. In this paper, we make\ntwo main contributions: First, we propose a general framework for learning\ndifferent kinds of explanations for any black box algorithm. Second, we\nspecialise the framework to find the part of an image most responsible for a\nclassifier decision. Unlike previous works, our method is model-agnostic and\ntestable because it is grounded in explicit and interpretable image\nperturbations.","[arxiv.Result.Author('Ruth Fong'), arxiv.Result.Author('Andrea Vedaldi')]",http://arxiv.org/abs/1704.03296v4,3.451381
8,Self corrective Perturbations for Semantic Segmentation and Classification,"Convolutional Neural Networks have been a subject of great importance over\nthe past decade and great strides have been made in their utility for producing\nstate of the art performance in many computer vision problems. However, the\nbehavior of deep networks is yet to be fully understood and is still an active\narea of research. In this work, we present an intriguing behavior: pre-trained\nCNNs can be made to improve their predictions by structurally perturbing the\ninput. We observe that these perturbations - referred as Guided Perturbations -\nenable a trained network to improve its prediction performance without any\nlearning or change in network weights. We perform various ablative experiments\nto understand how these perturbations affect the local context and feature\nrepresentations. Furthermore, we demonstrate that this idea can improve\nperformance of several existing approaches on semantic segmentation and scene\nlabeling tasks on the PASCAL VOC dataset and supervised classification tasks on\nMNIST and CIFAR10 datasets.","[arxiv.Result.Author('Swami Sankaranarayanan'), arxiv.Result.Author('Arpit Jain'), arxiv.Result.Author('Ser Nam Lim')]",http://arxiv.org/abs/1703.07928v2,3.417501
9,Graph Approximation and Clustering on a Budget,"We consider the problem of learning from a similarity matrix (such as\nspectral clustering and lowd imensional embedding), when computing pairwise\nsimilarities are costly, and only a limited number of entries can be observed.\nWe provide a theoretical analysis using standard notions of graph\napproximation, significantly generalizing previous results (which focused on\nspectral clustering with two clusters). We also propose a new algorithmic\napproach based on adaptive sampling, which experimentally matches or improves\non previous methods, while being considerably more general and computationally\ncheaper.","[arxiv.Result.Author('Ethan Fetaya'), arxiv.Result.Author('Ohad Shamir'), arxiv.Result.Author('Shimon Ullman')]",http://arxiv.org/abs/1406.2602v1,3.358721


### Analysing OpenCLIP embeddings on Nomic
Atlas is a platform for interacting with both small and internet scale unstructured datasets.

Atlas enables you to:
* Store, update and organize multi-million point datasets of unstructured text, images and embeddings.
* Visually interact with embeddings of your data from a web browser.
* Operate over unstructured data and embeddings with topic modeling, semantic duplicate clustering and semantic search.
* Generate high dimensional and two-dimensional embeddings of your data.

In [19]:
!pip install nomic --q

[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/41.2 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m41.2/41.2 kB[0m [31m1.5 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m62.5/62.5 kB[0m [31m3.1 MB/s[0m eta [36m0:00:00[0m
[?25h  Building wheel for nomic (setup.py) ... [?25l[?25hdone


### Nomic Login

We are using Nomic to use Atlas for visualizing dataset in clusters

In [20]:
!nomic login

[1m                                  [0m[1mAuthenticate with the Nomic API[0m[1m                                   [0m
[1m                                  [0m[4;94mhttps://atlas.nomic.ai/cli-login[0m[1m                                  [0m
[1m       [0m[1mClick the above link to retrieve your access token and then run `nomic login [0m[1m[[0m[1mtoken[0m[1m][0m[1m`[0m[1m        [0m


In [22]:
!nomic login [token] # Paste your token from Nomic Ai cli login -- here

In [25]:
from nomic import atlas
import numpy as np

# Get pandas dataframe from lancedb table
df = tbl.to_pandas()

# get embeddings from df
embs = np.array(df.pop("vector").to_list())

project = atlas.map_data(embeddings=embs, data=df.to_dict("records"))
print()

[32m2024-02-25 06:18:19.434[0m | [1mINFO    [0m | [36mnomic.dataset[0m:[36m_create_project[0m:[36m868[0m - [1mCreating dataset `inquisitive-jaynes`[0m
[32m2024-02-25 06:18:19.719[0m | [1mINFO    [0m | [36mnomic.atlas[0m:[36mmap_data[0m:[36m108[0m - [1mUploading data to Atlas.[0m
1it [00:00,  1.62it/s]
[32m2024-02-25 06:18:20.370[0m | [1mINFO    [0m | [36mnomic.dataset[0m:[36m_add_data[0m:[36m1536[0m - [1mUpload succeeded.[0m
[32m2024-02-25 06:18:20.374[0m | [1mINFO    [0m | [36mnomic.atlas[0m:[36mmap_data[0m:[36m123[0m - [1m`prasantdixit9876/inquisitive-jaynes`: Data upload succeeded to dataset`[0m
[32m2024-02-25 06:18:21.396[0m | [1mINFO    [0m | [36mnomic.dataset[0m:[36mcreate_index[0m:[36m1245[0m - [1mCreated map `inquisitive-jaynes` in dataset `prasantdixit9876/inquisitive-jaynes`: https://atlas.nomic.ai/data/prasantdixit9876/inquisitive-jaynes/map[0m





The visualizations are very interesting and is worth exploring more. In preliminary analysis, you can see that it succesfully creates clusters of similar types of papers. There are a few things that can be done next like comparing embeddings on various openclip models sizes and datasets.
<img width="1433" alt="Screenshot 2023-08-24 at 3 47 51 PM" src="https://github.com/lancedb/vectordb-recipes/assets/15766192/34ef88a3-2925-4450-abcd-1abc350ef3e4">