[![Kaggle](https://kaggle.com/static/images/open-in-kaggle.svg)](https://kaggle.com/kernels/welcome?src=https://github.com/pixeltable/pixeltable/blob/master/docs/release/tutorials/rag-operations.ipynb)&nbsp;&nbsp;
[![Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/pixeltable/pixeltable/blob/master/docs/release/tutorials/rag-operations.ipynb)

# RAG Operations in Pixeltable

In this tutorial, we'll explore Pixeltable's flexible handling of RAG operations on unstructured text. In a traditional AI workflow, such operations might be implemented as a Python script that runs on a periodic schedule or in response to certain events. In Pixeltable, as with everything else, they are implemented as persistent table operations that update incrementally as new data becomes available. In our tutorial workflow, we'll chunk Wikipedia articles in various ways with a document splitter, then apply several kinds of embeddings to the chunks.

## Set Up the Table Structure

We start by installing the necessary dependencies, creating a Pixeltable directory `rag_ops_demo` (if it doesn't already exist), and setting up the table structure for our new workflow.

In [1]:
%pip install -q sentence-transformers pixeltable

Note: you may need to restart the kernel to use updated packages.


In [2]:
import pixeltable as pxt

# Create the Pixeltable workspace
pxt.create_dir('rag_ops_demo', ignore_errors=True)

# Clean the database to ensure we're using fresh table instances
# (in case this demo has been run before)
pxt.drop_table('rag_ops_demo.short_char_chunks', ignore_errors=True)
pxt.drop_table('rag_ops_demo.short_chunks', ignore_errors=True)
pxt.drop_table('rag_ops_demo.chunks', ignore_errors=True)
pxt.drop_table('rag_ops_demo.sentences', ignore_errors=True)
pxt.drop_table('rag_ops_demo.docs', ignore_errors=True)

Connected to Pixeltable database at: postgresql://postgres:@/pixeltable?host=/Users/orm/.pixeltable/pgdata


## Creating Tables and Views

Now we'll create the tables that represent our workflow, starting with a table to hold references to source documents. The table contains a single column `source_doc` whose elements have type `pxt.DocumentType`, representing a general document instance. In this tutorial, we'll be working with HTML documents, but Pixeltable supports a range of other document types, such as Markdown and PDF.

In [3]:
docs = pxt.create_table('rag_ops_demo.docs', {'source_doc': pxt.DocumentType()})

Created table `docs`.


If we take a peek at the `docs` table, we see its very simple structure.

In [4]:
docs

Column Name,Type,Computed With
source_doc,document,


Next we create a view to represent chunks of our HTML documents. A Pixeltable view is a virtual table, which is dynamically derived from a source table by applying a transformation and/or selecting a subset of data. In this case, our view represents a one-to-many transformation from source documents into individual sentences. This is achieved using Pixeltable's built-in `DocumentSplitter` class.

Note that the `docs` table is currently empty, so creating this view doesn't actually *do* anything yet: it simply defines an operation that we want Pixeltable to execute when it sees new data.

In [5]:
from pixeltable.iterators.document import DocumentSplitter

sentences = pxt.create_view(
    'rag_ops_demo.sentences',  # Name of the view
    docs,  # Table from which the view is derived
    iterator=DocumentSplitter.create(
        document=docs.source_doc,
        separators='sentence',  # Chunk docs into sentences
        metadata='title,heading,sourceline'
    )
)

Created view `sentences` with 0 rows, 0 exceptions.


Let's take a peek at the new `sentences` view.

In [6]:
sentences

Column Name,Type,Computed With
pos,int,
text,string,
title,string,
heading,json,
sourceline,int,
source_doc,document,


We see that `sentences` inherits the `source_doc` column from `docs`, together with some new fields:
- `pos`: The position in the source document where the sentence appears.
-  `text`: The text of the sentence.
- `title`, `heading`, and `sourceline`: The metadata we requested when we set up the view.

## Data Ingestion

Ok, now it's time to insert some data into our workflow. A document in Pixeltable is just a URL; the following command inserts a single row into the `docs` table with the `source_doc` field set to the specified URL:

In [7]:
docs.insert(source_doc='https://en.wikipedia.org/wiki/Marc_Chagall')

Inserting rows into `docs`: 1 rows [00:00, 934.56 rows/s]
Inserting rows into `sentences`: 1461 rows [00:00, 3202.77 rows/s]
Inserted 1462 rows with 0 errors.


UpdateStatus(num_rows=1462, num_computed_values=0, num_excs=0, updated_cols=[], cols_with_excs=[])

We can see that two things happened. First, a single row was inserted into `docs`, containing the URL representing our source document. Then, the view `sentences` was incrementally updated by applying the `DocumentSplitter` according to the definition of the view. This illustrates an important principle in Pixeltable: by default, anytime Pixeltable sees new data, the update is incrementally propagated to any downstream views or computed columns.

We can see the effect of the insertion with the `select` command. There's a single row in `docs`:

In [8]:
docs.select(docs.source_doc.fileurl).show()

source_doc_fileurl
https://en.wikipedia.org/wiki/Marc_Chagall


And here are the first 20 rows in `sentences`. The content of the article is broken into individual sentences, as expected.

In [9]:
sentences.select(sentences.text, sentences.heading).show(20)

text,heading
Marc Chagall - Wikipedia Jump to content Search Search,{}
Marc Chagall 81 languages Afrikaans Alemannisch العربية,{1: Marc Chagall}
Aragonés Արեւմտահայերէն Asturianu Azərbaycanca বাংলা Башҡортса Беларуская Беларуская (тарашкевіца) Български Català Čeština Cymraeg Dansk Deutsch Eesti Ελληνικά Español Esperanto Euskara فارسی Français Galego 한국어 Հայերեն हिन्दी,{1: Marc Chagall}
Hrvatski Ido Bahasa Indonesia Interlingua Italiano עברית Jawa ქართული Kiswahili Latina Latviešu Lëtzebuergesch Lietuvių ...... Sicilianu Simple English Slovenčina Slovenščina کوردی Српски / srpski Srpskohrvatski / српскохрватски Suomi Svenska ไทย,{1: Marc Chagall}
Türkçe Українська Tiếng Việt Winaray 吴语 ייִדיש,{1: Marc Chagall}
"粵語 中文 Edit links From Wikipedia, the free encyclopedia Russian-French artist (1887–1985) ""Chagall"" redirects here.",{1: Marc Chagall}
"For other uses, see Chagall (disambiguation) .",{1: Marc Chagall}
"Marc Chagall Chagall, c. 1920",{1: Marc Chagall}
"Born Moishe Shagal ( 1887-07-06 ) 6 July 1887 (N.S.) Liozna , Vitebsk Governorate , Russian Empire (now Belarus)",{1: Marc Chagall}
[1] Died 28 March 1985 (1985-03-28) (aged 97),{1: Marc Chagall}


## Experimenting with Chunking

Of course, chunking into sentences isn't the only way to split a document. Perhaps we want to experiment with different chunking methodologies, in order to see which one performs best in a particular application. Pixeltable makes it easy to do this, by creating several views of the same source table. Here are a few examples. Notice that as each new view is created, it is initially populated from the data already in `docs`.

In [10]:
chunks = pxt.create_view(
    'rag_ops_demo.chunks', docs,
    iterator=DocumentSplitter.create(
        document=docs.source_doc,
        separators='paragraph,token_limit',
        limit=2048,
        overlap=0,
        metadata='title,heading,sourceline'
    )
)

Inserting rows into `chunks`: 205 rows [00:00, 14621.76 rows/s]
Created view `chunks` with 205 rows, 0 exceptions.


In [11]:
short_chunks = pxt.create_view(
    'rag_ops_demo.short_chunks', docs,
    iterator=DocumentSplitter.create(
        document=docs.source_doc,
        separators='paragraph,token_limit',
        limit=72,
        overlap=0,
        metadata='title,heading,sourceline'
    )
)

Inserting rows into `short_chunks`: 531 rows [00:00, 19589.72 rows/s]
Created view `short_chunks` with 531 rows, 0 exceptions.


In [12]:
short_char_chunks = pxt.create_view(
    'rag_ops_demo.short_char_chunks', docs,
    iterator=DocumentSplitter.create(
        document=docs.source_doc,
        separators='paragraph,char_limit',
        limit=72,
        overlap=0,
        metadata='title,heading,sourceline'
    )
)

Inserting rows into `short_char_chunks`: 1763 rows [00:00, 15904.40 rows/s]
Created view `short_char_chunks` with 1763 rows, 0 exceptions.


In [13]:
chunks.select(chunks.text, chunks.heading).show(20)

text,heading
Marc Chagall - Wikipedia Jump to content Search Search,{}
"Marc Chagall 81 languages Afrikaans Alemannisch العربية Aragonés Արեւմտահայերէն Asturianu Azərbaycanca বাংলা Башҡортса Б ...... encyclopedia Russian-French artist (1887–1985) ""Chagall"" redirects here. For other uses, see Chagall (disambiguation) .",{1: Marc Chagall}
"Marc Chagall Chagall, c. 1920 Born Moishe Shagal ( 1887-07-06 ) 6 July 1887 (N.S.) Liozna , Vitebsk Governorate , Russia ...... f Paris Spouses Bella Rosenfeld ​ ​ ( m. 1915; died 1944) ​ Valentina (Vava) Brodsky ​ ​ ( m. 1952) ​ [3] Children 2 [4]",{1: Marc Chagall}
"Marc Chagall [a] (born Moishe Shagal ; 6 July [ O.S. 24 June] 1887 – 28 March 1985) was a Russian-French artist. [b] An ...... including painting, drawings, book illustrations, stained glass , stage sets, ceramics, tapestries and fine art prints.",{1: Marc Chagall}
"Chagall was born in 1887, into a Jewish family near Vitebsk , today in Belarus , but at that time in the Pale of Settlem ...... ccupied France to the United States, where he lived in New York City for seven years before returning to France in 1948.",{1: Marc Chagall}
"Art critic Robert Hughes referred to Chagall as ""the quintessential Jewish artist of the twentieth century"". According t ...... Pablo Picasso remarked in the 1950s, ""Chagall will be the only painter left who understands what colour really is"". [17]",{1: Marc Chagall}
Early life and education [ edit ],"{1: Marc Chagall, 2: Early life and education[edit]}"
"Early life [ edit ] Marc Chagall's childhood home in Vitebsk , Belarus. Currently site of the Marc Chagall Museum . Marc Chagall, 1912, The Spoonful of Milk (La Cuillerée de lait) , gouache on paper","{1: Marc Chagall, 2: Early life and education[edit], 3: Early life[edit]}"
"Marc Chagall was born Moishe Shagal in 1887, into a Jewish family in Liozna , [1] near the city of Vitebsk , Belarus, th ...... ecause the city was built mostly of wood, little of it survived years of occupation and destruction during World War II.","{1: Marc Chagall, 2: Early life and education[edit], 3: Early life[edit]}"
"Chagall was the eldest of nine children. The family name, Shagal, is a variant of the name Segal , which in a Jewish com ...... les each month (the average wages across the Russian Empire was 13 roubles a month). Chagall wrote of those early years:","{1: Marc Chagall, 2: Early life and education[edit], 3: Early life[edit]}"


In [14]:
short_chunks.select(short_chunks.text, short_chunks.heading).show(20)

text,heading
Marc Chagall - Wikipedia Jump to content Search Search,{}
Marc Chagall 81 languages Afrikaans Alemannisch العربية Aragonés Արեւմտահայերէն Asturianu Azərbaycanca বাংলা Башҡ,{1: Marc Chagall}
ортса Беларуская Беларуская (тарашкевіца) Български Català Čeština Cymraeg Dansk Deutsch Eesti Ελληνικά Español Esperanto Euskara فارسی Français Galego 한,{1: Marc Chagall}
국어 Հայերեն हिन्दी Hrvatski Ido Bahasa Indonesia Interlingua Italiano עברית Jawa ქართული Kiswahili Latina Latviešu Lë,{1: Marc Chagall}
tzebuergesch Lietuvių Magyar Македонски Malagasy مصرى Nederlands Nedersaksies 日本語 Norsk bokmål Norsk nynorsk Occitan Oʻzbekcha / ўзбекча پنجابی Picard Piemont,{1: Marc Chagall}
èis Plattdüütsch Polski Português Română Runa Simi Русский Scots Shqip Sicilianu Simple English Slovenčina Slovenščina کوردی Српски / srpski Srpskohrvatski / српскохрватски Suomi,{1: Marc Chagall}
"Svenska ไทย Türkçe Українська Tiếng Việt Winaray 吴语 ייִדיש 粵語 中文 Edit links From Wikipedia, the free encyclopedia Russian-French artist (1887–1985) ""Chagall"" redirects here",{1: Marc Chagall}
". For other uses, see Chagall (disambiguation) .",{1: Marc Chagall}
"Marc Chagall Chagall, c. 1920 Born Moishe Shagal ( 1887-07-06 ) 6 July 1887 (N.S.) Liozna , Vitebsk Governorate , Russian Empire (now Belarus) [1] Died 28 March 1985 (1985-03-",{1: Marc Chagall}
"28) (aged 97) Saint-Paul-de-Vence , France Nationality Russian, later French [2] Known for Painting stained glass Notabl ...... Marc Chagall Movement Cubism Expressionism School of Paris Spouses Bella Rosenfeld ​ ​ ( m. 1915; died 1944) ​ Valentina",{1: Marc Chagall}


In [15]:
short_char_chunks.select(short_char_chunks.text, short_char_chunks.heading).show(20)

text,heading
Marc Chagall - Wikipedia Jump to content Search Search,{}
Marc Chagall 81 languages Afrikaans Alemannisch العربية Aragonés Արեւմտա,{1: Marc Chagall}
հայերէն Asturianu Azərbaycanca বাংলা Башҡортса Беларуская Беларуская (та,{1: Marc Chagall}
рашкевіца) Български Català Čeština Cymraeg Dansk Deutsch Eesti Ελληνικά,{1: Marc Chagall}
Español Esperanto Euskara فارسی Français Galego 한국어 Հայերեն हिन्दी Hrva,{1: Marc Chagall}
tski Ido Bahasa Indonesia Interlingua Italiano עברית Jawa ქართული Kiswah,{1: Marc Chagall}
ili Latina Latviešu Lëtzebuergesch Lietuvių Magyar Македонски Malagasy م,{1: Marc Chagall}
صرى Nederlands Nedersaksies 日本語 Norsk bokmål Norsk nynorsk Occitan Oʻzbe,{1: Marc Chagall}
kcha / ўзбекча پنجابی Picard Piemontèis Plattdüütsch Polski Português Ro,{1: Marc Chagall}
mână Runa Simi Русский Scots Shqip Sicilianu Simple English Slovenčina S,{1: Marc Chagall}


Now let's add a few more documents to our workflow. Notice how all of the downstream views are updated incrementally, processing just the new documents as they are inserted.

In [16]:
urls = [
    'https://en.wikipedia.org/wiki/Pierre-Auguste_Renoir',
    'https://en.wikipedia.org/wiki/Henri_Matisse',
    'https://en.wikipedia.org/wiki/Marcel_Duchamp'
]
docs.insert({'source_doc': url} for url in urls)

Inserting rows into `docs`: 3 rows [00:00, 3640.89 rows/s]
Inserting rows into `sentences`: 2105 rows [00:03, 645.11 rows/s]
Inserting rows into `chunks`: 276 rows [00:00, 15634.75 rows/s]
Inserting rows into `short_chunks`: 811 rows [00:00, 19953.19 rows/s]
Inserting rows into `short_char_chunks`: 2636 rows [00:00, 5401.02 rows/s]
Inserted 5831 rows with 0 errors.


UpdateStatus(num_rows=5831, num_computed_values=0, num_excs=0, updated_cols=[], cols_with_excs=[])

## Further Experiments

This is a good time to mention another important guiding principle of Pixeltable. The preceding examples all used the built-in `DocumentSplitter` class with various configurations. That's probably fine as a first cut or to prototype an application quickly, and it might be sufficient for some applications. But other applications might want to do more sophisticated kinds of chunking, implementing their own specialized logic or leveraging third-party tools. Pixeltable imposes no constraints on the AI or RAG operations a workflow uses: the iterator interface is highly general, and it's easy to implement new operations or adapt existing code or third-party tools into the Pixeltable workflow.

## Computing Embeddings

Next, let's look at how embedding indices can be added seamlessly to existing Pixeltable workflows. To compute our embeddings, we'll use the Huggingface `sentence_transformer` package, running it over the `chunks` view that broke our documents up into larger paragraphs. Pixeltable has a built-in `sentence_transformer` adapter, and all we have to do is add a new column that leverages it. Pixeltable takes care of the rest, applying the new column to all existing data in the view.

In [17]:
from pixeltable.functions.huggingface import sentence_transformer

chunks['minilm_embed'] =sentence_transformer(chunks.text, model_id='paraphrase-MiniLM-L6-v2')

Computing cells: 100%|███████████████████████████████████████| 481/481 [00:01<00:00, 379.37 cells/s]
Added 481 column values with 0 errors.


The new column is a *computed column*: it is defined as a function on top of existing data and updated incrementally as new data are added to the workflow. Let's have a look at how the new column affected the `chunks` view.

In [18]:
chunks

Column Name,Type,Computed With
pos,int,
text,string,
title,string,
heading,json,
sourceline,int,
minilm_embed,"array((384,), dtype=FLOAT)","sentence_transformer(text, model_id='paraphrase-MiniLM-L6-v2')"
source_doc,document,


In [19]:
chunks.head()

pos,text,title,heading,sourceline,minilm_embed,source_doc
0,Marc Chagall - Wikipedia Jump to content Search Search,Marc Chagall - Wikipedia,{},0,"[-0.262,-0.119,-0.133, 0.048, 0.12 ,-0.006,...,-0.556, 0.372, 0.468,-0.234,  -0.226, 0.164]",/Users/orm/.pixeltable/file_cache/e04c4df579814f55a952d0b64618e7dc_0_3c5b8031c7610e17a42ab6df79e614c2ee1a85dbd022497373d574cf65e15c51
1,"Marc Chagall 81 languages Afrikaans Alemannisch العربية Aragonés Արեւմտահայերէն Asturianu Azərbaycanca বাংলা Башҡортса Б ...... encyclopedia Russian-French artist (1887–1985) ""Chagall"" redirects here. For other uses, see Chagall (disambiguation) .",Marc Chagall - Wikipedia,{1: Marc Chagall},820,"[-0.136, 0.401,-0.53 ,-0.181,-0.453,-0.125,...,-0.184, 0.122, 0.644,-0.54 ,  0.188, 0.203]",/Users/orm/.pixeltable/file_cache/e04c4df579814f55a952d0b64618e7dc_0_3c5b8031c7610e17a42ab6df79e614c2ee1a85dbd022497373d574cf65e15c51
2,"Marc Chagall Chagall, c. 1920 Born Moishe Shagal ( 1887-07-06 ) 6 July 1887 (N.S.) Liozna , Vitebsk Governorate , Russia ...... f Paris Spouses Bella Rosenfeld ​ ​ ( m. 1915; died 1944) ​ Valentina (Vava) Brodsky ​ ​ ( m. 1952) ​ [3] Children 2 [4]",Marc Chagall - Wikipedia,{1: Marc Chagall},1015,"[ 6.518e-05, 3.302e-01,-3.144e-01, 1.680e-01,-1.229e-01, 3.993e-01,...,  -1.386e-01,-1.360e-01, 9.149e-02,-4.365e-01,-1.723e-01,-4.351e-02]",/Users/orm/.pixeltable/file_cache/e04c4df579814f55a952d0b64618e7dc_0_3c5b8031c7610e17a42ab6df79e614c2ee1a85dbd022497373d574cf65e15c51
3,"Marc Chagall [a] (born Moishe Shagal ; 6 July [ O.S. 24 June] 1887 – 28 March 1985) was a Russian-French artist. [b] An ...... including painting, drawings, book illustrations, stained glass , stage sets, ceramics, tapestries and fine art prints.",Marc Chagall - Wikipedia,{1: Marc Chagall},1029,"[ 0.061, 0.155,-0.189, 0.168,-0.089, 0.171,...,-0.136,-0.32 , 0.249,-0.007,  -0.094, 0.025]",/Users/orm/.pixeltable/file_cache/e04c4df579814f55a952d0b64618e7dc_0_3c5b8031c7610e17a42ab6df79e614c2ee1a85dbd022497373d574cf65e15c51
4,"Chagall was born in 1887, into a Jewish family near Vitebsk , today in Belarus , but at that time in the Pale of Settlem ...... ccupied France to the United States, where he lived in New York City for seven years before returning to France in 1948.",Marc Chagall - Wikipedia,{1: Marc Chagall},1030,"[ 0.013, 0.248,-0.692, 0.143,-0.379, 0.254,...,-0.232,-0.157,-0.018,-0.225,  -0.208,-0.095]",/Users/orm/.pixeltable/file_cache/e04c4df579814f55a952d0b64618e7dc_0_3c5b8031c7610e17a42ab6df79e614c2ee1a85dbd022497373d574cf65e15c51
5,"Art critic Robert Hughes referred to Chagall as ""the quintessential Jewish artist of the twentieth century"". According t ...... Pablo Picasso remarked in the 1950s, ""Chagall will be the only painter left who understands what colour really is"". [17]",Marc Chagall - Wikipedia,{1: Marc Chagall},1031,"[-0.172, 0.348,-0.307, 0.034,-0.071, 0.111,...,-0.31 ,-0.011, 0.302,-0.273,  -0.163, 0.152]",/Users/orm/.pixeltable/file_cache/e04c4df579814f55a952d0b64618e7dc_0_3c5b8031c7610e17a42ab6df79e614c2ee1a85dbd022497373d574cf65e15c51
6,Early life and education [ edit ],Marc Chagall - Wikipedia,"{1: Marc Chagall, 2: Early life and education[edit]}",1034,"[-0.213, 0.418, 0.094, 0.135,-0.069, 0.265,...,-0.548, 0.164, 0.075, 0.205,  0.309, 0.277]",/Users/orm/.pixeltable/file_cache/e04c4df579814f55a952d0b64618e7dc_0_3c5b8031c7610e17a42ab6df79e614c2ee1a85dbd022497373d574cf65e15c51
7,"Early life [ edit ] Marc Chagall's childhood home in Vitebsk , Belarus. Currently site of the Marc Chagall Museum . Marc Chagall, 1912, The Spoonful of Milk (La Cuillerée de lait) , gouache on paper",Marc Chagall - Wikipedia,"{1: Marc Chagall, 2: Early life and education[edit], 3: Early life[edit]}",1035,"[-0.04 , 0.143,-0.357, 0.412,-0.331, 0.201,...,-0.006,-0.057, 0.255, 0.181,  0.018,-0.021]",/Users/orm/.pixeltable/file_cache/e04c4df579814f55a952d0b64618e7dc_0_3c5b8031c7610e17a42ab6df79e614c2ee1a85dbd022497373d574cf65e15c51
8,"Marc Chagall was born Moishe Shagal in 1887, into a Jewish family in Liozna , [1] near the city of Vitebsk , Belarus, th ...... ecause the city was built mostly of wood, little of it survived years of occupation and destruction during World War II.",Marc Chagall - Wikipedia,"{1: Marc Chagall, 2: Early life and education[edit], 3: Early life[edit]}",1038,"[ 0.123, 0.198,-0.496, 0.154,-0.368, 0.078,...,-0.057,-0.141,-0.063,-0.096,  -0.136,-0.232]",/Users/orm/.pixeltable/file_cache/e04c4df579814f55a952d0b64618e7dc_0_3c5b8031c7610e17a42ab6df79e614c2ee1a85dbd022497373d574cf65e15c51
9,"Chagall was the eldest of nine children. The family name, Shagal, is a variant of the name Segal , which in a Jewish com ...... les each month (the average wages across the Russian Empire was 13 roubles a month). Chagall wrote of those early years:",Marc Chagall - Wikipedia,"{1: Marc Chagall, 2: Early life and education[edit], 3: Early life[edit]}",1039,"[-0.19 , 0.266,-0.4 , 0.129,-0.493, 0.063,...,-0.194,-0.2 , 0.322, 0.024,  -0.068, 0.031]",/Users/orm/.pixeltable/file_cache/e04c4df579814f55a952d0b64618e7dc_0_3c5b8031c7610e17a42ab6df79e614c2ee1a85dbd022497373d574cf65e15c51


Similarly, we might want to add a CLIP embedding to our workflow; once again, it's just another computed column:

In [20]:
from pixeltable.functions.huggingface import clip_text

chunks['clip_embed'] = clip_text(chunks.text, model_id='openai/clip-vit-base-patch32')

Computing cells: 100%|███████████████████████████████████████| 481/481 [00:01<00:00, 273.05 cells/s]
Added 481 column values with 0 errors.


In [21]:
chunks

Column Name,Type,Computed With
pos,int,
text,string,
title,string,
heading,json,
sourceline,int,
minilm_embed,"array((384,), dtype=FLOAT)","sentence_transformer(text, model_id='paraphrase-MiniLM-L6-v2')"
clip_embed,"array((512,), dtype=FLOAT)","clip_text(text, model_id='openai/clip-vit-base-patch32')"
source_doc,document,


In [22]:
chunks.head()

pos,text,title,heading,sourceline,minilm_embed,clip_embed,source_doc
0,Marc Chagall - Wikipedia Jump to content Search Search,Marc Chagall - Wikipedia,{},0,"[-0.262,-0.119,-0.133, 0.048, 0.12 ,-0.006,...,-0.556, 0.372, 0.468,-0.234,  -0.226, 0.164]","[ 0.439,-0.204,-0.37 ,-0.248, 0.253, 0.11 ,..., 0.216,-0.31 , 0.22 , 0.557,  0.173,-0.271]",/Users/orm/.pixeltable/file_cache/e04c4df579814f55a952d0b64618e7dc_0_3c5b8031c7610e17a42ab6df79e614c2ee1a85dbd022497373d574cf65e15c51
1,"Marc Chagall 81 languages Afrikaans Alemannisch العربية Aragonés Արեւմտահայերէն Asturianu Azərbaycanca বাংলা Башҡортса Б ...... encyclopedia Russian-French artist (1887–1985) ""Chagall"" redirects here. For other uses, see Chagall (disambiguation) .",Marc Chagall - Wikipedia,{1: Marc Chagall},820,"[-0.136, 0.401,-0.53 ,-0.181,-0.453,-0.125,...,-0.184, 0.122, 0.644,-0.54 ,  0.188, 0.203]","[ 0.106, 0.006,-0.152, 0.043, 0.283,-0.017,..., 0.237,-0.424, 0.203, 0.016,  0.117,-0.32 ]",/Users/orm/.pixeltable/file_cache/e04c4df579814f55a952d0b64618e7dc_0_3c5b8031c7610e17a42ab6df79e614c2ee1a85dbd022497373d574cf65e15c51
2,"Marc Chagall Chagall, c. 1920 Born Moishe Shagal ( 1887-07-06 ) 6 July 1887 (N.S.) Liozna , Vitebsk Governorate , Russia ...... f Paris Spouses Bella Rosenfeld ​ ​ ( m. 1915; died 1944) ​ Valentina (Vava) Brodsky ​ ​ ( m. 1952) ​ [3] Children 2 [4]",Marc Chagall - Wikipedia,{1: Marc Chagall},1015,"[ 6.518e-05, 3.302e-01,-3.144e-01, 1.680e-01,-1.229e-01, 3.993e-01,...,  -1.386e-01,-1.360e-01, 9.149e-02,-4.365e-01,-1.723e-01,-4.351e-02]","[ 0.301,-0.216,-0.186,-0.098, 0.273, 0.154,..., 0.051,-0.35 , 0.324, 0.152,  0.156,-0.235]",/Users/orm/.pixeltable/file_cache/e04c4df579814f55a952d0b64618e7dc_0_3c5b8031c7610e17a42ab6df79e614c2ee1a85dbd022497373d574cf65e15c51
3,"Marc Chagall [a] (born Moishe Shagal ; 6 July [ O.S. 24 June] 1887 – 28 March 1985) was a Russian-French artist. [b] An ...... including painting, drawings, book illustrations, stained glass , stage sets, ceramics, tapestries and fine art prints.",Marc Chagall - Wikipedia,{1: Marc Chagall},1029,"[ 0.061, 0.155,-0.189, 0.168,-0.089, 0.171,...,-0.136,-0.32 , 0.249,-0.007,  -0.094, 0.025]","[ 0.313,-0.156,-0.14 ,-0.062, 0.424, 0.07 ,..., 0.153,-0.361, 0.414,-0.295,  0.159,-0.21 ]",/Users/orm/.pixeltable/file_cache/e04c4df579814f55a952d0b64618e7dc_0_3c5b8031c7610e17a42ab6df79e614c2ee1a85dbd022497373d574cf65e15c51
4,"Chagall was born in 1887, into a Jewish family near Vitebsk , today in Belarus , but at that time in the Pale of Settlem ...... ccupied France to the United States, where he lived in New York City for seven years before returning to France in 1948.",Marc Chagall - Wikipedia,{1: Marc Chagall},1030,"[ 0.013, 0.248,-0.692, 0.143,-0.379, 0.254,...,-0.232,-0.157,-0.018,-0.225,  -0.208,-0.095]","[ 0.326, 0.067,-0.061,-0.046, 0.242, 0.227,..., 0.153,-0.295, 0.282, 0.013,  0.276,-0.331]",/Users/orm/.pixeltable/file_cache/e04c4df579814f55a952d0b64618e7dc_0_3c5b8031c7610e17a42ab6df79e614c2ee1a85dbd022497373d574cf65e15c51
5,"Art critic Robert Hughes referred to Chagall as ""the quintessential Jewish artist of the twentieth century"". According t ...... Pablo Picasso remarked in the 1950s, ""Chagall will be the only painter left who understands what colour really is"". [17]",Marc Chagall - Wikipedia,{1: Marc Chagall},1031,"[-0.172, 0.348,-0.307, 0.034,-0.071, 0.111,...,-0.31 ,-0.011, 0.302,-0.273,  -0.163, 0.152]","[ 0.374, 0.032,-0.311,-0.093, 0.242, 0.264,..., 0.178,-0.119, 0.247, 0.206,  0.35 ,-0.516]",/Users/orm/.pixeltable/file_cache/e04c4df579814f55a952d0b64618e7dc_0_3c5b8031c7610e17a42ab6df79e614c2ee1a85dbd022497373d574cf65e15c51
6,Early life and education [ edit ],Marc Chagall - Wikipedia,"{1: Marc Chagall, 2: Early life and education[edit]}",1034,"[-0.213, 0.418, 0.094, 0.135,-0.069, 0.265,...,-0.548, 0.164, 0.075, 0.205,  0.309, 0.277]","[-0.111,-0.318, 0.043, 0.033, 0.042,-0.531,...,-0.213, 0.212,-0.019,-0.233,  0.465,-0.051]",/Users/orm/.pixeltable/file_cache/e04c4df579814f55a952d0b64618e7dc_0_3c5b8031c7610e17a42ab6df79e614c2ee1a85dbd022497373d574cf65e15c51
7,"Early life [ edit ] Marc Chagall's childhood home in Vitebsk , Belarus. Currently site of the Marc Chagall Museum . Marc Chagall, 1912, The Spoonful of Milk (La Cuillerée de lait) , gouache on paper",Marc Chagall - Wikipedia,"{1: Marc Chagall, 2: Early life and education[edit], 3: Early life[edit]}",1035,"[-0.04 , 0.143,-0.357, 0.412,-0.331, 0.201,...,-0.006,-0.057, 0.255, 0.181,  0.018,-0.021]","[-0.211, 0.181,-0.237, 0.158, 0.337,-0.144,..., 0.248, 0.048, 0.399,-0.22 ,  -0.058,-0.508]",/Users/orm/.pixeltable/file_cache/e04c4df579814f55a952d0b64618e7dc_0_3c5b8031c7610e17a42ab6df79e614c2ee1a85dbd022497373d574cf65e15c51
8,"Marc Chagall was born Moishe Shagal in 1887, into a Jewish family in Liozna , [1] near the city of Vitebsk , Belarus, th ...... ecause the city was built mostly of wood, little of it survived years of occupation and destruction during World War II.",Marc Chagall - Wikipedia,"{1: Marc Chagall, 2: Early life and education[edit], 3: Early life[edit]}",1038,"[ 0.123, 0.198,-0.496, 0.154,-0.368, 0.078,...,-0.057,-0.141,-0.063,-0.096,  -0.136,-0.232]","[ 0.272,-0.137,-0.257,-0.087, 0.43 , 0.179,..., 0.17 ,-0.337, 0.315,-0.335,  0.208,-0.215]",/Users/orm/.pixeltable/file_cache/e04c4df579814f55a952d0b64618e7dc_0_3c5b8031c7610e17a42ab6df79e614c2ee1a85dbd022497373d574cf65e15c51
9,"Chagall was the eldest of nine children. The family name, Shagal, is a variant of the name Segal , which in a Jewish com ...... les each month (the average wages across the Russian Empire was 13 roubles a month). Chagall wrote of those early years:",Marc Chagall - Wikipedia,"{1: Marc Chagall, 2: Early life and education[edit], 3: Early life[edit]}",1039,"[-0.19 , 0.266,-0.4 , 0.129,-0.493, 0.063,...,-0.194,-0.2 , 0.322, 0.024,  -0.068, 0.031]","[ 0.182,-0.073,-0.195,-0.15 , 0.277, 0.246,..., 0.119,-0.211, 0.115,-0.045,  0.497,-0.432]",/Users/orm/.pixeltable/file_cache/e04c4df579814f55a952d0b64618e7dc_0_3c5b8031c7610e17a42ab6df79e614c2ee1a85dbd022497373d574cf65e15c51
