# Union and intersection of rankers

Let's build a pipeline using union `|` and intersection `&` operators.

In [1]:
from cherche import data, rank, retrieve
from sentence_transformers import SentenceTransformer

The first step is to define the corpus on which we will perform the neural search. The towns dataset contains about a hundred documents, all of which have four attributes, an `id`, the `title` of the article, the `url` and the content of the `article`.

In [2]:
documents = data.load_towns()
documents[:4]

[{'id': 0,
  'title': 'Paris',
  'url': 'https://en.wikipedia.org/wiki/Paris',
  'article': 'Paris (French pronunciation: \u200b[paʁi] (listen)) is the capital and most populous city of France, with an estimated population of 2,175,601 residents as of 2018, in an area of more than 105 square kilometres (41 square miles).'},
 {'id': 1,
  'title': 'Paris',
  'url': 'https://en.wikipedia.org/wiki/Paris',
  'article': "Since the 17th century, Paris has been one of Europe's major centres of finance, diplomacy, commerce, fashion, gastronomy, science, and arts."},
 {'id': 2,
  'title': 'Paris',
  'url': 'https://en.wikipedia.org/wiki/Paris',
  'article': 'The City of Paris is the centre and seat of government of the region and province of Île-de-France, or Paris Region, which has an estimated population of 12,174,880, or about 18 percent of the population of France as of 2017.'},
 {'id': 3,
  'title': 'Paris',
  'url': 'https://en.wikipedia.org/wiki/Paris',
  'article': 'The Paris Region had 

## Union

Let's create the union of two pipelines. The first with high precision and low recall and the second with better recall.

In [3]:
# Low recall, high precision
precision = retrieve.Flash(key="id", on=["title", "article"], k=30) + rank.Encoder(
    key="id",
    on=["title", "article"],
    encoder=SentenceTransformer("sentence-transformers/all-mpnet-base-v2").encode,
)

# High recall
recall = retrieve.TfIdf(
    key="id", on=["title", "article"], documents=documents, k=30
) + rank.Encoder(
    key="id",
    on=["title", "article"],
    encoder=SentenceTransformer("sentence-transformers/all-mpnet-base-v2").encode,
)

In [4]:
# Union: precision | recall
search = precision | recall
search.add(documents)

Encoder ranker: 100%|████████| 2/2 [00:02<00:00,  1.35s/it]
Encoder ranker: 100%|████████| 2/2 [00:02<00:00,  1.32s/it]


Union Pipeline
-----
Flash retriever
	key      : id
	on       : title, article
	documents: 110
Encoder ranker
	key       : id
	on        : title, article
	normalize : True
	embeddings: 105
TfIdf retriever
	key      : id
	on       : title, article
	documents: 105
Encoder ranker
	key       : id
	on        : title, article
	normalize : True
	embeddings: 105
-----

In [5]:
search("Paris football", k=30)

Flash retriever: 100%|█████| 1/1 [00:00<00:00, 8473.34it/s]
TfIdf retriever: 100%|██████| 1/1 [00:00<00:00, 240.38it/s]


[{'id': 20, 'similarity': 2.074074074074074},
 {'id': 24, 'similarity': 0.5},
 {'id': 16, 'similarity': 0.738095238095238},
 {'id': 21, 'similarity': 0.5689655172413793},
 {'id': 22, 'similarity': 0.4645161290322581},
 {'id': 1, 'similarity': 0.3958333333333333},
 {'id': 0, 'similarity': 0.3463203463203463},
 {'id': 2, 'similarity': 0.3088235294117647},
 {'id': 25, 'similarity': 0.27936507936507937},
 {'id': 6, 'similarity': 0.25555555555555554},
 {'id': 3, 'similarity': 0.23587223587223588},
 {'id': 23, 'similarity': 0.21929824561403508},
 {'id': 14, 'similarity': 0.20512820512820512},
 {'id': 7, 'similarity': 0.19163763066202089},
 {'id': 8, 'similarity': 0.18095238095238095},
 {'id': 17, 'similarity': 0.17151162790697674},
 {'id': 9, 'similarity': 0.16310160427807485},
 {'id': 13, 'similarity': 0.15555555555555556},
 {'id': 12, 'similarity': 0.14874141876430205},
 {'id': 15, 'similarity': 0.1425531914893617},
 {'id': 5, 'similarity': 0.13605442176870747},
 {'id': 10, 'similarity': 0

In [6]:
search("speciality Lyon", k=10)

Flash retriever: 100%|█████| 1/1 [00:00<00:00, 5377.31it/s]
TfIdf retriever: 100%|██████| 1/1 [00:00<00:00, 609.37it/s]


[{'id': 49, 'similarity': 2.1818181818181817},
 {'id': 45, 'similarity': 1.1538461538461537},
 {'id': 48, 'similarity': 0.3333333333333333},
 {'id': 41, 'similarity': 0.6428571428571428},
 {'id': 47, 'similarity': 0.5333333333333333},
 {'id': 50, 'similarity': 0.16666666666666666},
 {'id': 42, 'similarity': 0.14285714285714285},
 {'id': 46, 'similarity': 0.125},
 {'id': 44, 'similarity': 0.33986928104575165},
 {'id': 43, 'similarity': 0.3111111111111111},
 {'id': 56, 'similarity': 0.08333333333333333},
 {'id': 55, 'similarity': 0.0625},
 {'id': 10, 'similarity': 0.05263157894736842},
 {'id': 58, 'similarity': 0.05}]

We can automatically map document identifiers to their content.

In [7]:
search += documents

In [8]:
search("Paris football", k=30)[:5]

Flash retriever: 100%|████| 1/1 [00:00<00:00, 10866.07it/s]
TfIdf retriever: 100%|█████| 1/1 [00:00<00:00, 1649.35it/s]


[{'id': 20,
  'title': 'Paris',
  'url': 'https://en.wikipedia.org/wiki/Paris',
  'article': 'The football club Paris Saint-Germain and the rugby union club Stade Français are based in Paris.',
  'similarity': 2.074074074074074},
 {'id': 24,
  'title': 'Paris',
  'url': 'https://en.wikipedia.org/wiki/Paris',
  'article': 'The 1938 and 1998 FIFA World Cups, the 2007 Rugby World Cup, as well as the 1960, 1984 and 2016 UEFA European Championships were also held in the city.',
  'similarity': 0.5},
 {'id': 16,
  'title': 'Paris',
  'url': 'https://en.wikipedia.org/wiki/Paris',
  'article': 'Paris received 12.',
  'similarity': 0.738095238095238},
 {'id': 21,
  'title': 'Paris',
  'url': 'https://en.wikipedia.org/wiki/Paris',
  'article': 'The 80,000-seat Stade de France, built for the 1998 FIFA World Cup, is located just north of Paris in the neighbouring commune of Saint-Denis.',
  'similarity': 0.5689655172413793},
 {'id': 22,
  'title': 'Paris',
  'url': 'https://en.wikipedia.org/wiki/P

In [9]:
search("speciality Lyon", k=30)[:5]

Flash retriever: 100%|██████| 1/1 [00:00<00:00, 529.05it/s]
TfIdf retriever: 100%|██████| 1/1 [00:00<00:00, 958.04it/s]


[{'id': 52,
  'title': 'Lyon',
  'url': 'https://en.wikipedia.org/wiki/Lyon',
  'article': 'Economically, Lyon is a major centre for banking, as well as for the chemical, pharmaceutical and biotech industries.',
  'similarity': 2.1176470588235294},
 {'id': 49,
  'title': 'Lyon',
  'url': 'https://en.wikipedia.org/wiki/Lyon',
  'article': 'Lyon was historically an important area for the production and weaving of silk.',
  'similarity': 1.1111111111111112},
 {'id': 56,
  'title': 'Lyon',
  'url': 'https://en.wikipedia.org/wiki/Lyon',
  'article': "It ranked second in France and 40th globally in Mercer's 2019 liveability rankings.",
  'similarity': 0.7719298245614035},
 {'id': 45,
  'title': 'Lyon',
  'url': 'https://en.wikipedia.org/wiki/Lyon',
  'article': 'Lyon is the prefecture of the Auvergne-Rhône-Alpes region and seat of the Departmental Council of Rhône (whose jurisdiction, however, no longer extends over the Metropolis of Lyon since 2015).',
  'similarity': 0.6},
 {'id': 48,
  't

## Intersection

In [10]:
retriever = retrieve.Lunr(key="id", on=["title", "article"], documents=documents)

We will build a set of rankers consisting of two different pre-trained models with the intersection operator `&`. The pipeline will only offer the documents returned by the union of the two retrievers and the intersection of the rankers.

In [11]:
ranker = rank.Encoder(
    key="id",
    on=["title", "article"],
    encoder=SentenceTransformer("sentence-transformers/all-mpnet-base-v2").encode,
) & rank.Encoder(
    key="id",
    on=["title", "article"],
    encoder=SentenceTransformer(
        "sentence-transformers/multi-qa-mpnet-base-cos-v1"
    ).encode,
)

In [12]:
search = retriever + ranker
search.add(documents)

Encoder ranker: 100%|████████| 2/2 [00:02<00:00,  1.43s/it]
Encoder ranker: 100%|████████| 2/2 [00:02<00:00,  1.40s/it]


Lunr retriever
	key      : id
	on       : title, article
	documents: 105
Intersection
-----
Encoder ranker
	key       : id
	on        : title, article
	normalize : True
	embeddings: 105
Encoder ranker
	key       : id
	on        : title, article
	normalize : True
	embeddings: 105
-----

In [13]:
search("Paris football")

[{'id': 20, 'similarity': 2.0588235294117645},
 {'id': 24, 'similarity': 1.0571428571428572},
 {'id': 16, 'similarity': 0.7207207207207207},
 {'id': 21, 'similarity': 0.5555555555555556},
 {'id': 22, 'similarity': 0.45263157894736844},
 {'id': 1, 'similarity': 0.3833333333333333},
 {'id': 0, 'similarity': 0.33699633699633696},
 {'id': 2, 'similarity': 0.2965116279069767},
 {'id': 25, 'similarity': 0.261437908496732},
 {'id': 6, 'similarity': 0.24878048780487805},
 {'id': 3, 'similarity': 0.22529644268774704},
 {'id': 23, 'similarity': 0.2074829931972789},
 {'id': 14, 'similarity': 0.1982905982905983},
 {'id': 7, 'similarity': 0.18541033434650456},
 {'id': 8, 'similarity': 0.18095238095238095},
 {'id': 42, 'similarity': 0.16346153846153846},
 {'id': 32, 'similarity': 0.15931372549019607},
 {'id': 17, 'similarity': 0.14747474747474748},
 {'id': 9, 'similarity': 0.1429990069513406},
 {'id': 27, 'similarity': 0.13703703703703704},
 {'id': 13, 'similarity': 0.13523809523809524},
 {'id': 12,

In [14]:
search("speciality Lyon")

[{'id': 52, 'similarity': 2.1},
 {'id': 49, 'similarity': 1.0909090909090908},
 {'id': 56, 'similarity': 0.7619047619047619},
 {'id': 45, 'similarity': 0.58},
 {'id': 48, 'similarity': 0.48695652173913045},
 {'id': 41, 'similarity': 0.41025641025641024},
 {'id': 54, 'similarity': 0.3482142857142857},
 {'id': 47, 'similarity': 0.32407407407407407},
 {'id': 50, 'similarity': 0.28888888888888886},
 {'id': 53, 'similarity': 0.2689655172413793},
 {'id': 42, 'similarity': 0.26515151515151514},
 {'id': 51, 'similarity': 0.2238095238095238},
 {'id': 46, 'similarity': 0.21266968325791857},
 {'id': 55, 'similarity': 0.20346320346320346},
 {'id': 44, 'similarity': 0.1978494623655914},
 {'id': 43, 'similarity': 0.19642857142857142},
 {'id': 32, 'similarity': 0.17027863777089783},
 {'id': 28, 'similarity': 0.16666666666666666},
 {'id': 59, 'similarity': 0.1593172119487909}]

We can automatically map document identifiers to their content.

In [15]:
search += documents

In [16]:
search("Paris football")

[{'id': 20,
  'title': 'Paris',
  'url': 'https://en.wikipedia.org/wiki/Paris',
  'article': 'The football club Paris Saint-Germain and the rugby union club Stade Français are based in Paris.',
  'similarity': 2.0588235294117645},
 {'id': 24,
  'title': 'Paris',
  'url': 'https://en.wikipedia.org/wiki/Paris',
  'article': 'The 1938 and 1998 FIFA World Cups, the 2007 Rugby World Cup, as well as the 1960, 1984 and 2016 UEFA European Championships were also held in the city.',
  'similarity': 1.0571428571428572},
 {'id': 16,
  'title': 'Paris',
  'url': 'https://en.wikipedia.org/wiki/Paris',
  'article': 'Paris received 12.',
  'similarity': 0.7207207207207207},
 {'id': 21,
  'title': 'Paris',
  'url': 'https://en.wikipedia.org/wiki/Paris',
  'article': 'The 80,000-seat Stade de France, built for the 1998 FIFA World Cup, is located just north of Paris in the neighbouring commune of Saint-Denis.',
  'similarity': 0.5555555555555556},
 {'id': 22,
  'title': 'Paris',
  'url': 'https://en.wik

In [17]:
search("speciality Lyon")

[{'id': 52,
  'title': 'Lyon',
  'url': 'https://en.wikipedia.org/wiki/Lyon',
  'article': 'Economically, Lyon is a major centre for banking, as well as for the chemical, pharmaceutical and biotech industries.',
  'similarity': 2.1},
 {'id': 49,
  'title': 'Lyon',
  'url': 'https://en.wikipedia.org/wiki/Lyon',
  'article': 'Lyon was historically an important area for the production and weaving of silk.',
  'similarity': 1.0909090909090908},
 {'id': 56,
  'title': 'Lyon',
  'url': 'https://en.wikipedia.org/wiki/Lyon',
  'article': "It ranked second in France and 40th globally in Mercer's 2019 liveability rankings.",
  'similarity': 0.7619047619047619},
 {'id': 45,
  'title': 'Lyon',
  'url': 'https://en.wikipedia.org/wiki/Lyon',
  'article': 'Lyon is the prefecture of the Auvergne-Rhône-Alpes region and seat of the Departmental Council of Rhône (whose jurisdiction, however, no longer extends over the Metropolis of Lyon since 2015).',
  'similarity': 0.58},
 {'id': 48,
  'title': 'Lyon',