In [1]:
import numpy as np
import pandas as pd

from sentence_splitter import split_text_into_sentences
from sentence_transformers import CrossEncoder

In [8]:
model = CrossEncoder("cross-encoder/ms-marco-MiniLM-L6-v2", device='cuda')

In [16]:
def rank_crossencoder(query, document, doc_length = 1000):
    sentences = split_text_into_sentences(
        text=document,
        language='en',
    )

    sentences = [s for s in sentences if len(s) > 0]
    sentences = np.array(sentences)
    sentences_length = [len(s) for s in sentences]

    scores = model.predict([(query, s) for s in sentences])

    index = np.argsort(scores).tolist()[::-1]

    total_length = 0
    for i in range(len(index)):
        total_length += sentences_length[index[i]]
        if total_length > doc_length:
            break

    index = index[:i+1]
    index = np.sort(index)

    sentences = sentences[index]

    return "\n".join(sentences)

In [2]:
df = pd.read_parquet('../data/articles.parquet')

In [3]:
df['article_len'] = df.article.str.len()

In [4]:
df.sort_values('article_len', ascending=False, inplace=True)

In [18]:
df.article_len.describe()

count    10898.000000
mean       819.998899
std        582.007210
min         36.000000
25%        388.000000
50%        640.000000
75%       1096.750000
max       4042.000000
Name: article_len, dtype: float64

In [6]:
print(df.article_len.iloc[0])
print(df.iloc[0].article)

4042
# Destiny 2

Destiny 2 is a free-to-play online first-person shooter video game developed by Bungie. It was originally released as a pay to play game in 2017 for PlayStation 4, Xbox One, and Windows. It became free-to-play, utilizing the games as a service model, under the New Light title on October 1, 2019, followed by the game's release on Stadia the following month, and then PlayStation 5 and Xbox Series X/S platforms in December 2020. The game was published by Activision until December 31, 2018, when Bungie acquired the publishing rights to the franchise. It is the sequel to 2014's Destiny and its subsequent expansions.
Set in a "mythic science fiction" world, the game features a multiplayer "shared-world" environment with elements of role-playing games. Like the original, activities in Destiny 2 are divided among player versus environment (PvE) and player versus player (PvP) game types. In addition to normal story missions, PvE features three-player "strikes" and dungeons and

In [12]:
rank_crossencoder(
    "What is the title under which Destiny 2 became a free-to-play game?",
    df.iloc[0].article,
)

Rank 1: 3 - 5.554337501525879 - It became free-to-play, utilizing the games as a service model, under the New Light title on October 1, 2019, followed by the game's release on Stadia the following month, and then PlayStation 5 and Xbox Series X/S platforms in December 2020.
Rank 2: 1 - 5.551577091217041 - Destiny 2 is a free-to-play online first-person shooter video game developed by Bungie.
Rank 3: 18 - 5.45986795425415 - Released alongside this fourth expansion was a version of Destiny 2 called New Light, a free-to-play re-release of Destiny 2, which also included access to the first two expansions.
Rank 4: 20 - 4.665113925933838 - While the main Destiny 2 game has since been "free-to-play", all other content requires purchasing.
Rank 5: 29 - -0.07428227365016937 - Destiny 2 was nominated for and won various awards, such as at The Game Awards 2017 and Game Critics Awards.
Rank 6: 0 - -0.3811410069465637 - # Destiny 2
Rank 7: 14 - -0.42160502076148987 - Year One of Destiny 2 featured 

[3, 1, 18, 20, 29, 0, 14, 7, 26, 5, 21, 13, 24, 10, 2, 4, 6, 15, 9, 22]

In [17]:
print(rank_crossencoder(
    "What is the title under which Destiny 2 became a free-to-play game?",
    df.iloc[0].article,
))

# Destiny 2
Destiny 2 is a free-to-play online first-person shooter video game developed by Bungie.
It became free-to-play, utilizing the games as a service model, under the New Light title on October 1, 2019, followed by the game's release on Stadia the following month, and then PlayStation 5 and Xbox Series X/S platforms in December 2020.
Like the original, activities in Destiny 2 are divided among player versus environment (PvE) and player versus player (PvP) game types.
Year One of Destiny 2 featured two small expansions, Curse of Osiris (December 2017) and Warmind (May 2018).
Released alongside this fourth expansion was a version of Destiny 2 called New Light, a free-to-play re-release of Destiny 2, which also included access to the first two expansions.
While the main Destiny 2 game has since been "free-to-play", all other content requires purchasing.
Upon release, Destiny 2 received generally favorable reviews from critics.
Destiny 2 was nominated for and won various awards, suc