## <span style="color:#ff5f27;">📊🗞️ Ranking of news search results</span>

In the [previous tutorial](https://github.com/logicalclocks/hopsworks-tutorials/tree/branch-4.2/api_examples/vector_similarity_search/1_feature_group_embeddings_api.ipynb), you learned how to search news articles using natural language queries. In this tutorial, we will focus on ranking the search results to make them more useful and relevant.

To achieve this, we will use the number of views as a scoring metric for news articles, as it reflects their popularity. The steps are as follows:

1. Create a view count feature group using a sample dataset of view counts.
2. Create a feature view by joining the news feature group with the view count feature group.
3. Search news articles and rank them based on their view counts.

By the end of this tutorial, you'll be able to rank news search results effectively using view counts as a popularity indicator.

## <span style='color:#ff5f27'> 📝 Imports

In [None]:
!pip install -U 'hopsworks[python]' --quiet
!pip install sentence_transformers -q

In [None]:
import random
import pandas as pd
from sentence_transformers import SentenceTransformer
import logging
logging.getLogger().setLevel(logging.WARN)

## <span style="color:#ff5f27;">📈 Create a view count feature group</span>

First you create a sample view count dataset of the size of news feature group.

In [None]:
num_news = 300
df_view = pd.DataFrame(
    {
        "news_id": list(range(num_news)), 
        "view_cnt": [random.randint(0, 100) for i in range(num_news)]
    }
)

In [None]:
VERSION = 1

Then you create a view count feature group and ingest the data into Hopsworks.

In [None]:
import hopsworks

project = hopsworks.login()

fs = project.get_feature_store()

In [None]:
view_fg = fs.get_or_create_feature_group(
    name="view_fg",
    primary_key=["news_id"],
    version=VERSION,
    online_enabled=True,
)

view_fg.insert(df_view)

## <span style="color:#ff5f27;">🛠️ Create a feature view</span>

You need to first get back the news feature group created before for the creation of feature view.

In [None]:
news_fg = fs.get_or_create_feature_group(
    name="news_fg",
    version=VERSION,
)

Now, you create a feature view by joining the news feature group and the view count feature group. Here, you select the heading, and the view count for ranking.

In [None]:
news_fv = fs.get_or_create_feature_view(
    "news_view", 
    version=VERSION,
    query=news_fg.select(["heading"]).join(view_fg.select(["view_cnt"])),
)

## <span style="color:#ff5f27;">🔎 Search news and rank</span>

Same as the previous tutorial, the news description first needs to be encoded by the same LM you used to encoded the news. And then the embedding can be used to search similar news using the feature view.

In [None]:
model = SentenceTransformer('all-MiniLM-L6-v2')

news_description = "news about europe"

Define some helper functions which sort and print new results.

In [None]:
def print_news(feature_vectors):
    for feature_vector in feature_vectors:
        print(feature_vector)

In [None]:
def print_sort_news(feature_vectors):
    # Sort the articles by view count
    print("⛳️ Ranked result:")
    feature_vectors = sorted(feature_vectors, key=lambda x: x[1]*-1)
    print_news(feature_vectors)

Now, you can see the top k results returned by the feature view, which are the headings and the view count. You can also see the ranked results by view count of the top k results.

In [None]:
feature_vectors = news_fv.find_neighbors(
    model.encode(news_description),
    k=5, 
    feature=news_fg.embedding_heading,
)
print_news(feature_vectors)
print_sort_news(feature_vectors)

Like the feature group, you can filter results in `find_neighbors` in feature view. You can also use multiple filtering conditions.

In [None]:
feature_vectors = news_fv.find_neighbors(
    model.encode(news_description),
    k=5,               
    filter=(
        (news_fg.newstype == "sports") & (news_fg.article.like("europe"))
    ),
    feature=news_fg.embedding_heading,
)
print_news(feature_vectors)
print_sort_news(feature_vectors)

You can get back result by providing primary key which is the news id as well.

In [None]:
feature_vectors = news_fv.get_feature_vector({"news_id": 10})
print_news([feature_vectors])

---

## <span style="color:#ff5f27;">➡️ Next step</span>

Now you are able to search articles and rank them by view count. 