## K-arm bandit based solution for a news value maximiser

Solution:

1. Arm: Each article or group of articles is treated as an arm.
2. Reward: Reward for each article is based on views generated.
3. Exploration vs. Exploitation: Use an epsilon-greedy strategy to balance between exploring new articles and exploiting those with known high views.
4. Dataset: MIND- MIcrosoft News Dataset

In [1]:
import numpy as np
import pandas as pd
import random

In [5]:
# Load news.tsv dataset
df = pd.read_csv(r"C:\Users\PRIYANKA\Desktop\SEM VII\reinforcement learning\MINDsmall_dev\news.tsv",  sep='\t', header=None)

In [8]:
df.head(5)

Unnamed: 0,0,1,2,3,4,5,6,7
0,N55528,lifestyle,lifestyleroyals,"The Brands Queen Elizabeth, Prince Charles, an...","Shop the notebooks, jackets, and more that the...",https://assets.msn.com/labs/mind/AAGH0ET.html,"[{""Label"": ""Prince Philip, Duke of Edinburgh"",...",[]
1,N18955,health,medical,Dispose of unwanted prescription drugs during ...,,https://assets.msn.com/labs/mind/AAISxPN.html,"[{""Label"": ""Drug Enforcement Administration"", ...",[]
2,N61837,news,newsworld,The Cost of Trump's Aid Freeze in the Trenches...,Lt. Ivan Molchanets peeked over a parapet of s...,https://assets.msn.com/labs/mind/AAJgNsz.html,[],"[{""Label"": ""Ukraine"", ""Type"": ""G"", ""WikidataId..."
3,N53526,health,voices,I Was An NBA Wife. Here's How It Affected My M...,"I felt like I was a fraud, and being an NBA wi...",https://assets.msn.com/labs/mind/AACk2N6.html,[],"[{""Label"": ""National Basketball Association"", ..."
4,N38324,health,medical,"How to Get Rid of Skin Tags, According to a De...","They seem harmless, but there's a very good re...",https://assets.msn.com/labs/mind/AAAKEkt.html,"[{""Label"": ""Skin tag"", ""Type"": ""C"", ""WikidataI...","[{""Label"": ""Skin tag"", ""Type"": ""C"", ""WikidataI..."


In [13]:
#Assign column names
df.columns = ['news_id', 'category', 'subcategory', 'title', 'abstract', 'url', 'label1','label2']

In [32]:
df.category.unique()

array(['lifestyle', 'health', 'news', 'sports', 'weather',
       'entertainment', 'foodanddrink', 'autos', 'travel', 'video', 'tv',
       'finance', 'movies', 'music', 'kids', 'middleeast', 'games'],
      dtype=object)

In [11]:
#Load behaviour.tsv dataset
behav_df = pd.read_csv(r"C:\Users\PRIYANKA\Desktop\SEM VII\reinforcement learning\MINDsmall_train\behaviors.tsv", sep='\t', header=None)
behav_df.columns = ['impression_id', 'user_id', 'time', 'history', 'impressions']

In [12]:
behav_df.head(5)

Unnamed: 0,impression_id,user_id,time,history,impressions
0,1,U13740,11/11/2019 9:05:58 AM,N55189 N42782 N34694 N45794 N18445 N63302 N104...,N55689-1 N35729-0
1,2,U91836,11/12/2019 6:11:30 PM,N31739 N6072 N63045 N23979 N35656 N43353 N8129...,N20678-0 N39317-0 N58114-0 N20495-0 N42977-0 N...
2,3,U73700,11/14/2019 7:01:48 AM,N10732 N25792 N7563 N21087 N41087 N5445 N60384...,N50014-0 N23877-0 N35389-0 N49712-0 N16844-0 N...
3,4,U34670,11/11/2019 5:28:05 AM,N45729 N2203 N871 N53880 N41375 N43142 N33013 ...,N35729-0 N33632-0 N49685-1 N27581-0
4,5,U8125,11/12/2019 4:11:21 PM,N10078 N56514 N14904 N33740,N39985-0 N36050-0 N16096-0 N8400-1 N22407-0 N6...


In [39]:
# Define aligned articles as those in certain categories (Example: "Sports" and "Games")
aligned_categ= ["sports", "games"]
df['aligned'] = df['category'].str.lower().isin(aligned_categ)

In [40]:
# Analyze behaviour to count views per article
def analyze_behav(behav):
    behav_list = behav.split(' ')
    views = [beh.split('-')[0] for beh in behav_list if beh.split('-')[1] == '1']
    return views

In [41]:
# Count views per article based on impressions
behav_df['viewed_articles'] = behav_df['impressions'].apply(analyze_behav)
all_views = behav_df.explode('viewed_articles')['viewed_articles'].value_counts().to_dict()

In [42]:
# Merge view counts into the article DataFrame
df['views'] = df['news_id'].map(all_views).fillna(0).astype(int)
print(df.head())

  news_id   category      subcategory  \
0  N55528  lifestyle  lifestyleroyals   
1  N18955     health          medical   
2  N61837       news        newsworld   
3  N53526     health           voices   
4  N38324     health          medical   

                                               title  \
0  The Brands Queen Elizabeth, Prince Charles, an...   
1  Dispose of unwanted prescription drugs during ...   
2  The Cost of Trump's Aid Freeze in the Trenches...   
3  I Was An NBA Wife. Here's How It Affected My M...   
4  How to Get Rid of Skin Tags, According to a De...   

                                            abstract  \
0  Shop the notebooks, jackets, and more that the...   
1                                                NaN   
2  Lt. Ivan Molchanets peeked over a parapet of s...   
3  I felt like I was a fraud, and being an NBA wi...   
4  They seem harmless, but there's a very good re...   

                                             url  \
0  https://assets.msn.com/l

#### The K-arm bandit class below will prioritize articles with more views and explore other articles based on an epsilon-greedy strategy.
#### The aligned articles will receive a view boost.

In [43]:
class KArmBandit:
    def __init__(self, df, epsilon=0.1):
        self.df = df
        self.epsilon = epsilon
        self.view_counts = np.zeros(len(df))
        self.total_views = np.zeros(len(df))
        self.trials = np.zeros(len(df))
    
    def choose_article(self):
        # Explore with probability epsilon, otherwise exploit
        if random.random() < self.epsilon:
            return random.choice(range(len(self.df)))
        else:
            avg_views = np.divide(self.view_counts, self.trials, where=self.trials != 0)
            return np.argmax(avg_views)
    
    def update_views(self, article_idx):
        is_aligned = self.df.iloc[article_idx]['aligned']
        # Generate views: aligned articles are prioritized
        new_views = np.random.poisson(40 if is_aligned else 20)

        self.view_counts[article_idx] += new_views
        self.total_views[article_idx] += new_views
        self.trials[article_idx] += 1
    
    def run_bandit(self, rounds=1000):
        for _ in range(rounds):
            article_idx = self.choose_article()
            self.update_views(article_idx)
    
    def results(self):
        self.df['total_views'] = self.total_views
        self.df['trials'] = self.trials
        return self.df.sort_values(by='total_views', ascending=False)

In [47]:
bandit = KArmBandit(df, epsilon=0.1)
bandit.run_bandit(rounds=500)
results = bandit.results()
print(results[['news_id', 'category', 'aligned', 'views', 'total_views', 'trials']].head(20))

      news_id category  aligned  views  total_views  trials
8690   N17852   sports     True      0      17608.0   438.0
635    N17423   sports     True      0        195.0     5.0
11201  N27377   sports     True      0        158.0     4.0
38339  N53335   sports     True      1        118.0     3.0
29506  N27914   sports     True      0        113.0     3.0
17260   N2570   sports     True      0         80.0     2.0
42118   N2052   sports     True      0         36.0     1.0
41972  N63721   sports     True      0         36.0     1.0
32429  N29783   sports     True      0         34.0     1.0
32585   N9611   sports     True      0         31.0     1.0
2323    N5799    video    False      0         31.0     1.0
11570  N37285   sports     True      0         30.0     1.0
19467  N30949     news    False      0         30.0     1.0
9895   N26462     news    False      0         29.0     1.0
33779  N42905  finance    False      0         28.0     1.0
29173  N19863     news    False      0  

- The above table summarizes which articles received the most views after the algorithm runs for 500 rounds and interatively prioritizes them each time.
- The aligned column gives a Boolean indicating whether the article belongs to an "Aligned" category (Sports or Games).
- The total_views is the total views (not the actual total views) accumulated during the bandit process.
- The trials are the number of times the bandit algorithm picked this article.