# Integer Linear Programming Solver

Optimization Problem: Select 10 articles to display on the news feed that must have 
- More-than-average number of images, videos, and links
- At least two 'Others' articles to balance dominancee of other common topics
- At least three different categories are displayed

In [1]:
import pandas as pd 
import numpy as np
import sys
import os

In [2]:
df = pd.read_csv("../data/raw/online_news_popularity/OnlineNewsPopularity\OnlineNewsPopularity.csv")

## Dataset process

We will add the topic column (categorical) and include only relevant columns in the dataset used for linear programming.

In [12]:
df_lp = df.copy()
df_lp['topics'] = 'Other'
df_lp[' data_channel_is_other'] = 0.0
df_lp.loc[df_lp[' data_channel_is_lifestyle'] == 1.0, 'topics'] = 'Lifestyle'
df_lp.loc[df_lp[' data_channel_is_entertainment'] == 1.0, 'topics'] = 'Entertainment'
df_lp.loc[df_lp[' data_channel_is_bus'] == 1.0, 'topics'] = 'Business'
df_lp.loc[df_lp[' data_channel_is_socmed'] == 1.0, 'topics'] = 'Social Media'
df_lp.loc[df_lp[' data_channel_is_tech'] == 1.0, 'topics'] = 'Technology'
df_lp.loc[df_lp[' data_channel_is_world'] == 1.0, 'topics'] = 'World'
df_lp.loc[df_lp['topics'] == 'Other', ' data_channel_is_other'] = 1.0

print(df_lp['topics'].value_counts())

topics
World            8427
Technology       7346
Entertainment    7057
Business         6258
Other            6134
Social Media     2323
Lifestyle        2099
Name: count, dtype: int64


In [14]:
df_lp = df_lp[['topics', 
               ' num_imgs', ' num_hrefs', ' kw_avg_avg',
               ' data_channel_is_tech', ' data_channel_is_other',
               ' shares']]
df_lp.head()

Unnamed: 0,topics,num_imgs,num_hrefs,kw_avg_avg,data_channel_is_tech,data_channel_is_other,shares
0,Entertainment,1.0,4.0,0.0,0.0,0.0,593
1,Business,1.0,3.0,0.0,0.0,0.0,711
2,Business,1.0,3.0,0.0,0.0,0.0,1500
3,Entertainment,1.0,9.0,0.0,0.0,0.0,1200
4,Technology,20.0,19.0,0.0,1.0,0.0,505


## Solver Implementation

In [15]:
import os, sys
sys.path.append(os.path.abspath(".."))

from src.optimization.lp_solve import news_solver

popularity = df[' shares'].to_numpy()

topics = df_lp['topics'].to_numpy()
images = df_lp[' num_imgs'].to_numpy()
hrefs = df_lp[' num_hrefs'].to_numpy()
keywords = df_lp[' kw_avg_avg'].to_numpy()
tech_indicator = df_lp[' data_channel_is_tech'].to_numpy()
other_indicator = df_lp[' data_channel_is_other'].to_numpy()

In [None]:
selected_indices, status = news_solver(
    popularity=popularity,
    topics=topics,
    images=images,
    hrefs=hrefs,
    keywords=keywords,
    tech_indicator=tech_indicator,
    other_indicator=other_indicator,
    avg_images=images.mean(),
    avg_hrefs=hrefs.mean(),
    avg_keywords=keywords.mean()
)

print(f"List of articles selected: {selected_indices}")



List of articles selected: [3145, 4506, 5370, 9365, 9448, 16009, 16113, 16268, 18788, 23237]


In [11]:
lp_shares = df_lp.iloc[selected_indices][' shares'].sum()
print(f"Optimal Shares by Linear Programming: {lp_shares} shares")

Optimal Shares by Linear Programming: 4928500 shares


# Sensitivity Analysis

How much tech preference is needed before the homepage becomes tech-heavy?