## Recommendation System

### Part-4: Final Recommendation

For my recommendation algo, I am taking inspiration from (boosting + naive bayes) where, <br>
every variable is asigned equal weight -> weight's are changed bases on items importance [previous purchase frequency] -> weight are resampled with new imp. [association with current items] -> <br>
normalization to get probability for each item -> sorting for top items -> final recommendation.

> For simplicity, we won't recommend rare item. For complete inventory recommendation, replace df with original dataset.

**Note**: <br>

By default, you might think all items are eqaully probable. But this is not the case. <br>
Frequency of items purchase is very different. To get actual popularity of item aka frequency of occurance, <br>
We need to find quantity of items bought per transactions.

In [1]:
import pandas as pd
import numpy as np
from collections import defaultdict
import time
from sklearn.model_selection import train_test_split
import joblib

import tqdm
import matplotlib.pyplot as plt
import seaborn as sns

In [2]:
## Reading datasets
df = pd.read_parquet('../Data/data_with_features.parquet')
customer = pd.read_pickle('../Data/customer_history.pkl')
items = pd.read_pickle('../Data/item_summary.pkl')
baskets = pd.read_pickle('../Data/baskets.pkl')
itemsets = joblib.load('../Models/itemsets.joblib')
rules = pd.read_pickle('../Data/rules.pkl')
vectorizer = joblib.load('../Models/vectorizer.joblib')

unq_item = len(df['StockCode'].unique())
print("Total unique items:", unq_item)

# ## Remove rare item (Optional)
# customer = customer[customer['Customer ID'].isin(df['Customer ID'].unique())]
# items = items[items['StockCode'].isin(df['StockCode'].unique())]

Total unique items: 263


#### My Current Model scheme:
I will create a model which modifies the probability of each item being bought based on multiple factors. <br>
This model will initial a probability vector for each customer based on their purchase history and association rules.

This probability vector will then be used to predict the probability of each item being bought by the customer.

Let base probability of an item be $P_0​(i)$.

We want an updated $𝑃(𝑖)$ that reflects:

- Customer affinity: $H_i$
- Rule-based affinity: $R_i$


Formally:
$$ P(i) ∝ P_0(i) × f(H_i) × g(R_i) $$

Now, we must choose f and g — the transformation functions.

🔸 Option A: Empending histories effect with direct rule boost

f: log1p(alpha * (qty/count)) <br>
g: (1 + beta * lift)
So,
$$ P(i) = P_0(i) × log(1+\alpha. \frac{qty_i}{count_i}) × (1 + \beta.lift_i) $$

🔸 Option B: Use log-space additive updates
Taking everything in log-space, so multiplicative effects become additive.
This is stand in probabilistic models (e.g., Naive Bayes, word2vec)
$$ logP(i) = logP_0(i) × \alpha.log(1+\frac{qty_i}{count_i}) × \beta.log(lift_i) $$

🔸 Option C: Weighted geometric mean
Blended normal multiplication
$$ P(i) = P_0(i)^{1-\alpha -\beta} × f(H_i)^\alpha × g(R_i)^\beta $$
- This keeps balance b/w base, history, and rule weights
- Good for ensemble-like blending (like XGBoost feature combination)

🔸 Option D: Bayesian update analogy
Interpret rules and history as independent evidence sources for item likelihood.
$$ P(i|history,rules) ∝ P_0(i) × P(history|i)^\alpha × P(rules|i)^\beta $$
Then define:
- $P(history|i)$ = normalized purchase frequency
- $P(rules|i)$ = lift or confidence

IF we normalize both, this can be approximated as:
$$ P(i) = P_0(i) × (1 +\alpha.H_i) × (1 +\beta.R_i) $$


Upon careful observation, we see that for building a scalable recommender system which has
1) Interpretability
2) consistent probabilistic meanings and
3) stability across scales

then the log-space additive model B is the best choice.

#### **Recommended Formula**
$$ logP(i) = logP_0(i) × \alpha.log(1+\frac{qty_i}{count_i}) × \beta.log(lift_i) $$

Also, I also research into two fundamental mathematical question for finding best formula.

1) Should alpha/beta be inside the log or outside the log? <br>
ANSWER: Outside the log

- when tuning parameters are outside - you take the log-evidence `log(1+x)` and scale its importance linearly.
  In log-space it reads as adding alpha times the evidence:
  $$ logP → logP + \alpha.log(1+x) $$
→ Alpha controls how many 'units of evidence' that history constributes. It's linear in log domain and easy to tune

- when tuning parameters are inside - you take the log-evidence `log(1+x)` and scale its importance non-linearly. <br>
  For small x, `log(1+\alpha*x)` ≈ `log(1+x)`, but for large x, `log(1+\alpha*x)` is close to `log(1+\alpha)`.
  i.e., it saturated differently as it couples the scale which makes it unstable.

1) Lift v/s confidence*lift - which to use? <br>
Answer: No blind answer. Treat `lift` and `confidence` as complementart signals and combine them in log-space with seperate weights.

- Lift and confidence may have common part in formula but they measure different things and have different dimension.
- Multipling them mixed different scales -> product will be dominated by large lifts for very rare items or by moderately high confidence. <br>
  It is a blunt instrument and can overweight noisy rules.
- Better options: either use confidence as a weight in lift or both as additive terms with separate tunable weights.

After multiple iteration, i have reached an even better solution. I will use this method for the rest of the notebook.

#### **Formula**:
Probability vector for each item(i), for each user(u) -> $P_{u,i}$ is the probability of the item being bought by the user.
$$ P_{u,i} = Base_i + \sum{weight_{f,i} * function(feature_{u,i})} $$

-> P_{u,i} = log-space additive features * linear-space multiplicative features

#### Probability vector for each item depend upon:
##### A. log-space additive features:
1. Base probability of an item = $P_0(i)$ -> Item bias($b(i)$) <br>
   $$ b_i = log(1 + \frac{total quantity_i}{num orders_i}) $$

2. Frequency of previous purchase of customer history = $H_i$ -> Customer affinity <br>
   $$ H_i = log(1 + \frac{qty_{u, i}}{count_{u, i}}) $$

3. Boost due to association rules = $R_i$ -> Rule-based affinity <br>
   $$ R_i = confidence_i * log(1 + lift_i) $$

4. Price Affinity = $P_i$ -> effect of price <br>
   $$ P_i = log(1 + \frac{price_i - budget_u}{price_i}) $$

5. Discount effect = $D_i$ -> effect of discount (when given) <br> 
   $$ D_i = log(1 + discount_i) $$

6. Recency time-based factor = $T_i$ -> effect of last purchase transaction <br>
   $$ T_i = log(1+ d_{effect}*exp(- k_i \Delta{t{(u,i)}}) ) $$
   where $d_{effect}$ - decay effect, $k_i$ - recency factor.

   $$ t_i = 1+ d_{effect}*exp(- k_i \Delta{t{(u,i)}}) $$
##### B. linear-space multiplicative features:
1. Description commonality = $C_i$ -> effect of description <br>
   $$ C_i = cosine(desc_i, desc_{other}) $$

> ##### Why add 1 before multiplying Description similarity? <br>
Even after normalization, we want the minimum effect to be unity as Multiplying by raw $D_i$ (0–1) would shrink the score for items that are not similar, possibly too aggressively.

Hence, the final probability of an item being bought by a customer is:
$$ Z_{u,i} = exp(b_i + \alpha*H_i + \beta*R_i + \delta*P_i + \eta*D_i + \gamma*T_i).(1 + C_i)^{\epsilon} $$
#### OR
$$ Z_{u,i} = exp(b_i + \alpha*H_i + \beta*R_i + \delta*P_i + \eta*D_i). t_i^{\gamma}.(1 + C_i)^{\epsilon} $$
Taking log both sides,
$$ log(Z_{u,i}) = b_i + \alpha*H_i + \beta*R_i + \delta*P_i + \eta*D_i + \gamma*T_i + \epsilon*log(1 + C_i) $$
$$ P_{u,i} = Normalization( log(Z_{u,i}) )  $$

In [None]:
## Cell-1: Setup Hyper-parameters
from datetime import datetime
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.metrics.pairwise import cosine_similarity

## Initial weights
ALPHA = 0.5 # purchase history
BETA = 0.4 # association rules
DELTA = 0.1 # recency bias
ETA = 0.3 # discount boost
GAMMA = 0.2 # price
EPISILON = 0.3 # description similarity

# coefficients to train
params = {'alpha': 1.0, 'beta': 1.0, 'delta': 1.0, 'eta': 1.0, 'gamma': 1.0, 'epsilon': 1.0, 'd_effect': 0.5}

## All additional hyper-parameters
d_effect = 0.5 # decay factor

def normalize_dict(d):
    max_val = max(d.values()) if d else 1
    return {k: v / max_val for k, v in d.items()}


def Softmax_normalizer(x:dict):
    val = np.array(list(x.values()))
    val = (val - val.min())/(val.max() - val.min())
    e_x = np.exp(val)
    prob = e_x / e_x.sum()
    return dict(zip(x.keys(), prob))

In softmax function, we subtract our datapoint but max value. This is numerical trick done to stablize function <br>
Softmax works with exponential function which grows too large. This leads to one item dominating -> Overflow to `inf`

So, we shift all logits by same constants to keep the largest value around 0 -> exp(0) -> result in unity as largest number of normalization process.

> Each additive features should be normalized for scale invariance across factors. <br>
> Multiplicative features can remain as they are.

In [None]:
## Cell-2: Base proabability/ item bias bi
def compute_bias(items_df):
    """
    Function to compute base probability for each item -> bias of an item

    Args: items_df (pd.DataFrame): DataFrame containing item information.
    """
    bias = {}
    for _,item in items_df.iterrows():
        bias[item['StockCode']] = np.log1p( item['Frequency'] )
    
    ## Normalize bias
    bias = normalize_dict(bias)
    return bias

# compute_bias(items)

In [None]:
## Cell-3: Previous customer history
def compute_history(customer_df = customer,customer_id:int=-1):
    """ Function to compute previous customer history. """
    history = {}

    customer_data = customer_df[customer_df['Customer ID'] == customer_id]
    if customer_data.empty:
        return history
    
    ## Customer dataset has another layer for dictionary -> so we have to use .values[0]
    count_dict = customer_data['Purchase count'].values[0]
    qty_dict = customer_data['Purchase quantity'].values[0]
    
    for item in count_dict.keys():
        count_item = count_dict[item]
        qty_item = qty_dict[item]
        # print(item, count_item, qty_item)
        history[item] = np.log1p(qty_item / count_item)
    
    ## Normalize history
    history = normalize_dict(history)
    return history

# compute_history(customer, customer_id=12608)

In [None]:
from itertools import combinations
## Cell-4: Association rules
def compute_rules(current_basket, rules=rules, items=items):
    """ 
    Function to compute rules association boost for each item. 
    
    Args:
        current_basket : List of items in the current basket.
        rules: Dataset containing association rules.
        items: Dataset containing item information.
    """
    rule_boost = {} # map item -> boost
    
    if isinstance(current_basket, (list,set)): # only list of items
        current_basket = set(current_basket)
    elif hasattr(current_basket, 'StockCode') or 'StockCode' in current_basket.columns: # dataframe with other info
        # hasattr for checking series
        current_basket = set(current_basket['StockCode'])
    else:
        raise TypeError("current basket must be list,set or Dataframe with 'StockCode' column.")

    itemsets = set(list(combinations(current_basket, 1)) + list(combinations(current_basket,2)))
    print("All Itemsets: ", itemsets)
    

    rules_index = {}
    for _, row in rules.iterrows():
        ant = tuple(sorted(row['antecedent']))  # ensure consistent tuple type
        rules_index.setdefault(ant, []).append(row)
    
    
    for itemset in itemsets:
        ## for each itemset, find all its association rules
        if itemset not in rules_index:
            continue
        else:
            for row in rules_index[itemset]: # for each matching rule
                # for each rules, get all resultant items
                conseq = row['consequent']
                for item in conseq:
                    # for each resultant item, compute boost
                    boost = row['confidence']* np.log1p(row['lift'])

        ## If item is already in rule boost through other itemset, add to it
                    if item in rule_boost:
                        rule_boost[item] += boost
                    else:
                        rule_boost[item] = boost
    
    # normalization
    rule_boost = normalize_dict(rule_boost)
    return rule_boost

compute_rules(baskets.iloc[2])

Budget constraints should also penalize items that are over budget

In [None]:
## cell-5: Budget constraints
def compute_price_bias(items=items, budget=None):
    """
    Computes bias due to budget constraints
    
    Args:
        items: Dataset containing item information.
        budget: Budget for customer
    """
    
    if budget is None:
        return {}
    
    price_bias = {}
    for _,item in items.iterrows():
        curr_price = item.get('Current_Price', 0)
        # ratio = (budget - curr_price)/curr_price
        ratio = (budget)/curr_price
        price_bias[item['StockCode']] = np.log1p(ratio)
    
    ## Normalize bias
    price_bias = normalize_dict(price_bias)
    return price_bias

# sns.displot(compute_price_bias(items, budget=50).values(), kde=True)

> In case discount is given on items.

In [None]:
## cell-6: Discount's effect
def compute_discount_boost(discount, items=items):
    """ Function to compute discount boost for each item.
    
    Args:
        discount (dictionary): Discount percentage on each item
        items: Dataset containing item information.
    """
    
    discount_bias = {}
    for _,item in items.iterrows():
        item_name = item['StockCode']
        discount_bias[item_name] = np.log1p(discount.get(item_name, 0))
    
    ## Normalize bias
    max_val = max(discount_bias.values()) if discount_bias else 1
    discount_bias = {k: v / max_val for k, v in discount_bias.items()}
    
    return dict(sorted(discount_bias.items(), key=lambda x: x[1], reverse=True))

#### How per-item recency variation works
Recency factor works from the idea of decay effect. Our items proability of buying decays based on last purchase date of the customer.

$$ T_i = d_{effect}*(1+ exp(- k_i \Delta{t{(u,i)}}) ) $$

- our items prob goes from 1 to d_effect over time
- $d_{effect}$: decay factor which determines the overall magnitude of effect from decay
- $k_i$: recency factor which determines the steepness of decay for each item
- $k_i$ marks each item prob based on difference on purchase date from today

##### Formula for k:
$$ K_i = k_{min} + (k_{max} - k_{min}) * norm[ \sqrt{log(1 + quantity bought_i)} ] $$
where  k_min and k_max are global.

- norm_logfreq_i ∈ [0,1], computed from log1p(Num_orders)
- Frequent items → norm_logfreq ≈ 1 → k_i ≈ k_max
- Rare items → norm_logfreq ≈ 0 → k_i ≈ k_min

NOTE: This dataset contains transcations from 2009 to 2011 -> long time ago. <br>
So, we can't find difference from actual today. Thus we will train model for today being = last date in dataset + 1

In [None]:
Today = pd.Timestamp(df['Purchase Date'].max()) + pd.DateOffset(days=1)
Time_period = (df['Purchase Date'].max() - df['Purchase Date'].min()).days

## Cell-7: Recency bias -> users per items
def compute_RecencyBias(items=items, customer=customer, customer_id=-1, d_effect=d_effect, today=Today, Time_period=Time_period):
    """ 
    Function to compute recency bias for each item. = d_effect * (1 + exp(-k_i * delta))
    
    Args:
        current_basket : List of items in the current basket.
        items: Dataset containing item information.
        customer: Dataset containing customer's data
        customer_id: Customer ID (in case entire data is given)
        d_effect: strength of recency decay
        today: current date for time delta
        Time_period: for maximum recency factor calculation
    """
    
    if type(customer) == pd.core.frame.DataFrame:
        customer_data = customer[customer['Customer ID'] == customer_id]
    elif customer_id == -1: # if only current customer is given
        # for anonymous customer, there is no recency bias
        return {}
    else: # not a dataframe -> direct customer data
        customer_data = customer
        
    
    kmin = np.log(2)/Time_period
    kmax = np.log(2)/7
    
    # for all item, at least Ti = d_effect
    recency_bias = defaultdict()
    
    freq = {item : 0 for item in items['StockCode']}
    quantity = customer_data['Purchase quantity'].values[0]
    
    for item, qty in quantity.items():
        # log(1+X) for 0 is still 0
        freq[item] = np.sqrt(np.log1p(qty))
    
    max_freq = max(freq.values()) or 1e-6
    freq = {item : qty/max_freq for item, qty in freq.items()} # normalization
    # freq = {item : qty/fmax for item, qty in freq.items()} # normalization
    # print("Normalized frequency: ",freq)
    
    ## decay factor for each item
    K = {item : 0 for item in items['StockCode']}
    for item in items['StockCode']:
        K[item] = kmin + (kmax - kmin) * freq[item]
    # print("Recency factor: ",K)
    
    purchase_date = customer_data['Last purchase date'].values[0]
    for item, date in purchase_date.items():
        # for each previously purchased item by customer, boost its recency bias
        time_delta = (today - date).days
        decay_factor = 1 + d_effect*np.exp(-K[item] * time_delta)
        recency_bias[item] = round(recency_bias[item] * decay_factor,8)
        
    return recency_bias


In [None]:
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.feature_extraction.text import CountVectorizer, TfidfVectorizer

## Cell-8: Description similarity
# vectorizer = joblib.load('../Models/vectorizer.joblib')

def compute_description_similarity(current_basket, items_df=items, vectorizer=None, vectorizer_path=None):
    """
    Compute description similarity (D_i) for all items w.r.t. current basket.

    Parameters
    ----------
    descrip : pd.DataFrame
        Must contain ['StockCode', 'Clean_Description']
    current_basket : list of str
        StockCodes in the current basket
    vectorizer : sklearn.feature_extraction.text.CountVectorizer
        Pre-trained CountVectorizer
    vectorizer_path : str
        Path to the pre-trained CountVectorizer joblib file

    Returns
    -------
    dictionary
        ['StockCode', 'D'] where D ∈ [0, 1]
    """
    if isinstance(current_basket, (list,set)): # only list of items
        current_basket = set(current_basket)
    elif hasattr(current_basket, 'StockCode') or 'StockCode' in current_basket.columns: # dataframe with other info
        # hasattr for checking series
        current_basket = set(current_basket['StockCode'])
    else:
        raise TypeError("current basket must be list,set or Dataframe with 'StockCode' column.")
    
    descrip = items_df.copy()
    descrip['Clean_Description'] = descrip['Description'].str.lower().str.replace('[^A-Za-z]+', ' ', regex=True).str.strip()
    
    if vectorizer:
        vocab = vectorizer
    elif vectorizer_path:
        vocab = joblib.load(vectorizer_path)
    else: # train vectorizer from scratch
        # vocab = CountVectorizer(ngram_range=(1,2), stop_words='english')
        vocab = TfidfVectorizer(ngram_range=(1,2), stop_words='english')
        vocab.fit(descrip['Clean_Description'])
    
    # tranform all items descriptions
    item_desc_vector = vocab.transform(descrip['Clean_Description'])
    print(vocab.get_feature_names_out())
    # get descriptions of items in current basket
    basket_desc = descrip.loc[descrip['StockCode'].isin(current_basket), 'Clean_Description']
    
    if basket_desc.empty:
        # not items in basket or basket not given,
        return {}

    # transform descriptions from items in baskets
    basket_vector = vocab.transform(basket_desc)
    
    # compute cosine similarity
    similarity_matrix = cosine_similarity(item_desc_vector, basket_vector)
    # print(similarity_matrix, similarity_matrix.shape)
    # take maximum similarity to any basket items
    """ For each item in the catalog, take the maximum similarity across all basket items.
        Idea: “How similar is this item to any item in the basket?”
    """
    D = similarity_matrix.max(axis=1)
    # D = D - D.mean()
    # sns.displot(D)
    
    return dict(zip(descrip['StockCode'], D))

In [None]:
## cell-9: Aggregative function
Today = df['Purchase Date'].max().date() + pd.DateOffset(days=1)
Time_period = (df['Purchase Date'].max() - df['Purchase Date'].min()).days

def Recommendation(current_basket, item_data=items, customer_data=customer, association_rules=rules, 
    Coefficients=None, Id=-1, d_effect=1, current_date=Today, Time_period=Time_period, 
    budget=None, vectorizer=None, Discount=None):
    """ Function for aggregation of all effect -> probability vector of each item.
     Compute top-N recommended items for a given customer + basket using hybrid model.
    
    Args:
        current_basket : list | set | pd.DataFrame
            List of items in the current basket.
        item_data: pd.DataFrame
            Dataset containing item information.
        customer_data: pd.DataFrame | pd.Series
            Dataset containing customer's data
        association_rules: pd.DataFrame
            All association rules from Apriori algorithm
        Coefficients: list
            All weights associated with each factor
        Id: int  [optional]
            Each customer's unique identifier (in case entire data is given)
        d_effect: float
            strength of recency decay
        current_date: datetime
            date at which transactions is occuring
        Time_period: int
            Total time span in the dataset (for recency scaling)
        budget: float
            Total budget for shopping furthur
        vectorizer: sklearn Vectorizer  [optional]
            Text vectorizer for description similarity
        Discount: dictionary
            Percentage of discount for given item
    
    returns:
        list of recommended items along with their data
    """
    
    if current_basket is None: # initial 
        print("No Item in basket.")
    
    ## final items probability vector
    all_items = item_data['StockCode'].to_list()
    logit = {}
    
    # Unpack factor weights
    if Coefficients is None:
        Coefficients = {} # one of each feature
    # unlock the needed coefficients
    alpha, beta, delta, gamma, episilon = Coefficients.get('alpha',1), Coefficients.get('beta',1), Coefficients.get('delta',1), Coefficients.get('gamma',1), Coefficients.get('episilon',1)
    eta = Coefficients.get('eta',1)
    
    # --- Compute each component from each function ---
    b_i = compute_bias(item_data)
    H_i = compute_history(customer_data, Id)
    R_i = compute_rules(current_basket, association_rules, item_data)
    P_i = compute_price_bias(item_data, budget)
    
    if Discount is None: # if discount is given
        discount = {}
    else:
        discount = compute_discount_boost(Discount, item_data)
    
    t_i = compute_RecencyBias(item_data, customer=customer, 
        customer_id=Id, d_effect=d_effect, today=current_date, Time_period=Time_period)
    Cosine = compute_description_similarity(current_basket, item_data, vectorizer=vectorizer)
    
    for i in all_items:
        logit[i] = b_i.get(i,0) + (alpha*H_i.get(i,0) + beta*R_i.get(i,0) + delta*np.log(t_i[i]) +
            max(gamma*P_i.get(i,0), 0) + eta*discount.get(i,0) + episilon*np.log( 1+Cosine.get(i,0) ))
        ## clips out negetive budget.

    
    # --- Normalize logits to probabilities ---
    # ## Using min-max scaling
    # logits_array = np.array(list(logit.values()), dtype=float) ## strictly need to convert it to float
    # maxx = logits_array.max()
    # minn = logits_array.min()
    # Probability = {k: (v - minn + 1e-6) / (maxx - minn + 1e-6) for k, v in logit.items()}
    ## softmax normalization for probabilistic interpretation
    Probability = Softmax_normalizer(logit) 
    
    ## Store it in combination with item data
    Item = item_data[['StockCode', 'Description', 'Current_Price']].copy()
    Item['Probability'] = Item['StockCode'].map(Probability).fillna(0)
    
    # Sort by probability
    Item = Item.sort_values(by='Probability', ascending=False)
    
    return Item

#### Part-5: Parallelize -> Scalibility
Our model works fine currently for single basket, but we want to scale it to multiple baskets. <br>
We need model to work in parallel for multiple baskets so that it works faster on deployment.

Any real life recommendation system should be able fast, and be able to work for multiple users at the same time. <br>
-> Hence, we now optimize all our functions

> Create a Class is always better -> OOPS concept

For compute_history_batch, for each customer we are finding effect on each item. <br>
For our model to work, Our batch must not have baskets from same customer while training. <br>
So, for training we plan to send in batches of baskets bought on same day.

Our Final model class is working but it is way too slow. <br>
So, we will now optimize it as much as possible for time efficiency.

1. The way we will find description similarity is somewhat different from above. <br>
   Here instead of similarity b/w each item of basket with each item of catalog, we are finding similarity b/w average semantic theme of the basket with each item of catalog. This simiplifies the problem.
2. Cosine similarity automatically

In [None]:
import numpy as np
import pandas as pd
from itertools import combinations
from collections import defaultdict
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
from scipy import sparse
import math

EPS = 1e-9
DEFAULT_HALF_LIFE_DAYS = 7
MAGIC_KMIN_PERIOD = 30

class Retail_Recommendation_optimal:
    """
    Batch-capable hybrid recommender combining:
      - item bias (popularity)
      - per-customer purchase history affinity
      - association-rule boosts
      - per-basket description similarity
      - per-basket / per-customer price-affordability (budget)
      - recency boost per (user,item)
      - discount boost (batch-level)
    
    This class computes item-level arrays once and then scores baskets in vectorized form.
    """

    def __init__(self, items_data, customer_data, rules, vectorizer=None, vectorizer_path=None,
                 initial_weights=None, d_effect=1, current_date=None, Time_period=None, filter_items=False):
        """
        Args:
            items_df (pd.DataFrame): must contain ['StockCode','Description','Current_Price',
                                                  'Total_quantity','Num_orders'].
            customer_df (pd.DataFrame): rows may contain dicts in columns 'Purchase count',
                                        'Purchase quantity', 'Last purchase date'.
            rules_df (pd.DataFrame): cols ['antecedent','consequent','confidence','lift'].
            vectorizer: optional pre-trained sklearn vectorizer for descriptions (TF-IDF).
            vectorizer_path: Path to pre-trained sklearn vectorizer for descriptions (TF-IDF).
            initial_weights: dict with keys alpha,beta,delta,gamma,epsilon,eta (defaults used if None).
            d_effect: base recency multiplier for items (floats).
            today: pd.Timestamp or pd.Date for "current" date; if None derived from items_df if possible.
            time_period: integer days span; if None computed from items_df.
        """
        # Data
        self.item_data = items_data
        self.customer_data = customer_data
        self.rules = rules
        
        # Items initialization
        self.all_items = np.asarray(self.item_data['StockCode'].unique())
        self.item_to_index = {item:i for i,item in enumerate(self.all_items)}
        self.n_items = len(self.all_items)
        
        self.vectorizer = vectorizer
        self.vectorizer_path = vectorizer_path
        
        ## OPTIONAL: Remove rare items
        if filter_items:
            self._remove_rare_items()
        
        # Hyper-parameters & weights
        self.d_effect = float(d_effect)
        default_weights = {'alpha':0.5,'beta':0.4,'delta':0.1,'gamma':0.2,'epsilon':0.3,'eta':0.3}
        self.weights = default_weights if initial_weights is None else initial_weights
        
        # Time params
        self.current_date = pd.Timestamp.now().date() if current_date is None else pd.Timestamp(current_date)
        self.Time_period = ((items_data['Last_sale'].max() - items_data['Last_sale'].min()).days 
                            if Time_period is None else Time_period)
        
        
        # Caches -> Mapping for customer id
        self.history_cache = {}
        self.recency_cache = {}
        
        # # Description vectorizer
        # if 'Clean_Description' not in self.item_data.columns:
        #     self.item_data = self.item_data.copy()
        #     self.item_data.loc[:,'Clean_Description'] = (self.item_data.loc[:,'Description'].str.lower().str.replace('[^A-Za-z]+',' ',regex=True).str.strip())
        # if self.vectorizer is None:
        #     if self.vectorizer_path:
        #         self.vectorizer = joblib.load(self.vectorizer_path)
        #     else:
        #         self.vectorizer = TfidfVectorizer(ngram_range=(1,2), stop_words='english')
        #         self.vectorizer.fit(self.item_data['Clean_Description'])
        # self.vocab = self.vectorizer.transform(self.item_data['Clean_Description'])
        # # item-item similarity matrix -> n_item^2 memory complexity -> can remove reare items
        # self.similarity_matrix = cosine_similarity(self.vocab, dense_output=False).astype(np.float32)

        # Precompute static factors
        self.Bias = self.compute_bias()
        self.rules_lookup = self._build_rules_index()
        
        # Precompute boost for each rule
        self._prepare_rules_boost()
        

    # ----------------- Helpers -----------------
    @staticmethod
    def normalize_dict(d):
        if not d: return {}
        max_val = max(d.values())
        return {k:v/max_val for k,v in d.items()}
    @staticmethod
    def normalize_array(a):
        amin, amax = 0, a.max()
        return (a - amin) / (amax - amin)

    @staticmethod
    def softmax_dict(x):
        vals = np.array(list(x.values()))
        vals = (vals - vals.min())/(vals.max()-vals.min())
        e = np.exp(vals)
        probs = e / e.sum()
        return dict(zip(x.keys(), probs))

        # Optional: remove rare items
    def _remove_rare_items(self, min_orders=300, min_customers=100):
        """
        Filter items to keep only those with sufficient popularity (Frequency > min_orders).
        Also cleans customer_data dictionaries to remove rare/unused items.

        Args:
            min_orders : int
                Minimum number of orders required for an item to be kept.
            min_customers : int
                Minimum number of unique customers required for an item to be kept.
        """
        # remove from items_data
        self.item_data = self.item_data[(self.item_data['Num_orders'] > min_orders) & (self.item_data['Num_customers'] > min_customers)]
        print("Number of important items: ",self.item_data.shape[0])
        
        self.all_items = set(self.item_data['StockCode'].unique())
        self.item_to_index = {item:i for i,item in enumerate(self.all_items)}
        self.n_items = len(self.all_items)
        # remove from customer data
        for ind, cust in self.customer_data.iterrows():
            pc = {k:v for k,v in cust['Purchase count'].items() if k in self.all_items}
            pq = {k:v for k,v in cust['Purchase quantity'].items() if k in self.all_items}
            self.customer_data.at[ind,'Purchase count'] = pc
            self.customer_data.at[ind,'Purchase quantity'] = pq
            
        # remove from rules
        def valid_rule(row):
            ant = row['antecedent']
            if not isinstance(ant, (list, tuple, set)): ant = [ant]
            ant = set(ant)
            
            return ant.issubset(self.all_items)
        self.rules = self.rules[self.rules.apply(valid_rule, axis=1)]

        
    def _build_rules_index(self):
        lookup = defaultdict(list)
        for _, r in self.rules.iterrows():
            ant = tuple(sorted(r['antecedent']))
            lookup[ant].append(r)
        return lookup

        # Precompute boost vector for each antecedents
    def _prepare_rules_boost(self):
        self.rules_boost = defaultdict(float)
        for ant,row in self.rules_lookup.items(): # for each rule, compute its boost
            boost_vector = defaultdict(float)
            for r in row: # for each associated consequent,
                boost_val = r['confidence'] * np.log1p(r['lift'])
                for conseq in r['consequent']: # for each connseq
                    boost_vector[conseq] += boost_val
            # For each antecedent, we have boost for each consequent
            self.rules_boost[ant] = normalize_dict(boost_vector)

    # ----------------- Feature computations -----------------
    # ----- static factors -----
    def compute_bias(self, array=True):
        bias = {row['StockCode']: np.log1p(row['Frequency']) for _,row in self.item_data.iterrows()}
        if array:
            bias = np.array([bias.get(item, 0) for item in self.all_items], dtype=np.float32)
            return self.normalize_array(bias)
        return self.normalize_dict(bias)

    # ----- per-customer factors -----
    def compute_history(self, cust_id):
        if cust_id in self.history_cache:
            return self.history_cache[cust_id]

        cust_data = self.customer_data[self.customer_data['Customer ID']==cust_id]
        if cust_data.empty:
            self.history_cache[cust_id] = {}
            return {}

        count_dict = cust_data['Purchase count'].values[0]
        qty_dict = cust_data['Purchase quantity'].values[0]
        history = {item: np.log1p(qty_dict.get(item,0)/(count_dict.get(item,1))) 
                   for item in count_dict.keys()}
        self.history_cache[cust_id] = self.normalize_dict(history)
        return self.history_cache[cust_id]

    def compute_recency(self, cust_id):
        if cust_id in self.recency_cache:
            return self.recency_cache[cust_id]

        recency = np.zeros(self.n_items, dtype=np.float32)
        cust_data = self.customer_data[self.customer_data['Customer ID']==cust_id]
        if cust_data.empty:
            self.recency_cache[cust_id] = recency
            return recency

        qty_dict = cust_data['Purchase quantity'].values[0]
        freq = np.array([np.sqrt(np.log1p(qty_dict.get(item,0))) for item in self.all_items], dtype=np.float32)
        freq = freq / freq.max() # normalize to fit maximum value
        
        kmin = np.log(2)/DEFAULT_HALF_LIFE_DAYS
        kmax = np.log(2)/self.Time_period
        K = kmin + (kmax-kmin)*freq

        last_purchase = cust_data['Last purchase date'].values[0]
        for i,item in enumerate(self.all_items):
            last_date = last_purchase.get(item, None)
            if last_date is None:
                delta_days = self.Time_period
            else:
                delta_days = (self.current_date - last_purchase.get(item, None)).days # Time since last purchase -> delta=0 for fresh items
            recency[i] = (1 + d_effect*np.exp(-K[i] * delta_days))
        
        self.recency_cache[cust_id] = recency
        return recency

    # ----- Discount boost -----
    def compute_discount_boost(self, discount_dict):
        if not discount_dict:
            return {}
        boost = {item: np.log1p(discount_dict[item]) for item in discount_dict}
        return self.normalize_dict(boost)

    # ----- Price boost -----
    def compute_price_bias(self, budget):
        if budget is None:
            return {}
        pb = np.array([np.log1p(budget/row['Current_Price']) for item,row in self.item_data.iterrows()], dtype=np.float32)
        return self.normalize_array(pb)

    # For each basket -> Pull out values
    def compute_description_boost(self, baskets_df):
        """Compute description similarity using precomputed item-item matrix"""
        desc_map = {}
        for idx,row in baskets_df.iterrows(): # for each basket
            basket_items = set(row['StockCode']) if isinstance(row['StockCode'],(list,set)) else set()
            if not basket_items:
                desc_map[idx] = {}
                continue
            
            # Indexies of basket items
            basket_idx  = [self.item_to_index[item] for item in basket_items if item in self.item_to_index]
            
            sim_vec = self.similarity_matrix[basket_idx].toarray() # similarity vector for each basket item with all other items
            # print(sim_vec.shape)
            avg_sim_vec = sim_vec.mean(axis=0) # average similarity for entire basket with all other items
            # print(avg_sim_vec.shape)
            
            # Normalize boost
            max_val = avg_sim_vec.max()
            if max_val > 0:
                desc_map[idx] = avg_sim_vec / max_val
            else:
                desc_map[idx] = np.zeros(self.n_items, dtype=np.float32)
        
        return desc_map
    
    # ----- Rules boost -----
    def compute_association_boost(self, baskets):
        rules_map = {}
        for idx, row in baskets.iterrows():
            current_basket = set(row['StockCode']) if isinstance(row['StockCode'], (list, set)) else set()
            if not current_basket:
                rules_map[idx] = {}
                continue
            
            conseq_boost = {}
            # check for each antecedent
            for ant, boost_dict in self.rules_boost.items():
                if set(ant).issubset(current_basket): # if ant match basket itemset
                    for item,val in boost_dict.items():
                        conseq_boost[item] = conseq_boost.get(item, 0) + val
            
            # For idx basket, we have rules boost on conseq_boost
            rules_map[idx] = conseq_boost
        return rules_map
    
            
    # ----------------- Recommendation -----------------
    def recommend(self, baskets, Coefficients = None, budget_dict=None, discount_dict=None, top_n=5):
        """
        Aggregate all factors to compute item probabilities per basket.
        Supports mix of dict (sparse) and array (dense) features.
        """
        if Coefficients is not None:
            self.weights = Coefficients
        
        results = {}

        # Precompute dense features
        # Desc_map = self.compute_description_boost(baskets) # dense arrays
        rules_map = self.compute_association_boost(baskets) # sparse dicts
        Discount_dict = self.compute_discount_boost(discount_dict) # sparse dict
        price = self.compute_price_bias(budget_dict)  # sparse dict
        
        
        for idx,row in tqdm.tqdm(baskets.iterrows()):
            cid = row.get('Customer ID', -1)
            basket_items = set(row['StockCode']) if isinstance(row['StockCode'], (list, set)) else set()
            
            logit = {}
            
            # Fetch features
            H_i = self.compute_history(cid) # sparse dict
            T_i = self.compute_recency(cid) # dense array
            R_i = rules_map.get(idx, {})    # sparse dict
            ## Discount, price will work directly
            # Desc = Desc_map.get(idx, np.zeros(self.n_items, dtype=np.float32))
            
            for i, item in enumerate(self.all_items):
                logit[i] = self.Bias[i] + self.weights['alpha']*H_i.get(item, 0.0) + self.weights['beta']*R_i.get(item, 0.0) + self.weights['eta']*Discount_dict.get(item, 0.0)
                logit[i] += self.weights['delta']*price.get(item, 0) + self.weights['gamma']*T_i[i]
                # logit[i] += self.weights['epsilon']*np.log1p(Desc[i])
        
            # Softmax normalization
            Prob = self.softmax_dict(logit)
            
            # Picking out top-N items
            all_items_list = list(self.all_items)                     # convert set → list
            top_idx = np.argsort(Prob)[::-1][:top_n]                  # indices of top-N
            top_codes = [all_items_list[i] for i in top_idx]          # get item codes
            top_probs = [Prob[i] for i in top_idx]                    # get probabilities

            df_top = self.item_data[self.item_data['StockCode'].isin(top_codes)].copy()
            df_top['Probability'] = df_top['StockCode'].map(dict(zip(top_codes, top_probs)))
            df_top = df_top.set_index('StockCode').loc[top_codes].reset_index()
            df_top['Rank'] = range(1, len(df_top)+1)
            results[idx] = df_top

        return results

rec2 = Retail_Recommendation_optimal(items, customer, rules=rules, initial_weights=params, d_effect=1,
    current_date=Today, Time_period=100, vectorizer=vectorizer, filter_items=True)
basket1 = baskets.iloc[[2]]
# rec2.recommend(baskets, budget_dict=None, discount_dict=None, top_n=5)

Even with the Best Structure from python, Our model is way too slow. <br>
So, for optimization -> 
1. We will stop using python loops. Only work in C loops indirectly
2. Using python dictionarys is way too slow. We will use sparse arrays instead.
3. Only numpy vectorised array so that computations become extremely fast.
4. Using sorting on dictionary in recommend functionis O[n logn]. So, instead we use argpartition + argsort -> O[n], fully in C.
5. Using pandas DataFrame is way too inefficient for middle steps. Completely replace them with arrays.
6. Similar description is a relevant factors, but similarity score is a O[n_item^2] operation. -> Too complex. <br>
   It would be better to just remove it.

Now, we will Code the most optimal recommendation class that i could make.

In [None]:

import numpy as np
import pandas as pd
from collections import defaultdict
import math
import tqdm

EPS = 1e-9
DEFAULT_HALF_LIFE_DAYS = 7

class Retail_Recommendation_Fast:
    """
    Extremely optimized version (description boost removed).
    Uses full NumPy vectorization, cached arrays, and minimal loops.
    """

    def __init__(self, items_data, customer_data, rules, vectorizer=None, vectorizer_path=None,
                 initial_weights=None, d_effect=1, current_date=None, Time_period=None, 
                 filter_items=False, include_description=False):

        self.item_data = items_data.copy()
        self.customer_data = customer_data
        self.rules = rules

        # Items setup
        if filter_items:
            self._remove_rare_items()
        self.all_items = np.array(self.item_data['StockCode'].unique())
        self.item_to_index = {item: i for i, item in enumerate(self.all_items)}
        self.n_items = len(self.all_items)

        # Time and weights
        self.d_effect = float(d_effect)
        self.weights = (
            {'alpha': 0.5, 'beta': 0.4, 'delta': 0.1,
             'gamma': 0.2, 'eta': 0.3}
            if initial_weights is None else initial_weights
        )
        self.current_date = (
            pd.Timestamp.now().date()
            if current_date is None else pd.Timestamp(current_date)
        )
        self.Time_period = (
            pd.Timedelta(days=100) # (items_data['Last_sale'].max() - items_data['Last_sale'].min()).days
            if Time_period is None else Time_period
        )

        # Caches
        self.history_cache = {}
        self.recency_cache = {}

        # Precompute static arrays
        self.Bias = self._compute_bias()
        self.rules_lookup = self._build_rules_index()
        self.rules_boost = self._prepare_rules_boost()
        
        self.include_description = include_description
        
        if self.include_description:
                # ----------------- Description similarity -----------------
            if 'Clean_Description' not in self.item_data.columns:
                self.item_data = self.item_data.copy()
                self.item_data['Clean_Description'] = (self.item_data['Description'].str.lower().str.replace('[^A-Za-z]+',' ', regex=True).str.strip())
            if vectorizer is None:
                if vectorizer_path:
                    import joblib
                    vectorizer = joblib.load(vectorizer_path)
                else:
                    vectorizer = TfidfVectorizer(ngram_range=(1,2), stop_words='english')
                    vectorizer.fit(self.item_data['Clean_Description'])
            self.vectorizer = vectorizer

            self.vocab = self.vectorizer.transform(self.item_data['Clean_Description'])
            self.similarity_matrix = cosine_similarity(self.vocab, dense_output=False).astype(np.float32)
        
        

    # ------------------ Helper methods ------------------
    def _remove_rare_items(self, min_orders=300, min_customers=100):
        """
        Filter items to keep only those with sufficient popularity (Frequency > min_orders).
        Also cleans customer_data dictionaries to remove rare/unused items.

        Args:
            min_orders : int
                Minimum number of orders required for an item to be kept.
            min_customers : int
                Minimum number of unique customers required for an item to be kept.
        """
        df = self.item_data
        df = df[(df['Num_orders'] > min_orders) & (df['Num_customers'] > min_customers)]
        self.item_data = df.reset_index(drop=True)
        print(f"Number of important items: {self.item_data.shape[0]}")
        

    def _build_rules_index(self):
        lookup = defaultdict(list)
        for _, r in self.rules.iterrows():
            ant = tuple(sorted(r['antecedent']))
            lookup[ant].append(r)
        return lookup

    def _prepare_rules_boost(self):
        rb = {}
        for ant, rows in self.rules_lookup.items():
            boost_vector = defaultdict(float)
            for r in rows:
                boost_val = r['confidence'] * math.log1p(r['lift'])
                for conseq in r['consequent']:
                    boost_vector[conseq] += boost_val
            # Normalize
            max_val = max(boost_vector.values(), default=1)
            rb[ant] = {k: v / max_val for k, v in boost_vector.items()}
        return rb

    def _compute_bias(self):
        freq = self.item_data['Frequency'].values
        bias = np.log1p(freq).astype(np.float32)
        return bias / bias.max()

    # ------------------ Feature computations ------------------
    def compute_history(self, cust_id):
        if cust_id in self.history_cache:
            return self.history_cache[cust_id]

        cust_data = self.customer_data[self.customer_data['Customer ID'] == cust_id]
        hist = np.zeros(self.n_items, dtype=np.float32)
        if cust_data.empty:
            self.history_cache[cust_id] = hist
            return hist

        count = cust_data.iloc[0]['Purchase count']
        qty = cust_data.iloc[0]['Purchase quantity']

        for item, q in qty.items():
            idx = self.item_to_index.get(item)
            if idx is not None:
                c = count.get(item, 1)
                hist[idx] = math.log1p(q / c)

        if hist.max() > 0:
            hist /= hist.max()
        self.history_cache[cust_id] = hist
        return hist

    def compute_recency(self, cust_id):
        if cust_id in self.recency_cache:
            return self.recency_cache[cust_id]

        cust_data = self.customer_data[self.customer_data['Customer ID'] == cust_id]
        rec = np.ones(self.n_items, dtype=np.float32)
        if cust_data.empty:
            self.recency_cache[cust_id] = rec
            return rec

        qty_dict = cust_data.iloc[0]['Purchase quantity']
        freq = np.array([math.sqrt(math.log1p(qty_dict.get(it, 0))) for it in self.all_items], dtype=np.float32)
        freq /= (freq.max() + EPS)

        kmin = np.log(2) / DEFAULT_HALF_LIFE_DAYS
        kmax = np.log(2) / self.Time_period
        K = kmin + (kmax - kmin) * freq

        last_purchase = cust_data.iloc[0]['Last purchase date']
        deltas = np.array(
            [(self.current_date - last_purchase.get(it, self.current_date)).days for it in self.all_items],
            dtype=np.float32
        )

        rec = 1 + self.d_effect * np.exp(-K * deltas)
        self.recency_cache[cust_id] = rec
        return rec

    def compute_rules_array(self, basket):
        rules_arr = np.zeros(self.n_items, dtype=np.float32)
        current = set(basket)
        for ant, boost_dict in self.rules_boost.items():
            if set(ant).issubset(current):
                for item, val in boost_dict.items():
                    idx = self.item_to_index.get(item)
                    if idx is not None:
                        rules_arr[idx] += val
        if rules_arr.max() > 0:
            rules_arr /= rules_arr.max()
        return rules_arr

    def compute_discount_array(self, discount_dict):
        arr = np.zeros(self.n_items, dtype=np.float32)
        if discount_dict:
            for item, val in discount_dict.items():
                idx = self.item_to_index.get(item)
                if idx is not None:
                    arr[idx] = math.log1p(val)
        if arr.max() > 0:
            arr /= arr.max()
        return arr

    def compute_price_array(self, budget):
        arr = np.zeros(self.n_items, dtype=np.float32)
        if budget is None:
            return arr
        prices = self.item_data['Current Price'].values
        arr = np.log1p(budget / (prices + EPS)).astype(np.float32)
        return arr / (arr.max() + EPS)
    
    def compute_description_boost(self, baskets_df):
        desc_map = {}
        for idx, row in baskets_df.iterrows():
            basket_items = set(row['StockCode']) if isinstance(row['StockCode'], (list, set)) else set()
            if not basket_items:
                desc_map[idx] = np.zeros(self.n_items, dtype=np.float32)
                continue
            basket_idx = [self.item_to_index[item] for item in basket_items if item in self.item_to_index]
            sim_vec = self.similarity_matrix[basket_idx].toarray()  # shape: len(basket) x n_items
            avg_vec = sim_vec.mean(axis=0)
            max_val = avg_vec.max()
            desc_map[idx] = avg_vec / max_val if max_val>0 else np.zeros(self.n_items, dtype=np.float32)
        return desc_map

    # ------------------ Recommendation ------------------
    def recommend(self, baskets, budget_dict=None, discount_dict=None, top_n=5):
        " If top_n == -1, then return all items "
        results = {}
        discount_arr = self.compute_discount_array(discount_dict)
        price_arr = self.compute_price_array(budget_dict)


        if self.include_description:
            Desc_map = self.compute_description_boost(baskets)

        for idx, row in tqdm.tqdm(baskets.iterrows(), total=len(baskets)):
            cid = row.get('Customer ID', -1)
            basket_items = row['StockCode'] if isinstance(row['StockCode'], (list, set)) else []

            # Precompute all dense feature arrays
            H = self.compute_history(cid)
            T = self.compute_recency(cid)
            R = self.compute_rules_array(basket_items)
            
            if self.include_description:
                Desc = Desc_map.get(idx, np.zeros(self.n_items, dtype=np.float32))

            # Vectorized scoring
            logit = (
                self.Bias
                + self.weights['alpha'] * H
                + self.weights['beta'] * R
                + self.weights['eta'] * discount_arr
                + self.weights['delta'] * price_arr
                + self.weights['gamma'] * T
            )

            if self.include_description:
                logit += self.weights['epsilon'] * Desc

            # Softmax probability
            exps = np.exp(logit - logit.max())
            prob = exps / (exps.sum() + EPS)

            # Top-N items
            if top_n == -1:
                top_idx = np.argpartition(prob)
                
            top_idx = np.argpartition(prob, -top_n)[-top_n:]
            top_idx = top_idx[np.argsort(prob[top_idx])[::-1]]
            top_codes = self.all_items[top_idx]
            top_probs = prob[top_idx]

            df_top = (
                self.item_data[self.item_data['StockCode'].isin(top_codes)]
                .copy()
                .set_index('StockCode')
                .loc[top_codes]
                .reset_index()
            )
            df_top['Probability'] = top_probs
            df_top['Rank'] = np.arange(1, len(df_top) + 1)
            results[idx] = df_top

        return results


In [None]:
rec_fast = Retail_Recommendation_Fast(
    items_data=items,
    customer_data=customer,
    rules=rules,
    initial_weights=params,
    d_effect=1,
    current_date=Today,
    Time_period=100,
    filter_items=True,
    include_description=False
)
basket1 = baskets.iloc[[2]]

start = time.time()
result = rec_fast.recommend(basket1, budget_dict=None, discount_dict=None, top_n=5)
print("Without description: ",time.time() - start)

In [None]:
rec_fast = Retail_Recommendation_Fast(
    items_data=items,
    customer_data=customer,
    rules=rules,
    initial_weights=params,
    d_effect=1,
    current_date=Today,
    Time_period=100,
    filter_items=False,
    include_description=True
)
basket1 = baskets.iloc[[2]]

start2 = time.time()
result = rec_fast.recommend(basket1, budget_dict=None, discount_dict=None, top_n=5)
print("With description: ",time.time() - start2)

### Saving model:
For saving this model, we will make a `Model.py` file