#### Goal of the Competition
The goal of this competition is to predict e-commerce clicks, cart additions, and orders. You'll build a multi-objective recommender system based on previous events in a user session.

Your work will help improve the shopping experience for everyone involved. Customers will receive more tailored recommendations while online retailers may increase their sales.

#### Context
Online shoppers have their pick of millions of products from large retailers. While such variety may be impressive, having so many options to explore can be overwhelming, resulting in shoppers leaving with empty carts. This neither benefits shoppers seeking to make a purchase nor retailers that missed out on sales. This is one reason online retailers rely on recommender systems to guide shoppers to products that best match their interests and motivations. Using data science to enhance retailers' ability to predict which products each customer actually wants to see, add to their cart, and order at any given moment of their visit in real-time could improve your customer experience the next time you shop online with your favorite retailer.

Current recommender systems consist of various models with different approaches, ranging from simple matrix factorization to a transformer-type deep neural network. However, no single model exists that can simultaneously optimize multiple objectives. In this competition, you’ll build a single entry to predict click-through, add-to-cart, and conversion rates based on previous same-session events.

With more than 10 million products from over 19,000 brands, OTTO is the largest German online shop. OTTO is a member of the Hamburg-based, multi-national Otto Group, which also subsidizes Crate & Barrel (USA) and 3 Suisses (France).

Your work will help online retailers select more relevant items from a vast range to recommend to their customers based on their real-time behavior. Improving recommendations will ensure navigating through seemingly endless options is more effortless and engaging for shoppers.

[view in kaggle.com](https://www.kaggle.com/competitions/otto-recommender-system/overview/description)

##### Understanding the Competition

Typical e-commerce business involves millions of products and millions of customers each day looking at different products, adding some of the products to the cart, and buying some products. Each product can have multiple versions/colors etc but we can assume they are all the same product. Usually some products are bought together, think tooth brush and paste, Phone and Phone case, Table and chair, monitor and cpu etc. Idea of this challenge is to predict the customers next step based on previous steps within the sessons. Customer loads the website interacts with the products (clicks/cart adds/orders) and leaves the website. This entire process is called a session. It is interesting to note that we can only take same session data to predict next step within the session. Ideally it would make sense to also include the customers past sessions into the data. This Competition particularly improves the experience of new customer (as we won't have past data anyway), also it adds value when the existing customer is looking for a new product.

##### The patterns model needs to capture

Customer clicks a product

1. Interest in exploring the features
2. Look at the reviews
3. Have an immediate need for the product
4. Potential buy

Customer adds to the cart

1. High buy intent
2. Add similar items to cart and later orders only 1
3. Waiting for more selections to order them together

Customer buys a product

1. It looks like a better item among all the explored options. (ratings/reviews/cost etc)
2. Goes in hand with other products I bought
3. Seasonal product

##### Challenges

Multi-input problem, usual models require a constant number of variables to start with. Number of inputs can vary by a wide margin in this case. There might be number of existing models that deal with this case, but I first want to try and create a new model purely based on my intution.

##### Simple Approach 1:

Counting co-occurences.

Assumption is that each event independently effects the probability of occurence of further events.
That is P1 = P(Buying item1/Clicking item1) = (number of item1 buys within 20 events of item1 click) / (all item1 clicks).

Total probability = P1*P2*-----Pn , where n is number of events in the input slice.

For a given session input slice, we can choose top 20 events in the order of total probability, which results in our simplest predictions

However, there is problem with this approach

Clicks on the same product might repeat twice but orders/cart adds of same product might not repeat. Buy of item1 in the input slice might negate the probability of buy in the output slice. There is a dirty correction for this though, Pi for this event can be replaced with

Pi = (another item1 buy within proximity of item 1 buy)/(total item1 buys)

, ie we are checking if there is a possiblity of a repetition.

total probability in this case becomes (Pi)^n


##### Scale of the simple approach:

Everytime there is co-occurence we need to count it, which makes number of counts = sigma((number of events in a session-20) * 20)

Number of stored co-occurences cannot exactly estimated but it should be less than the number of counts. Model might be 20 times bigger than the data itself which is concerning at storage level.
Each prediction only needs data related to the input slice events, which should be very small. So we need some kind of indexing for effeciency at predicting. Model prediction time is not that concerning.







In [1]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import json # handling json files

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

/kaggle/input/otto-recommender-system/sample_submission.csv
/kaggle/input/otto-recommender-system/test.jsonl
/kaggle/input/otto-recommender-system/train.jsonl


In [2]:
def load_jsonl(path):
    data=[]
    with open(path, 'r', encoding='utf-8') as reader:
        i = 0
        for line in reader:
                data.append(json.loads(line))
                i+=1
                if i > 100: break
    return data 

In [3]:
train = load_jsonl('/kaggle/input/otto-recommender-system/train.jsonl')

In [4]:
test = load_jsonl('/kaggle/input/otto-recommender-system/test.jsonl')

In [5]:
test

[{'session': 12899779,
  'events': [{'aid': 59625, 'ts': 1661724000278, 'type': 'clicks'}]},
 {'session': 12899780,
  'events': [{'aid': 1142000, 'ts': 1661724000378, 'type': 'clicks'},
   {'aid': 582732, 'ts': 1661724058352, 'type': 'clicks'},
   {'aid': 973453, 'ts': 1661724109199, 'type': 'clicks'},
   {'aid': 736515, 'ts': 1661724136868, 'type': 'clicks'},
   {'aid': 1142000, 'ts': 1661724155248, 'type': 'clicks'}]},
 {'session': 12899781,
  'events': [{'aid': 141736, 'ts': 1661724000559, 'type': 'clicks'},
   {'aid': 199008, 'ts': 1661724022851, 'type': 'clicks'},
   {'aid': 57315, 'ts': 1661724170835, 'type': 'clicks'},
   {'aid': 194067, 'ts': 1661724246188, 'type': 'clicks'},
   {'aid': 199008, 'ts': 1661780623778, 'type': 'clicks'},
   {'aid': 199008, 'ts': 1661781274081, 'type': 'clicks'},
   {'aid': 199008, 'ts': 1661781409993, 'type': 'carts'},
   {'aid': 199008, 'ts': 1661804151788, 'type': 'clicks'},
   {'aid': 199008, 'ts': 1662060028567, 'type': 'clicks'},
   {'aid': 19