# Associative Analysis - Mee


Notion: https://www.notion.so/Associative-Analysis-Mee-2b9e3a89fc344d9f93f490dc69dbadf8


# **SMART**

**S**pecific **M**easurable **A**ttainable **R**elevant **T**imely

## Business Understanding

### What's the problem? (S)

Some of our customers have a high volume of sales at specific days and they need that the process to add new items to current orders **to be as fast as possible**.

Currently the process is the following:

- Type the name of the item
- Search results are shown as you type
- Select the item you want to add
- Save it

### How do I measure success? (M)

We will succeed if we **minimize the number of steps or/and the time required to add new items to an order**.

### What's the proposed solution? (A)

The insight we have is that **there is a lot of items which are commonly bought together** or in sequence. e.g: A customer that starts drinking beer in the afternoon will likely ask for something to eat shortly.

So the idea is that through associative analysis **we can provide suggestions of items that are commonly bought together** based on the items in the current order. That way we **eliminate the most costly step** in many scenarios which is **typing**.

In a way this suggestions will act like a **smart cache**. Bringing the items that have a higher probability to be the next ordered item, that way the waiter don't need to type anything. Just select and save.

### What's the value that this solution delivers? (R)

With a lower attendance time we improve customer experience and increase customer satisfaction. Which can lead into increase in sales due to good reviews and Word-to-mouth marketing.

To our customers it improves the perception of value of our solution as a smart and data-driven platform.

### How much time it takes to complete the project? (T)

- Data exploration and model training → 2 days
- Integrate solution with current services → 1 day
- Deployment and Testing → 1 day
- Monitor → 1 week
- Report results → 1 day

In [1]:
import pandas as pd
from pymongo import MongoClient
from bson.objectid import ObjectId

# Data Understanding

In [2]:
client = MongoClient('mongodb+srv://mee-web:<password>@production-cluster-oty65.gcp.mongodb.net/mee?retryWrites=true&w=majority')
db = client['mee']
db.list_collection_names()

['financialStatements',
 'suppliers',
 'ifoodOrders',
 'users',
 'companies',
 'productionRequests',
 'purchases',
 'invoices',
 'ifoodMarketplace',
 'usersProducts',
 'products',
 'financialFunds',
 'changelog',
 'orders',
 'userBill',
 'inventory',
 'registerOperations',
 'customers']

In [3]:
# let's grab a customer with high sales volume
colchetesId = ObjectId('5fd604714afe55001cfe1ec6')

In [4]:
sales = pd.DataFrame(list(db.orders.find({ 'company': colchetesId, 'status': 'closed' }, { 'items': 1 })))
sales.shape

(1548, 2)

In [5]:
sales.describe()

Unnamed: 0,_id,items
count,1548,1548
unique,1548,1346
top,5ff60c28cdbc54001b41041c,"[{'discount': 0, 'modifiers': [], 'product': 5..."
freq,1,51


In [6]:
sales.head()

Unnamed: 0,_id,items
0,5fb444eaca90840027f4c481,"[{'discount': 0, 'modifiers': [], 'product': 5..."
1,5fb444f4ca90840027f4c485,"[{'discount': 0, 'modifiers': [], 'product': 5..."
2,5fb446e3ca90840027f4c63e,"[{'discount': 0, 'modifiers': [], 'product': 5..."
3,5fb44d54ca90840027f4d636,"[{'discount': 0, 'modifiers': [], 'product': 5..."
4,5fb44e6fca90840027f4d841,"[{'discount': 0, 'modifiers': [], 'product': 5..."


In [7]:
sales['items'][0]

[{'discount': 0,
  'modifiers': [],
  'product': ObjectId('5fb4386dca90840027f4bc9f'),
  'gtin': '2000238000177',
  'name': 'gin tônica',
  'description': None,
  'price': 20,
  'measurement': 'unit',
  'quantity': 1,
  'note': '',
  'subtotal': 20},
 {'discount': 0,
  'modifiers': [],
  'product': ObjectId('5fb2b7ab52448b0027afd2f7'),
  'gtin': '2000238000122',
  'name': 'chope german pils 350ml',
  'description': 'Copo 350ml',
  'price': 14,
  'measurement': 'unit',
  'quantity': 4,
  'note': '',
  'subtotal': 56}]

## Data Exploration


- Which products are commonly bought together?
- Does sazionality matters?
- How do we calculate and fine tune model parameters for each company automatically?


# Modeling

In [8]:
# we need to transform this series in a list of lists
def transform_items_to_transactions(sales, attr='name'):
    transactions = []
    for i in range(0, sales.shape[0]):
        items = []
        for j in range(0, len(sales['items'][i])):
            item = sales['items'][i][j]
            items.append(item[attr])
        transactions.append(items)
    return transactions

transactions = transform_items_to_transactions(sales)
print(transactions[0])

['gin tônica', 'chope german pils 350ml']


In [9]:
from apyori import apriori

In [10]:
# Support: Frequence of item set
# Confidence: Reliability of the rule

rules = apriori(transactions, min_support = 0.01, min_confidence = 0.2, min_lift = 2, min_length = 2)

In [11]:
#viewing the rules
results = list(rules)
results = pd.DataFrame(results)
results.head(10)

Unnamed: 0,items,support,ordered_statistics
0,"(zn lager chope 400ml, apa 400)",0.012274,"[((apa 400), (zn lager chope 400ml), 0.5135135..."
1,"(gin tônica, aperol)",0.010336,"[((aperol), (gin tônica), 0.3137254901960784, ..."
2,"(água mineral com gás natural prata 300ml, ape...",0.010336,"[((aperol), (água mineral com gás natural prat..."
3,"(água mineral com gás natural prata 300ml, bai...",0.010336,"[((baiao executivo), (água mineral com gás nat..."
4,"(zn lager chope 400ml, bolinho de arroz com pa...",0.01615,"[((bolinho de arroz com parmesão), (zn lager c..."
5,"(croqueta de costela, bruschetta de cogumelos)",0.01292,"[((bruschetta de cogumelos), (croqueta de cost..."
6,"(miracle ipa chope 350ml, bruschetta de cogume...",0.013566,"[((bruschetta de cogumelos), (miracle ipa chop..."
7,"(croqueta de costela, burrata com tomatinhos, ...",0.011628,"[((burrata com tomatinhos, torrada e molho pes..."
8,"(dadinhos de tapioca, burrata com tomatinhos, ...",0.012274,"[((burrata com tomatinhos, torrada e molho pes..."
9,"(zn lager chope 400ml, burrata com tomatinhos,...",0.010982,"[((burrata com tomatinhos, torrada e molho pes..."


In [12]:
results.shape

(38, 3)

In [13]:
class AprioriModel:
    def __init__(self):
        self.model = {}
    
    def transform_results_to_dict(self, results):
        kv = {}
        for i in range(0, results.shape[0]):
            key = list(results.ordered_statistics[i][0].items_base)[0]
            value = { 'add': list(results.ordered_statistics[i][0].items_add)[0], 'conf': results.ordered_statistics[i][0].confidence }
            if  key not in kv:
                value = [value]
                kv[key] = value
            else:
                kv[key].append(value)
        return kv
    
    def train(self, transactions):
        rules = apriori(transactions, min_support = 0.01, min_confidence = 0.2, min_lift = 2, min_length = 2)
        results = list(rules)
        results = pd.DataFrame(results)
        
        self.model = self.transform_results_to_dict(results)
    
    def get_recommendations(self, items):
        results = []
        for item in items:
            rules = self.model.get(item)
            max_conf = 0
            best_rule = None
            
            if rules:
                for rule in rules:
                    if rule.get('conf', 0) > max_conf:
                        best_rule = rule
                        max_conf = rule.get('conf')

                results.append(best_rule)

        return results
                


# Deployment

To deploy this model we will create a cronjob that will build this models in batches for every active user and save the rules in our MongoDB database. Which can further be accessed from our API that will deliver the final recommendation responses when the user tries to edit a current open order.

In [16]:
# get active users
# ...

# for every active user do:
colchetesId = ObjectId('5fd604714afe55001cfe1ec6')
sales = pd.DataFrame(list(db.orders.find({ 'company': colchetesId, 'status': 'closed' }, { 'items': 1 })))
transactions = transform_items_to_transactions(sales)

model = AprioriModel()
model.train(transactions)
model.get_recommendations(['aperol', 'baiao executivo'])

[{'add': 'gin tônica', 'conf': 0.3137254901960784},
 {'add': 'água mineral com gás natural prata 300ml',
  'conf': 0.28571428571428575}]

**You can check this commit** to see the necessary changes to deploy this model in the platform: https://github.com/somosmee/mee/commit/eb183a2ce97098b9ff358732433e61a01b79fa39

## Results and Report

In the gif below you can see the final product using the suggestions given by the rules we got from our **Association Analysis model** 🎉

**Translation:**

Chopp = Draft Beer

Costela = Ribs

The following steps are happening:

1. We check that our current order have only a draft beer
2. We click the edit button to add more items from customer request
3. A list of suggestions popup given Ribs with 24.36% of confidence in the recommendation
4. We select the recommendation and save it


![SegmentLocal](suggestions.gif "segment")

I pushed this change in a separate beta branch so we could test and validate with a close client.

I left this experiment for a complete week and we got a **12% increase in the average ticket** and **23% decrease on attedance time**

Both metrics were calculate based on the average of the previous month where this change wasn't in place.

**Conclusions**

- **Attendance time** was calculate with the time the attendants took to edit and add new items to current open orders. We could see a significant improvement with this suggestions acting like a **smart cache** and allowing attendants to **not having to type and search so often**.
- I determined that the **average ticket increase** was a result of this change after talking to attendants and realizing they were using this suggestions to recommend products to customers which resulted in more items being added.

![alt text](average-ticket.png "Title")