# OneDrive Client business rules

## Motivation

The purpose of this project is automated generation and validation of business rules for the OneDrive Client project. 
Synchronization of files involves a lot of parameters and it is a relatively complex process. And because of that it is hard formulate them manually and this is why automated approach was chosen. Automated approach imrpoves correctness and completeness of formulation of the business rules.

## Introduction

This project operates with such concepts as **conditions** and **actions**. Conditions represent input and actions represent reaction of the program. Conditions and actions comprised of variables. For example `Local file does not exist` is a condition variable. `Download remote file` is an action variable.
Both condition and action variables could take three values: `True`, `False` and `Undefined`. `Undefined` means that the variable doesn't make sense in combination with values of other variables. For example condition variable `Remote and local files are the same` does not make sense if `Local file does not exist` is `True` and `Remote file does not exit` if `False` - we can't compare files if one of them does not exist.

General structure of the rules generation is the following:
1. Formulation of condition and action variables.
2. Generation of all possible combinations of values of all variables (condition- and action- together).
3. Filtering out combination that does not make sense by applying constraints.
4. Generatino of actions.
5. Further removal of redundancy in the resulting rules by determining such combinations of values of condition variables that always result in the same values of certain single action variables and other methods.
6. Validation of the non-redundant rules against the "dumb" redundant rules.
7. Presentation of the resulting rules.

## Imports

In [1]:
import collections
import functools as ft
import itertools as it
import operator as op

import dask
import dask.dataframe
import numpy as np
import pandas as pd
import progressbar
import yaml
from dask.diagnostics import ProgressBar
from frozendict import frozendict
from scipy.misc import comb

## Common utilities

In [2]:
class Enumeration(tuple):
    """Homegrown enum class.
    
    It is useful because it could be pickled by cloudpickle which
    is used by dask library.
    """
    def __getattr__(self, name):
        if name in self:
            return name
        raise AttributeError(name)

In [3]:
def safe_update(a, b):
    """Like dict.update but doesn't override keys."""
    for key, new_value in b.items():
        value = a.setdefault(key, new_value)
        assert value == new_value
    return a

In [4]:
def remove_supersets(sets):
    """Remove sets from 'sets' that are supersets of any other sets."""
    sets = set(map(frozenset, sets))
    filtered = collections.deque()
    for a in sets:
        for b in sets:
            if a > b:
                break
        else:
            filtered.append(a)
    return filtered

In [5]:
def all_combinations(values):
    """All possible combinations of all lengts for `values`."""
    return it.chain.from_iterable(
        it.combinations(values, i)
        for i in range(1, len(values) + 1)
    )

In [6]:
def ncombinations(n):
    """Number of all possible combinations of all lengths for set with length `n`."""
    return sum(comb(n, r, exact=True) for r in range(1, n + 1))

Values that condition and action variables could take.

In [7]:
Value = Enumeration(['TRUE', 'FALSE', 'UNDEFINED'])


_VALUE_PYTHON_MAP = {
    Value.TRUE: True,
    Value.FALSE: False,
    Value.UNDEFINED: None
}

_PYTHON_VALUE_MAP = {
    True: Value.TRUE,
    False: Value.FALSE,
    None: Value.UNDEFINED
}

def value_to_python(value):
    return _VALUE_PYTHON_MAP[value]

def value_from_python(value):
    return _PYTHON_VALUE_MAP[value]

## Definition of conditions and actions.

### Conditions

Conditions, that serve as input data for the program.

Each condition might be `True`, `False` or `Undefined`. `Undefined` means that the condition doesn't make sense in combination with the other conditions.

In [8]:
Condition = Enumeration([
    'LOCAL_FILE_CHANGED',
    'REMOTE_FILE_CHANGED',
    'FILES_CONTENT_IS_SAME',
    'FILES_METADATA_IS_SAME',
    'LOCAL_FILE_DOES_NOT_EXIST',
    'REMOTE_FILE_DOES_NOT_EXIST',
    # One or more copies (matched by content) exist on the opposide side.
    'LOCAL_COPIES_DO_EXIST',
    'REMOTE_COPIES_DO_EXIST',
    'LOCAL_COPY_COUNTERPART_DOES_NOT_EXIST',
    'REMOTE_COPY_COUNTERPART_DOES_NOT_EXIST',
    'LOCAL_COPY_METADATA_IS_SAME',
    'REMOTE_COPY_METADATA_IS_SAME'
])

In [9]:
class Constraints:
    """Defines constraints - functions that check a single rule for validity.
    
    These functions are supposed to be applied to conditions sequentially.
    """
    @staticmethod
    def base_conditions(conditions):
        """Should be always defined."""
        assert all(
            e in conditions for e in
            [Condition.LOCAL_FILE_CHANGED,
             Condition.REMOTE_FILE_CHANGED,
             Condition.LOCAL_FILE_DOES_NOT_EXIST,
             Condition.REMOTE_FILE_DOES_NOT_EXIST]
        )

    @staticmethod
    def something_should_be_changed(conditions):
        """Either the local or the remote file should be changed."""
        assert (
            conditions.get(Condition.LOCAL_FILE_CHANGED, False) or
            conditions.get(Condition.REMOTE_FILE_CHANGED, False)
        )

    @staticmethod
    def if_local_file_does_exist_remote_copies_existence_must_be_defined(
        conditions
    ):
        """If the local file does exist, existence of remote copies
        can't be undefined.
        """
        if conditions.get(Condition.LOCAL_FILE_DOES_NOT_EXIST) is False:
            assert Condition.REMOTE_COPIES_DO_EXIST in conditions

    @staticmethod
    def if_remote_file_does_exist_local_copies_existence_must_be_defined(
        conditions
    ):
        """If the remote file does exist, existence of local copies
        can't be undefined.
        """
        if conditions.get(Condition.REMOTE_FILE_DOES_NOT_EXIST) is False:
            assert Condition.LOCAL_COPIES_DO_EXIST in conditions

    @staticmethod
    def if_local_file_does_not_exist_remote_copies_are_not_possible(conditions):
        """If the local file does not exist, remote copies can't
        exist.
        """
        if conditions.get(Condition.LOCAL_FILE_DOES_NOT_EXIST) is True:
            assert Condition.REMOTE_COPIES_DO_EXIST not in conditions

    @staticmethod
    def if_remote_file_does_not_exist_local_copies_are_not_possible(
        conditions
    ):
        """If the remote file does not exist, local copies can't
        exist.
        """
        if conditions.get(Condition.REMOTE_FILE_DOES_NOT_EXIST) is True:
            assert Condition.LOCAL_COPIES_DO_EXIST not in conditions

    @staticmethod
    def if_local_copies_do_not_exist_associated_conditions_are_impossible(
        conditions
    ):
        """If the local copies do not exist, the other conditions, 
        associated with them, do not make sense.
        """
        if (conditions.get(Condition.LOCAL_COPIES_DO_EXIST) in {None, False}):
            assert (
                Condition.LOCAL_COPY_COUNTERPART_DOES_NOT_EXIST not in
                conditions and
                Condition.LOCAL_COPY_METADATA_IS_SAME not in conditions
            )

    @staticmethod
    def if_remote_copies_do_not_exist_associated_conditions_are_impossible(
        conditions
    ):
        """If remote copies do not exist, the other conditions, 
        associated with them, do not make sense.
        """
        if conditions.get(Condition.REMOTE_COPIES_DO_EXIST) in {None, False}:
            assert (
                Condition.REMOTE_COPY_COUNTERPART_DOES_NOT_EXIST
                not in conditions and
                Condition.REMOTE_COPY_METADATA_IS_SAME not in conditions
            )

    @staticmethod
    def if_local_copies_do_exist_associated_conditions_must_be_defined(
        conditions
    ):
        """If the local copies do exist, the other conditions, 
        associated with them, must be defined.
        """
        if conditions.get(Condition.LOCAL_COPIES_DO_EXIST):
            assert Condition.LOCAL_COPY_COUNTERPART_DOES_NOT_EXIST in conditions
            assert Condition.LOCAL_COPY_METADATA_IS_SAME in conditions

    @staticmethod
    def if_remote_copies_do_exist_associated_conditions_must_be_defined(
        conditions
    ):
        """If the remote copies do exist, the other conditions, 
        associated with them, must be defined.
        """
        if conditions.get(Condition.REMOTE_COPIES_DO_EXIST):
            assert Condition.REMOTE_COPY_COUNTERPART_DOES_NOT_EXIST \
                   in conditions
            assert Condition.REMOTE_COPY_METADATA_IS_SAME in conditions

    @staticmethod
    def if_file_does_not_exist_or_undefined_comparison_is_impossible(
        conditions
    ):
        """The remote and the local files could only be compared in case 
        if they are both exist.
        """
        if (conditions.get(Condition.LOCAL_FILE_DOES_NOT_EXIST) or
            conditions.get(Condition.REMOTE_FILE_DOES_NOT_EXIST)):
            assert (
                Condition.FILES_CONTENT_IS_SAME not in conditions and
                Condition.FILES_METADATA_IS_SAME not in conditions
            )

    @staticmethod
    def if_both_files_do_exist_comparison_required(conditions):
        """If the remote and the local files exist, the comparison
        info must be present.
        """
        if (conditions.get(Condition.LOCAL_FILE_DOES_NOT_EXIST) is False and
            conditions.get(Condition.REMOTE_FILE_DOES_NOT_EXIST) is False):
            assert Condition.FILES_CONTENT_IS_SAME in conditions
            if conditions.get(Condition.FILES_CONTENT_IS_SAME) is False:
                assert conditions.get(Condition.FILES_METADATA_IS_SAME) is False
            elif conditions.get(Condition.FILES_CONTENT_IS_SAME) is True:
                assert Condition.FILES_METADATA_IS_SAME in conditions

In [10]:
CONSTRAINTS = [getattr(Constraints, e) for e in Constraints.__dict__
               if not e.startswith('_') and callable(getattr(Constraints, e))]

### Actions

Actions represent reaction of the program on the combination of conditions with certain values.

In [11]:
Action = Enumeration([
    'DOWNLOAD_REMOTE_FILE',
    'UPLOAD_LOCAL_FILE',
    'DELETE_REMOTE_FILE',
    'DELETE_LOCAL_FILE',
    'UPDATE_LOCAL_FILE_METADATA',
    'UPDATE_REMOTE_FILE_METADATA',
    'RENAME_LOCAL_FILE',
    'COPY_LOCAL_FILE',
    'COPY_REMOTE_FILE',
    'MOVE_LOCAL_FILE',
    'MOVE_REMOTE_FILE',
    'DO_NOTHING'
])

In [12]:
class Actions:
    """Defines functions that fill actions based on conditions.
    
    The functions are supposed to applied to the rules in round-robin fashion 
    until nothing is changed.
    """
    @staticmethod
    def _is_only_local_file_changed(rule):
        return (
            rule.get(Condition.LOCAL_FILE_CHANGED, False) and
            not rule.get(Condition.REMOTE_FILE_CHANGED, False)
        )
    
    @staticmethod
    def _is_only_remote_file_changed(rule):
        return (
            not rule.get(Condition.LOCAL_FILE_CHANGED, False) and
            rule.get(Condition.REMOTE_FILE_CHANGED, False)
        )
    
    @staticmethod
    def _are_both_files_changed(rule):
        return (
            rule.get(Condition.LOCAL_FILE_CHANGED, False) and
            rule.get(Condition.REMOTE_FILE_CHANGED, False)
        )
    
    @staticmethod
    def _make_other_undefined(actions):
        for action in Action:
            if action not in actions:
                actions[action] = None
        return actions
    
    @staticmethod
    def _get_actions(rule):
        return {
            a: rule[a] for a in Action if a in rule
        }
    
    @staticmethod
    def _make_other_false_except_do_nothing(actions):
        for action in Action:
            if (action in actions) or (action is Action.DO_NOTHING):
                continue
            actions[action] = False
        return actions
    
    @classmethod
    def _download(cls, rule):
        actions = {}
        
        assert not rule.get(Condition.FILES_CONTENT_IS_SAME)
        assert rule.get(Condition.REMOTE_FILE_DOES_NOT_EXIST) is False
        assert rule.get(Condition.LOCAL_FILE_DOES_NOT_EXIST) is not None
        assert rule.get(Condition.LOCAL_COPIES_DO_EXIST) is not None
        
        if rule.get(Condition.LOCAL_COPIES_DO_EXIST):
            assert rule.get(Condition.LOCAL_COPY_COUNTERPART_DOES_NOT_EXIST) is not None
            if rule.get(Condition.LOCAL_COPY_COUNTERPART_DOES_NOT_EXIST) is True:
                actions[Action.MOVE_LOCAL_FILE] = True
                actions[Action.COPY_LOCAL_FILE] = None
            elif rule.get(Condition.LOCAL_COPY_COUNTERPART_DOES_NOT_EXIST) is False:
                actions[Action.COPY_LOCAL_FILE] = True
                actions[Action.MOVE_LOCAL_FILE] = None

            assert rule.get(Condition.LOCAL_COPY_METADATA_IS_SAME) is not None
            if rule.get(Condition.LOCAL_COPY_METADATA_IS_SAME) is False:
                actions[Action.UPDATE_LOCAL_FILE_METADATA] = True
        elif rule.get(Condition.LOCAL_COPIES_DO_EXIST) is False:
            actions[Action.DOWNLOAD_REMOTE_FILE] = True
            
            if rule.get(Condition.FILES_METADATA_IS_SAME) is False:
                assert rule.get(Condition.LOCAL_FILE_DOES_NOT_EXIST) is False
                actions[Action.UPDATE_LOCAL_FILE_METADATA] = True
                    
        return actions
                
    @classmethod
    def _upload(cls, rule):
        actions = {}
        
        assert not rule.get(Condition.FILES_CONTENT_IS_SAME)
        assert rule.get(Condition.LOCAL_FILE_DOES_NOT_EXIST) is False
        assert rule.get(Condition.REMOTE_FILE_DOES_NOT_EXIST) is not None
        assert rule.get(Condition.REMOTE_COPIES_DO_EXIST) is not None
        
        if rule.get(Condition.REMOTE_COPIES_DO_EXIST):
            assert rule.get(Condition.REMOTE_COPY_COUNTERPART_DOES_NOT_EXIST) is not None
            if rule.get(Condition.REMOTE_COPY_COUNTERPART_DOES_NOT_EXIST) is True:
                actions[Action.MOVE_REMOTE_FILE] = True
                actions[Action.COPY_REMOTE_FILE] = None
            elif rule.get(Condition.REMOTE_COPY_COUNTERPART_DOES_NOT_EXIST) is False:
                actions[Action.COPY_REMOTE_FILE] = True
                actions[Action.MOVE_REMOTE_FILE] = None

            assert rule.get(Condition.REMOTE_COPY_METADATA_IS_SAME) is not None
            if rule.get(Condition.REMOTE_COPY_METADATA_IS_SAME) is False:
                actions[Action.UPDATE_REMOTE_FILE_METADATA] = True
            
        elif rule.get(Condition.REMOTE_COPIES_DO_EXIST) is False:
            actions[Action.UPLOAD_LOCAL_FILE] = True
            
            if rule.get(Condition.FILES_METADATA_IS_SAME) is False:
                assert rule.get(Condition.REMOTE_FILE_DOES_NOT_EXIST) is False
                actions[Action.UPDATE_REMOTE_FILE_METADATA] = True
            
        return actions
                
    @staticmethod
    def if_files_are_same_do_nothing(rule):
        if (rule.get(Condition.FILES_CONTENT_IS_SAME, False) and 
            rule.get(Condition.FILES_METADATA_IS_SAME, False)):
            return {
                Action.DO_NOTHING: True
            }
        
    @staticmethod
    def if_files_do_not_exist_do_nothing(rule):
        if (rule.get(Condition.LOCAL_FILE_DOES_NOT_EXIST, False) and 
            rule.get(Condition.REMOTE_FILE_DOES_NOT_EXIST, False)):
            return {
                Action.DO_NOTHING: True
            }
            
    @classmethod
    def if_do_nothing_other_actions_undefined(cls, rule):
        if rule.get(Action.DO_NOTHING, False):
            assert {v for k, v in cls._get_actions(rule).items() 
                    if k != Action.DO_NOTHING} <= {None}
            return {a: None for a in Action if a != Action.DO_NOTHING}
        
    @staticmethod
    def if_some_action_is_defined_do_nothing_impossible(rule):
        if any(rule.get(a) is not None for a in Action if a is not Action.DO_NOTHING):
            assert rule.get(Action.DO_NOTHING) is None
            return {Action.DO_NOTHING: None}
        
    @staticmethod
    def if_copy_file_then_move_file_is_impossible(rule):
        actions = {}
        if rule.get(Action.COPY_LOCAL_FILE):
            assert rule.get(Action.MOVE_LOCAL_FILE) is None
            actions[Action.MOVE_LOCAL_FILE] = None
        elif rule.get(Action.MOVE_LOCAL_FILE):
            assert rule.get(Action.COPY_LOCAL_FILE) is None
            actions[Action.COPY_LOCAL_FILE] = None
            
        if rule.get(Action.COPY_REMOTE_FILE):
            assert rule.get(Action.MOVE_REMOTE_FILE) is None
            actions[Action.MOVE_REMOTE_FILE] = None
        elif rule.get(Action.MOVE_REMOTE_FILE):
            assert rule.get(Action.COPY_REMOTE_FILE) is None
            actions[Action.COPY_REMOTE_FILE] = None

        return actions
    
    def rename_rules(rule):
        if (rule.get(Condition.LOCAL_FILE_CHANGED) is True and
            rule.get(Condition.REMOTE_FILE_CHANGED) is True and
            rule.get(Condition.LOCAL_FILE_DOES_NOT_EXIST) is False and
            rule.get(Condition.FILES_CONTENT_IS_SAME) is False and
            rule.get(Action.RENAME_LOCAL_FILE is not None)):
            assert rule.get(Action.RENAME_LOCAL_FILE) is True
         
    @classmethod
    def sync_logic(cls, rule):
        actions = {}
        assert rule.get(Condition.LOCAL_FILE_CHANGED) or rule.get(Condition.REMOTE_FILE_CHANGED)
        assert rule.get(Condition.LOCAL_FILE_DOES_NOT_EXIST) is not None
        assert rule.get(Condition.REMOTE_FILE_DOES_NOT_EXIST) is not None
        
        if (rule.get(Condition.LOCAL_FILE_DOES_NOT_EXIST) and 
            rule.get(Condition.REMOTE_FILE_DOES_NOT_EXIST)):
            actions[Action.DO_NOTHING] = True
            return actions
        elif rule.get(Condition.FILES_CONTENT_IS_SAME):
            assert (
                rule.get(Condition.LOCAL_FILE_DOES_NOT_EXIST) is False and
                rule.get(Condition.REMOTE_FILE_DOES_NOT_EXIST) is False
            )
            
            if rule.get(Condition.FILES_METADATA_IS_SAME):
                actions[Action.DO_NOTHING] = True
                return actions
            
            actions[Action.DOWNLOAD_REMOTE_FILE] = False
            actions[Action.UPLOAD_LOCAL_FILE] = False
            actions[Action.DELETE_REMOTE_FILE] = False
            actions[Action.DELETE_LOCAL_FILE] = False
            actions[Action.COPY_LOCAL_FILE] = False
            actions[Action.COPY_REMOTE_FILE] = False
            actions[Action.MOVE_LOCAL_FILE] = False
            actions[Action.MOVE_REMOTE_FILE] = False
            actions[Action.RENAME_LOCAL_FILE] = False
            
            if (rule.get(Condition.LOCAL_FILE_CHANGED) and
                rule.get(Condition.REMOTE_FILE_CHANGED)):
                actions[Action.RENAME_LOCAL_FILE] = True
                actions[Action.COPY_LOCAL_FILE] = True
                actions[Action.MOVE_LOCAL_FILE] = None
                actions[Action.UPDATE_LOCAL_FILE_METADATA] = True
                actions[Action.UPDATE_REMOTE_FILE_METADATA] = False
                return actions
            elif rule.get(Condition.LOCAL_FILE_CHANGED):
                actions[Action.UPDATE_REMOTE_FILE_METADATA] = True
                actions[Action.UPDATE_LOCAL_FILE_METADATA] = False
                return actions
            elif rule.get(Condition.REMOTE_FILE_CHANGED):
                actions[Action.UPDATE_LOCAL_FILE_METADATA] = True
                actions[Action.UPDATE_REMOTE_FILE_METADATA] = False
                return actions
        elif (
            rule.get(Condition.LOCAL_FILE_CHANGED) and 
            rule.get(Condition.REMOTE_FILE_CHANGED) 
        ):
            if rule.get(Condition.LOCAL_FILE_DOES_NOT_EXIST) is False:
                actions[Action.RENAME_LOCAL_FILE] = True
                
            assert rule.get(Condition.REMOTE_FILE_DOES_NOT_EXIST) is not None
            assert rule.get(Condition.LOCAL_FILE_DOES_NOT_EXIST) is not None
            if rule.get(Condition.REMOTE_FILE_DOES_NOT_EXIST) is False:
                actions.update(cls._download(rule))
            cls._make_other_false_except_do_nothing(actions)
        elif rule.get(Condition.LOCAL_FILE_CHANGED):
            if (rule.get(Condition.LOCAL_FILE_DOES_NOT_EXIST) and 
                rule.get(Condition.REMOTE_FILE_DOES_NOT_EXIST) is False):
                actions[Action.DELETE_REMOTE_FILE] = True
            else:
                actions[Action.DELETE_REMOTE_FILE] = False
                actions.update(cls._upload(rule))
            cls._make_other_false_except_do_nothing(actions)
        elif rule.get(Condition.REMOTE_FILE_CHANGED):
            if (rule.get(Condition.REMOTE_FILE_DOES_NOT_EXIST) and
                rule.get(Condition.LOCAL_FILE_DOES_NOT_EXIST) is False):
                actions[Action.DELETE_LOCAL_FILE] = True
            else:
                actions[Action.DELETE_LOCAL_FILE] = False
                actions.update(cls._download(rule))
            cls._make_other_false_except_do_nothing(actions)
        return actions

In [13]:
ACTIONS = [getattr(Actions, e) for e in Actions.__dict__
           if not e.startswith('_') and callable(getattr(Actions, e))]

## All possible conditions.

In [14]:
possible_conditions = pd.DataFrame()

for condition in Condition:
    values = pd.DataFrame({condition: list(Value)}, np.zeros(len(Value), np.int8))
    possible_conditions = possible_conditions.merge(values, right_index=True, left_index=True, how='outer')

possible_conditions.reset_index(inplace=True, drop=True)
possible_conditions.describe()

Unnamed: 0,LOCAL_FILE_CHANGED,REMOTE_FILE_CHANGED,FILES_CONTENT_IS_SAME,FILES_METADATA_IS_SAME,LOCAL_FILE_DOES_NOT_EXIST,REMOTE_FILE_DOES_NOT_EXIST,LOCAL_COPIES_DO_EXIST,REMOTE_COPIES_DO_EXIST,LOCAL_COPY_COUNTERPART_DOES_NOT_EXIST,REMOTE_COPY_COUNTERPART_DOES_NOT_EXIST,LOCAL_COPY_METADATA_IS_SAME,REMOTE_COPY_METADATA_IS_SAME
count,531441,531441,531441,531441,531441,531441,531441,531441,531441,531441,531441,531441
unique,3,3,3,3,3,3,3,3,3,3,3,3
top,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE
freq,177147,177147,177147,177147,177147,177147,177147,177147,177147,177147,177147,177147


## Validation of the rules using the constraints

In [15]:
def validate(condition):
    condition = {k: value_to_python(v) for k, v in condition.to_dict().items() if v != Value.UNDEFINED}
    try:
        for constraint in CONSTRAINTS:
            constraint(condition)
    except AssertionError:
        return False
    return True

with dask.set_options(get=dask.multiprocessing.get), ProgressBar():
    valid_conditions_idx = (
        dask.dataframe.from_pandas(possible_conditions, npartitions=50)
                      .apply(validate, axis=1, meta=(None, bool))
                      .compute()
    )    

    
valid_conditions = possible_conditions[valid_conditions_idx]
valid_conditions.describe()

[########################################] | 100% Completed | 21.5s


Unnamed: 0,LOCAL_FILE_CHANGED,REMOTE_FILE_CHANGED,FILES_CONTENT_IS_SAME,FILES_METADATA_IS_SAME,LOCAL_FILE_DOES_NOT_EXIST,REMOTE_FILE_DOES_NOT_EXIST,LOCAL_COPIES_DO_EXIST,REMOTE_COPIES_DO_EXIST,LOCAL_COPY_COUNTERPART_DOES_NOT_EXIST,REMOTE_COPY_COUNTERPART_DOES_NOT_EXIST,LOCAL_COPY_METADATA_IS_SAME,REMOTE_COPY_METADATA_IS_SAME
count,258,258,258,258,258,258,258,258,258,258,258,258
unique,2,2,3,3,2,2,3,3,3,3,3,3
top,TRUE,TRUE,TRUE,FALSE,FALSE,FALSE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE
freq,172,172,150,150,240,240,192,192,96,96,96,96


## Percentage of valid conditions

In [16]:
len(valid_conditions.index) / len(possible_conditions.index) * 100

0.048547251717500156

## Generate actions

In [17]:
actions = pd.DataFrame(columns=Action, index=valid_conditions.index)

Actions functions are applied in round-robin fashion until they don't add anything new.

In [18]:
for row in valid_conditions.itertuples():
    i = row[0]
    rule = row[1:]
    
    rule = {
        k: value_to_python(v) 
        for k, v in zip(Condition, rule) 
        if v is not Value.UNDEFINED
    }
    while True:
        keys = list(rule.keys())
        for action in ACTIONS:
            try:
                result = action(frozendict(rule))
            except AssertionError as exc:
                raise AssertionError(action.__name__, rule, exc)

            if result is None:
                continue
        
            for k, v in result.items():
                if k in rule:
                    assert rule[k] == result[k], (action.__name__, k, rule[k], result[k], rule)
                
            rule.update(result)
        if len(rule.keys()) == len(keys):
            break
            
    action = {
        a: value_from_python(rule[a]) for a in Action if a in rule 
    }
    if not action:
        continue
   
    actions.loc[i] = action

actions.describe()

Unnamed: 0,DOWNLOAD_REMOTE_FILE,UPLOAD_LOCAL_FILE,DELETE_REMOTE_FILE,DELETE_LOCAL_FILE,UPDATE_LOCAL_FILE_METADATA,UPDATE_REMOTE_FILE_METADATA,RENAME_LOCAL_FILE,COPY_LOCAL_FILE,COPY_REMOTE_FILE,MOVE_LOCAL_FILE,MOVE_REMOTE_FILE,DO_NOTHING
count,258,258,258,258,258,258,258,258,258,258,258,258
unique,3,3,3,3,3,3,3,3,3,3,3,2
top,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,UNDEFINED,FALSE,UNDEFINED
freq,168,174,175,175,96,138,125,107,156,127,156,180


## Percentage of defined actions

In [19]:
actions.count().sum() / ft.reduce(op.mul, actions.shape) * 100

100.0

## Percentage of fully defined actions

In [20]:
actions.dropna().shape[0] / actions.shape[0] * 100

100.0

## Rules with missing actions

In [21]:
rules = valid_conditions.join(actions)

In [22]:
rules_with_missing_actions = rules[actions.isnull().sum(axis=1) > 0]
if len(rules_with_missing_actions.index) > 0:
    print(rules_with_missing_actions.T)
else:
    print('N/A')

N/A


## Unique actions

In [23]:
len(rules[list(Action)].drop_duplicates().index)

23

## Redundancy removal

We are not interested in `Undefined` values for the conditions and in `Undefined` and `False` for the actions.

In [24]:
filtered_rules = rules.copy()
filtered_rules[list(Condition)] = filtered_rules[list(Condition)].replace(Value.UNDEFINED, np.NaN)
filtered_rules[list(Action)] = filtered_rules[list(Action)].replace(Value.UNDEFINED, np.NaN)
filtered_rules[list(Action)] = filtered_rules[list(Action)].replace(Value.FALSE, np.NaN)

Generate mappings of condition variables valeus combinations/action variables values to the corresponding indices in the rules table.

In [25]:
conditions_cases = {}
actions_cases = {}

conditions_combinations = list(all_combinations(Condition))
actions_combinations = [(a,) for a in Action]

with progressbar.ProgressBar(
    max_value=(
        len(conditions_combinations) +
        len(actions_combinations)
    ),
    poll_interval=1
) as bar:
    i = 0
    for cases, variables_combinations in [
            (conditions_cases, conditions_combinations),
            (actions_cases, actions_combinations)
        ]:
        for variable in variables_combinations:
            for row in filtered_rules[list(variable)].drop_duplicates().itertuples():
                values = row[1:]
                value = frozenset((k, v) for k, v in zip(variable, values) if v is not np.NaN)
                if len(value) == 0:
                    continue
                indices = filtered_rules.query(
                    ft.reduce(
                        lambda a, b: '%s & %s' % (a, b), 
                        ['%s == "%s"' % (k, v) for k, v in value]
                    )
                ).index
                assert len(indices) > 0
                cases[value] = frozenset(indices)
            i += 1
            bar.update(i)

100% (4107 of 4107) |#######################################################| Elapsed Time: 0:15:56 Time: 0:15:56


Match conditions variables values combinations to actions variables values on the basis whether with the given conditions the actions are always the same.

In [26]:
matched_associations = collections.defaultdict(dict)

with progressbar.ProgressBar(
    max_value=(
        len(conditions_cases)
    ),
    poll_interval=1
) as bar:
    i = 0
    for independent_variable_case, indices1 in conditions_cases.items():
        for dependent_variable_case, indices2 in actions_cases.items():
            assert indices1
            assert indices2
            if indices1 <= indices2:
                safe_update(
                    matched_associations[independent_variable_case],
                    dict(dependent_variable_case)
                )
        if independent_variable_case in matched_associations: 
            matched_associations[independent_variable_case] = frozenset(
                matched_associations[independent_variable_case].items()
            )
        
        i += 1
        bar.update(i)

100% (93031 of 93031) |#####################################################| Elapsed Time: 0:00:00 Time: 0:00:00


In [27]:
len(matched_associations)

50533

Remove conditions/actions pairs with redundant conditions variables, that do not affect actions in comparison with the other conditions with smaller number of variables that result in the same actions.
Also remove rules that could be deduced from the other smaller rules.

In [28]:
compacted_associations = collections.defaultdict(set)

with progressbar.ProgressBar() as bar:
    for i, (conditions_case1, actions_case1) in enumerate(bar(list(matched_associations.items()))):
        # If the current conditions are a superset of some other conditions - skip them.
        try:
            for conditions_case2, actions_case2 in matched_associations.items():
                if conditions_case2 is conditions_case1:
                    continue
                if (conditions_case2 < conditions_case1 and
                    actions_case2 >= actions_case1):
                    raise RuntimeError
        except RuntimeError:
            continue
        # Try to "assemble" the current conditions and actions from the other ones.
        # In case of success - we don't need the current conditions/actions 
        # since they are redundant.
        matched_conditions = set()
        matched_actions = set()
        for conditions_case2, actions_case2 in matched_associations.items():
            if conditions_case2 is conditions_case1:
                continue
            if not (conditions_case2 < conditions_case1):
                continue
            matched_conditions.update(conditions_case2)
            matched_actions.update(actions_case2)
        if not (matched_conditions == conditions_case1 and
                matched_actions == actions_case1):
            compacted_associations[conditions_case1] = actions_case1            

100% (50533 of 50533) |#####################################################| Elapsed Time: 0:00:02 Time: 0:00:02
100% (50533 of 50533) |#####################################################| Elapsed Time: 0:00:02 Time: 0:00:02


Find out which conditions variables also take the same values when the conditions in the resulting rules take place.

In [29]:
completed_associations = collections.defaultdict(dict)

with progressbar.ProgressBar() as bar:
    for conditions1, actions in bar(list(compacted_associations.items())):
        new_conditions = dict(conditions1)
        for conditions2, indices in conditions_cases.items():
            if conditions_cases[conditions1] == indices:
                safe_update(new_conditions, dict(conditions2))
        matched_rules = filtered_rules[list(Condition)].query(
            ft.reduce(
                lambda a, b: '%s & %s' % (a, b), 
                ['%s == "%s"' % (k, v) for k, v in new_conditions.items()]
            )
        ).to_dict('list')
        matched_rules = {k: v[0] for k, v in matched_rules.items() if len(set(v)) == 1}
        matched_rules = {k: v for k, v in matched_rules.items() if v is not np.nan}
        safe_update(
            new_conditions, 
            matched_rules
        )

        safe_update(completed_associations[frozenset(new_conditions.items())], dict(actions))

100% (60 of 60) |###########################################################| Elapsed Time: 0:00:01 Time: 0:00:01
100% (60 of 60) |###########################################################| Elapsed Time: 0:00:01 Time: 0:00:01


In [30]:
len(completed_associations)

60

Remove rules that have conditions that are supersets to the others' rules conditions and that have actions that don't add anything new.

In [31]:
associations = {}

with progressbar.ProgressBar() as bar:
    for conditions1, actions1 in bar(list(completed_associations.items())):
        try:
            for i, (conditions2, actions2) in enumerate(completed_associations.items()):
                if conditions2 is conditions1:
                    continue
                if (conditions2 < conditions1 and
                    set(actions2.items()) >= set(actions1.items())):
                    raise RuntimeError
        except RuntimeError:
            continue
        assert i > 0
                
        associations[frozenset(conditions1)] = actions1      

100% (60 of 60) |###########################################################| Elapsed Time: 0:00:00 Time: 0:00:00
100% (60 of 60) |###########################################################| Elapsed Time: 0:00:00 Time: 0:00:00


In [32]:
associations = dict(associations)

In [33]:
len(associations)

28

In [34]:
# associations = {frozenset(k): dict(v) for k, v in yaml.load( 
#     open('./associations.yml', 'r'),
#     Loader=yaml.CSafeLoader
# ).items}

In [35]:
yaml.dump(
    {tuple(k): dict(v) for k, v in associations.items()}, 
    open('./associations.yml', 'w'),
    default_flow_style=False,
    Dumper=yaml.CSafeDumper
)

## Validate the associations against the actual "dumb" rules

In [36]:
with progressbar.ProgressBar() as bar:
    for actual_case in bar(filtered_rules.to_dict('records')):
        conditions_case = frozenset([
            (k, v) for k, v in actual_case.items() 
            if k in Condition and v is not np.nan
        ])
        actions_case = frozenset([
            (k, v) for k, v in actual_case.items() 
            if k in Action and v is not np.nan
        ])

        matched_condition = set()
        matched_action = set()

        for condition, action in associations.items():
            action = frozenset(action.items())
            assert len(condition) > 0
            assert len(action) > 0
            if condition <= conditions_case:
                assert action <= actions_case, (condition, conditions_case, action, actions_case)
                matched_condition.update(condition)
                matched_action.update(action)

        assert matched_condition <= conditions_case, (matched_condition, conditions_case)
        assert matched_action == actions_case, {'matched_condition': matched_condition, 'conditions_case': conditions_case, 'matched_action': matched_action, 'actions_case': actions_case}

100% (258 of 258) |#########################################################| Elapsed Time: 0:00:00 Time: 0:00:00
100% (258 of 258) |#########################################################| Elapsed Time: 0:00:00 Time: 0:00:00


In [37]:
with progressbar.ProgressBar() as bar:
    for conditions, actions in bar(list(associations.items())):
        index1 = filtered_rules.query(
            ft.reduce(
                lambda a, b: '%s & %s' % (a, b), 
                ['%s == "%s"' % (k, v) for k, v in conditions]
            )
        ).index
        index2 = filtered_rules.query(
            ft.reduce(
                lambda a, b: '%s & %s' % (a, b), 
                ['%s == "%s"' % (k, v) for k, v in it.chain(conditions, actions.items())]
            )
        ).index
        assert set(index1) == set(index2)

100% (28 of 28) |###########################################################| Elapsed Time: 0:00:00 Time: 0:00:00
100% (28 of 28) |###########################################################| Elapsed Time: 0:00:00 Time: 0:00:00


## Represent the rules

In [38]:
import tabulate
from IPython.core.display import display, HTML

In [47]:
table = [list(it.chain(list(Condition), list(Action)))]
table.extend(sorted([
    list(it.chain(
        [dict(conditions).get(c, '') for c in Condition],
        [actions.get(a, '') for a in Action]
    ))
    for conditions, actions in associations.items()
]))
table = list(zip(*table))
table.insert(0, [''] + list(map(str, range(len(table[0]) - 1))))
table.insert(1, ['Conditions'] + ['*'] * (len(table[0]) - 1))
table.insert(len(Condition) + 1, ['Actions'] + ['*'] * (len(table[0]) - 1))

In [48]:
display(HTML(tabulate.tabulate(table, tablefmt='html')))

0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28
,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27
Conditions,*,*,*,*,*,*,*,*,*,*,*,*,*,*,*,*,*,*,*,*,*,*,*,*,*,*,*,*
LOCAL_FILE_CHANGED,,,,,,,,,,,,,FALSE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE
REMOTE_FILE_CHANGED,,,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,TRUE,TRUE,TRUE,TRUE
FILES_CONTENT_IS_SAME,,TRUE,,,,,,,FALSE,FALSE,FALSE,TRUE,,,,,,,,,FALSE,FALSE,FALSE,TRUE,,,,TRUE
FILES_METADATA_IS_SAME,,TRUE,,,,,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,,,,,,,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,,FALSE,FALSE,FALSE
LOCAL_FILE_DOES_NOT_EXIST,TRUE,FALSE,TRUE,TRUE,TRUE,TRUE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,TRUE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE
REMOTE_FILE_DOES_NOT_EXIST,TRUE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,TRUE,TRUE,TRUE,TRUE,TRUE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,TRUE,FALSE,FALSE,FALSE
LOCAL_COPIES_DO_EXIST,,,FALSE,TRUE,TRUE,TRUE,FALSE,TRUE,FALSE,TRUE,TRUE,,,,,,,,,,,,,,,,TRUE,
REMOTE_COPIES_DO_EXIST,,,,,,,,,,,,,,FALSE,TRUE,TRUE,TRUE,,FALSE,TRUE,FALSE,TRUE,TRUE,,,,,


In [50]:
# open('rules.rst', 'w').write(tabulate.tabulate(table, tablefmt='rst'))