## Joining on Non-Equi Operators

In [1]:
import pandas as pd
import janitor
import numpy as np

  register_dataframe_accessor(method.__name__)(AccessorMethod)


In [2]:
#https://stackoverflow.com/q/61948103/7175713 
df1 = pd.DataFrame({'id': [1,1,1,2,2,3], 
                    'value_1': [2,5,7,1,3,4]})

df2 = pd.DataFrame({'id': [1,1,1,1,2,2,2,3], 
                    'value_2A': [0,3,7,12,0,2,3,1], 
                    'value_2B': [1,5,9,15,1,4,6,3]})

In [3]:
df1

Unnamed: 0,id,value_1
0,1,2
1,1,5
2,1,7
3,2,1
4,2,3
5,3,4


In [4]:
df2

Unnamed: 0,id,value_2A,value_2B
0,1,0,1
1,1,3,5
2,1,7,9
3,1,12,15
4,2,0,1
5,2,2,4
6,2,3,6
7,3,1,3


Join on equi and non-equi operators is possible:

In [14]:
df1.conditional_join(
        df2,
     #   ('id', 'id', '=='),
        ('value_1', 'value_2A', '>='),
        ('value_1', 'value_2B', '<='),
        sort_by_appearance = True,
        use_numba=True
    )

(array([1, 2, 3, 4, 5, 7]),
 array([ 0,  3,  7, 12,  0,  2,  3,  1]),
 array([3, 0, 4, 5, 1, 2]),
 array([0, 1, 2, 3, 4, 5, 6, 7]),
 False,
 False,
 array([1, 2, 3, 4, 5, 7]),
 array([ 1,  5,  9, 15,  1,  4,  6,  3]),
 array([3, 0, 4, 5, 1, 2]),
 array([0, 1, 2, 3, 4, 5, 6, 7]),
 True,
 False)

The default join is inner. left and right joins are supported as well:

In [6]:
df1.conditional_join(
        df2,
        ('id', 'id', '=='),
        ('value_1', 'value_2A', '>='),
        ('value_1', 'value_2B', '<='),
        how='left',
        sort_by_appearance = True
    )

(array([1, 2, 3, 4, 4]), array([1, 2, 4, 5, 6]))

In [7]:
df1.conditional_join(
        df2,
        ('id', 'id', '=='),
        ('value_1', 'value_2A', '>='),
        ('value_1', 'value_2B', '<='),
        how='right',
        sort_by_appearance = True
    )

(array([1, 2, 3, 4, 4]), array([1, 2, 4, 5, 6]))

Join on just the non-equi joins is also possible:

In [8]:
df1.conditional_join(
        df2,
        ('value_1', 'value_2A', '>'),
        ('value_1', 'value_2B', '<'),
        how='inner',
        sort_by_appearance = True
    )

(array([0, 1, 4, 5, 5]), array([7, 6, 5, 1, 6]))

Join on not equal -> !=

In [9]:
df1.conditional_join(
        df2,
        ('id', 'id', "!=")
    )

(array([0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 2, 3, 4, 3, 3, 3, 3, 4, 4, 4, 4,
        5, 5, 5, 5, 5, 5, 5]),
 array([4, 5, 6, 7, 4, 5, 6, 7, 4, 5, 6, 7, 7, 7, 0, 1, 2, 3, 0, 1, 2, 3,
        0, 1, 2, 3, 4, 5, 6]))

If the columns from both dataframes have nothing in common, a single indexed column is returned:

In [10]:
(df1.select_columns('value_1')
    .conditional_join(
        df2.select_columns('val*'),
        ('value_1', 'value_2A', '>'),
        ('value_1', 'value_2B', '<'),
    )
)

(array([0, 1, 4, 5, 5]), array([7, 6, 5, 1, 6]))

Selection of relevant columns within `conditional_join`: 

In [11]:
df1.conditional_join(
        df2,
        ('id', 'id', "<"),
        df_columns = 'id',
        right_columns = 'id'
    )

(array([0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 2, 3, 4]),
 array([4, 5, 6, 7, 4, 5, 6, 7, 4, 5, 6, 7, 7, 7]))

Column renaming is also possible:

In [12]:
df1.conditional_join(
        df2,
        ('id', 'id', "<"),
        df_columns = {'id':'df_id'},
        right_columns = {'id':'right_id'}
    )

(array([0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 2, 3, 4]),
 array([4, 5, 6, 7, 4, 5, 6, 7, 4, 5, 6, 7, 7, 7]))