## Algorithm : Get the max consecutive days per user

    a. Convert the user days from normal date time to days of the year in the rnage 1..365
    
    b. Loop though the days, till n-1, where n is the total number of days the user made a payment
    
    c. Have a variable to store lists of lists, which will be the consecutive days of payments by the user
    
    d. Have a variable to track the index where the last list ended. Initialize to zero.
    
    e. For each day/index, compare x and the item at the next index. If the two days are consecutive, proceed to the next index. If two days are not consecutive, make a sublits from the previous list's index to the current index into different a list and append to the consective days. 
    
    f. Get the lengths of the lists and get the maximum, which will represent the maximum number of consective days paid by the user
    
    g. Store this in a dictionary where the key is the user_id and the value his/her maximum number of consecutive days
    
    h. Sort the dictionary by descending order
    
    i. Return the requested number of values
    
## Requirements not met
1. If you need to break ties, you should choose account numbers that come first alphabetically. We can solve this by having a custom ordering fucntion.
2. If the years are different the algorithm will break. We can solve this by having days of the year by day_of_year * year

In [1]:
import pandas as pd

def read_csv_file_and_clean(path):
    data_frame = pd.read_csv(path)
    data_frame['transaction_date'] = data_frame['transaction_date'].astype('datetime64[ns]')
    # we convert days from plain date to the range of 1..365 so that it is easier to work with them over a list
    # this does not take into account the fact that the year may be different. Such a test case has to be handled.
    data_frame['transaction_day_of_year'] = data_frame['transaction_date'].dt.day_of_year
    # sort the data by day of year now as it is more efficient
    # This will make it easier when working with the data in pure python
    data_frame.sort_values(['transaction_day_of_year', 'customer_id' ], inplace=True)
    return data_frame

In [2]:
is_next = lambda x, y : y-x == 1

def get_consecutive_payment_days(days_list):
    sublists = []
    last_list_end = 0
    for i in range(len(days_list)-1):
        if not is_next(days_list[i], days_list[i+1]):
            sublists.append(days_list[last_list_end:i+1])
            last_list_end = i+1 
    if last_list_end != len(days_list):
        sublists.append(l[last_list_end:len(days_list)])
    return sublists

def get_max_consecutive_days(sublists):
    sub_days = get_consecutive_payment_days(sublists)
    days = list(map(lambda x : len(x), sub_days))
    return max(days)

l = [1,2,3,5,6,7,8,9,11,12]

assert get_consecutive_payment_days(l) == [[1,2,3], [5,6,7,8,9], [11,12]]
assert get_max_consecutive_days(l) == 5
assert is_next(4,5)
assert is_next (-1, 1) != True

In [3]:
def sort_dict_by_value(x, reverse=True):
    return dict(sorted(x.items(), key=lambda item: item[1], reverse=reverse))

d = dict(A=12, B=8, C=34, D=123)
assert sort_dict_by_value(d, False) == dict(B=8, A=12, C=34, D=123)

d2 = dict(Apples=999, Oranges=82, Coffee=34, Dogs=3)
assert sort_dict_by_value(d2, True) == dict(Apples=999, Oranges=82, Coffee=34, Dogs=3)

In [4]:
def calculate_regular_customers(path, n):
    df = read_csv_file_and_clean(path)
    all_users_max_days = dict()
    # get unique user ids and use these to get payments specific to a user
    users = df['customer_id'].unique()
    for user in users:
        user_df = df.loc[df['customer_id']==user]
        user_days = sorted(user_df['transaction_day_of_year'].to_list())
        # print(f"User : {user} Days : {user_days}")
        max_days = get_max_consecutive_days(user_days)
        # A better approach has to be found
        all_users_max_days[user] = max_days
    all_users_max_days = sort_dict_by_value(all_users_max_days)
    l = []
#     print(f"Sorted : {all_users_max_days}")
    return list(all_users_max_days.keys())[0:n]

In [5]:
calculate_regular_customers('~/Documents/lendable/data/transaction_data_3.csv', 3)

['ACC143', 'ACC418', 'ACC214']

## Improvement suggestions:

    1. Use a priority queue to store the user/number of consecutive days 
    
    2. Make the day of the year aware of different years.
    
    3. In the algorithm to get max consecutive elements, implement a form of look ahead such that if the number of remaining days is less than the current_maximum, exit the loop
    
    4. To save on space complexity, evaluate list length and store that at step e instead of saving the sublist and evaluating at step f