Notebook purpose

- Understand nature of duplicate transactions, explore solutions, document decisions about what duplicates to drop

### Summary

Types of duplicates and how we handle them:

1. `['user_id', 'date', 'amount', 'account_id', 'desc']` are identical* -> drop in main analysis.

2. `['user_id', 'date', 'amount', 'account_id']` are identical and `desc` is similar**, *** -> drop from main analysis.

3. `['user_id', 'date', 'amount', 'account_id']` are identical and `desc` not similar -> keep in main analysis.

4. `['user_id', 'date', 'amount']`, `desc` may or may not differ, but `account_id` differs. This is relevant if there are (many) duplicated accounts, in which case a different account number is no guarantee for a different account. -> ignore for now, ask MDB to share list of duplicated accounts.

\* This includes pairs where description for both txns is `<mdbremoved>`, in which case we assume that the same description is being masked.

\** "similar" is defined below.

\*** In this category we include cases where desc of one txn is `<mdbremoved>` while other isn't, so even though descriptions are not similar, we assume that what is being masked by `<mdbremoved>` in one description is similar to what is visible in the other.

Solution steps:

- preprocess using cleaning func to eliminate extraneous chars
- drop type 1 dups
- of remaining dups, drop those for which all WORDS of short desc are contained in long desc


Todo:
- Make regex [fast](https://stackoverflow.com/questions/42742810/speed-up-millions-of-regex-replacements-in-python-3)

In [1]:
import os
import sys

import numpy as np
import pandas as pd
import seaborn as sns

sys.path.append('/Users/fgu/dev/projects/entropy')
import entropy.helpers.aws as aws
import entropy.data.cleaners as cl

sns.set_style('whitegrid')
pd.set_option('display.max_rows', 120)
pd.set_option('display.max_columns', 120)
pd.set_option('max_colwidth', None)
%config InlineBackend.figure_format = 'retina'
%load_ext autoreload
%autoreload 2

In [2]:
df = pd.read_parquet('~/tmp/entropy_X77.parquet')

Preprocessed df

In [None]:
def clean_desc(df):
    """Removes extraneous characters that often create duplicates."""
    df['desc'] = (df.desc
                  .str.replace(r'x{2,}', '', regex=True)    # used to mask card numbers
                  .str.replace(r'\s{2,}', ' ', regex=True)  # multiple spaces
                  .str.replace(' - vis', '', regex=False)   # visa debit card suffix
                  .str.replace(' - dd', '', regex=False)    # debit direct suffix
                  .str.replace(' - d/d', '', regex=False)    # debit direct suffix
                  .str.replace(' - s/o', '', regex=False)   # standing order suffix
                  .str.replace('- pos', '', regex=False)    # point of sale suffix
                  .str.replace(' - )))', '', regex=False)   # crypic suffix
                  .str.replace(' - ', ' ', regex=False))    # hypen
    return df

df = clean_desc(df)

Helper functions

In [9]:
def distr(x):
    pcts = [.01, .05, .1, .25, .50, .75, .90, .95, .99]
    return x.describe(percentiles=pcts).round(2)

def duplicates_sample(df, col_subset, n=100, seed=2312):
    """Draws sample of size n of duplicate txns as defined by col_subset."""
    dups = df[df.duplicated(subset=col_subset, keep=False)].copy()
    dups['group'] = dups.groupby(col_subset).ngroup()
    unique_groups = np.unique(dups.group)
    rng = np.random.default_rng(seed=seed)
    sample = rng.choice(unique_groups, size=n)
    return dups[dups.group.isin(sample)]

## Case studies

Below three case studies of duplicates

In [None]:
dh.user_date_data(df, 35177, '1 Jan 2020')

In [None]:
dh.user_date_data(df, 362977, '1 Jan 2020')

In [None]:
dh.user_date_data(df, 467877, '1 Jan 2020')

## Type 1 duplicates

### Definition
- `['user_id', 'date', 'amount', 'account_id', 'desc']` are identical.
 
- This includes transactions where desc for both is `<mdbremoved>`, where we have to make a call whether or not to assume they are the same.

- Reasons for false positives (FP): user makes two identical transactions on the same day (or on subsequent days for txns that appear with a delay). Plausible cases are coffee and betting shop txns. However, inspection suggests that the vast majority of cases are duplicates, as they are txns that are unlikely to result from multiple purchases on the same day.


### Decision

- We delete dups for main analysis and do robustness check withouth deleting them
- We tread cases where both descriptions are `<mdbremoved>` no different from others, even though it's somewhat more likely that they are genuinely different transactions

In [4]:
col_subset = ['user_id', 'date', 'amount', 'account_id', 'desc']
dup_var = 'dup1'

df[dup_var] = df.duplicated(subset=col_subset)

### Prevalence and value

How prevalent are duplicates?

In [10]:
n_df = len(df)
n_dups = len(df[df[dup_var]])
n_users_dups = df[df[dup_var]].user_id.nunique()
n_users_df = df.user_id.nunique()
txt = 'About {:.1%} of transactions across {:.0%} of users are potential dups.'
print(txt.format(n_dups / n_df, n_users_dups / n_users_df))

About 1.7% of transactions across 97% of users are potential dups.


Gross value of duplicated txns

In [11]:
gross_value = df[df[dup_var]].set_index('user_id').amount.abs().groupby('user_id').sum()
distr(gross_value)

count       417.00
mean       4622.48
std       15167.33
min           0.17
1%            4.07
5%           19.30
10%          61.06
25%         236.30
50%         836.96
75%        2736.70
90%        9293.31
95%       17393.81
99%       59012.20
max      183754.34
Name: amount, dtype: float64

Most frequent txns description

In [12]:
df[df[dup_var]].desc.value_counts(dropna=False)[:10]

<mdbremoved>                         2050
<mdbremoved>                          517
<mdbremoved> ft                       357
b365 moto                             263
tfl travel charge tfl.gov.uk/cp       167
www.skybet.com cd 9317                165
<mdbremoved> - s/o                    158
<mdbremoved> so                       156
bank giro credit ref <mdbremoved>     147
betfair.-purchase                     146
Name: desc, dtype: int64

Most frequent auto tag

In [13]:
df[df[dup_var]].tag_auto.value_counts(dropna=False)[:10]

NaN                           6240
transfers                     2961
gambling                      2185
enjoyment                     1535
public transport              1076
lunch or snacks               1048
bank charges                   823
entertainment, tv, media       562
cash                           516
food, groceries, household     506
Name: tag_auto, dtype: int64

Proportion of txns per auto tag that are duplicated

In [14]:
txns_per_tag_overall = df.tag_auto.value_counts(dropna=False)
txns_per_tag_duplicated = df[df[dup_var]].tag_auto.value_counts(dropna=False) 
p_dup_per_tag = (txns_per_tag_duplicated / txns_per_tag_overall)
p_dup_per_tag.sort_values(ascending=False)[:40]

investment - other               0.214953
gambling                         0.159303
mobile app                       0.151954
isa                              0.088095
tradesmen fees                   0.076923
flights                          0.049423
parking                          0.046014
payment protection insurance     0.044776
bills                            0.040082
home appliance insurance         0.039448
games and gaming                 0.038053
supermarket                      0.035669
road charges                     0.032078
pension or investments           0.030481
pet insurance                    0.027899
bank charges                     0.026923
refunded purchase                0.026788
public transport                 0.026075
child - everyday or childcare    0.024896
fines                            0.024390
NaN                              0.024055
gym membership                   0.023636
postage / shipping               0.023256
entertainment, tv, media         0

### Inspect dups

In [15]:
duplicates_sample(df, col_subset, n=3, seed=None)

Unnamed: 0,id,date,user_id,amount,desc,merchant,tag_group,tag,user_female,user_postcode,user_registration_date,user_salary_range,user_yob,account_created,account_id,account_last_refreshed,account_provider,account_type,data_warehouse_date_created,data_warehouse_date_last_updated,debit,latest_balance,merchant_business_line,tag_auto,tag_manual,tag_up,updated_flag,ym,balance,income,savings,dup1,group
847151,456481489,2017-12-07,464477,0.5,non-stg purch fee cd 4361 deb,,spend,finance,True,nw6 3,2018-09-30,,1987.0,2018-09-30,1064847,2020-03-11 13:54:00,lloyds bank,current,2018-10-01,1900-01-01,True,1739.109985,account provider,bank charges,,bank charges,c,201712,1524.519653,63838.195312,False,False,11598
847152,456481493,2017-12-07,464477,0.5,non-stg purch fee cd 4361 deb,,spend,finance,True,nw6 3,2018-09-30,,1987.0,2018-09-30,1064847,2020-03-11 13:54:00,lloyds bank,current,2018-10-01,1900-01-01,True,1739.109985,account provider,bank charges,,bank charges,c,201712,1524.519653,63838.195312,False,True,11598
1162810,677077282,2019-07-09,560777,0.2,non-sterling transaction fee,,spend,finance,False,ng3 5,2020-01-06,,1991.0,2020-01-06,1546208,2020-04-03 03:19:00,hsbc,current,2020-01-07,1900-01-01,True,1124.959961,account provider,bank charges,,bank charges,c,201907,1274.169922,18362.550781,False,False,16011
1162811,677077284,2019-07-09,560777,0.2,non-sterling transaction fee,,spend,finance,False,ng3 5,2020-01-06,,1991.0,2020-01-06,1546208,2020-04-03 03:19:00,hsbc,current,2020-01-07,1900-01-01,True,1124.959961,account provider,bank charges,,bank charges,c,201907,1274.169922,18362.550781,False,True,16011
1292446,771161154,2019-01-25,583677,3.8,spar partney,spar,spend,household,True,pe23 5,2020-05-12,20k to 30k,1987.0,2020-05-12,1688629,2020-08-09 22:03:00,lloyds bank,current,2020-05-13,1900-01-01,True,-1083.930054,spar,"food, groceries, household",,"food, groceries, household",c,201901,28.110352,46319.640625,False,False,17791
1292447,771167224,2019-01-25,583677,3.8,spar partney,spar,spend,household,True,pe23 5,2020-05-12,20k to 30k,1987.0,2020-05-12,1688629,2020-08-09 22:03:00,lloyds bank,current,2020-05-13,1900-01-01,True,-1083.930054,spar,"food, groceries, household",,"food, groceries, household",c,201901,28.110352,46319.640625,False,True,17791


In [16]:
df = df.drop_duplicates(subset=col_subset)

## Type 2 dups

### Definition

- `['user_id', 'date', 'amount', 'account_id']` are identical, `desc` is different but similar, or one `desc` contains `<mdbremoved>` and the other one isn't.

### Decision

- Inspection suggests that in most cases, similar but different desc strings result from slight editing of the string, e.g. by removing unnecessary punctuation characters or (as discussed in MDB documentation) by revealing additional information that a new algorighm has classified as non-sensitive.

In [17]:
col_subset = ['user_id', 'date', 'amount', 'account_id']
dup_var = 'dup2'

df[dup_var] = df.duplicated(subset=col_subset)

### Prevalence and value

How prevalent are duplicates?

In [18]:
n_df = len(df)
n_dups = len(df[df[dup_var]])
n_users_dups = df[df[dup_var]].user_id.nunique()
n_users_df = df.user_id.nunique()
txt = 'About {:.1%} of transactions across {:.0%} of users are potential dups.'
print(txt.format(n_dups / n_df, n_users_dups / n_users_df))

About 1.9% of transactions across 99% of users are potential dups.


Gross value of duplicated txns

In [19]:
gross_value = df[df[dup_var]].set_index('user_id').amount.abs().groupby('user_id').sum()
distr(gross_value)

count       426.00
mean       2543.32
std        8323.64
min           2.50
1%            7.72
5%           41.88
10%          91.53
25%         295.48
50%         881.08
75%        2102.15
90%        4924.51
95%        6928.04
99%       25746.48
max      106598.39
Name: amount, dtype: float64

Most frequent txns description

In [20]:
df[df[dup_var]].desc.str[:12].value_counts(dropna=False)[:10]

<mdbremoved>    3460
daily od fee    1894
int'l xxxxxx     944
card payment     490
direct debit     439
contactless      410
visa purchas     346
tfl travel c     343
tfl.gov.uk/c     288
call ref.no.     273
Name: desc, dtype: int64

### Similarity score

Most frequent auto tag

In [230]:
import difflib
import functools
import collections

from fuzzywuzzy import fuzz

DescAndId = collections.namedtuple('DescAndID', ['desc', 'id'])
shortest_first = functools.partial(sorted, key=lambda x: len(x.desc))

def similarity_score(group):
    """Return similarity score between longest string in group and all others."""
    cols = list(group.columns)
    group['score_difflib'] = np.nan
    group['score_fuzz'] = np.nan
    items = [DescAndId(*item) for item in zip(group.desc, group.id)]
    shortest, *others = shortest_first(items)
    for o in others:
        group.loc[group.id == o.id, 'score_difflib'] = difflib.SequenceMatcher(None, shortest.desc, o.desc).ratio()
        group.loc[group.id == o.id, 'score_fuzz'] = fuzz.partial_ratio(shortest.desc, o.desc)
    return group[['score_difflib', 'score_fuzz'] + cols]



Unnamed: 0,id,date,user_id,amount,desc,merchant,tag_group,tag,user_female,user_postcode,user_registration_date,user_salary_range,user_yob,account_created,account_id,account_last_refreshed,account_provider,account_type,data_warehouse_date_created,data_warehouse_date_last_updated,debit,latest_balance,merchant_business_line,tag_auto,tag_manual,tag_up,updated_flag,ym,balance,income,savings,dup1,dup2,group
362325,67097053,2015-01-07,246477,5.0,"call ref.no. 0000 , to a/c xxxxxx87 - dpc",,transfers,tsransfer,True,ka6 7,2015-01-11,,1989.0,2015-03-31,412591,2015-09-27 00:00:00,royal bank of scotland (rbs),current,2015-04-01,2017-11-13,True,,personal,transfers,,,u,201501,,16663.427083,False,False,False,6133
362327,67097060,2015-01-07,246477,5.0,"call ref.no. 0000 , to a/c xxxxxx05 - dpc",,transfers,tsransfer,True,ka6 7,2015-01-11,,1989.0,2015-03-31,412591,2015-09-27 00:00:00,royal bank of scotland (rbs),current,2015-04-01,2017-11-13,True,,personal,transfers,,,u,201501,,16663.427083,False,False,True,6133
362328,67097068,2015-01-07,246477,5.0,"call ref.no. 0000 , to a/c xxxxxx56 - dpc",,transfers,tsransfer,True,ka6 7,2015-01-11,,1989.0,2015-03-31,412591,2015-09-27 00:00:00,royal bank of scotland (rbs),current,2015-04-01,2017-11-13,True,,personal,transfers,,,u,201501,,16663.427083,False,False,True,6133
384746,107464858,2015-10-12,264077,20.0,xx0459 11oct 14.13 cash withdrawal asda bideford atm,,spend,other_spend,False,ex38 8,2015-01-26,,1973.0,2015-01-26,361332,2020-03-12 00:34:00,barclays,current,2015-11-25,2018-09-03,True,64.739998,personal,cash,,cash,u,201510,-5720.017578,26249.878906,False,False,False,6590
384753,107464856,2015-10-12,264077,20.0,notemachine cash withdrawal <mdbremoved>,,spend,other_spend,False,ex38 8,2015-01-26,,1973.0,2015-01-26,361332,2020-03-12 00:34:00,barclays,current,2015-11-25,2018-04-30,True,64.739998,personal,cash,,cash,u,201510,-5720.017578,26249.878906,False,False,True,6590
839185,451044618,2018-01-15,461277,7.99,contactless payment - waterstones oxford gb,waterstones,spend,retail,False,ox4 1,2018-09-15,,1972.0,2018-09-16,1051844,2019-09-18 11:04:00,nationwide,current,2018-09-17,1900-01-01,True,7053.839844,waterstones,books / magazines / newspapers,,books / magazines / newspapers,c,201801,8417.492188,74618.742188,False,False,False,16460
839195,451044564,2018-01-15,461277,7.99,visa purchase amazon uk prime amzn.co.u - amazon uk prime amzn.co.u amzn.co.uk/pm lu,amazon prime,spend,services,False,ox4 1,2018-09-15,,1972.0,2018-09-16,1051844,2019-09-18 11:04:00,nationwide,current,2018-09-17,1900-01-01,True,7053.839844,amazon prime,enjoyment,,enjoyment,c,201801,8417.492188,74618.742188,False,False,True,16460


In [255]:
def clean_desc(df):
    """Removes extraneous characters that often create duplicates."""
    df = df.copy()
    import re, string
    number_mask = re.compile(r'[x]{2,}')
    common_suffixes = re.compile(r' - .{2,3}$')   # e.g. - vis, - p/p
    punctuation = re.compile('[{}]+'.format(string.punctuation))
    multiple_spaces = re.compile('\s{2,}')
    separate_word_digits = re.compile(r'(?<=[a-zA-Z])(?=\d+)')
    
    df['desc'] = (df.desc
                  .str.replace(common_suffixes, '', regex=True)
                  .str.replace(punctuation, ' ', regex=True)
                  .str.replace(number_mask, ' ', regex=True)
                  .str.replace(separate_word_digits, ' ', regex=True)
                  .str.replace(multiple_spaces, ' ', regex=True)
                  .str.strip())
    return df


def partial_token(group):
    cols = list(group.columns)
    group['partial_token'] = np.nan
    items = [DescAndId(*item) for item in zip(group.desc, group.id)]
    shortest, *others = shortest_first(items)
    for o in others:
        group.loc[group.id == o.id, 'partial_token'] = fuzz.partial_ratio(shortest.desc, o.desc)
    return group[['partial_token', 'group'] + cols]


u = duplicates_sample(df, col_subset, n=3, seed=None).pipe(clean_desc).groupby('group').apply(partial_token)
u

Unnamed: 0,partial_token,group,id,date,user_id,amount,desc,merchant,tag_group,tag,user_female,user_postcode,user_registration_date,user_salary_range,user_yob,account_created,account_id,account_last_refreshed,account_provider,account_type,data_warehouse_date_created,data_warehouse_date_last_updated,debit,latest_balance,merchant_business_line,tag_auto,tag_manual,tag_up,updated_flag,ym,balance,income,savings,dup1,dup2,group.1
292967,,5201,204278826,2017-04-18,171677,3.9,1841 16apr 17 trainline london gb,trainline.com,spend,travel,True,b5 7,2014-10-23,,1988.0,2014-10-23,63127,2018-02-26 04:19:00,royal bank of scotland (rbs),current,2017-04-21,1900-01-01,True,3.99,trainline.com,public transport,,public transport,c,201704,-75.86998,17709.131037,False,False,False,5201
292968,67.0,5201,204278827,2017-04-18,171677,3.9,1841 15apr 17 northern rail ltd leeds 2987 gb,northern rail,spend,travel,True,b5 7,2014-10-23,,1988.0,2014-10-23,63127,2018-02-26 04:19:00,royal bank of scotland (rbs),current,2017-04-21,1900-01-01,True,3.99,northern rail,public transport,,public transport,c,201704,-75.86998,17709.131037,False,False,True,5201
503398,,8014,528975417,2019-03-25,353477,10.0,ee top up vesta cd 3540 deb,ee,spend,communication,False,b77 4,2016-08-01,,1986.0,2016-08-01,589439,2020-03-11 23:45:00,halifax personal banking,current,2019-03-27,1900-01-01,True,650.700012,ee,mobile,,mobile,c,201903,-43.310181,23365.869141,False,False,False,8014
503402,52.0,8014,527946242,2019-03-25,353477,10.0,lnk boots tamworth cd 3540 23mar 19 cpt,,spend,other_spend,False,b77 4,2016-08-01,,1986.0,2016-08-01,589439,2020-03-11 23:45:00,halifax personal banking,current,2019-03-25,1900-01-01,True,650.700012,personal,cash,,cash,c,201903,-43.310181,23365.869141,False,False,True,8014
1274343,100.0,21975,750185696,2017-11-20,579177,1.23,daily od fee 19 11,,spend,finance,False,fk9 5,2020-04-01,> 80k,1981.0,2020-04-02,1652258,2020-07-01 00:53:00,halifax personal banking,current,2020-04-03,1900-01-01,True,4.5,account provider,bank charges,,bank charges,c,201711,-864.120117,48382.90625,False,False,False,21975
1274347,,21975,750185695,2017-11-20,579177,1.23,daily od fee,,spend,finance,False,fk9 5,2020-04-01,> 80k,1981.0,2020-04-02,1652258,2020-07-01 00:53:00,halifax personal banking,current,2020-04-03,1900-01-01,True,4.5,account provider,bank charges,,bank charges,c,201711,-864.120117,48382.90625,False,False,True,21975


Issues/FP:
- Groups of three or more (daily overdraft fees, shortest might not be best comparator [1267567, 1267576, 1267577]

In [248]:
u.desc

420731                   mdbremoved 0277
420732                        mdbremoved
532585     marks spencer plc swindon orb
532591     marks spencer plc swindon orb
1294499               lnk lincolnshire c
1294500               lnk sainsburys ban
Name: desc, dtype: object

### Check whether each word in smaller is in larger

In [261]:
k = u[['group', 'desc']].iloc[:,1:]
k

Unnamed: 0,group,desc
292967,5201,1841 16apr 17 trainline london gb
292968,5201,1841 15apr 17 northern rail ltd leeds 2987 gb
503398,8014,ee top up vesta cd 3540 deb
503402,8014,lnk boots tamworth cd 3540 23mar 19 cpt
1274343,21975,daily od fee 19 11
1274347,21975,daily od fee


In [266]:
k.desc.str.split('')

292967                                         [, 1, 8, 4, 1,  , 1, 6, a, p, r,  , 1, 7,  , t, r, a, i, n, l, i, n, e,  , l, o, n, d, o, n,  , g, b, ]
292968     [, 1, 8, 4, 1,  , 1, 5, a, p, r,  , 1, 7,  , n, o, r, t, h, e, r, n,  , r, a, i, l,  , l, t, d,  , l, e, e, d, s,  , 2, 9, 8, 7,  , g, b, ]
503398                                                           [, e, e,  , t, o, p,  , u, p,  , v, e, s, t, a,  , c, d,  , 3, 5, 4, 0,  , d, e, b, ]
503402                       [, l, n, k,  , b, o, o, t, s,  , t, a, m, w, o, r, t, h,  , c, d,  , 3, 5, 4, 0,  , 2, 3, m, a, r,  , 1, 9,  , c, p, t, ]
1274343                                                                                     [, d, a, i, l, y,  , o, d,  , f, e, e,  , 1, 9,  , 1, 1, ]
1274347                                                                                                       [, d, a, i, l, y,  , o, d,  , f, e, e, ]
Name: desc, dtype: object

In [None]:
import difflib
import functools
import collections

from fuzzywuzzy import fuzz

DescAndId = collections.namedtuple('DescAndID', ['desc', 'id'])
shortest_first = functools.partial(sorted, key=lambda x: len(x.desc))

def similarity_score(group):
    """Return similarity score between longest string in group and all others."""
    cols = list(group.columns)
    group['score_difflib'] = np.nan
    group['score_fuzz'] = np.nan
    items = [DescAndId(*item) for item in zip(group.desc, group.id)]
    shortest, *others = shortest_first(items)
    for o in others:
        group.loc[group.id == o.id, 'score_difflib'] = difflib.SequenceMatcher(None, shortest.desc, o.desc).ratio()
        group.loc[group.id == o.id, 'score_fuzz'] = fuzz.partial_ratio(shortest.desc, o.desc)
    return group[['score_difflib', 'score_fuzz'] + cols]



In [229]:
fuzz.partial_token_set_ratio(u.iloc[0], u.iloc[1])

100

Duplicate if:
- all WORDS of short appear in long (as WORD or part of WORD) (533726, 533730; 58878, 58893)  but not (510409, 510410)



- lowest same: .82 most, 72
- highest different: .71

Cases for decision:
- One mdbremoved, the other unremoved has very low score -> treat as dup or not? (could use 100 fuzz score to exclude)
- daily od fee _date_ only differs in date and has very high score but is different - exclude manually or handle automatically. some other descs that contain differing dates. could match dates and ensure they differ
- remove `-`, `)` and `(`

In [101]:
from fuzzywuzzy import fuzz
from functools import partial

longest_first = partial(sorted, key=lambda x: len(x), reverse=True)

for idx, data in dd.groupby('group'):
    longest, *others = longest_first(data.desc.values)
    print(longest)
    for other in others:
        print('   {}'.format(other))
        print('   {}'.format(fuzz.partial_ratio(longest, other)), end='\n\n')


lnk sk store, 44/4 cd 8050 12jul15
   sby tamworth cd 8050 13jul15
   68

lnk sk store, 44/4 cd 8050 13dec15
   lnk star news coto cd 9447 12dec15
   59

the boathouse bras cd 4720 deb
   tesco stores 6711 cd 4720 deb
   62

<mdbremoved>
   <mdbremoved>
   100

   <mdbremoved>
   100

<mdbremoved> xxxxxx xxxx5560
   <mdbremoved>
   100

bank credit <mdbremoved>
   bank credit <mdbremoved>
   100

xxxxxx xxxx0290 internet transfer
   xxxxxx xxxx8658 internet transfer
   88

32 red cd 7512 deb
   32 red cd 7512 deb
   100

non-stg purch fee cd 6710 deb
   non-stg purch fee cd 6710 deb
   100

   non-stg purch fee cd 6710 deb
   100

   non-stg purch fee cd 6710 deb
   100

card payment to iz *canopy market,2.00 gbp, rate 1.00/gbp on 10-07-2020
   card payment to iz *crosstown,2.00 gbp, rate 1.00/gbp on 10-07-2020
   85

