# Suptech Framework Detection of Illegal Payment System Service Provider App in Indonesia Listed in Google PlayStore

[By: Rafi Salman](https://www.linkedin.com/in/rafisalman/)

### Scrape payment related app from Google Play Store using google_play_scraper

First we need to obtain list of apps that are related with payment in google play store that available for download in Indonesia

In [1]:
import pandas as pd
import yake
from google_play_scraper import search
from fuzzywuzzy import fuzz
from fuzzywuzzy import process

In [2]:
def crawled_ps(search_keyword):
    playstore = pd.DataFrame()
    for keyword in search_keyword:
        result = search(
                keyword,
                lang="en",  # Majority of apps have default desc in English (en)
                country="id",  # Available for download in Indonesia (id)'
                n_hits=30 # defaults to 30 (= Google's maximum)
)          
        if len(playstore.index) == 0:
            playstore = pd.DataFrame(result)
        else:
            more_apps = pd.DataFrame(result)
            playstore = pd.concat([playstore, more_apps], ignore_index=True)
    return playstore

In [3]:
search_keyword = ['wallet', 'dompet digital']

In [4]:
# Store the dataset in form of dataframe in varible playstore
playstore = crawled_ps(search_keyword)

In [5]:
# Sample of details of app stored in playstore df
playstore.head()

Unnamed: 0,appId,icon,screenshots,title,score,genre,price,free,currency,video,videoImage,description,descriptionHTML,developer,installs
0,com.droid4you.application.wallet,https://play-lh.googleusercontent.com/DqAKT8mJ...,[https://play-lh.googleusercontent.com/7K9xn0C...,Wallet: Budget Expense Tracker,4.89561,Finance,0,True,IDR,,,<b>Wallet is a market-leading personal finance...,<b>Wallet is a market-leading personal finance...,BudgetBakers.com,"5,000,000+"
1,com.wallet.crypto.trustapp,https://play-lh.googleusercontent.com/-3uTwEsZ...,[https://play-lh.googleusercontent.com/HOAibhB...,Trust: Crypto & Bitcoin Wallet,4.705442,Finance,0,True,IDR,,,Trust Wallet is the official crypto wallet of ...,Trust Wallet is the official crypto wallet of ...,"DApps Platform, Inc.","10,000,000+"
2,com.airtm.android,https://play-lh.googleusercontent.com/sTc5uAZ8...,[https://play-lh.googleusercontent.com/NSj16yx...,Airtm,3.8,Finance,0,True,IDR,https://www.youtube.com/embed/rsN3CTtU_FA?ps=p...,https://i.ytimg.com/vi/rsN3CTtU_FA/hqdefault.jpg,"Access your money when you want, wherever you ...","Access your money when you want, wherever you ...","Airtm, Inc.","1,000,000+"
3,io.metamask,https://play-lh.googleusercontent.com/8rzHJpfk...,[https://play-lh.googleusercontent.com/_bBbQOO...,MetaMask - Blockchain Wallet,4.455304,Finance,0,True,IDR,https://www.youtube.com/embed/YVgfHZMFFFQ?ps=p...,https://i.ytimg.com/vi/YVgfHZMFFFQ/hqdefault.jpg,Whether you are an experienced user or brand n...,Whether you are an experienced user or brand n...,MetaMask Web3 Wallet,"10,000,000+"
4,org.toshi,https://play-lh.googleusercontent.com/wrgUujbq...,[https://play-lh.googleusercontent.com/rCqX7hd...,Coinbase Wallet: NFTs & Crypto,3.178571,Finance,0,True,IDR,,,Coinbase Wallet is your key to what’s next in ...,Coinbase Wallet is your key to what’s next in ...,Coinbase Wallet,"5,000,000+"


In [6]:
# Details of the crawled app list, we can see total numbers of entries listed
playstore.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 58 entries, 0 to 57
Data columns (total 15 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   appId            58 non-null     object 
 1   icon             58 non-null     object 
 2   screenshots      58 non-null     object 
 3   title            58 non-null     object 
 4   score            58 non-null     float64
 5   genre            58 non-null     object 
 6   price            58 non-null     int64  
 7   free             58 non-null     bool   
 8   currency         58 non-null     object 
 9   video            22 non-null     object 
 10  videoImage       22 non-null     object 
 11  description      58 non-null     object 
 12  descriptionHTML  58 non-null     object 
 13  developer        58 non-null     object 
 14  installs         58 non-null     object 
dtypes: bool(1), float64(1), int64(1), object(12)
memory usage: 6.5+ KB


Since there is possibility that different keyword might return same app, we will remove the duplicated app details

In [7]:
# Drop duplicated app details
playstore.drop_duplicates(subset="appId", keep='first', inplace=True)

In [8]:
# Total numbers of entries might decrease due to dropped duplicated entry
playstore.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 53 entries, 0 to 57
Data columns (total 15 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   appId            53 non-null     object 
 1   icon             53 non-null     object 
 2   screenshots      53 non-null     object 
 3   title            53 non-null     object 
 4   score            53 non-null     float64
 5   genre            53 non-null     object 
 6   price            53 non-null     int64  
 7   free             53 non-null     bool   
 8   currency         53 non-null     object 
 9   video            19 non-null     object 
 10  videoImage       19 non-null     object 
 11  description      53 non-null     object 
 12  descriptionHTML  53 non-null     object 
 13  developer        53 non-null     object 
 14  installs         53 non-null     object 
dtypes: bool(1), float64(1), int64(1), object(12)
memory usage: 6.3+ KB


### Upload List Legal Payment System Service Provider Company Provider Dataset

Next, we upload the list of payment system service provider company listed in Bank Indonesia.

The list can be found [here](https://www.bi.go.id/PJSPQRIS/default.aspx)

In [9]:
# Upload the list
legal_bi = pd.read_csv('legal_BI.csv')

In [10]:
# Example of legal PSP on the dataset, the dataframe has gone through preprocessing before uploaded
legal_bi[:5]

Unnamed: 0,organization,product,address,telephone,category,decision_number,decision_date,permit_place,operational_date,description,status,barcode,email,website
0,"PT Pan Indonesia Bank, Tbk (PT Bank Panin)",,"Gedung Panin Centre Lt. 1-2, Jl. Jend. Sudirma...",,QRIS,24/303/DKSP/Srt/B,8/8/2022,Departemen Kebijakan Sistem Pembayaran,,QRIS Operators,Berizin (Belum Beroperasinal),,,
1,Lembaga Pelatihan Kerja Alfamart Learning Center,,"Alfa Tower Lantai 19, Alamat Sutera, Jl. Jalur...",,Job Training Institute,24/159/DPSP-GESK/Srt/B,7/27/2022,Jakarta,,Job Training Institute,Berizin (Telah Operasional),,,
2,PT Citra Abdi Valasindo,,"Wisma Bumiputera Lt. 2/M, Jl. Jendral Sudirman...",,Non Bank Money Changer Operators,24/37/KEP.PBI/Jkt/2022,7/8/2022,KPwBI Provinsi DKI Jakarta,7/11/2022,Non Bank Money Changer Operators,Berizin (Telah Operasional),1424.48-000/Jkt,,
3,PT Dewata Inter Valasindo,,"Jl. Bulungan I No. 64, Gedung Ayam Bulungan Lt...",,Non Bank Money Changer Operators,24/39/KEP.PBI/Jkt/2022,7/8/2022,KPwBI Provinsi DKI Jakarta,7/18/2022,Non Bank Money Changer Operators,Berizin (Telah Operasional),1423.47-000/Jkt,,
4,PT Luxury Valuta Perkasa,,"Jalan Panglima Polim Raya No. 105, Kel. Melawa...",,Non Bank Money Changer Operators,24/40/KEP.PBI/Jkt/2022,7/8/2022,KPwBI Provinsi DKI Jakarta,7/12/2022,Non Bank Money Changer Operators,Berizin (Telah Operasional),1422.46-001/Jkt,,


### Using NLP Fuzzy String Matching to Match Developer of Each App with Organization Listed in Bank Indonesia

We will use Fuzzywuzzy library for this part and later process that needs matching between list value of datasets. 

* Fuzzywuzzy library is a Python library that uses Levenshtein Distance to calculate the differences between sequences in a simple-to-use package.

* We need to use NLP since the company or developer listed on **playstore['developer']** might have slight naming difference than the one listed in the **legal_bi['organization']**. Even though both are the same entity

There are several methods to compare two strings in Fuzzywuzzy, we will test each methods and find the most suited method

`ratio`, compares the entire string similarity, in order.

In [11]:
# fuzz.ratio(Playstore[title],list_berizin[nama_penyelenggara])
fuzz.ratio('PT. Sprint Asia Technology', 'PT Sprint Asia Technology')

98

In [12]:
fuzz.ratio('PT. Nusa Satu Inti Artha (DOKU)', 'PT Nusa Satu Inti Artha')

85

In [13]:
fuzz.ratio('PT.BANK PEMBANGUNAN DAERAH JAWA BARAT & BANTEN,TBK', 'PT Bank Jabar dan Banten')

24

**'PT.BANK PEMBANGUNAN DAERAH JAWA BARAT & BANTEN,TBK'** and **'PT Bank Jabar dan Banten'** even though we know that both are the same entity is only 24% the same.

It turns out, the naive approach is far too sensitive to minor differences in word order, missing or extra words, and other such issues.

Next we use `partial_ratio`, as it compares partial string similarity using the same data pairs

In [14]:
# fuzz.partial_ratio(Playstore[title],list_berizin[nama_penyelenggara])
fuzz.partial_ratio('PT. Sprint Asia Technology', 'PT Sprint Asia Technology')

96

In [15]:
fuzz.partial_ratio('PT. Nusa Satu Inti Artha (DOKU)', 'PT Nusa Satu Inti Artha')

96

In [16]:
fuzz.partial_ratio('PT.BANK PEMBANGUNAN DAERAH JAWA BARAT & BANTEN,TBK', 'PT Bank Jabar dan Banten')

21

For this data set, comparing partial string while it does bring better score for one result, but it's not the same case with the others.

The next method is `token_sort_ratio`, token_sort_ratio will ignores word order.

In [17]:
# fuzz.token_sort_ratio(Playstore[title],list_berizin[nama_penyelenggara])
fuzz.token_sort_ratio('PT. Sprint Asia Technology', 'PT Sprint Asia Technology')

100

In [18]:
fuzz.token_sort_ratio('PT. Nusa Satu Inti Artha (DOKU)', 'PT Nusa Satu Inti Artha')

90

In [19]:
fuzz.token_sort_ratio('PT.BANK PEMBANGUNAN DAERAH JAWA BARAT & BANTEN,TBK', 'PT Bank Jabar dan Banten')

61

This method has the best result so far, we take look at the final method.

`token_set_ratio`, it will ignores duplicated words. It is similar with token sort ratio, but a little bit more flexible.`

In [20]:
# fuzz.token_set_ratio(Playstore[title],list_berizin[nama_penyelenggara])
fuzz.token_set_ratio('PT. Sprint Asia Technology', 'PT Sprint Asia Technology')

100

In [21]:
fuzz.token_set_ratio('PT. Nusa Satu Inti Artha (DOKU)', 'PT Nusa Satu Inti Artha')

100

In [22]:
fuzz.token_set_ratio('PT.BANK PEMBANGUNAN DAERAH JAWA BARAT & BANTEN,TBK', 'PT Bank Jabar dan Banten')

74

Looks like `token_set_ratio` is the best fit for our dataset. According to this discovery, we decided to apply token_set_ratio for the developer/organization matching.

We will then create a function that will do matching and return two list for each of the listed and unlisted list of app's parameter (developer, title, etc).

* The default parameter of the function based on the testing above will be `token_set_ratio`
* And the minimum ratio score will be **70** to be recognized as the same value/entity

In [23]:
def check_listed(match_list, source_list, unlisted_list, listed_list, method=fuzz.token_set_ratio, ratio_score=74):
    for name in match_list:
        result = process.extractOne(name, source_list, scorer=method)
        ratio = int(result[1])
        if ratio <= ratio_score:
            unlisted_list.append(name)
        else:
            listed_list.append(name)

### Checking if the developer company is listed on the legal_PSP list

In [24]:
# First We Will create a new dataframe for the crawled playstore dataset to keep the original safe
result_df = playstore[['title', 'developer', 'description', 'installs']].copy()

# We wil only select few relevant column that we can use for later analysis

In [25]:
unlisted_developer = []
listed_developer = []

# Call check_listed function on developer name
check_listed(result_df['developer'], 
             legal_bi['organization'], 
             unlisted_developer, 
             listed_developer
            )

#### Program to manually check ratio between matched value
for name in result_df['developer']:
        result = process.extractOne(name, legal_bi['organization'], scorer=fuzz.token_set_ratio)
        ratio = int(result[1])
        if ratio <= 74:
            None
        else:
            print(name, result)

In [26]:
# Sample name of unlisted developer from crawled dataset in Bank Indonesia
unlisted_developer[:10]

['BudgetBakers.com',
 'DApps Platform, Inc.',
 'Airtm, Inc.',
 'MetaMask Web3 Wallet',
 'Coinbase Wallet',
 'WEMIX PTE. LTD.',
 'BitKeep Global Inc',
 'Hellenic Republic',
 'Wallet Cards Alliance',
 'Exodus Movement, Inc.']

In [27]:
# Sample name of listed developer from crawled dataset in Bank Indonesia
listed_developer[:10]

['PT Espay Debit Indonesia Koe',
 'PT. Nusa Satu Inti Artha (DOKU)',
 'PT Fintek Karya Nusantara',
 'PT Visionet Internasional',
 'PT Fliptech Lentera Inspirasi Pertiwi',
 'Inti Dunia Sukses, PT',
 'Dwijaya Mandiri Sejahtera Indonesia',
 'ALFA TECHNOLOGY',
 'PayoApp Indonesia',
 'PT Visionet Internasional']

We will then create a new column for this result where developer listed in listed_bi['organization'] will be set as True

In [28]:
# Create new column named 'dev_status' where listed developer's app will have True value
result_df['dev_status'] = result_df['developer'].isin(listed_developer)

In [29]:
# Current state of our result_df dataframe where if dev_status = false means the developer for each of the app is not 
# a listed organization in the legal_psp dataset
result_df

Unnamed: 0,title,developer,description,installs,dev_status
0,Wallet: Budget Expense Tracker,BudgetBakers.com,<b>Wallet is a market-leading personal finance...,"5,000,000+",False
1,Trust: Crypto & Bitcoin Wallet,"DApps Platform, Inc.",Trust Wallet is the official crypto wallet of ...,"10,000,000+",False
2,Airtm,"Airtm, Inc.","Access your money when you want, wherever you ...","1,000,000+",False
3,MetaMask - Blockchain Wallet,MetaMask Web3 Wallet,Whether you are an experienced user or brand n...,"10,000,000+",False
4,Coinbase Wallet: NFTs & Crypto,Coinbase Wallet,Coinbase Wallet is your key to what’s next in ...,"5,000,000+",False
5,DANA Indonesia Digital Wallet,PT Espay Debit Indonesia Koe,DANA is Indonesia's digital wallet that can be...,"50,000,000+",True
6,DOKU,PT. Nusa Satu Inti Artha (DOKU),"DOKU, layanan dompet digital yang membantu sip...","1,000,000+",True
7,WEMIX Wallet,WEMIX PTE. LTD.,"<font color=""#813ccc""><b>Integration with Vari...","1,000,000+",False
8,BitKeep: Crypto DeFi Wallet,BitKeep Global Inc,BitKeep is one of the top DeFi multi-chain cry...,"100,000+",False
9,Gov.gr Wallet,Hellenic Republic,Το Gov.gr Wallet είναι η επίσημη ελληνική εφαρ...,"500,000+",False


### Checking if the app is listed on the legal_bi dataset

After we check whether the developer is already listed or not, next we need to further investigate if the app itself is already listed in Bank Indonesia.

* We need to determine which method of Fuzzy String Matching best suited for the app name/title in our dataset.

* Based on the previous finding, we narrow it down to two method which are `token_sort_ratio` and `token_set_ratio`

We will try using `token_set_ratio` method first.

In [30]:
fuzz.token_sort_ratio('LinkAja / LinkAja Syariah', 'LinkAja')

47

In [31]:
fuzz.token_sort_ratio('Jeton Wallet', 'One Wallet')

82

It turns out it returns bad ratio score for same app with different naming in playstore, far below our initial minimum score.

Next is using `token_set_ratio` again

In [32]:
fuzz.token_set_ratio('LinkAja / LinkAja Syariah', 'LinkAja')

100

In [33]:
fuzz.token_set_ratio('Jeton Wallet', 'One Wallet')

82

While it does finally return an excellent score for our LinkAja app name matching, it give a slighly higher score than our minimum ratio for a total different app.

So for the app name matching, we will still be using `token_set_ratio` method but with higher minimum ratio score.

In [34]:
unlisted_app = []
listed_app = []

check_listed(result_df['title'], 
             legal_bi['product'], 
             unlisted_app, 
             listed_app,
             ratio_score = 83
            )

In [36]:
# Sample of app that are NOT LISTED in the legal_bi['product'] from the crawled dataset
unlisted_app[:10]

['Wallet: Budget Expense Tracker',
 'Trust: Crypto & Bitcoin Wallet',
 'Airtm',
 'MetaMask - Blockchain Wallet',
 'Coinbase Wallet: NFTs & Crypto',
 'WEMIX Wallet',
 'BitKeep: Crypto DeFi Wallet',
 'Gov.gr Wallet',
 'Wallet Cards | Digital Wallet',
 'Exodus: Crypto Bitcoin Wallet']

In [37]:
# Sample of app that are listed in the legal_bi['product'] from the crawled dataset
listed_app[:10]

['DANA Indonesia Digital Wallet',
 'DOKU',
 'LinkAja / LinkAja Syariah',
 'ONE Wallet - Empty your wallet',
 'OVO',
 'Flip: Transfer Without Admin',
 'i.saku',
 'OVO Merchant',
 'Sakuku',
 'Dipay']

In [38]:
# Create new column named 'app_status' where developer listed in listed_bi['product'] set as True
result_df['app_status'] = result_df['title'].isin(listed_app)

In [39]:
# Current state of our result_df dataframe where if app_status = false means the developer for each of the app is not 
# a listed product/app in the legal_bi dataset
result_df

Unnamed: 0,title,developer,description,installs,dev_status,app_status
0,Wallet: Budget Expense Tracker,BudgetBakers.com,<b>Wallet is a market-leading personal finance...,"5,000,000+",False,False
1,Trust: Crypto & Bitcoin Wallet,"DApps Platform, Inc.",Trust Wallet is the official crypto wallet of ...,"10,000,000+",False,False
2,Airtm,"Airtm, Inc.","Access your money when you want, wherever you ...","1,000,000+",False,False
3,MetaMask - Blockchain Wallet,MetaMask Web3 Wallet,Whether you are an experienced user or brand n...,"10,000,000+",False,False
4,Coinbase Wallet: NFTs & Crypto,Coinbase Wallet,Coinbase Wallet is your key to what’s next in ...,"5,000,000+",False,False
5,DANA Indonesia Digital Wallet,PT Espay Debit Indonesia Koe,DANA is Indonesia's digital wallet that can be...,"50,000,000+",True,True
6,DOKU,PT. Nusa Satu Inti Artha (DOKU),"DOKU, layanan dompet digital yang membantu sip...","1,000,000+",True,True
7,WEMIX Wallet,WEMIX PTE. LTD.,"<font color=""#813ccc""><b>Integration with Vari...","1,000,000+",False,False
8,BitKeep: Crypto DeFi Wallet,BitKeep Global Inc,BitKeep is one of the top DeFi multi-chain cry...,"100,000+",False,False
9,Gov.gr Wallet,Hellenic Republic,Το Gov.gr Wallet είναι η επίσημη ελληνική εφαρ...,"500,000+",False,False


### Check Whether the App has Already Been Listed on Legal List of other Regulatory Organization

As we know there are several regulatory organization in Indonesia, in our case we will also check if those app is already listed in other regulatory organization (OJK & Bappebti).

In [40]:
# Import List of Entities Listed in OJK
legal_ojk = pd.read_csv('legal_ojk.csv')
# Show sample of legal entities in OJK
legal_ojk.head()

Unnamed: 0,organization,prev_organization,address
0,PT Ajaib Sekuritas Asia,PT Primasia Unggul Sekuritas PT Primasia Secur...,"Neo Soho @ Podomoro City Lt. 30-03, Jl. Letje..."
1,PT Aldiracita Sekuritas Indonesia,PT Aldiracita Corpotama,"Menara Tekno Lantai 9, Jl. Fachrudin No. 19, ..."
2,PT Amantara Sekuritas Indonesia,PT Amantara Securities\nPT Kwik Tjandra Martoa...,"Sinar Mas Land Plaza Menara 3 Lantai 11, Jl. M..."
3,PT Anugerah Sekuritas Indonesia,PT Anugerah Securindo Indah,"Komp. Ruko Cempaka Mas Blok M No. 1,2,3 Cempa..."
4,PT Artha Sekuritas Indonesia,PT Artha Securities Indonesia,"Rukan Mangga Dua Square Blok F No.40, Jalan Gu..."


In [41]:
listed_ojk = []
unlisted_ojk = []

check_listed(result_df['developer'], legal_ojk['organization'], unlisted_ojk, listed_ojk)

In [42]:
# Create new column named 'listed_ojk' where if an developer/organization listed in the legal_ojk list will be set as True
result_df['listed_ojk'] = result_df['developer'].isin(listed_ojk)
result_df.head()

Unnamed: 0,title,developer,description,installs,dev_status,app_status,listed_ojk
0,Wallet: Budget Expense Tracker,BudgetBakers.com,<b>Wallet is a market-leading personal finance...,"5,000,000+",False,False,False
1,Trust: Crypto & Bitcoin Wallet,"DApps Platform, Inc.",Trust Wallet is the official crypto wallet of ...,"10,000,000+",False,False,False
2,Airtm,"Airtm, Inc.","Access your money when you want, wherever you ...","1,000,000+",False,False,False
3,MetaMask - Blockchain Wallet,MetaMask Web3 Wallet,Whether you are an experienced user or brand n...,"10,000,000+",False,False,False
4,Coinbase Wallet: NFTs & Crypto,Coinbase Wallet,Coinbase Wallet is your key to what’s next in ...,"5,000,000+",False,False,False


In [43]:
# Import List of Entities Listed in Bappebti
legal_bappebti = pd.read_csv('legal_BAPPEBTI.csv')
# Show sample of legal entities in Bappebti
legal_bappebti.head()

Unnamed: 0,organization,reg_number,permit_date,web,address
0,PT TUMBUH BERSAMA NANO,007/BAPPEBTI/CP-AK/03/2022,3/22/2022 0:00,nanovest.io,"Sinarmas MSIG Tower Lt. 37, Jl. Jend Sudirman ..."
1,PT. KAGUM TEKNOLOGI INDONESIA,008/BAPPEBTI/CP-AK/04/2022,4/22/2022 0:00,www.ptkagumteknologiindonesia.com,"Neo Soho @Podomoro City, Jalan Letnan Jenderal..."
2,PT. ASET DIGITAL BERKAT,001/BAPPEBTI/CP-AK/11/2019,11/21/2019 0:00,www.tokocrypto.com,"AXA TOWER, JL. PROFESOR DOKTOR SATRIO, KAV. 18..."
3,PT. ASET DIGITAL INDONESIA,005/BAPPEBTI/CP-AK/02/2022,2/11/2022 0:00,www.incrypto.co.id,"AXA Tower Lantai 36 Suite 02, Jalan Prof. Dr. ..."
4,PT. BUMI SANTOSA CEMERLANG,012/BAPPEBTI/CP-AK/4/2022,4/28/2022 0:00,https://pluang.com/produk/pluang-crypto,"Ruko Inkopal KTC Blok F No. 42, Jl. Boulevard ..."


In [44]:
listed_bappebti = []
unlisted_bappebti = []

check_listed(result_df['developer'], legal_bappebti['organization'], unlisted_bappebti, listed_bappebti)

In [45]:
# Create new column named 'listed_bappebti' where if an developer/organization listed in the legal_bappebti list will be set as True
result_df['listed_bappebti'] = result_df['developer'].isin(listed_bappebti)

In [46]:
result_df

Unnamed: 0,title,developer,description,installs,dev_status,app_status,listed_ojk,listed_bappebti
0,Wallet: Budget Expense Tracker,BudgetBakers.com,<b>Wallet is a market-leading personal finance...,"5,000,000+",False,False,False,False
1,Trust: Crypto & Bitcoin Wallet,"DApps Platform, Inc.",Trust Wallet is the official crypto wallet of ...,"10,000,000+",False,False,False,False
2,Airtm,"Airtm, Inc.","Access your money when you want, wherever you ...","1,000,000+",False,False,False,False
3,MetaMask - Blockchain Wallet,MetaMask Web3 Wallet,Whether you are an experienced user or brand n...,"10,000,000+",False,False,False,False
4,Coinbase Wallet: NFTs & Crypto,Coinbase Wallet,Coinbase Wallet is your key to what’s next in ...,"5,000,000+",False,False,False,False
5,DANA Indonesia Digital Wallet,PT Espay Debit Indonesia Koe,DANA is Indonesia's digital wallet that can be...,"50,000,000+",True,True,False,False
6,DOKU,PT. Nusa Satu Inti Artha (DOKU),"DOKU, layanan dompet digital yang membantu sip...","1,000,000+",True,True,False,False
7,WEMIX Wallet,WEMIX PTE. LTD.,"<font color=""#813ccc""><b>Integration with Vari...","1,000,000+",False,False,False,False
8,BitKeep: Crypto DeFi Wallet,BitKeep Global Inc,BitKeep is one of the top DeFi multi-chain cry...,"100,000+",False,False,False,False
9,Gov.gr Wallet,Hellenic Republic,Το Gov.gr Wallet είναι η επίσημη ελληνική εφαρ...,"500,000+",False,False,False,False


### Look for Important Keyword from App Description Using Natural Language Processing (NLP) Keyword extraction

Next we will try to extract important keywords from each app description, in hope we can later filter out unrelated keywords under payment system supervision.

* We create a function called `get_keywords` that will extract important keywords, for this project we limit to max of 10 keywords to minimize the process.

In [47]:
def get_keywords(text, language='en', max_ngram_size=2, deduplication_threshold=0.5, deduplication_algo='seqm', windowSize=1, numOfKeywords = 10):
    kw_extractor = yake.KeywordExtractor(lan=language, 
                                         n=max_ngram_size, 
                                         dedupLim=deduplication_threshold,
                                         dedupFunc=deduplication_algo, 
                                         windowsSize=windowSize, 
                                         top=numOfKeywords, 
                                         features=None)
    keywords = kw_extractor.extract_keywords(text)
    sentences = []
    sentence_output = ""
    for word, number in keywords:
        sentence_output += word + " "
    sentences.append(sentence_output)
    sentences = sentences[0]
    return sentences

In [48]:
# The keywords will be stored in the list of keywords_list
keyword_list = []

for desc in result_df['description']:
    keyword_list.append(get_keywords(desc))

In [49]:
# Sample of keywords list being extracted
keyword_list[:5]

['Wallet budget finances money spending save track bank synchronization personal finance accounts ',
 'wallet support Trust Wallet Wallet support Trust Binance Wallet crypto Binance Bitcoin Ethereum ',
 'money Access convert Receive local currency Airtm Receive payments wallets United States ',
 'decentralized web MetaMask web digital assets experienced user digital assets blockchain internet wallet ',
 'Coinbase Wallet Wallet crypto self-custody wallet Bitcoin Cash Ether Classic decentralized control Secure Element receive Bitcoin ']

Now we have an accurate list of keywords used from the app description of the application that we have crawled earlier.

### Filter Unrelated Keywords That Most Likely Not Under Payment Systems

From sample keywords above, we can see some of the apps are actually not related with e-wallet or electronic money (crypto, money spending tracker, etc) which under supervision of Bank Indonesia.

So we will create a list of keywords that is not relatable with what we are trying to achieve.

In [50]:
unrelated_keywords = ['crypto wallet', 'Bitcoin', 'cryptocurrency', 'Ethereum', 
                      'crypto', 'Shiba Inu', 'Dogecoin', 'Solana', 'blockchain', 
                      'XRP', 'NFT', 'track spending', 'kripto' ]

### Set Up Relevancy Score Based on the Extracted Keywords

After we initiate the list of the potentially unrelated keywords instead of dropping all the app details those who posses any of those keywords, we will create a scoring system called relevancy score.

* The lower the score means it has lower possibility to be out of scope from the payment system supervision.

In [51]:
def relevancy(payment_keywords, unrelated_keywords):
    total_score = 0
    for key in unrelated_keywords:
        result = fuzz.token_set_ratio(key,payment_keywords)
        total_score += result
    total_score /= len(unrelated_keywords)
    total_score = round(total_score,2)
    return total_score

In [52]:
relevancy_score = []

for keywords in keyword_list:
    relevancy_scr = relevancy(keywords, unrelated_keywords)
    relevancy_score.append(relevancy_scr)

We then add the relevancy_score to the result_df dataframe to see if each of the app is relevant with payment systems supervision.

In [53]:
result_df['relevancy_score'] = relevancy_score

We sort the list of app by the most relevant app based on scoring.

In [54]:
# Sort dataframe in ascending order based on relevancy score. Higher score means it's less relevant
result_df = result_df.sort_values('relevancy_score')

In [55]:
# Sample of our current final dataframe
result_df.head()

Unnamed: 0,title,developer,description,installs,dev_status,app_status,listed_ojk,listed_bappebti,relevancy_score
9,Gov.gr Wallet,Hellenic Republic,Το Gov.gr Wallet είναι η επίσημη ελληνική εφαρ...,"500,000+",False,False,False,False,7.69
6,DOKU,PT. Nusa Satu Inti Artha (DOKU),"DOKU, layanan dompet digital yang membantu sip...","1,000,000+",True,True,False,False,10.69
40,DAS Dompet Digital,ALFA TECHNOLOGY,Untuk meningkatkan pelayanan dan lebih mendeka...,100+,True,False,False,False,11.15
52,PAYFAZZ Agen: Pulsa & Transfer,PT Payfazz Teknologi Nusantara,"PAYFAZZ Agen, aplikasi PPOB super lengkap untu...","5,000,000+",True,False,False,False,11.31
56,SeaBank,PT. BANK SEABANK INDONESIA,Lebih Untung di SeaBank!\r\n \r\nBuka rekening...,"5,000,000+",True,False,True,False,11.54


In [56]:
immediate_investigation = result_df[(result_df['dev_status']==False) & (result_df['app_status']==False)]

further_checking = result_df[((result_df['dev_status']==True) & (result_df['app_status']==False)) | 
                             ((result_df['dev_status']==False) & (result_df['app_status']==True)) ]

legally_listed = result_df[(result_df['dev_status']==True) & (result_df['app_status']==True)]

In [57]:
result_df

Unnamed: 0,title,developer,description,installs,dev_status,app_status,listed_ojk,listed_bappebti,relevancy_score
9,Gov.gr Wallet,Hellenic Republic,Το Gov.gr Wallet είναι η επίσημη ελληνική εφαρ...,"500,000+",False,False,False,False,7.69
6,DOKU,PT. Nusa Satu Inti Artha (DOKU),"DOKU, layanan dompet digital yang membantu sip...","1,000,000+",True,True,False,False,10.69
40,DAS Dompet Digital,ALFA TECHNOLOGY,Untuk meningkatkan pelayanan dan lebih mendeka...,100+,True,False,False,False,11.15
52,PAYFAZZ Agen: Pulsa & Transfer,PT Payfazz Teknologi Nusantara,"PAYFAZZ Agen, aplikasi PPOB super lengkap untu...","5,000,000+",True,False,False,False,11.31
56,SeaBank,PT. BANK SEABANK INDONESIA,Lebih Untung di SeaBank!\r\n \r\nBuka rekening...,"5,000,000+",True,False,True,False,11.54
35,i.saku,"Inti Dunia Sukses, PT",Rasakan pengalaman berbelanja dengan uang elek...,"5,000,000+",True,True,False,False,11.77
46,Cek Saldo Dompet Digital,Ardhi Studio,OVO merupakan aplikasi smart yang memberikan A...,500+,False,False,False,False,11.77
31,OVO,PT Visionet Internasional,"From snack times to mealtimes, from routine bi...","10,000,000+",True,True,False,True,11.85
11,LinkAja / LinkAja Syariah,PT Fintek Karya Nusantara,LinkAja is a digital wallet from PT Fintek Kar...,"10,000,000+",True,True,False,False,12.38
39,Dompet Digital,Dwijaya Mandiri Sejahtera Indonesia,Aplikasi Transaksi Pulsa All in One\r\n\r\nIsi...,"10,000+",True,False,False,False,12.46


### Final Outcome

On the final step, we want to see the list of app that has potential to be furtherly investigated.

Based on our method, we can categorize the list of app based on three different categories:
* `Immediate Investigation` = if the developer and app are not listed (both set to False) in the legal_bi dataset.
* `Further Checking` = if only one of the developer or the app is listed (only one is True) in the legal_bi dataset.
* `Legally Listed` = if both the developer and the app are already listed in the legal_bi dataset.

Do note we can also see whether the app is listed in other regulatory organization (OJK & Bappebti), if it turns out the developer or the app itself not legally listed in Bank Indonesia.

In [58]:
print('List of apps that needs immediate investigation are:' )
immediate_investigation

List of apps that needs immediate investigation are:


Unnamed: 0,title,developer,description,installs,dev_status,app_status,listed_ojk,listed_bappebti,relevancy_score
9,Gov.gr Wallet,Hellenic Republic,Το Gov.gr Wallet είναι η επίσημη ελληνική εφαρ...,"500,000+",False,False,False,False,7.69
46,Cek Saldo Dompet Digital,Ardhi Studio,OVO merupakan aplikasi smart yang memberikan A...,500+,False,False,False,False,11.77
57,Dompet Mandiri,Reload Plus,Dompet Mandiri adalah aplikasi bisnis untuk pe...,100+,False,False,False,False,12.62
16,WalletPasses | Passbook Wallet,Wallet Passes Alliance,"With WalletPasses, you can use passes on your ...","10,000,000+",False,False,False,False,14.0
2,Airtm,"Airtm, Inc.","Access your money when you want, wherever you ...","1,000,000+",False,False,False,False,14.08
21,Cards - Mobile Wallet,Cards,"<font color=""#5a94f5""><b>Cards</b> is a mobile...","5,000,000+",False,False,False,False,14.31
54,Dompet Manfaat,Multi Digital Nusantara,"Saatnya berpindah ke dompet digital, satu domp...","1,000+",False,False,False,False,14.77
36,STICPAY,STICPAY,Transfer and receive money faster and easier w...,"100,000+",False,False,False,False,15.69
29,"Fearless Wallet: Polkadot, クサマ",Soramitsu,Fearless Wallet is designed for the decentrali...,"100,000+",False,False,False,False,15.92
7,WEMIX Wallet,WEMIX PTE. LTD.,"<font color=""#813ccc""><b>Integration with Vari...","1,000,000+",False,False,False,False,16.46


In [59]:
print('List of apps that needs further checking are:' )
further_checking

List of apps that needs further checking are:


Unnamed: 0,title,developer,description,installs,dev_status,app_status,listed_ojk,listed_bappebti,relevancy_score
40,DAS Dompet Digital,ALFA TECHNOLOGY,Untuk meningkatkan pelayanan dan lebih mendeka...,100+,True,False,False,False,11.15
52,PAYFAZZ Agen: Pulsa & Transfer,PT Payfazz Teknologi Nusantara,"PAYFAZZ Agen, aplikasi PPOB super lengkap untu...","5,000,000+",True,False,False,False,11.31
56,SeaBank,PT. BANK SEABANK INDONESIA,Lebih Untung di SeaBank!\r\n \r\nBuka rekening...,"5,000,000+",True,False,True,False,11.54
39,Dompet Digital,Dwijaya Mandiri Sejahtera Indonesia,Aplikasi Transaksi Pulsa All in One\r\n\r\nIsi...,"10,000+",True,False,False,False,12.46
44,BRImo BRI,PT Bank Rakyat Indonesia (Persero) Tbk.,BRImo merupakan Aplikasi Internet dan Mobile B...,"10,000,000+",True,False,False,False,12.92
45,Dipay,DIPAY,Dompet bukan sekadar dompet. Ini dompet yang a...,"5,000+",False,True,False,False,13.23
51,Allo Bank,PT. ALLO BANK INDONESIA Tbk,"Hi, kini Allo Bank versi Beta untuk Android ha...","1,000,000+",True,False,True,False,13.69
48,blu by BCA Digital,BCA Digital,Mulai Langkahmu bareng blu!\r\nUntuk hidup leb...,"1,000,000+",True,False,True,True,13.85
41,Uncang - Dompet Digital,PayoApp Indonesia,Uncang adalah akronim dari Uang Canggih adalah...,100+,True,False,False,False,14.69
20,ONE Wallet - Empty your wallet,Soosu Studio,Manage all your cards in one app.\r\n\r\nFligh...,"500,000+",False,True,False,False,15.15


In [60]:
print('List of apps that are legally listed are:' )
legally_listed

List of apps that are legally listed are:


Unnamed: 0,title,developer,description,installs,dev_status,app_status,listed_ojk,listed_bappebti,relevancy_score
6,DOKU,PT. Nusa Satu Inti Artha (DOKU),"DOKU, layanan dompet digital yang membantu sip...","1,000,000+",True,True,False,False,10.69
35,i.saku,"Inti Dunia Sukses, PT",Rasakan pengalaman berbelanja dengan uang elek...,"5,000,000+",True,True,False,False,11.77
31,OVO,PT Visionet Internasional,"From snack times to mealtimes, from routine bi...","10,000,000+",True,True,False,True,11.85
11,LinkAja / LinkAja Syariah,PT Fintek Karya Nusantara,LinkAja is a digital wallet from PT Fintek Kar...,"10,000,000+",True,True,False,False,12.38
43,Sakuku,PT Bank Central Asia Tbk.,Sakuku is an electronic money service for paym...,"1,000,000+",True,True,False,False,12.54
42,OVO Merchant,PT Visionet Internasional,"Satu Langkah Kecil, Jutaan Kesempatan untuk me...","1,000,000+",True,True,False,True,13.15
5,DANA Indonesia Digital Wallet,PT Espay Debit Indonesia Koe,DANA is Indonesia's digital wallet that can be...,"50,000,000+",True,True,False,False,13.46
33,Flip: Transfer Without Admin,PT Fliptech Lentera Inspirasi Pertiwi,"<b>Transfer money to anywhere and anyone, dome...","10,000,000+",True,True,False,False,15.23
49,NETZME,"Netzme Kreasi Indonesia, PT","Pay anything, expand your network, leveling up...","1,000,000+",True,True,False,False,15.31
53,yourpayID,PT Rpay Finansial Digital Indonesia,Need to send money to Indonesia? Send money an...,"100,000+",True,True,False,True,15.92
