Skip to content

maximsachs/phishing_classification_recurrent_nn

Repository files navigation

Phishing detection AI from scratch.

Makes use of Phishtank online valid datasets and Cisco Umbrella top 1 million domains list, to train a recurrent neural network to classify domain names as phishing or not phishing.

Run the dataset_downloader.py to download one new sample from phishtank and merge any new urls into the combined dataset. Could be automated using crontab.

Youtube video walthrough of the code:

Results

Accuracy vs decision threshold:

The neural network outputs values between 0 and 1. The threshold where the decision is made determines the various detection rates.

Sample prediction distribution for the best threshold:

The distribution of samples with their prediction outputs visualised to see TN, FP, FN and TP:

Example output

Best performance at threshold: 0.5655778894472362
Calculated 7876 predictions with a mean value of 0.4562220573425293
Evaluating using threshold 0.5655778894472362
Cut-off threshold: 0.5656
Evaluation counts: {'TN': 3891, 'FP': 506, 'FN': 435, 'TP': 3044}
+------------------+----------------+--------------------+
| Accuracy 88.052% | Predicted safe | Predicted phishing |
+------------------+----------------+--------------------+
|   Not phishing   |    TN: 3891    |      FP: 506       |
|                  |  NPV: 89.945%  |    FDR: 14.254%    |
|                  |  TNR: 88.492%  |    FPR: 11.508%    |
|   Is phishing    |    FN: 435     |      TP: 3044      |
|                  |  FOR: 10.055%  |    PPV: 85.746%    |
|                  |  FNR: 12.504%  |    TPR: 87.496%    |
+------------------+----------------+--------------------+

Examples for TN Bin range: 7.317396e-06 - 0.11298742 , Num. Samples: 3462
                                input  ground truth  prediction
0                llnw.daps.nbcuni.com             0    0.000118
1              speedtest.wispwest.net             0    0.000050
2                     m13.rsefukf.com             0    0.000032
3                     m15.dcbxlrj.org             0    0.000008
4   edge-091.sgsin.icloud-content.com             0    0.000015
5                   mail.novuscom.net             0    0.000012
6                          xghelp.com             0    0.005918
7                linuxfromscratch.org             0    0.024444
8      r1.sn-5ualdn7y.googlevideo.com             0    0.000011
9                     m40.dyopshw.com             0    0.000246
10        gaplecdn.aliyun.joyours.com             0    0.000438
11                       cdn.epica.ai             0    0.000028
12                  siorc.xfinity.com             0    0.000428
13                   110.mimilcnf.pro             0    0.000041
14                        aslroma1.it             0    0.004463

Examples for TN Bin range: 0.11298742 - 0.22596753 , Num. Samples: 157
                         input  ground truth  prediction
0                   reamaze.io             0    0.188346
1               www.wisksl.com             0    0.134943
2             pollosgar.com.co             0    0.183440
3           hm.vuicungdafa.com             0    0.161075
4      mail.guoceratile.com.my             0    0.133046
5                      ggwp.id             0    0.191724
6             oceans-nadia.com             0    0.149921
7               jieqinwang.com             0    0.148553
8               v.lsttnews.com             0    0.141278
9   liveme.zgsgllive.linkv.fun             0    0.191640
10         nws.etemavrajoip.fr             0    0.136429
11             portforward.com             0    0.122953
12     98kz-alternate.app.link             0    0.186109
13               emailsrvr.com             0    0.213246
14           static.f-list.net             0    0.124085

Examples for TN Bin range: 0.22596753 - 0.33894762 , Num. Samples: 105
                       input  ground truth  prediction
0           fastcomments.com             0    0.324313
1             elsenbruch.com             0    0.231270
2                 zahori.com             0    0.272963
3             alex.kakao.com             0    0.263239
4   www.svfree.svgame168.com             0    0.228486
5          kashra-server.com             0    0.301930
6      mail.elcharcutero.com             0    0.326268
7           stratoserver.net             0    0.256581
8          votesouthwell.com             0    0.320887
9             www.amboss.com             0    0.275090
10             mx.interia.pl             0    0.274614
11             myconvene.com             0    0.240418
12                pchouse.gr             0    0.324463
13          intraplanner.com             0    0.272231
14         elnacional.com.do             0    0.330957

Examples for TN Bin range: 0.33894762 - 0.45192775 , Num. Samples: 73
                     input  ground truth  prediction
0   acdevelopmentgroup.com             0    0.353133
1              aaronsw.com             0    0.406117
2              gazt.gov.sa             0    0.386374
3              yhcgift.com             0    0.418845
4                javdb.com             0    0.354495
5     www.trainenquiry.com             0    0.419734
6       www.devexpress.com             0    0.373720
7      anferingenieria.com             0    0.438848
8       fusion-manager.com             0    0.374282
9               mqnrqbl.in             0    0.432225
10    citi.bridgetrack.com             0    0.439842
11     xlog-va.tiktokv.com             0    0.445081
12                 ghnd.fr             0    0.371937
13   billing.oknotify2.com             0    0.382837
14              300400.net             0    0.353102

Examples for TN Bin range: 0.45192775 - 0.56490785 , Num. Samples: 93
                                       input  ground truth  prediction
0                  sfd.et0978.epichosted.com             0    0.529685
1                        surgicalscience.com             0    0.559815
2                            privatemail.com             0    0.485508
3                          asemanfredonia.it             0    0.474472
4                                  mevris.io             0    0.474573
5                         xianghuanzhang.com             0    0.546445
6                         www.madridhifi.com             0    0.490822
7                mail.motivedynamicsmgmt.com             0    0.465295
8         user-media-upload.renderforest.com             0    0.538933
9                                 ffquan.com             0    0.551031
10                   kaptivoapi.kaptivo.live             0    0.515401
11                         scipublisherj.com             0    0.494632
12  08c374edcb87ab5cbdab24dc9.js.ubembed.com             0    0.462961
13                     fashioneditorials.com             0    0.486268
14                             shoplazza.com             0    0.484051

Examples for FP Bin range: 0.5741874 - 0.6593455 , Num. Samples: 58
                      input  ground truth  prediction
0        delta-searches.com             0    0.610190
1         luckyqueenpro.com             0    0.612061
2               pollmann.at             0    0.628621
3                   zbj.com             0    0.642035
4              zastatic.com             0    0.598736
5   freeonlineconverter.net             0    0.639440
6     freemalaysiatoday.com             0    0.634647
7            www.silive.com             0    0.574187
8       ariannelingerie.com             0    0.605711
9     go.publisher-news.com             0    0.636007
10      seancodynetwork.com             0    0.637383
11           indihome.co.id             0    0.583247
12            hortimail.com             0    0.647075
13          ljlpeinture.com             0    0.586821
14       hiddencamstube.com             0    0.614998

Examples for FP Bin range: 0.6593455 - 0.74450356 , Num. Samples: 60
                        input  ground truth  prediction
0            www.senate.go.th             0    0.682710
1           blackhole9999.com             0    0.671924
2            xnxx.tubekek.com             0    0.682402
3                   nasoe.org             0    0.725112
4               local.bark.us             0    0.684137
5              www.netapp.com             0    0.741023
6           www.molottery.com             0    0.689571
7              satismeter.com             0    0.736355
8           mqsrkvcpyzvnw.com             0    0.742406
9          bbbbbbbb.846846.de             0    0.680710
10              bsh-group.com             0    0.696653
11               dlr-hatc.com             0    0.683883
12          bibliotecasma.com             0    0.731776
13  bollingplasticsurgery.com             0    0.733331
14            ystrationa.info             0    0.741842

Examples for FP Bin range: 0.74450356 - 0.82966167 , Num. Samples: 68
                                input  ground truth  prediction
0   studyinsweden.easyvirtualfair.com             0    0.801279
1        mail.veiligheidsbeleving.com             0    0.751713
2                       fotoaziti.com             0    0.805599
3                         nettv4u.com             0    0.813323
4             live.outplaygamekit.com             0    0.824369
5                uiprxlbgfrxaxrwnh.in             0    0.746816
6            recaptcha.bittorrent.com             0    0.823070
7                       www.fun2u.biz             0    0.762361
8                 freightinvestor.com             0    0.747800
9                    servizioemail.it             0    0.809356
10         kaiserswerther-diakonie.de             0    0.824186
11                   www.europcar.com             0    0.800409
12                   www.pulsapay.com             0    0.809266
13                     ecoserveis.net             0    0.778508
14                         lkr5k85.cn             0    0.769014

Examples for FP Bin range: 0.82966167 - 0.9148197 , Num. Samples: 97
                          input  ground truth  prediction
0              b6b4rhsbdawj.com             0    0.901391
1    incrediblerugsanddecor.com             0    0.838826
2                  ramweb.co.za             0    0.874516
3            cbt-charpentier.fr             0    0.841216
4              beerrightnow.com             0    0.859271
5                 bdtsales.best             0    0.857589
6      boehringer-ingelheim.com             0    0.872133
7       www.dernieredemeure.com             0    0.908550
8                  reco.wynk.in             0    0.863313
9             acielcolombia.com             0    0.873488
10     www.alaskaredkitchen.com             0    0.911707
11  lyncdiscover.wellsfargo.com             0    0.832653
12         www.jobdiagnosis.com             0    0.890343
13         zh-hk.guitarians.com             0    0.898340
14  mail.kontaktlinsen-radar.de             0    0.882129

Examples for FP Bin range: 0.9148197 - 0.9999778 , Num. Samples: 222
                              input  ground truth  prediction
0                  gampangcod.my.id             0    0.998248
1             kath-dekanat-alzey.de             0    0.987271
2                www.jdsports.co.uk             0    0.997600
3                    delightfull.eu             0    0.966502
4           bipolaraustralia.org.au             0    0.924600
5                  essentialhome.eu             0    0.999747
6   consentmanager.mgr.consensu.org             0    0.967583
7         koodomobile.telus.digital             0    0.997511
8             bomenservice-zuid.com             0    0.916765
9                  virtuallawyer.se             0    0.988618
10                     redcalfs.com             0    0.986578
11                          baua.fr             0    0.983905
12     www.childsplayclothing.co.uk             0    0.999968
13                           wdm.pl             0    0.940031
14                    cimtav.com.tr             0    0.997630

Examples for FN Bin range: 8.67373e-06 - 0.112897925 , Num. Samples: 238
                                   input  ground truth  prediction
0                             moxisq.com             1    0.001682
1                      mail-generali.com             1    0.000243
2                    stearncommunily.com             1    0.029782
3                            newmy-3.com             1    0.007560
4                                shrt.es             1    0.012051
5                         elgabalawy.com             1    0.008782
6                                 hmp.me             1    0.043992
7          nice.constantcontactsites.com             1    0.000962
8             dubailuxurypropertiess.com             1    0.020251
9                          unnobzava.net             1    0.009643
10       selfish-cheese.aerobaticapp.com             1    0.005296
11  claro-controle-downloader.m4u.com.br             1    0.011110
12                        arcomindia.com             1    0.059921
13             e.t.s.interac.ca-app.club             1    0.000054
14                          donghuong.uk             1    0.002262

Examples for FN Bin range: 0.112897925 - 0.22578716 , Num. Samples: 44
                                       input  ground truth  prediction
0                      binarybenliveload.com             1    0.213511
1                             mtfirewood.com             1    0.186709
2                             propress.co.uk             1    0.173334
3          www.videosoy.reachhealthylife.com             1    0.135371
4                       axiomatickidneys.org             1    0.179086
5               codedrop.thevisionpoints.com             1    0.201330
6                            zinextworld.com             1    0.148280
7                          getyourtx-tdy.com             1    0.147874
8                             mtfirewood.com             1    0.186709
9   ge-id-7819108955.sycoexportimportltd.com             1    0.145995
10                www.scuolascigressoney.net             1    0.196822
11                         losmentirosos.com             1    0.160719
12                               ipp-inc.com             1    0.125211
13                              wikiform.org             1    0.116855
14                            careplayit.vip             1    0.129922

Examples for FN Bin range: 0.22578716 - 0.33867642 , Num. Samples: 50
                       input  ground truth  prediction
0          mackanthem.com.pe             1    0.306885
1             365playing.com             1    0.301153
2      elhogarproperties.com             1    0.264287
3             pandaimath.com             1    0.276507
4      playfirstoftheday.com             1    0.312422
5          expertcarzone.com             1    0.282245
6   sexeducation.atspace.com             1    0.239057
7         techsysnigeria.com             1    0.235290
8            jeffreybcam.net             1    0.230963
9          www.coderllci.com             1    0.299572
10            galvarburg.com             1    0.226562
11        contraprova.com.br             1    0.292125
12            pubgmdaily.com             1    0.299808
13            pkwmobilede.de             1    0.320379
14            7colours.co.za             1    0.230266

Examples for FN Bin range: 0.33867642 - 0.45156565 , Num. Samples: 52
                                       input  ground truth  prediction
0                             lassolinks.com             1    0.368287
1               marie02zue.azurewebsites.net             1    0.381409
2                          tomeigosto.com.br             1    0.363083
3                                   gradi.ba             1    0.434554
4                              foamnflow.com             1    0.395296
5                                  kisa.link             1    0.339635
6                      bizbizeturkiyenim.com             1    0.359583
7   asecure-messagesystem.thenailcabin.co.uk             1    0.372895
8                     autodiscover.gre.ac.uk             1    0.429600
9                             snarkysoap.com             1    0.350943
10                         aegisredmedia.com             1    0.438507
11                   freey-joingrub.otzo.com             1    0.381696
12                   graphicommunication.com             1    0.410395
13                              gold-mail.ru             1    0.409111
14          redirectlyoseven.firebaseapp.com             1    0.444133

Examples for FN Bin range: 0.45156565 - 0.5644549 , Num. Samples: 50
                                       input  ground truth  prediction
0   NvbQ==Memberservices&legalshieldcorp             1    0.473352
1                       backyarddelivery.com             1    0.559786
2                        kbstitchdesigns.com             1    0.457322
3                     orthodoxresearcher.com             1    0.519094
4                           fedexvoyager.com             1    0.479571
5                         woodysportsbar.com             1    0.510820
6                  globaldoctorshospital.com             1    0.525911
7                        akannitoyegbola.com             1    0.454716
8                                fdriqtbt.cn             1    0.523340
9                     kellijophotography.com             1    0.502783
10                         plasticmonkey.com             1    0.487777
11                        www.bayernlbuk.net             1    0.563618
12                   www.ktplasmachinery.com             1    0.476686
13                            pochegroup.com             1    0.528044
14                            modesuites.com             1    0.471518

Examples for TP Bin range: 0.5688994 - 0.65511745 , Num. Samples: 66
                          input  ground truth  prediction
0            remnegocios.com.br             1    0.584870
1   neighbourhoodwatchcasey.com             1    0.569869
2      find-yourprofithere.life             1    0.633644
3               www.jfteabd.com             1    0.592988
4                www.mktbtk.com             1    0.579223
5         austinbeautyguide.com             1    0.646723
6              www.hoomokef.com             1    0.653069
7              essexminibus.com             1    0.588611
8                  olivaspa.com             1    0.624943
9              www.denartcc.org             1    0.572421
10                 tarelka67.ru             1    0.607841
11                   jagex.club             1    0.638270
12  www.tinavegaphotography.com             1    0.631984
13                  v-upd.co.uk             1    0.586614
14                  byoko.co.kr             1    0.603089

Examples for TP Bin range: 0.65511745 - 0.7413355 , Num. Samples: 65
                          input  ground truth  prediction
0            www.wonderstore.it             1    0.688639
1               pp-giftcard.com             1    0.715353
2                      ehan.org             1    0.718887
3            bonamourmarket.com             1    0.726458
4                mansdragon.com             1    0.686235
5   ded5441.inmotionhosting.com             1    0.706312
6                  dasktake.com             1    0.672902
7               patchcracks.com             1    0.663025
8                tierretyr.live             1    0.659768
9                   pubgner.com             1    0.700133
10                viewfbapp.com             1    0.721883
11                profalsam.com             1    0.730803
12           alealtaseguros.com             1    0.725690
13            limited-verify.me             1    0.681457
14                pubgmyace.com             1    0.711182

Examples for TP Bin range: 0.7413355 - 0.8275535 , Num. Samples: 110
                                  input  ground truth  prediction
0                     www.arrowcase.com             1    0.777720
1                          brighant.com             1    0.746563
2                      amazerpresce.com             1    0.813245
3             essentialshoppingmall.com             1    0.824800
4          privateinvestigatormilan.com             1    0.819198
5                    markareklamevi.com             1    0.814238
6                       epay-paxful.com             1    0.762931
7                  sakkiswonderland.com             1    0.787093
8                        hutoknepper.de             1    0.807550
9                       epay-paxful.com             1    0.762931
10                       ddotamoney.com             1    0.796305
11                      www.payinur.com             1    0.783428
12                     valeexpressa.com             1    0.742105
13  m.facebok-item-84372.vattrustbd.com             1    0.812274
14                      zeebracross.com             1    0.788122

Examples for TP Bin range: 0.8275535 - 0.91377157 , Num. Samples: 185
                         input  ground truth  prediction
0                   net-eco.fr             1    0.841986
1          ecscreditrepair.com             1    0.881374
2              www.rehrlbau.de             1    0.842446
3            destructoring.com             1    0.879459
4                    ucxuc.com             1    0.872591
5   unionheightsresidental.com             1    0.841180
6    liberbankos-prestades.com             1    0.908444
7          chirhoprecision.com             1    0.908011
8         gandjministorage.com             1    0.866434
9      ebay-payment-issues.com             1    0.847162
10       ruakunten.kadnanu.top             1    0.831892
11             boconceptla.com             1    0.883810
12       bahankuliahonline.com             1    0.851976
13      corewellnesshawaii.com             1    0.848730
14            fghjh.uioiuo.xyz             1    0.850209

Examples for TP Bin range: 0.91377157 - 0.9999896 , Num. Samples: 2613
                                       input  ground truth  prediction
0   u-dot-cedar-code-289917.nn.r.appspot.com             1    0.999989
1             netflix.error-with-billing.com             1    0.999882
2                         psych-k-online.com             1    0.999736
3                                pubg-as.com             1    0.915831
4                    unknowninfo-online.link             1    0.998208
5                 www.academiafleming.com.pe             1    0.984846
6                llojas-americanas.ezyro.com             1    0.999285
7               windowinstallationtoronto.ca             1    0.994219
8   leappsecurehalifacxappsecure.wikiamuz.ir             1    0.999947
9                  raokuten.co.jp.amozj.buzz             1    0.999732
10                     truenorthstrength.com             1    0.947873
11              services.runescape.com-zx.ru             1    0.999986
12                   www.bp-atualiza-app.com             1    0.997780
13                 fbcom-32601355.chekkos.mx             1    0.999840
14                                 hatac.net             1    0.994230

Phishing ULR examples:
Prediction on url: frgcxtmjawefgrthdcusge.dab 0.0037203317
Prediction on url: evilmadeupurl.phish 0.633814
Prediction on url: evil.madeupurl.phish 0.00034718902

Safe URL examples:
Prediction on url: google.com 0.4695877
Prediction on url: www.google.com 0.23997158
Prediction on url: gmail.google.com 6.3069674e-05
Prediction on url: mail.google.com 0.00013527935
Prediction on url: tudelft.nl 0.031315554
Prediction on url: brightspace.tudelft.nl 0.9944102
Prediction on url: colab.research.google.com 0.0001981094
Prediction on url: 00-gayrettepe-t3-8---00-gayrettepe-xrs-t2-1.statik.turktelekom.com.tr 0.0064572464

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published