# Hyperlink-Induced Topic Search to find scam account on twitter during arbitrum airdrop.

This works because of how the authority score and hub score function. A good authority publishes news that many hubs reference, while a good hub cites news from reliable authorities. In the case of the scam, a fake official Arbitrum account was created, and other scam accounts amplified it by sharing its link. This link contained a phishing site designed to steal information from users' wallets.

In [1]:
import os
import re
from bs4 import BeautifulSoup, Comment
from tqdm import tqdm
import pandas as pd
import networkx as nx
from pyvis.network import Network
import numpy as np
from networkx.algorithms.community import greedy_modularity_communities


# Get data  

Download data from https://osome.iu.edu/tools/networks/?hashtag=%23ARB,+%23Airdrop,+%23Arbitrum&network_type=rq&start_date=2023-03-16&end_date=2023-03-30​

In [2]:
file_path='Twitter_ARB_Airdrop_Arbitrum_Osome.csv'

In [3]:
data_df=pd.read_csv(file_path)
data_df

Unnamed: 0,id,source,target,tweet_id,type
0,riderbaken444AirdropDet,riderbaken444,AirdropDet,1639687149871308800,retweet
1,riderbaken444ArbitrumCharts,riderbaken444,ArbitrumCharts,1640509722947362816,retweet
2,riderbaken444Lumishare_Lumi,riderbaken444,Lumishare_Lumi,1636601129915289600,retweet
3,RKorneliusarbitnum,RKornelius,arbitnum,1641470108248809475,retweet
4,RKorneliusarbitruns,RKornelius,arbitruns,1641168281087123459,retweet
...,...,...,...,...,...
73775,welcomtoherearbiturum,welcomtohere,arbiturum,1639973761347735552,retweet
73776,welcomtoherearitbum,welcomtohere,aritbum,1641079789246201857,retweet
73777,globawingsCryptoTechDAO,globawings,CryptoTechDAO,1638161749043204098,retweet
73778,globawingsZenith_Swap,globawings,Zenith_Swap,1641546444988624899,retweet


# Generate the graph

In [4]:

# Create a NetworkX graph from the DataFrame
G = nx.DiGraph()

for _, row in tqdm(data_df.iterrows(),total=len(data_df)):
    G.add_edge(row['source'], row['target'], tweet_id=row['tweet_id'], type=row['type'])

100%|█████████████████████████████████████████████████████████████████████████| 73780/73780 [00:02<00:00, 27147.84it/s]


In [5]:
num_nodes = G.number_of_nodes()
num_edges = G.number_of_edges()
print('total',num_nodes,'nodes, and',num_edges,'edges')

total 20794 nodes, and 73780 edges


# In-degree analyze

With in-degree analyze, can already see the scam. But the offical CryptoTechDAO page got recognized in the top also

In [6]:
# Calculate in-degree and out-degree for each node
in_degrees = G.in_degree()
out_degrees = G.out_degree()

# Sort nodes based on in-degree and out-degree
sorted_by_in_degree =  pd.DataFrame(sorted(in_degrees, key=lambda x: x[1], reverse=True),columns=['Address','in_degree'])
sorted_by_out_degree = pd.DataFrame(sorted(out_degrees, key=lambda x: x[1], reverse=True),columns=['Address','out_degree'])

The scam account set the name look similar to atribtrum but small different (like arbrtum)

In [7]:
sorted_by_in_degree.head(10)

Unnamed: 0,Address,in_degree
0,arbitruns,6837
1,arbiturum,6717
2,aritbum,6566
3,arbitnum,5945
4,aribrtum,3189
5,CryptoTechDAO,2540
6,AirdropDet,1510
7,viannneey7,1176
8,META_STARx,893
9,Web_3space,735


And the officla in the position of 113

In [17]:
sorted_by_in_degree[sorted_by_in_degree['Address']=='arbitrum']

Unnamed: 0,Address,in_degree
94,arbitrum,113


# Sorted by out degree cannot solve much

In [20]:
sorted_by_out_degree.head(10)

Unnamed: 0,Address,out_degree
0,Ayse1060,22
1,Emblem3_Rock,21
2,moon15114,19
3,MuhdAlMuizz_,18
4,ezname30,17
5,IdGypro,17
6,VaiCommon,17
7,CryptoHero22,16
8,Davidlavy6,16
9,SpivaJake,16


# Now run HITS model 

Hyperlink-Induced Topic Search

In [23]:
hits = nx.hits(G)
result_hits_hub_df= pd.DataFrame(list(hits[0].items()), columns=['Name', 'Value'])
result_hits_hub_df=result_hits_hub_df.sort_values('Value',ascending=False).reset_index(drop=True)
#result_hits_hub_df['Link']='https://x.com/'+result_df['Name']
result_hits_hub_df['Hub_score']=result_hits_hub_df['Value']
result_hits_hub_df=result_hits_hub_df.drop('Value',axis=1)
result_hits_hub_df.head(100).to_csv('hub.csv')
result_hits_hub_df.head(10)

Unnamed: 0,Name,Hub_score
0,mares_sean,0.000165
1,Jj06123205,0.000165
2,Sue72136164,0.000165
3,fofo49310106,0.000165
4,bootyiseveryday,0.000165
5,bananakanna,0.000165
6,Charlie68Kirby,0.000165
7,78Marcus,0.000165
8,SKoss1,0.000165
9,John_Huey96,0.000165


Sorting by auth score revealed scam ones.

In [28]:
result_auth_root_df= pd.DataFrame(list(hits[1].items()), columns=['Name', 'Value'])
result_auth_root_df=result_auth_root_df.sort_values('Value',ascending=False).reset_index(drop=True)
#result_auth_root_df['Link']='https://x.com/'+result_df['Name']
result_auth_root_df['Auth_score']=result_auth_root_df['Value']
result_auth_root_df=result_auth_root_df.drop('Value',axis=1)
result_auth_root_df.head(100).to_csv('auth.csv')
result_auth_root_df.head(10)

Unnamed: 0,Name,Auth_score
0,arbitruns,0.221651
1,aritbum,0.213765
2,arbiturum,0.208127
3,arbitnum,0.193788
4,aribrtum,0.099525
5,viannneey7,0.024563
6,arditrums,0.012175
7,FFujissy,0.009502
8,free_ts,0.006061
9,mizyntochuj19,0.004013


but this time the CryptoTechDAO is not in the top anymore

In [29]:
result_auth_root_df[result_auth_root_df['Name']=='CryptoTechDAO']

Unnamed: 0,Name,Auth_score
67,CryptoTechDAO,1.545342e-07


and the offical arbitrum also far away from top 

In [25]:
result_auth_root_df[result_auth_root_df['Name']=='arbitrum']

Unnamed: 0,Name,Auth_score
123,arbitrum,3.273933e-09
