## Rekt vs. Hive
This notebook has some code to investigate the difference between Rekt users and Hive influencers. 

We wanted to understand why our articles first approach (based on quote tweets) in the Hive Ethereum cluster doesn't return the same feed that Rekt has. 



### Compare Users/Influencers
The feed is based on the users you use to curate the feed. So, the first step is to compare the users from each cluster

**note:** for rekt, the users are listed in the [crypto parlor](https://feed.rekt.news/parlor). I was too lazy to figure out how to make python scroll the page to load all the influencers, so I manually scrolled then copy/pasted the html from the page to a local file to load them in. 

In [5]:
import os
from bs4 import BeautifulSoup
import requests 
from dotenv import load_dotenv

load_dotenv()

## Loading Rekt Users

# local file with rekt hmtl of top 260 users in crypto parlor
with open("/home/nick/Documents/GitHub/Tweetscape/rekt_parlor.html", "r") as f:
    html = f.read()

soup = BeautifulSoup(html, 'html.parser')
rekt_users = []
for i in soup.find_all("a"):
    rekt_users.append(i.text[1:].lower()) # 
print(len(rekt_users))
rekt_set = set(rekt_users[:250]) # grabbing top 250 influencers (to match same number as Hive

260


In [9]:
## Loading Hive Eth influencers from api 

BORG_HEADER = {"Authorization": f"Token {os.environ['BORG_API_KEY']}"}
BORG_BASE_URL = "https://api.borg.id"

def get_all_cluster_influencers(
    cluster_name, sort_direction="desc", sort_by="score", max_page=None
):
    more_influencers = True
    cur_page = 0
    while more_influencers:
        print(f"requesting page {cur_page} for {cluster_name} cluster")
        res = requests.get(
            f"{BORG_BASE_URL}/influence/clusters/{cluster_name}/influencers/?page={cur_page}&sort_by={sort_by}&sort_direction={sort_direction}",
            headers=BORG_HEADER,
        )
        if res.status_code != 200:
            raise Exception(f"request failed: {res.text}")
        if "has_more" not in res.json():
            more_influencers = False
        for i_influencer in res.json()["influencers"]:
            yield i_influencer
        cur_page += 1
        if max_page and max_page <= cur_page:
            more_influencers = False
            print("hit max page")

hive_names = get_all_cluster_influencers("Ethereum",sort_direction="desc", sort_by="score", max_page=5)
hive_names = [x["social_account"]["social_account"]["screen_name"].lower() for x in hive_names]
print(len(hive_names))
hive_set = set(hive_names)

requesting page 0 for Ethereum cluster
requesting page 1 for Ethereum cluster
requesting page 2 for Ethereum cluster
requesting page 3 for Ethereum cluster
requesting page 4 for Ethereum cluster
hit max page
250


In [11]:
## Compare Top 50 Influencers from Each Cluster
rekt_top50 = set(rekt_users[:50])
hive_top50 = set(hive_names[:50])
inter = hive_top50.intersection(rekt_top50)

print(f"{len(inter)} users shared between rekt_top50 and hive_top50\n")

ranks = []
for i in inter:
    hive_rank = hive_names.index(i)
    rekt_rank = rekt_users.index(i)
    ranks.append((i, hive_rank, rekt_rank, abs(hive_rank-rekt_rank)))
    
# print a list showing (user, hive_rank, rekt_rank, the absolute value difference between the two)
for i, hive_rank, rekt_rank, diff in sorted(ranks, key=lambda x: x[3]):
    print(f"{i}, hive_rank: {hive_rank}, rekt_rank: {rekt_rank}, diff={abs(hive_rank-rekt_rank)}")

9 users shared between rekt_top50 and hive_top50

vitalikbuterin, hive_rank: 0, rekt_rank: 0, diff=0
tayvano_, hive_rank: 41, rekt_rank: 42, diff=1
samczsun, hive_rank: 20, rekt_rank: 17, diff=3
ethereumjoseph, hive_rank: 21, rekt_rank: 26, diff=5
sassal0x, hive_rank: 34, rekt_rank: 29, diff=5
danrobinson, hive_rank: 27, rekt_rank: 16, diff=11
gakonst, hive_rank: 4, rekt_rank: 23, diff=19
haydenzadams, hive_rank: 22, rekt_rank: 3, diff=19
hasufl, hive_rank: 24, rekt_rank: 1, diff=23


From the list above, we can see that the only influencer in the top 50 that rekt and hive agree on is VitalikButerin.  

The most striking differences to me are hasufl, gakonst, and haydenzadams, because they are all highly ranked in one cluster, but ranked back in the 20's in the other cluster. 

It is clear that Hive and Rekt are using a different approach to rank their users. 

This is further shown in the list below, which is the same comparison over the top 250 users. A good example is `zhusu`, whose hive_rank is 188, but rekt_rank is 4... that's a pretty massive disparity. 



In [12]:
## Number of Shared Influencers in top 250
inter_all = hive_set.intersection(rekt_set)
print(f"{len(inter_all)} users shared between rekt and hive\n")

ranks_all = []
for i in inter_all:
    hive_rank = hive_names.index(i)
    rekt_rank = rekt_users.index(i)
    ranks_all.append((i, hive_rank, rekt_rank, abs(hive_rank-rekt_rank)))
    
# print a list showing (user, hive_rank, rekt_rank, the absolute value difference between the two)
for i, hive_rank, rekt_rank, diff in sorted(ranks_all, key=lambda x: x[3]):
    print(f"{i}, hive_rank: {hive_rank}, rekt_rank: {rekt_rank}, diff={abs(hive_rank-rekt_rank)}")

101 users shared between rekt and hive

nanexcool, hive_rank: 58, rekt_rank: 58, diff=0
vitalikbuterin, hive_rank: 0, rekt_rank: 0, diff=0
tayvano_, hive_rank: 41, rekt_rank: 42, diff=1
samczsun, hive_rank: 20, rekt_rank: 17, diff=3
sassal0x, hive_rank: 34, rekt_rank: 29, diff=5
ethereumjoseph, hive_rank: 21, rekt_rank: 26, diff=5
andrewdarmacap, hive_rank: 220, rekt_rank: 214, diff=6
simondlr, hive_rank: 119, rekt_rank: 127, diff=8
evabeylin, hive_rank: 63, rekt_rank: 74, diff=11
danrobinson, hive_rank: 27, rekt_rank: 16, diff=11
ameensol, hive_rank: 39, rekt_rank: 51, diff=12
brockjelmore, hive_rank: 166, rekt_rank: 183, diff=17
gakonst, hive_rank: 4, rekt_rank: 23, diff=19
haydenzadams, hive_rank: 22, rekt_rank: 3, diff=19
owocki, hive_rank: 65, rekt_rank: 85, diff=20
bantg, hive_rank: 55, rekt_rank: 34, diff=21
hasufl, hive_rank: 24, rekt_rank: 1, diff=23
sunnya97, hive_rank: 101, rekt_rank: 126, diff=25
zooko, hive_rank: 52, rekt_rank: 79, diff=27
blockgeekdima, hive_rank: 79, rek