# Anonymous Twitter User Collection

Jennifer Shan

1. [Collecting followers and following](#collect)
2. [Checking for alphanumeric variations of 'anonymous' and 'legion'](#filter)
3. [Applying DecisionTreeClassifier and RandomForestClassifier](#apply)
4. [Collecting Anonymous-affiliated tweets](#anon)
5. [Collecting randomly sampled tweets](#random)

In [None]:
bearer_token = ''

In [None]:
consumer_key = ''
consumer_secret = ''

In [None]:
access_token = ''
access_token_secret = ''

In [2]:
import tweepy

In [3]:
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)

api = tweepy.API(auth, wait_on_rate_limit = True)

We specify our seed accounts.

In [4]:
ausers = ['AnonyOps', 'YourAnonNews', 'YourAnonCentral', 'AnonPress']

<a id="collect"></a>
**1. Collecting followers and following**  
We collect a list of followers and a list of followed users.

In [6]:
import json

We only grab 5000 followers for every seed account.

In [197]:
flwrs = {}

for a in ausers:
    flwrs[a] = []
    for page in tweepy.Cursor(api.get_follower_ids, screen_name = a).pages():
        flwrs[a].extend(page)
        break

In [122]:
flwng = {}

for a in ausers:
    flwng[a] = []
    for page in tweepy.Cursor(api.get_friend_ids, screen_name = a).pages():
        flwng[a].extend(page)

Let's check what we're dealing with.

In [198]:
for a in ausers:
    print(a, len(flwrs[a]), len(flwng[a]))

AnonyOps 5000 1141
YourAnonNews 5000 807
YourAnonCentral 5000 665
AnonPress 5000 101


We don't want to run this again.

In [199]:
with open('flwrs.json', 'w') as f:
    json.dump(flwrs, f)
with open('flwng.json', 'w') as f:
    json.dump(flwng, f)

<a id="filter"></a>
**2. Checking for alphanumeric variations of 'anonymous' and 'legion'**  
We can filter out variations of 'anonymous' and 'legion' from our users since these are probably Anonymous-affiliated.

In [13]:
flwrs = json.load(open('flwrs.json'))
flwng = json.load(open('flwng.json'))

In [11]:
variations = ('anonymous an0nym0u5 anonymou5 an0nymous anonym0us anonym0u5 '
              'an0nymou5 an0nym0us anony legion leg1on an0ny anon l3gion legi0n '
              'l3g1on leg10n an0n le3gi0n l3g10n')
variations = variations.split()

We check for variations of 'anonymous' and 'legion' in username or screenname and description.

In [25]:
anonfwr = {}

for a in ausers:
    anonfwr[a] = []
    for u in flwrs[a]:
        try:
            user = api.get_user(user_id = u)
            name = user.name.lower()
            scrname = user.screen_name.lower()
            if any(var in name for var in variations) or any(var in scrname for var in variations):
                anonfwr[a].append(u)
        except tweepy.errors.Forbidden as e:
            print(e)
        except tweepy.errors.NotFound as e:
            print(e)

Rate limit reached. Sleeping for: 421


403 Forbidden
63 - User has been suspended.
404 Not Found
50 - User not found.


Rate limit reached. Sleeping for: 647


404 Not Found
50 - User not found.


Rate limit reached. Sleeping for: 650
Rate limit reached. Sleeping for: 661


403 Forbidden
63 - User has been suspended.


Rate limit reached. Sleeping for: 662
Rate limit reached. Sleeping for: 711


403 Forbidden
63 - User has been suspended.
403 Forbidden
63 - User has been suspended.
404 Not Found
50 - User not found.
404 Not Found
50 - User not found.


Rate limit reached. Sleeping for: 714


404 Not Found
50 - User not found.
404 Not Found
50 - User not found.
404 Not Found
50 - User not found.
404 Not Found
50 - User not found.


Rate limit reached. Sleeping for: 727


404 Not Found
50 - User not found.
404 Not Found
50 - User not found.
404 Not Found
50 - User not found.


Rate limit reached. Sleeping for: 727


404 Not Found
50 - User not found.
404 Not Found
50 - User not found.


Rate limit reached. Sleeping for: 728


404 Not Found
50 - User not found.
403 Forbidden
63 - User has been suspended.
403 Forbidden
63 - User has been suspended.
404 Not Found
50 - User not found.


Rate limit reached. Sleeping for: 726


403 Forbidden
63 - User has been suspended.
403 Forbidden
63 - User has been suspended.
403 Forbidden
63 - User has been suspended.


Rate limit reached. Sleeping for: 697


404 Not Found
50 - User not found.
404 Not Found
50 - User not found.
404 Not Found
50 - User not found.
404 Not Found
50 - User not found.
403 Forbidden
63 - User has been suspended.
403 Forbidden
63 - User has been suspended.


Rate limit reached. Sleeping for: 733


404 Not Found
50 - User not found.
404 Not Found
50 - User not found.


Rate limit reached. Sleeping for: 712


403 Forbidden
63 - User has been suspended.
404 Not Found
50 - User not found.


Rate limit reached. Sleeping for: 716


403 Forbidden
63 - User has been suspended.
404 Not Found
50 - User not found.


Rate limit reached. Sleeping for: 713


404 Not Found
50 - User not found.
404 Not Found
50 - User not found.


Rate limit reached. Sleeping for: 710


403 Forbidden
63 - User has been suspended.
404 Not Found
50 - User not found.
404 Not Found
50 - User not found.


Rate limit reached. Sleeping for: 708


404 Not Found
50 - User not found.
404 Not Found
50 - User not found.


Rate limit reached. Sleeping for: 683


404 Not Found
50 - User not found.


Rate limit reached. Sleeping for: 726
Rate limit reached. Sleeping for: 717
Rate limit reached. Sleeping for: 683


In [26]:
anonfwg = {}

for a in ausers:
    anonfwg[a] = []
    for u in flwng[a]:
        try:
            user = api.get_user(user_id = u)
            name = user.name.lower()
            scrname = user.screen_name.lower()
            if any(var in name for var in variations) or any(var in scrname for var in variations):
                anonfwg[a].append(u)
        except tweepy.errors.Forbidden as e:
            print(e)
        except tweepy.errors.NotFound as e:
            print(e)

Rate limit reached. Sleeping for: 691


404 Not Found
50 - User not found.


Rate limit reached. Sleeping for: 686
Rate limit reached. Sleeping for: 684


In [27]:
for a in ausers:
    print(a, len(anonfwr[a]), len(anonfwg[a]))

AnonyOps 140 47
YourAnonNews 53 82
YourAnonCentral 41 66
AnonPress 101 9


We don't want to run this again.

In [28]:
with open('anonfwr1.json', 'w') as f:
    json.dump(anonfwr, f)
with open('anonfwg1.json', 'w') as f:
    json.dump(anonfwg, f)

<a id="apply"></a>
**3. Applying DecisionTreeClassifier and RandomForestClassifier**  
We can use accounts identified as Anonymous-affiliated to train these classifiers.

In [14]:
import pandas as pd
import sklearn

In [7]:
anonfwr1 = json.load(open('anonfwr1.json'))
anonfwg1 = json.load(open('anonfwg1.json'))

In [12]:
users = []
for a in ausers:
    users.extend(anonfwr1[a])
    users.extend(anonfwg1[a])
print(len(users))

539


In [16]:
df = pd.DataFrame(users)
df

Unnamed: 0,0
0,1434935899645566976
1,1469439375637241858
2,1429176781777555457
3,1401080776817471488
4,1468690875937169411
...,...
534,373157754
535,279390084
536,225712501
537,225235528


<a id="anon"></a>
**4. Collecting Anonymous-affiliated tweets that include emojis**

<a id="random"></a>
**5. Collecting randomly sampled tweets that include emojis**