# Step 1: Assembling accounts for analysis

One result of my masters thesis was that a large corpus of tweets is required to train complex deep neural networks for the task of tweet engagement prediction.
Consequently, more accounts need to be considered for examination, as the time span should not be further expanded.

The following steps will be undertaken in this notebook:
1. Setup the API connection
2. Get all existing users
3. Add further tech accounts
4. Add further journalism accounts
5. Add further celebrity accounts
6. Filter duplicates and save to file

## Setup API connection

In [1]:
%load_ext autoreload
%autoreload 2

%matplotlib inline

In [2]:
from tep.accountCollector import AccountCollector

In [3]:
ac = AccountCollector()

{"created_at": "Thu May 01 12:37:22 +0000 2014", "description": "Student of Information Systems @TUDarmstadt , co-founder of a small web agency. Interested in Machine Learning", "favourites_count": 393, "followers_count": 57, "friends_count": 221, "id": 2472450259, "id_str": "2472450259", "lang": "en", "listed_count": 7, "location": "Darmstadt, Deutschland", "name": "Felix Peters", "profile_background_color": "C0DEED", "profile_background_image_url": "http://abs.twimg.com/images/themes/theme1/bg.png", "profile_background_image_url_https": "https://abs.twimg.com/images/themes/theme1/bg.png", "profile_image_url": "http://pbs.twimg.com/profile_images/600953861629734913/7y_RkdW4_normal.jpg", "profile_image_url_https": "https://pbs.twimg.com/profile_images/600953861629734913/7y_RkdW4_normal.jpg", "profile_link_color": "224F82", "profile_sidebar_border_color": "C0DEED", "profile_sidebar_fill_color": "DDEEF6", "profile_text_color": "333333", "profile_use_background_image": true, "screen_name"

## Get existing accounts

In [4]:
# retrieve all Twitter list for this account
all_lists = ac.get_all_lists()
all_lists

[List(ID=922818913909518338, FullName='@_fpeters/us-politicians', Slug=us-politicians, User=_fpeters),
 List(ID=922802414054461440, FullName='@_fpeters/celebrities', Slug=celebrities, User=_fpeters),
 List(ID=914828785647841280, FullName='@_fpeters/fortune-500', Slug=fortune-500, User=_fpeters)]

In [5]:
# get all users from these lists
all_users = []
for l in all_lists:
    all_users += ac.get_users_from_list(l.id)
len(all_users)

772

### Add further tech accounts

In [14]:
!ls data

celebrity_accounts.txt  journalist_accounts.txt tech_accounts.txt


In [15]:
tech_accounts = ac.load_users_from_file(fname="data/tech_accounts.txt")
len(tech_accounts)

User labusque could not be loaded.


98

In [16]:
all_users += tech_accounts
len(all_users)

870

### Add further journalism accounts

In [17]:
# get pre-collected accounts
journalism_accounts = ac.load_users_from_file("data/journalist_accounts.txt")
len(journalism_accounts)

33

In [18]:
all_users += journalism_accounts
len(all_users)

903

In [19]:
# get accounts assembled by C-SPAN
ac.api.GetLists(screen_name="cspan")

[List(ID=983390155703832576, FullName='@cspan/house-commerce-cmte', Slug=house-commerce-cmte, User=cspan),
 List(ID=983383749613228032, FullName='@cspan/sen-commerce-judiciary', Slug=sen-commerce-judiciary, User=cspan),
 List(ID=884477830062641153, FullName='@cspan/the-cabinet', Slug=the-cabinet, User=cspan),
 List(ID=816275409931210752, FullName='@cspan/new-members-of-congress', Slug=new-members-of-congress, User=cspan),
 List(ID=234326967, FullName='@cspan/political-reporters', Slug=political-reporters, User=cspan),
 List(ID=166477976, FullName='@cspan/military-reporters', Slug=military-reporters, User=cspan),
 List(ID=105140167, FullName='@cspan/foreign-leaders', Slug=foreign-leaders, User=cspan),
 List(ID=67564101, FullName='@cspan/supreme-court-reporters', Slug=supreme-court-reporters, User=cspan),
 List(ID=42362748, FullName='@cspan/congressional-committees', Slug=congressional-committees, User=cspan),
 List(ID=34179516, FullName='@cspan/members-of-congress', Slug=members-of-cong

In [20]:
journalism_accounts = ac.get_users_from_list(list_id=234326967)
len(journalism_accounts)

138

In [21]:
all_users += journalism_accounts
len(all_users)

1041

## Add further celebrity accounts

In [22]:
celeb_accounts = ac.load_users_from_file("data/celebrity_accounts.txt")
len(celeb_accounts)

34

In [23]:
all_users += celeb_accounts
len(all_users)

1075

## Remove duplicates and save to file

In [24]:
import numpy as np

In [25]:
user_ids = [u.id for u in all_users]
len(user_ids)

1075

In [26]:
# remove duplicate IDs
user_ids = np.unique(user_ids)
len(user_ids)

1061

In [27]:
# save user IDs to file
from tep.utils import save_as_text
save_as_text(data=user_ids, filename="data/user_ids.txt")