# Creative extension analysis notebook

---

**Authors**

- Jérémy Bensoussan
- Ekaterina Kryukova
- Jules Triomphe

---

## Abstract

While the paper examines the exposure hypothesis for all topics, we propose to analyze and compare political and food related tweets. To do so, we plan to obtain egos, alters and their timelines from Twitter’s API by generating $3\times30,000$ random numbers in a range from $0$ to $3,000,000,000$. Then, we intend to identify egos’ retweets using official RT, classify tweets by topics based on hashtags (and keywords if the dataset is lacking content) and create a follower/followee graph. To calculate the probability of retweeting alters’ tweets by egos we will use a more solid approach then what is described in the paper “Differences in the Mechanics of Information Diffusion Across Topics” where the probability is equal to the number of users that were k times exposed to a hashtag and retweeted before the ($k+1$)-th exposure, divided by the number of users that were k-times exposed to the hashtag. Finally, we plan to visualize results as well as analyze the probability by breaking down users based on betweenness, clustering coefficient and number of followees.

## Research Questions

1. Is there a significant difference between the probability of retweeting when the tweet is about food and when it is about politics ?
2. Is there a significant difference in the number of times a tweet is retweeted depending on whether it is about food or politics ?
3. Is there a relation between user betweenness, size of cluster or number of friends with the retweet probabilities ?

## Proposed dataset

Self-collected (with the Twitter API) ego and alter timelines with all tweet fields from the [**GET /2/users** endpoint](https://developer.twitter.com/en/docs/twitter-api/users/lookup/api-reference/get-users) and all user fields except for `profile_image_url`.

## Methods

### Data collection

We will sign up for Twitter’s API to collect data. We will generate $3\times30,000$ random numbers in a range from $0$ to $3,000,000,000$ as in the paper and use the [**GET /2/users** endpoint](https://developer.twitter.com/en/docs/twitter-api/users/lookup/api-reference/get-users) to collect active and public user information with all tweet fields and all user fields except for profile_image_url.

### Building the network

We will use networkx to build a directed network of followers  in which nodes are users (egos and alters) and edges are the following relationships (without the following relationships among alters). Next, we will build another network where relationships among alters of active egos are included.

### Calculating retweet probability

For each ego, we will count the number of followees who have retweeted a post (exposures) on a certain topic at a certain date. We will get the information like this: an ego $i$ was exposed to $200$ posts about this topic only once, among which $i$ retweeted $50$ (probability is $50/200 = 25\%$); at the same time, $i$ was exposed to $100$ posts about this topic twice, among which $i$ retweeted $50$ ($probability = 50\%$); … Finally, we will calculate a sequence of probability for each ego. (Same procedure as in the paper.) If there is sufficient data, we will apply a t-test to identify whether the distribution of retweets for each exposure count is significantly different from one topic to the other. We will also try to compare the distribution of retweet probabilities between topics.

### Community detection

We will use the second ego networks to compute clustering coefficients and betweenness of the active egos.

### Data analysis

We will compare the retweet probabilities based on betweenness, number of followers and clustering for each topic, much like in *Figure 6* in the paper.

---

## Initialisation

Import modules.

In [1]:
# Import libraries
import random
import numpy as np
import pandas as pd
from tqdm.autonotebook import tqdm, trange
import requests
import time
from datetime import datetime
import os

  from tqdm.autonotebook import tqdm, trange


Setup automatic formatting (requires the `nb-black` package).

In [2]:
# Enable auto-formatting

%load_ext lab_black

### Control center

**This is the control center. All operations are decided here to avoid memory overflow and excessive computation times. This notebook should be run FROM THE TOP once these parameters have been set.** If in doubt, ask Jules ;)

In [3]:
# Define constants

# UserID range
LOWER_ID_N = 0
UPPER_ID_N = int(3e9)

# UserID number
N_UID_PER_REQUEST = int(3e4)
N_UID_REQUESTS = 3

# --------------------------------------------------

# Choose whether to generate new UserIDs
CREATE_NEW_UIDs = False

# Choose whether to collect user data
COLLECT_USER_DATA = False
# Select the batch to query if collecting user data
REQUEST_NUMBER = 2
# Define behaviour depending on the run number.
# If this is True then COLLECT_USER_DATA must be True
FIRST_RUN = False

# Chooser whether to create user subset files
CREATE_USER_SUBSETS = False

# Choose whether to create/reset data pull status
CREATE_DATA_PULL_STATUS = False
# Choose whether to pull new data and save it
PULL_NEW_TIMELINE_DATA = False
PULL_NEW_FF_DATA = True
N_RUNS_PULL_FF_DATA = 50
# Choose whether to save newly pulled data
SAVE_PULLED_DATA = True

# --------------------------------------------------

# Data folder location
DATA_FOLDER = "./data/"
# UIDs
UIDS_FILE = DATA_FOLDER + "uids.csv"
# User files
USERS_FOLDER = DATA_FOLDER + "users/"
USERS_FILE = USERS_FOLDER + "users.csv"
PUBLIC_USERS_FILE = USERS_FOLDER + "public_users.csv"
PUBLIC_USERS_W_TWEETS_FILE = USERS_FOLDER + "public_users_w_tweets.csv"
PUBLIC_USERS_W_FOLLOWERS_FILE = USERS_FOLDER + "public_users_w_followers.csv"
PUBLIC_USERS_W_FRIENDS_FILE = USERS_FOLDER + "public_users_w_friends.csv"
# Pulled data
PUBLIC_USERS_PULL_STATUS_FILE = DATA_FOLDER + "public_users_pull_status_ff.csv"
# Timeline files
TIMELINES_FOLDER = DATA_FOLDER + "timelines/"
PUBLIC_USERS_TIMELINES_FILE = TIMELINES_FOLDER + "public_users_timelines.csv"
# Network
NETWORK_FOLDER = DATA_FOLDER + "network/"
PUBLIC_USERS_FOLLOWERS_FILE = NETWORK_FOLDER + "public_users_followers.csv"
PUBLIC_USERS_FRIENDS_FILE = NETWORK_FOLDER + "public_users_friends.csv"
# File containing the bearer token
BEARER_TOKEN = DATA_FOLDER + "bearer_token.auth"

# API endpoints
API_USERS_ENDPOINT = "https://api.twitter.com/2/users?ids="
API_USER_FIELDS = "user.fields=created_at,description,entities,id,location,name,pinned_tweet_id,protected,public_metrics,url,username,verified,withheld"
API_TWEET_FIELDS = "tweet.fields=attachments,author_id,context_annotations,conversation_id,created_at,entities,geo,id,in_reply_to_user_id,lang,non_public_metrics,public_metrics,organic_metrics,promoted_metrics,possibly_sensitive,referenced_tweets,reply_settings,source,text,withheld"
API_V1_RATE_LIMITS = "https://api.twitter.com/1.1/application/rate_limit_status.json?resources=application,statuses,followers,friends"
API_USER_TIMELINE_ENDPOINT = "https://api.twitter.com/1.1/statuses/user_timeline.json"
API_FOLLOWERS_IDS_ENDPOINT = "https://api.twitter.com/1.1/followers/ids.json"
API_FRIENDS_IDS_ENDPOINT = "https://api.twitter.com/1.1/friends/ids.json"

# Random seed
SEED = 30
random.seed(SEED)

---

## Data collection

In this part, we will generate random user IDs and collect their respective user information if they exist.

### UID generation

Let's create random UIDs in the 0-3 billion range as discussed in the abstract.

We reshape them to simplify queries due to Twitter's API's rate limits.

If they have already been generated, we load them.

In [4]:
if CREATE_NEW_UIDs:
    uids = pd.DataFrame(
        np.array(
            random.sample(
                range(LOWER_ID_N, UPPER_ID_N), N_UID_PER_REQUEST * N_UID_REQUESTS
            )
        ).reshape(N_UID_PER_REQUEST, N_UID_REQUESTS)
    )
    uids.to_csv(UIDS_FILE, index=False)
else:
    uids = pd.read_csv(UIDS_FILE)

### Token load

To query Twitter's API, we need a bearer token which we load.

In [5]:
# Load bearer token
with open(BEARER_TOKEN, "r") as file:
    token = file.readline().strip("\n")

# Define authentication header
headers = {"Authorization": "Bearer " + token}

### User data collection

In this section, we will get user data from Twitter's API.

First we define a few helper functions.

In [6]:
def wait_for_reset(r):
    print(
        "Current time: {} (UTC)".format(datetime.utcnow().strftime("%Y-%m-%d %H:%M:%S"))
    )
    # Get reset time (Unix format)
    ts = int(r.headers["x-rate-limit-reset"])
    ts_str = datetime.fromtimestamp(ts).strftime("%Y-%m-%d %H:%M:%S")
    # Compute difference between current time and reset time
    sleep_time = (datetime.fromtimestamp(ts) - datetime.utcnow()).total_seconds()
    if sleep_time > 0:
        print("Waiting until {} (UTC) for the rate limit to reset.".format(ts_str))
        time.sleep(sleep_time)
    else:
        print("Reset time was: {} (UTC)".format(ts_str))
    print("Resuming user data collection")


def get_user_data(req, headers=headers, wait=True):
    # Query the user data
    r = requests.get(req, headers=headers)
    if wait & (int(r.headers["x-rate-limit-remaining"]) == 0):
        wait_for_reset(r)

        # Query the user data
        r = requests.get(req, headers=headers)

    return r


def get_user_data_df(r):
    df = pd.DataFrame(
        r.json()["data"],
        columns=[
            "id",
            "username",
            "name",
            "protected",
            "withheld",
            "verified",
            "created_at",
            "location",
            "public_metrics",
            "description",
            "url",
            "entities",
            "pinned_tweet_id",
        ],
        # Replace NaNs by empty strings to facilitate pre-processing
    ).fillna("")
    return df

Now, let's query the API.

In [7]:
# Get user data
if COLLECT_USER_DATA:
    for i in trange(N_UID_PER_REQUEST // 100):
        # Get 100 UserIDs (limit per request as defined by Twitter)
        users = uids.values[i * 100 : (i + 1) * 100, REQUEST_NUMBER]
        # Define the request URL
        req = (
            API_USERS_ENDPOINT
            + ",".join([str(user) for user in users])
            + "&"
            + API_USER_FIELDS
            + "&"
            + API_TWEET_FIELDS
        )

        # Create the dataframe on the first iteration
        if i == 0:
            # Query the user data
            r = get_user_data(req)

            # If the rate limit is not maximal, then wait for the reset to occur
            # (max 15 minutes)
            if int(r.headers["x-rate-limit-remaining"]) != 299:
                wait_for_reset(r)

                # Query the user data
                r = get_user_data(req)

            raw_user_data = get_user_data_df(r)
        # Append to existing dataframe on other iterations
        # but do not wait for reset for the last iteration
        elif i == 299:
            # Query the user data
            r = get_user_data(req, wait=False)
            # Append new data to existing dataframe
            raw_user_data = raw_user_data.append(get_user_data_df(r))
        else:
            # Query the user data
            r = get_user_data(req)
            # Append new data to existing dataframe
            raw_user_data = raw_user_data.append(get_user_data_df(r))

There are a few important data points we will need to the next parts so we extract them here along with any others they are grouped with.

In [8]:
# Preprocess the data


def get_key_val(x, key):
    """Get dictionary value from key if it exists, otherwise return an empty string."""
    if key in x:
        return x[key]
    else:
        return ""


def get_public_metrics(df):
    """Extract the data from the public_metrics column"""
    for metric in ["followers_count", "following_count", "tweet_count", "listed_count"]:
        df[metric] = df.public_metrics.apply(lambda x: get_key_val(x, metric))
    df.pop("public_metrics")
    return df


def get_entities(df):
    """Extract the data from the entity column"""
    for entity in ["url", "description"]:
        df["entities_" + entity] = df.entities.apply(lambda x: get_key_val(x, entity))
    df.pop("entities")
    return df


if COLLECT_USER_DATA:
    raw_user_data = get_public_metrics(raw_user_data)
    raw_user_data = get_entities(raw_user_data)
    raw_user_data = raw_user_data.astype(
        {"id": int, "protected": bool, "verified": bool}
    )
    print("Number of valid users: {:,}".format(raw_user_data.shape[0]))
    raw_user_data

We need all of the user data available for the next parts, so we append the generated data (if any) to pre-existing user data and we save the data frame.

In [9]:
# Load user data if it exists
if os.path.isfile(USERS_FILE) and not FIRST_RUN:
    user_data = pd.read_csv(
        USERS_FILE,
        dtype={"id": int, "protected": bool, "verified": bool},
        lineterminator="\n",
    )
    if COLLECT_USER_DATA:
        user_data = user_data.append(raw_user_data)
else:
    user_data = raw_user_data

if COLLECT_USER_DATA:
    # Save data to disk
    user_data.to_csv(USERS_FILE, index=False)

# Print statistics
print("Total number of valid users: {:,}".format(user_data.shape[0]))

Total number of valid users: 33,520


### User subset definition

We define and save groups of users to facilitate data manipulation later on.

In [10]:
# Extract user subsets
if CREATE_USER_SUBSETS:
    # Public users
    public_users = user_data[~user_data.protected].copy()
    # Public users with tweets
    # Tweet count includes retweets
    public_users_w_tweets = user_data[
        ~user_data.protected & (user_data.tweet_count > 0)
    ].copy()
    # Public users with followers
    public_users_w_followers = user_data[
        ~user_data.protected & (user_data.followers_count > 0)
    ].copy()
    # Public users with friends
    public_users_w_friends = user_data[
        ~user_data.protected & (user_data.following_count > 0)
    ].copy()

    print("Number of public users: {:,}".format(public_users.shape[0]))
    print(
        "Number of public users with tweets: {:,}".format(
            public_users_w_tweets.shape[0]
        )
    )
    print(
        "Number of public users with followers: {:,}".format(
            public_users_w_followers.shape[0]
        )
    )
    print(
        "Number of public users with friends: {:,}".format(
            public_users_w_friends.shape[0]
        )
    )

    public_users.to_csv(PUBLIC_USERS_FILE, index=False)
    public_users_w_tweets.to_csv(PUBLIC_USERS_W_TWEETS_FILE, index=False)
    public_users_w_followers.to_csv(PUBLIC_USERS_W_FOLLOWERS_FILE, index=False)
    public_users_w_friends.to_csv(PUBLIC_USERS_W_FRIENDS_FILE, index=False)

else:
    public_users = pd.read_csv(
        PUBLIC_USERS_FILE,
        dtype={"id": int, "protected": bool, "verified": bool},
        lineterminator="\n",
    )
    public_users_w_tweets = pd.read_csv(
        PUBLIC_USERS_W_TWEETS_FILE,
        dtype={"id": int, "protected": bool, "verified": bool},
        lineterminator="\n",
    )
    public_users_w_followers = pd.read_csv(
        PUBLIC_USERS_W_FOLLOWERS_FILE,
        dtype={"id": int, "protected": bool, "verified": bool},
        lineterminator="\n",
    )
    public_users_w_friends = pd.read_csv(
        PUBLIC_USERS_W_FRIENDS_FILE,
        dtype={"id": int, "protected": bool, "verified": bool},
        lineterminator="\n",
    )

### Data pull status generation

As there are many queries to make, we create here a dataframe to be able to keep track of what data was already pulled and what data still needs to be pulled.

In [11]:
# Create pull status dataframe
if CREATE_DATA_PULL_STATUS:
    # Use public metrics to define limits
    user_data_pull_status = public_users[
        ["id", "followers_count", "following_count", "tweet_count"]
    ].copy()

    # Define parameters for API queries
    user_data_pull_status["timeline_lowest_id"] = 0
    user_data_pull_status["timeline_tweets_pulled"] = 0

    user_data_pull_status["followers_cursor"] = -1
    user_data_pull_status["followers_pulled"] = 0

    user_data_pull_status["following_cursor"] = -1
    user_data_pull_status["following_pulled"] = 0

    # Change column order for easier visualization
    user_data_pull_status = user_data_pull_status[
        [
            "id",
            "timeline_lowest_id",
            "timeline_tweets_pulled",
            "tweet_count",
            "followers_cursor",
            "followers_pulled",
            "followers_count",
            "following_cursor",
            "following_pulled",
            "following_count",
        ]
    ]

    # Set id column to index
    user_data_pull_status = user_data_pull_status.set_index("id")

else:
    user_data_pull_status = pd.read_csv(
        PUBLIC_USERS_PULL_STATUS_FILE,
        dtype=int,
        index_col="id",
        lineterminator="\n",
    )

### User timeline collection

As part of our analysis, we need to collect users' timelines. This is what we do here.

First we define a few helper functions whose names are pretty explicit, then we move on to actually query the data before saving it along with the data pull status.

In [12]:
def get_user_timeline_rate_limit():
    r = requests.get(API_V1_RATE_LIMITS, headers=headers)
    remaining = r.json()["resources"]["statuses"]["/statuses/user_timeline"][
        "remaining"
    ]
    # Get Unix timestamp
    reset_ts = r.json()["resources"]["statuses"]["/statuses/user_timeline"]["reset"]
    # Convert to string
    reset_time = datetime.fromtimestamp(reset_ts).strftime("%Y-%m-%d %H:%M:%S")
    return remaining, reset_time


def get_initial_timeline_df():
    return pd.DataFrame(
        columns=[
            "user_id",
            "user",
            "id",
            "created_at",
            "text",
            "in_reply_to_status_id",
            "in_reply_to_user_id",
            "source",
            "truncated",
            "coordinates",
            "place",
            "is_quote_status",
            "quoted_status_id",
            "quoted_status",
            "quote_count",
            "retweeted_status",
            "retweet_count",
            "favorite_count",
            "entities",
            "extended_entities",
            "possibly_sensitive",
            "lang",
        ]
    )


def get_user_timeline_df(r):
    df = pd.DataFrame(
        r.json(),
        columns=[
            "user_id",
            "user",
            "id",
            "created_at",
            "text",
            "in_reply_to_status_id",
            "in_reply_to_user_id",
            "source",
            "truncated",
            "coordinates",
            "place",
            "is_quote_status",
            "quoted_status_id",
            "quoted_status",
            "quote_count",
            "retweeted_status",
            "retweet_count",
            "favorite_count",
            "entities",
            "extended_entities",
            "possibly_sensitive",
            "lang",
        ],
        # Replace NaNs by empty strings to facilitate pre-processing
    ).fillna("")
    # Fill in user_id with tweet UserID
    df.user_id = df.user.apply(lambda x: x["id"])
    return df


def get_user_timeline(query_n, user_id, max_id, count):
    req = API_USER_TIMELINE_ENDPOINT
    params = {
        "user_id": str(user_id),
        "count": str(count),
        "include_rts": "1",
    }
    if max_id > 0:
        params.update({"max_id": str(max_id - 1)})

    r = requests.get(req, headers=headers, params=params)
    df = get_user_timeline_df(r)

    n_tweets_pulled = len(r.json())
    if n_tweets_pulled < count:
        print(r.url)
        print(
            "Query {:,} -- ".format(query_n + 1).ljust(15)
            + "User {}: got {:,} tweets instead of {:,}.".format(
                user_id, n_tweets_pulled, count
            )
        )
        lowest_id = -1
    if df.shape[0] > 0:
        lowest_id = int(df.id.min())

    return df, lowest_id, n_tweets_pulled

Having defined our helper functions, we now create an empty dataframe for our user timeline data and query the API for as much data as possible until we hit the rate limit (similar sections are run multiple times (days...) to query all of the necessary data).

In [13]:
tmp_user_timeline_data = get_initial_timeline_df()

if PULL_NEW_TIMELINE_DATA:

    # Get the number of available queries and rate limit reset time
    query_quota, reset_time = get_user_timeline_rate_limit()

    # Get users with tweets left to pull
    user_timelines_to_pull = user_data_pull_status[
        (user_data_pull_status.tweet_count > 0)
        & (
            user_data_pull_status.timeline_tweets_pulled
            < user_data_pull_status.tweet_count
        )
    ]

    n_queries = min(query_quota, user_timelines_to_pull.shape[0])
    print("Executing {:,} queries.".format(n_queries))
    for query_n in trange(n_queries):

        # Get query parameters
        user_id = user_timelines_to_pull.index[query_n]
        max_id = user_timelines_to_pull.loc[user_id, "timeline_lowest_id"]
        # A 200-tweet limit is set by Twitter per request
        count = min(
            user_timelines_to_pull.loc[user_id, "tweet_count"]
            - user_timelines_to_pull.loc[user_id, "timeline_tweets_pulled"],
            200,
        )

        # Get user timeline data and statistics
        raw_user_timeline_data, lowest_id, n_tweets_pulled = get_user_timeline(
            query_n, user_id, max_id, count
        )

        # Append to existing user timeline data
        tmp_user_timeline_data = tmp_user_timeline_data.append(raw_user_timeline_data)

        # Update pull status
        user_data_pull_status.loc[user_id, "timeline_lowest_id"] = lowest_id
        if user_timelines_to_pull.loc[user_id, "timeline_tweets_pulled"] == 0:
            user_data_pull_status.loc[user_id, "timeline_tweets_pulled"] = count
        else:
            user_data_pull_status.loc[user_id, "timeline_tweets_pulled"] += count

    print("Next reset time: {} (UTC)".format(reset_time))

Now we append the temporary dataframe to our dataset.

In [14]:
# # Define user timelines dataframe
# if os.path.isfile(PUBLIC_USERS_TIMELINES_FILE) and not CREATE_DATA_PULL_STATUS:
#     user_timeline_data = pd.read_csv(
#         PUBLIC_USERS_TIMELINES_FILE,
#         dtype={"coordinates": "object", "place": "object", "quoted_status": "object"},
#         lineterminator="\n",
#     )
#     user_timeline_data = user_timeline_data.append(tmp_user_timeline_data)
# else:
#     user_timeline_data = tmp_user_timeline_data

We are interested in obtaining some statistics on the pulled data.

For consistency, we check whether the number of collected tweets is unique. As a measure of progress in our data collection efforts, we also report how many unique user timelines have been queried. As we cycle through users however, we will have queried parts of timelines for all users before we have queried all tweets from their timelines.

In [15]:
# if PULL_NEW_TIMELINE_DATA:
#     print(
#         "Number of collected tweets: {:,} ({:,} unique) out of {:,}.\nNumber of unique users: {:,} (out of {:,}).".format(
#             user_timeline_data.shape[0],
#             len(np.unique(user_timeline_data.id.values)),
#             np.sum(user_data_pull_status.tweet_count.values),
#             len(np.unique(user_timeline_data.user_id.values)),
#             public_users_w_tweets.shape[0],
#         )
#     )

To be able to use and share this data, we save it to disk.

In [16]:
# if SAVE_PULLED_DATA:
#     # Save data to disk
#     user_timeline_data.to_csv(PUBLIC_USERS_TIMELINES_FILE, index=False)
#     user_data_pull_status.to_csv(PUBLIC_USERS_PULL_STATUS_FILE)

### Follower & friends IDs collection

To build our ego network, we need to collect users' followers' & friends' IDs.

First we define a few helper functions, then we move on to actually query the data before saving it along with the data pull status.

Functions to query rate limits.

In [17]:
def get_followers_ids_rate_limit():
    r = requests.get(API_V1_RATE_LIMITS, headers=headers)
    remaining = r.json()["resources"]["followers"]["/followers/ids"]["remaining"]
    # Get Unix timestamp
    reset_ts = r.json()["resources"]["followers"]["/followers/ids"]["reset"]
    # Convert to string
    reset_time = datetime.fromtimestamp(reset_ts).strftime("%Y-%m-%d %H:%M:%S")
    return remaining, reset_time


def get_friends_ids_rate_limit():
    r = requests.get(API_V1_RATE_LIMITS, headers=headers)
    remaining = r.json()["resources"]["friends"]["/friends/ids"]["remaining"]
    # Get Unix timestamp
    reset_ts = r.json()["resources"]["friends"]["/friends/ids"]["reset"]
    # Convert to string
    reset_time = datetime.fromtimestamp(reset_ts).strftime("%Y-%m-%d %H:%M:%S")
    return remaining, reset_time

Functions to create initial dataframes.

In [18]:
def get_initial_followers_df():
    return pd.DataFrame(
        columns=[
            "user_id",
            "ids",
            "next_cursor",
        ]
    )


def get_initial_friends_df():
    return pd.DataFrame(
        columns=[
            "user_id",
            "ids",
            "next_cursor",
        ]
    )

Functions to get dataframes from request body.

In [19]:
def get_followers_df(r, user_id):
    df = pd.DataFrame(
        r.json(),
        columns=[
            "user_id",
            "ids",
            "next_cursor",
        ],
    ).fillna("")
    # Fill in user_id
    df.user_id = user_id
    return df


def get_friends_df(r, user_id):
    df = pd.DataFrame(
        r.json(),
        columns=[
            "user_id",
            "ids",
            "next_cursor",
        ],
    ).fillna("")
    # Fill in user_id
    df.user_id = user_id
    return df

Functions to create queries and return dataframes with the data.

In [20]:
def get_followers(query_n, user_id, cursor, count):
    req = API_FOLLOWERS_IDS_ENDPOINT
    params = {
        "user_id": str(user_id),
        "cursor": str(cursor),
        "count": str(count),
    }

    r = requests.get(req, headers=headers, params=params)
    df = get_followers_df(r, user_id)

    n_followers_pulled = df.shape[0]
    if n_followers_pulled < count:
        print(r.url)
        print(
            "Query {:,} -- ".format(query_n + 1).ljust(15)
            + "User {}: got {:,} followers instead of {:,}.".format(
                user_id, n_followers_pulled, count
            )
        )
        next_cursor = -1
    if df.shape[0] > 0:
        next_cursor = df.next_cursor[0]

    return df, next_cursor


def get_friends(query_n, user_id, cursor, count):
    req = API_FRIENDS_IDS_ENDPOINT
    params = {
        "user_id": str(user_id),
        "cursor": str(cursor),
        "count": str(count),
    }

    r = requests.get(req, headers=headers, params=params)
    df = get_friends_df(r, user_id)

    n_friends_pulled = df.shape[0]
    if n_friends_pulled < count:
        print(r.url)
        print(
            "Query {:,} -- ".format(query_n + 1).ljust(15)
            + "User {}: got {:,} friends instead of {:,}.".format(
                user_id, n_friends_pulled, count
            )
        )
        next_cursor = -1
    if df.shape[0] > 0:
        next_cursor = df.next_cursor[0]

    return df, next_cursor

In [21]:
# Load or create user followers or friends dataframe
def load_user_ff_df(
    file,
    create_data_status,
    init_df_func,
):
    if os.path.isfile(file) and not create_data_status:
        user_ff_data = pd.read_csv(
            file,
            dtype=int,
            lineterminator="\n",
        )
    else:
        user_ff_data = init_df_func()

    return user_ff_data

Functions to run the pull sequence.

In [22]:
def pull_new_follower_data(user_data_pull_status, user_followers_data):

    # Get the number of available queries and rate limit reset time
    query_quota, reset_time = get_followers_ids_rate_limit()

    # Get users with tweets left to pull
    user_followers_to_pull = user_data_pull_status[
        (user_data_pull_status.followers_count > 0)
        & (
            user_data_pull_status.followers_pulled
            < user_data_pull_status.followers_count
        )
    ]

    n_queries = min(query_quota, user_followers_to_pull.shape[0])
    print("Executing {:,} queries for followers.".format(n_queries))
    for query_n in trange(n_queries):

        # Get query parameters
        user_id = user_followers_to_pull.index[query_n]
        cursor = user_followers_to_pull.loc[user_id, "followers_cursor"]
        # A 5000-UserIDs limit is set by Twitter per request
        count = min(
            user_followers_to_pull.loc[user_id, "followers_count"]
            - user_followers_to_pull.loc[user_id, "followers_pulled"],
            5000,
        )

        # Get user followers data and statistics
        raw_user_followers_data, next_cursor = get_followers(
            query_n, user_id, cursor, count
        )

        # Append to existing user followers data
        user_followers_data = user_followers_data.append(raw_user_followers_data)

        # Update pull status
        user_data_pull_status.loc[user_id, "followers_cursor"] = next_cursor
        if user_followers_to_pull.loc[user_id, "followers_pulled"] == 0:
            user_data_pull_status.loc[user_id, "followers_pulled"] = count
        else:
            user_data_pull_status.loc[user_id, "followers_pulled"] += count

    print("Next reset time: {} (UTC)".format(reset_time))

    return user_data_pull_status, user_followers_data, reset_time


def pull_new_friend_data(user_data_pull_status, user_friends_data):

    # Get the number of available queries and rate limit reset time
    query_quota, reset_time = get_friends_ids_rate_limit()

    # Get users with tweets left to pull
    user_friends_to_pull = user_data_pull_status[
        (user_data_pull_status.following_count > 0)
        & (
            user_data_pull_status.following_pulled
            < user_data_pull_status.following_count
        )
    ]

    n_queries = min(query_quota, user_friends_to_pull.shape[0])
    print("Executing {:,} queries for friends.".format(n_queries))
    for query_n in trange(n_queries):

        # Get query parameters
        user_id = user_friends_to_pull.index[query_n]
        cursor = user_friends_to_pull.loc[user_id, "following_cursor"]
        # A 5000-UserIDs limit is set by Twitter per request
        count = min(
            user_friends_to_pull.loc[user_id, "following_count"]
            - user_friends_to_pull.loc[user_id, "following_pulled"],
            5000,
        )

        # Get user friends data and statistics
        raw_user_friends_data, next_cursor = get_friends(
            query_n, user_id, cursor, count
        )

        # Append to existing user friends data
        user_friends_data = user_friends_data.append(raw_user_friends_data)

        # Update pull status
        user_data_pull_status.loc[user_id, "following_cursor"] = next_cursor
        if user_friends_to_pull.loc[user_id, "following_pulled"] == 0:
            user_data_pull_status.loc[user_id, "following_pulled"] = count
        else:
            user_data_pull_status.loc[user_id, "following_pulled"] += count

    print("Next reset time: {} (UTC)".format(reset_time))

    return user_data_pull_status, user_friends_data, reset_time

Now that we have defined our helper functions, we can query the data, print out some statistics and save it.

In [23]:
if PULL_NEW_FF_DATA:
    # Load existing data
    user_followers_data = load_user_ff_df(
        file=PUBLIC_USERS_FOLLOWERS_FILE,
        create_data_status=CREATE_DATA_PULL_STATUS,
        init_df_func=get_initial_followers_df,
    )
    user_friends_data = load_user_ff_df(
        file=PUBLIC_USERS_FRIENDS_FILE,
        create_data_status=CREATE_DATA_PULL_STATUS,
        init_df_func=get_initial_friends_df,
    )

    # Pull data
    user_data_pull_status, user_followers_data, reset_time = pull_new_follower_data(
        user_data_pull_status, user_followers_data
    )
    # Redefining the reset time creates a negligible lag but ensures that all 30 requests are made for each run
    user_data_pull_status, user_friends_data, reset_time = pull_new_friend_data(
        user_data_pull_status, user_friends_data
    )

    # Repeat pull sequence
    if N_RUNS_PULL_FF_DATA > 1:
        for run in range(N_RUNS_PULL_FF_DATA - 1):
            # Save data
            if SAVE_PULLED_DATA:
                # Save data to disk
                user_followers_data.to_csv(PUBLIC_USERS_FOLLOWERS_FILE, index=False)
                user_friends_data.to_csv(PUBLIC_USERS_FRIENDS_FILE, index=False)
                user_data_pull_status.to_csv(PUBLIC_USERS_PULL_STATUS_FILE)
                print("Data saved!")

            # Print statistics
            print(
                "Number of collected followers: {:,} (out of {:,}).\nNumber of unique users: {:,} (out of {:,}).".format(
                    user_followers_data.shape[0],
                    np.sum(user_data_pull_status.followers_count.values),
                    len(np.unique(user_followers_data.user_id.values)),
                    public_users_w_followers.shape[0],
                )
            )
            print(
                "Number of collected friends: {:,} (out of {:,}).\nNumber of unique users: {:,} (out of {:,}).".format(
                    user_friends_data.shape[0],
                    np.sum(user_data_pull_status.following_count.values),
                    len(np.unique(user_friends_data.user_id.values)),
                    public_users_w_friends.shape[0],
                )
            )

            # Sleep until rate limit reset
            sleep_time = (
                datetime.strptime(reset_time, "%Y-%m-%d %H:%M:%S") - datetime.utcnow()
            ).total_seconds()
            print(
                "Waiting for {:,.0f} seconds to continue (until {} (UTC)).\n".format(
                    sleep_time, reset_time
                )
            )
            if sleep_time > 0:
                time.sleep(sleep_time)

            # Pull data
            (
                user_data_pull_status,
                user_followers_data,
                reset_time,
            ) = pull_new_follower_data(user_data_pull_status, user_followers_data)
            user_data_pull_status, user_friends_data, reset_time = pull_new_friend_data(
                user_data_pull_status, user_friends_data
            )

    print("\nPull is done!\n")

    if SAVE_PULLED_DATA:
        # Save data to disk
        user_followers_data.to_csv(PUBLIC_USERS_FOLLOWERS_FILE, index=False)
        user_friends_data.to_csv(PUBLIC_USERS_FRIENDS_FILE, index=False)
        user_data_pull_status.to_csv(PUBLIC_USERS_PULL_STATUS_FILE)
        print("Data saved!")

Executing 15 queries for followers.


HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=15.0), HTML(value='')))

https://api.twitter.com/1.1/followers/ids.json?user_id=2937707710&cursor=-1&count=697
Query 1 --     User 2937707710: got 693 followers instead of 697.

Next reset time: 2020-12-12 15:54:21 (UTC)
Executing 15 queries for friends.


HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=15.0), HTML(value='')))

https://api.twitter.com/1.1/friends/ids.json?user_id=158416486&cursor=-1&count=251
Query 9 --     User 158416486: got 250 friends instead of 251.

Next reset time: 2020-12-12 15:54:25 (UTC)
Data saved!
Number of collected followers: 260,369 (out of 1,662,736).
Number of unique users: 3,291 (out of 18,689).
Number of collected friends: 351,558 (out of 2,208,697).
Number of unique users: 3,291 (out of 22,099).
Waiting for 894 seconds to continue (until 2020-12-12 15:54:25 (UTC)).

Executing 15 queries for followers.


HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=15.0), HTML(value='')))

https://api.twitter.com/1.1/followers/ids.json?user_id=109168983&cursor=-1&count=653
Query 1 --     User 109168983: got 652 followers instead of 653.

Next reset time: 2020-12-12 16:09:25 (UTC)
Executing 15 queries for friends.


HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=15.0), HTML(value='')))


Next reset time: 2020-12-12 16:09:29 (UTC)
Data saved!
Number of collected followers: 261,491 (out of 1,662,736).
Number of unique users: 3,306 (out of 18,689).
Number of collected friends: 352,646 (out of 2,208,697).
Number of unique users: 3,306 (out of 22,099).
Waiting for 894 seconds to continue (until 2020-12-12 16:09:29 (UTC)).

Executing 15 queries for followers.


HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=15.0), HTML(value='')))

https://api.twitter.com/1.1/followers/ids.json?user_id=2938715356&cursor=-1&count=41
Query 5 --     User 2938715356: got 40 followers instead of 41.
https://api.twitter.com/1.1/followers/ids.json?user_id=2777340436&cursor=-1&count=41
Query 10 --    User 2777340436: got 40 followers instead of 41.

Next reset time: 2020-12-12 16:24:29 (UTC)
Executing 15 queries for friends.


HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=15.0), HTML(value='')))


Next reset time: 2020-12-12 16:24:33 (UTC)
Data saved!
Number of collected followers: 262,138 (out of 1,662,736).
Number of unique users: 3,321 (out of 18,689).
Number of collected friends: 354,708 (out of 2,208,697).
Number of unique users: 3,321 (out of 22,099).
Waiting for 895 seconds to continue (until 2020-12-12 16:24:33 (UTC)).

Executing 15 queries for followers.


HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=15.0), HTML(value='')))


Next reset time: 2020-12-12 16:39:33 (UTC)
Executing 15 queries for friends.


HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=15.0), HTML(value='')))


Next reset time: 2020-12-12 16:39:37 (UTC)
Data saved!
Number of collected followers: 262,808 (out of 1,662,736).
Number of unique users: 3,336 (out of 18,689).
Number of collected friends: 356,235 (out of 2,208,697).
Number of unique users: 3,336 (out of 22,099).
Waiting for 895 seconds to continue (until 2020-12-12 16:39:37 (UTC)).

Executing 15 queries for followers.


HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=15.0), HTML(value='')))

https://api.twitter.com/1.1/followers/ids.json?user_id=2799953977&cursor=-1&count=227
Query 11 --    User 2799953977: got 226 followers instead of 227.

Next reset time: 2020-12-12 16:54:38 (UTC)
Executing 15 queries for friends.


HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=15.0), HTML(value='')))


Next reset time: 2020-12-12 16:54:42 (UTC)
Data saved!
Number of collected followers: 263,634 (out of 1,662,736).
Number of unique users: 3,351 (out of 18,689).
Number of collected friends: 356,629 (out of 2,208,697).
Number of unique users: 3,351 (out of 22,099).
Waiting for 895 seconds to continue (until 2020-12-12 16:54:42 (UTC)).

Executing 15 queries for followers.


HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=15.0), HTML(value='')))


Next reset time: 2020-12-12 17:09:42 (UTC)
Executing 15 queries for friends.


HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=15.0), HTML(value='')))

https://api.twitter.com/1.1/friends/ids.json?user_id=1854687852&cursor=-1&count=352
Query 4 --     User 1854687852: got 351 friends instead of 352.

Next reset time: 2020-12-12 17:09:46 (UTC)
Data saved!
Number of collected followers: 263,739 (out of 1,662,736).
Number of unique users: 3,366 (out of 18,689).
Number of collected friends: 358,292 (out of 2,208,697).
Number of unique users: 3,366 (out of 22,099).
Waiting for 894 seconds to continue (until 2020-12-12 17:09:46 (UTC)).

Executing 15 queries for followers.


HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=15.0), HTML(value='')))

https://api.twitter.com/1.1/followers/ids.json?user_id=995825869&cursor=-1&count=16
Query 9 --     User 995825869: got 15 followers instead of 16.

Next reset time: 2020-12-12 17:24:46 (UTC)
Executing 15 queries for friends.


HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=15.0), HTML(value='')))


Next reset time: 2020-12-12 17:24:50 (UTC)
Data saved!
Number of collected followers: 264,118 (out of 1,662,736).
Number of unique users: 3,381 (out of 18,689).
Number of collected friends: 358,865 (out of 2,208,697).
Number of unique users: 3,381 (out of 22,099).
Waiting for 895 seconds to continue (until 2020-12-12 17:24:50 (UTC)).

Executing 15 queries for followers.


HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=15.0), HTML(value='')))


Next reset time: 2020-12-12 17:39:50 (UTC)
Executing 15 queries for friends.


HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=15.0), HTML(value='')))

https://api.twitter.com/1.1/friends/ids.json?user_id=1338033456&cursor=-1&count=245
Query 13 --    User 1338033456: got 243 friends instead of 245.

Next reset time: 2020-12-12 17:39:54 (UTC)
Data saved!
Number of collected followers: 265,206 (out of 1,662,736).
Number of unique users: 3,396 (out of 18,689).
Number of collected friends: 359,860 (out of 2,208,697).
Number of unique users: 3,396 (out of 22,099).
Waiting for 894 seconds to continue (until 2020-12-12 17:39:54 (UTC)).

Executing 15 queries for followers.


HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=15.0), HTML(value='')))


Next reset time: 2020-12-12 17:54:55 (UTC)
Executing 15 queries for friends.


HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=15.0), HTML(value='')))

https://api.twitter.com/1.1/friends/ids.json?user_id=2723420134&cursor=-1&count=1385
Query 6 --     User 2723420134: got 1,384 friends instead of 1,385.
https://api.twitter.com/1.1/friends/ids.json?user_id=336763131&cursor=-1&count=1623
Query 10 --    User 336763131: got 1,621 friends instead of 1,623.

Next reset time: 2020-12-12 17:54:59 (UTC)
Data saved!
Number of collected followers: 265,596 (out of 1,662,736).
Number of unique users: 3,411 (out of 18,689).
Number of collected friends: 363,581 (out of 2,208,697).
Number of unique users: 3,411 (out of 22,099).
Waiting for 894 seconds to continue (until 2020-12-12 17:54:59 (UTC)).

Executing 15 queries for followers.


HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=15.0), HTML(value='')))


Next reset time: 2020-12-12 18:09:59 (UTC)
Executing 15 queries for friends.


HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=15.0), HTML(value='')))

https://api.twitter.com/1.1/friends/ids.json?user_id=550406434&cursor=-1&count=1198
Query 2 --     User 550406434: got 1,195 friends instead of 1,198.
https://api.twitter.com/1.1/friends/ids.json?user_id=2355027054&cursor=-1&count=479
Query 9 --     User 2355027054: got 478 friends instead of 479.

Next reset time: 2020-12-12 18:10:03 (UTC)
Data saved!
Number of collected followers: 266,058 (out of 1,662,736).
Number of unique users: 3,426 (out of 18,689).
Number of collected friends: 365,891 (out of 2,208,697).
Number of unique users: 3,426 (out of 22,099).
Waiting for 895 seconds to continue (until 2020-12-12 18:10:03 (UTC)).

Executing 15 queries for followers.


HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=15.0), HTML(value='')))


Next reset time: 2020-12-12 18:25:03 (UTC)
Executing 15 queries for friends.


HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=15.0), HTML(value='')))


Next reset time: 2020-12-12 18:25:07 (UTC)
Data saved!
Number of collected followers: 266,398 (out of 1,662,736).
Number of unique users: 3,441 (out of 18,689).
Number of collected friends: 366,486 (out of 2,208,697).
Number of unique users: 3,441 (out of 22,099).
Waiting for 894 seconds to continue (until 2020-12-12 18:25:07 (UTC)).

Executing 15 queries for followers.


HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=15.0), HTML(value='')))


Next reset time: 2020-12-12 18:40:07 (UTC)
Executing 15 queries for friends.


HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=15.0), HTML(value='')))


Next reset time: 2020-12-12 18:40:11 (UTC)
Data saved!
Number of collected followers: 266,748 (out of 1,662,736).
Number of unique users: 3,456 (out of 18,689).
Number of collected friends: 367,209 (out of 2,208,697).
Number of unique users: 3,456 (out of 22,099).
Waiting for 894 seconds to continue (until 2020-12-12 18:40:11 (UTC)).

Executing 15 queries for followers.


HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=15.0), HTML(value='')))

https://api.twitter.com/1.1/followers/ids.json?user_id=153932195&cursor=-1&count=305
Query 2 --     User 153932195: got 304 followers instead of 305.

Next reset time: 2020-12-12 18:55:11 (UTC)
Executing 15 queries for friends.


HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=15.0), HTML(value='')))

https://api.twitter.com/1.1/friends/ids.json?user_id=2715558879&cursor=-1&count=368
Query 5 --     User 2715558879: got 365 friends instead of 368.
https://api.twitter.com/1.1/friends/ids.json?user_id=330057778&cursor=-1&count=1341
Query 12 --    User 330057778: got 1,340 friends instead of 1,341.

Next reset time: 2020-12-12 18:55:15 (UTC)
Data saved!
Number of collected followers: 267,135 (out of 1,662,736).
Number of unique users: 3,471 (out of 18,689).
Number of collected friends: 369,398 (out of 2,208,697).
Number of unique users: 3,471 (out of 22,099).
Waiting for 894 seconds to continue (until 2020-12-12 18:55:15 (UTC)).

Executing 15 queries for followers.


HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=15.0), HTML(value='')))


Next reset time: 2020-12-12 19:10:15 (UTC)
Executing 15 queries for friends.


HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=15.0), HTML(value='')))

https://api.twitter.com/1.1/friends/ids.json?user_id=485702821&cursor=-1&count=111
Query 7 --     User 485702821: got 110 friends instead of 111.
https://api.twitter.com/1.1/friends/ids.json?user_id=113375289&cursor=-1&count=1
Query 14 --    User 113375289: got 0 friends instead of 1.

Next reset time: 2020-12-12 19:10:19 (UTC)
Data saved!
Number of collected followers: 267,611 (out of 1,662,736).
Number of unique users: 3,486 (out of 18,689).
Number of collected friends: 369,822 (out of 2,208,697).
Number of unique users: 3,485 (out of 22,099).
Waiting for 894 seconds to continue (until 2020-12-12 19:10:19 (UTC)).

Executing 15 queries for followers.


HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=15.0), HTML(value='')))


Next reset time: 2020-12-12 19:25:19 (UTC)
Executing 15 queries for friends.


HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=15.0), HTML(value='')))


Next reset time: 2020-12-12 19:25:23 (UTC)
Data saved!
Number of collected followers: 268,239 (out of 1,662,736).
Number of unique users: 3,501 (out of 18,689).
Number of collected friends: 370,515 (out of 2,208,697).
Number of unique users: 3,500 (out of 22,099).
Waiting for 894 seconds to continue (until 2020-12-12 19:25:23 (UTC)).

Executing 15 queries for followers.


HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=15.0), HTML(value='')))

https://api.twitter.com/1.1/followers/ids.json?user_id=2252344758&cursor=-1&count=361
Query 10 --    User 2252344758: got 360 followers instead of 361.

Next reset time: 2020-12-12 19:40:23 (UTC)
Executing 15 queries for friends.


HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=15.0), HTML(value='')))

https://api.twitter.com/1.1/friends/ids.json?user_id=2315664816&cursor=-1&count=713
Query 15 --    User 2315664816: got 712 friends instead of 713.

Next reset time: 2020-12-12 19:40:27 (UTC)
Data saved!
Number of collected followers: 269,424 (out of 1,662,736).
Number of unique users: 3,516 (out of 18,689).
Number of collected friends: 371,696 (out of 2,208,697).
Number of unique users: 3,515 (out of 22,099).
Waiting for 894 seconds to continue (until 2020-12-12 19:40:27 (UTC)).

Executing 15 queries for followers.


HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=15.0), HTML(value='')))


Next reset time: 2020-12-12 19:55:27 (UTC)
Executing 15 queries for friends.


HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=15.0), HTML(value='')))


Next reset time: 2020-12-12 19:55:31 (UTC)
Data saved!
Number of collected followers: 270,165 (out of 1,662,736).
Number of unique users: 3,531 (out of 18,689).
Number of collected friends: 372,693 (out of 2,208,697).
Number of unique users: 3,530 (out of 22,099).
Waiting for 894 seconds to continue (until 2020-12-12 19:55:31 (UTC)).

Executing 15 queries for followers.


HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=15.0), HTML(value='')))

https://api.twitter.com/1.1/followers/ids.json?user_id=2876767006&cursor=-1&count=184
Query 5 --     User 2876767006: got 183 followers instead of 184.
https://api.twitter.com/1.1/followers/ids.json?user_id=377241899&cursor=-1&count=35
Query 9 --     User 377241899: got 34 followers instead of 35.
https://api.twitter.com/1.1/followers/ids.json?user_id=2806332151&cursor=-1&count=972
Query 12 --    User 2806332151: got 969 followers instead of 972.

Next reset time: 2020-12-12 20:10:31 (UTC)
Executing 15 queries for friends.


HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=15.0), HTML(value='')))


Next reset time: 2020-12-12 20:10:35 (UTC)
Data saved!
Number of collected followers: 271,434 (out of 1,662,736).
Number of unique users: 3,546 (out of 18,689).
Number of collected friends: 373,207 (out of 2,208,697).
Number of unique users: 3,545 (out of 22,099).
Waiting for 894 seconds to continue (until 2020-12-12 20:10:35 (UTC)).

Executing 15 queries for followers.


HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=15.0), HTML(value='')))

https://api.twitter.com/1.1/followers/ids.json?user_id=49476144&cursor=-1&count=59
Query 1 --     User 49476144: got 58 followers instead of 59.
https://api.twitter.com/1.1/followers/ids.json?user_id=1459767343&cursor=-1&count=284
Query 11 --    User 1459767343: got 283 followers instead of 284.

Next reset time: 2020-12-12 20:25:35 (UTC)
Executing 15 queries for friends.


HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=15.0), HTML(value='')))

https://api.twitter.com/1.1/friends/ids.json?user_id=622736525&cursor=-1&count=87
Query 2 --     User 622736525: got 86 friends instead of 87.
https://api.twitter.com/1.1/friends/ids.json?user_id=741823496&cursor=-1&count=216
Query 5 --     User 741823496: got 214 friends instead of 216.
https://api.twitter.com/1.1/friends/ids.json?user_id=710553329&cursor=-1&count=57
Query 9 --     User 710553329: got 56 friends instead of 57.
https://api.twitter.com/1.1/friends/ids.json?user_id=831523723&cursor=-1&count=536
Query 12 --    User 831523723: got 534 friends instead of 536.

Next reset time: 2020-12-12 20:25:39 (UTC)
Data saved!
Number of collected followers: 271,955 (out of 1,662,736).
Number of unique users: 3,561 (out of 18,689).
Number of collected friends: 375,014 (out of 2,208,697).
Number of unique users: 3,560 (out of 22,099).
Waiting for 894 seconds to continue (until 2020-12-12 20:25:39 (UTC)).

Executing 15 queries for followers.


HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=15.0), HTML(value='')))

https://api.twitter.com/1.1/followers/ids.json?user_id=68621351&cursor=-1&count=391
Query 8 --     User 68621351: got 390 followers instead of 391.

Next reset time: 2020-12-12 20:40:39 (UTC)
Executing 15 queries for friends.


HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=15.0), HTML(value='')))

https://api.twitter.com/1.1/friends/ids.json?user_id=867575724&cursor=-1&count=35
Query 10 --    User 867575724: got 34 friends instead of 35.

Next reset time: 2020-12-12 20:40:43 (UTC)
Data saved!
Number of collected followers: 272,869 (out of 1,662,736).
Number of unique users: 3,576 (out of 18,689).
Number of collected friends: 376,320 (out of 2,208,697).
Number of unique users: 3,575 (out of 22,099).
Waiting for 894 seconds to continue (until 2020-12-12 20:40:43 (UTC)).

Executing 15 queries for followers.


HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=15.0), HTML(value='')))


Next reset time: 2020-12-12 20:55:43 (UTC)
Executing 15 queries for friends.


HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=15.0), HTML(value='')))

https://api.twitter.com/1.1/friends/ids.json?user_id=2243843301&cursor=-1&count=101
Query 6 --     User 2243843301: got 100 friends instead of 101.
https://api.twitter.com/1.1/friends/ids.json?user_id=438568537&cursor=-1&count=2480
Query 7 --     User 438568537: got 2,479 friends instead of 2,480.
https://api.twitter.com/1.1/friends/ids.json?user_id=2839516955&cursor=-1&count=776
Query 15 --    User 2839516955: got 775 friends instead of 776.

Next reset time: 2020-12-12 20:55:47 (UTC)
Data saved!
Number of collected followers: 273,479 (out of 1,662,736).
Number of unique users: 3,591 (out of 18,689).
Number of collected friends: 381,507 (out of 2,208,697).
Number of unique users: 3,590 (out of 22,099).
Waiting for 894 seconds to continue (until 2020-12-12 20:55:47 (UTC)).

Executing 15 queries for followers.


HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=15.0), HTML(value='')))

https://api.twitter.com/1.1/followers/ids.json?user_id=390193652&cursor=-1&count=422
Query 3 --     User 390193652: got 421 followers instead of 422.

Next reset time: 2020-12-12 21:10:47 (UTC)
Executing 15 queries for friends.


HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=15.0), HTML(value='')))


Next reset time: 2020-12-12 21:10:51 (UTC)
Data saved!
Number of collected followers: 274,130 (out of 1,662,736).
Number of unique users: 3,606 (out of 18,689).
Number of collected friends: 382,212 (out of 2,208,697).
Number of unique users: 3,605 (out of 22,099).
Waiting for 894 seconds to continue (until 2020-12-12 21:10:51 (UTC)).

Executing 15 queries for followers.


HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=15.0), HTML(value='')))


Next reset time: 2020-12-12 21:25:51 (UTC)
Executing 15 queries for friends.


HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=15.0), HTML(value='')))


Next reset time: 2020-12-12 21:25:55 (UTC)
Data saved!
Number of collected followers: 275,375 (out of 1,662,736).
Number of unique users: 3,621 (out of 18,689).
Number of collected friends: 383,208 (out of 2,208,697).
Number of unique users: 3,620 (out of 22,099).
Waiting for 894 seconds to continue (until 2020-12-12 21:25:55 (UTC)).

Executing 15 queries for followers.


HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=15.0), HTML(value='')))


Next reset time: 2020-12-12 21:40:55 (UTC)
Executing 15 queries for friends.


HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=15.0), HTML(value='')))

https://api.twitter.com/1.1/friends/ids.json?user_id=1148749495&cursor=-1&count=40
Query 8 --     User 1148749495: got 39 friends instead of 40.

Next reset time: 2020-12-12 21:40:59 (UTC)
Data saved!
Number of collected followers: 275,645 (out of 1,662,736).
Number of unique users: 3,636 (out of 18,689).
Number of collected friends: 383,715 (out of 2,208,697).
Number of unique users: 3,635 (out of 22,099).
Waiting for 894 seconds to continue (until 2020-12-12 21:40:59 (UTC)).

Executing 15 queries for followers.


HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=15.0), HTML(value='')))


Next reset time: 2020-12-12 21:55:59 (UTC)
Executing 15 queries for friends.


HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=15.0), HTML(value='')))


Next reset time: 2020-12-12 21:56:03 (UTC)
Data saved!
Number of collected followers: 275,901 (out of 1,662,736).
Number of unique users: 3,651 (out of 18,689).
Number of collected friends: 383,927 (out of 2,208,697).
Number of unique users: 3,650 (out of 22,099).
Waiting for 894 seconds to continue (until 2020-12-12 21:56:03 (UTC)).

Executing 15 queries for followers.


HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=15.0), HTML(value='')))

https://api.twitter.com/1.1/followers/ids.json?user_id=2426714385&cursor=-1&count=139
Query 1 --     User 2426714385: got 138 followers instead of 139.

Next reset time: 2020-12-12 22:11:03 (UTC)
Executing 15 queries for friends.


HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=15.0), HTML(value='')))


Next reset time: 2020-12-12 22:11:07 (UTC)
Data saved!
Number of collected followers: 276,619 (out of 1,662,736).
Number of unique users: 3,666 (out of 18,689).
Number of collected friends: 387,269 (out of 2,208,697).
Number of unique users: 3,665 (out of 22,099).
Waiting for 893 seconds to continue (until 2020-12-12 22:11:07 (UTC)).

Executing 15 queries for followers.


HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=15.0), HTML(value='')))


Next reset time: 2020-12-12 22:26:07 (UTC)
Executing 15 queries for friends.


HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=15.0), HTML(value='')))


Next reset time: 2020-12-12 22:26:11 (UTC)
Data saved!
Number of collected followers: 276,985 (out of 1,662,736).
Number of unique users: 3,681 (out of 18,689).
Number of collected friends: 391,063 (out of 2,208,697).
Number of unique users: 3,680 (out of 22,099).
Waiting for 894 seconds to continue (until 2020-12-12 22:26:11 (UTC)).

Executing 15 queries for followers.


HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=15.0), HTML(value='')))


Next reset time: 2020-12-12 22:41:11 (UTC)
Executing 15 queries for friends.


HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=15.0), HTML(value='')))

https://api.twitter.com/1.1/friends/ids.json?user_id=189715999&cursor=-1&count=373
Query 5 --     User 189715999: got 372 friends instead of 373.

Next reset time: 2020-12-12 22:41:15 (UTC)
Data saved!
Number of collected followers: 277,129 (out of 1,662,736).
Number of unique users: 3,696 (out of 18,689).
Number of collected friends: 391,685 (out of 2,208,697).
Number of unique users: 3,695 (out of 22,099).
Waiting for 894 seconds to continue (until 2020-12-12 22:41:15 (UTC)).

Executing 15 queries for followers.


HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=15.0), HTML(value='')))

https://api.twitter.com/1.1/followers/ids.json?user_id=41183224&cursor=-1&count=119
Query 1 --     User 41183224: got 118 followers instead of 119.
https://api.twitter.com/1.1/followers/ids.json?user_id=2778642032&cursor=-1&count=438
Query 3 --     User 2778642032: got 437 followers instead of 438.

Next reset time: 2020-12-12 22:56:15 (UTC)
Executing 15 queries for friends.


HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=15.0), HTML(value='')))


Next reset time: 2020-12-12 22:56:19 (UTC)
Data saved!
Number of collected followers: 278,350 (out of 1,662,736).
Number of unique users: 3,711 (out of 18,689).
Number of collected friends: 392,176 (out of 2,208,697).
Number of unique users: 3,710 (out of 22,099).
Waiting for 894 seconds to continue (until 2020-12-12 22:56:19 (UTC)).

Executing 15 queries for followers.


HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=15.0), HTML(value='')))

https://api.twitter.com/1.1/followers/ids.json?user_id=2897565236&cursor=-1&count=83
Query 1 --     User 2897565236: got 82 followers instead of 83.

Next reset time: 2020-12-12 23:11:19 (UTC)
Executing 15 queries for friends.


HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=15.0), HTML(value='')))

https://api.twitter.com/1.1/friends/ids.json?user_id=338753999&cursor=-1&count=959
Query 11 --    User 338753999: got 958 friends instead of 959.
https://api.twitter.com/1.1/friends/ids.json?user_id=53133119&cursor=-1&count=406
Query 14 --    User 53133119: got 402 friends instead of 406.

Next reset time: 2020-12-12 23:11:24 (UTC)
Data saved!
Number of collected followers: 278,908 (out of 1,662,736).
Number of unique users: 3,726 (out of 18,689).
Number of collected friends: 395,339 (out of 2,208,697).
Number of unique users: 3,725 (out of 22,099).
Waiting for 894 seconds to continue (until 2020-12-12 23:11:24 (UTC)).

Executing 15 queries for followers.


HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=15.0), HTML(value='')))


Next reset time: 2020-12-12 23:26:24 (UTC)
Executing 15 queries for friends.


HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=15.0), HTML(value='')))


Next reset time: 2020-12-12 23:26:28 (UTC)
Data saved!
Number of collected followers: 279,241 (out of 1,662,736).
Number of unique users: 3,741 (out of 18,689).
Number of collected friends: 397,021 (out of 2,208,697).
Number of unique users: 3,740 (out of 22,099).
Waiting for 893 seconds to continue (until 2020-12-12 23:26:28 (UTC)).

Executing 15 queries for followers.


HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=15.0), HTML(value='')))

https://api.twitter.com/1.1/followers/ids.json?user_id=1113988416&cursor=-1&count=10
Query 4 --     User 1113988416: got 9 followers instead of 10.

Next reset time: 2020-12-12 23:41:28 (UTC)
Executing 15 queries for friends.


HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=15.0), HTML(value='')))


Next reset time: 2020-12-12 23:41:32 (UTC)
Data saved!
Number of collected followers: 280,188 (out of 1,662,736).
Number of unique users: 3,756 (out of 18,689).
Number of collected friends: 397,539 (out of 2,208,697).
Number of unique users: 3,755 (out of 22,099).
Waiting for 894 seconds to continue (until 2020-12-12 23:41:32 (UTC)).

Executing 15 queries for followers.


HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=15.0), HTML(value='')))


Next reset time: 2020-12-12 23:56:32 (UTC)
Executing 15 queries for friends.


HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=15.0), HTML(value='')))


Next reset time: 2020-12-12 23:56:36 (UTC)
Data saved!
Number of collected followers: 281,960 (out of 1,662,736).
Number of unique users: 3,771 (out of 18,689).
Number of collected friends: 398,938 (out of 2,208,697).
Number of unique users: 3,770 (out of 22,099).
Waiting for 894 seconds to continue (until 2020-12-12 23:56:36 (UTC)).

Executing 15 queries for followers.


HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=15.0), HTML(value='')))

https://api.twitter.com/1.1/followers/ids.json?user_id=830786712&cursor=-1&count=172
Query 1 --     User 830786712: got 171 followers instead of 172.

Next reset time: 2020-12-13 00:11:36 (UTC)
Executing 15 queries for friends.


HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=15.0), HTML(value='')))


Next reset time: 2020-12-13 00:11:40 (UTC)
Data saved!
Number of collected followers: 282,556 (out of 1,662,736).
Number of unique users: 3,786 (out of 18,689).
Number of collected friends: 399,082 (out of 2,208,697).
Number of unique users: 3,785 (out of 22,099).
Waiting for 894 seconds to continue (until 2020-12-13 00:11:40 (UTC)).

Executing 15 queries for followers.


HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=15.0), HTML(value='')))

https://api.twitter.com/1.1/followers/ids.json?user_id=464596938&cursor=-1&count=16
Query 2 --     User 464596938: got 15 followers instead of 16.
https://api.twitter.com/1.1/followers/ids.json?user_id=312362331&cursor=-1&count=45
Query 11 --    User 312362331: got 44 followers instead of 45.

Next reset time: 2020-12-13 00:26:40 (UTC)
Executing 15 queries for friends.


HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=15.0), HTML(value='')))


Next reset time: 2020-12-13 00:26:44 (UTC)
Data saved!
Number of collected followers: 282,836 (out of 1,662,736).
Number of unique users: 3,801 (out of 18,689).
Number of collected friends: 400,152 (out of 2,208,697).
Number of unique users: 3,800 (out of 22,099).
Waiting for 894 seconds to continue (until 2020-12-13 00:26:44 (UTC)).

Executing 15 queries for followers.


HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=15.0), HTML(value='')))


Next reset time: 2020-12-13 00:41:44 (UTC)
Executing 15 queries for friends.


HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=15.0), HTML(value='')))


Next reset time: 2020-12-13 00:41:48 (UTC)
Data saved!
Number of collected followers: 283,493 (out of 1,662,736).
Number of unique users: 3,816 (out of 18,689).
Number of collected friends: 402,176 (out of 2,208,697).
Number of unique users: 3,815 (out of 22,099).
Waiting for 894 seconds to continue (until 2020-12-13 00:41:48 (UTC)).

Executing 15 queries for followers.


HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=15.0), HTML(value='')))

https://api.twitter.com/1.1/followers/ids.json?user_id=2304642019&cursor=-1&count=671
Query 8 --     User 2304642019: got 669 followers instead of 671.
https://api.twitter.com/1.1/followers/ids.json?user_id=1623933194&cursor=-1&count=213
Query 10 --    User 1623933194: got 212 followers instead of 213.

Next reset time: 2020-12-13 00:56:48 (UTC)
Executing 15 queries for friends.


HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=15.0), HTML(value='')))


Next reset time: 2020-12-13 00:56:52 (UTC)
Data saved!
Number of collected followers: 284,842 (out of 1,662,736).
Number of unique users: 3,831 (out of 18,689).
Number of collected friends: 403,446 (out of 2,208,697).
Number of unique users: 3,830 (out of 22,099).
Waiting for 894 seconds to continue (until 2020-12-13 00:56:52 (UTC)).

Executing 15 queries for followers.


HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=15.0), HTML(value='')))


Next reset time: 2020-12-13 01:11:52 (UTC)
Executing 15 queries for friends.


HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=15.0), HTML(value='')))

https://api.twitter.com/1.1/friends/ids.json?user_id=2343847324&cursor=-1&count=981
Query 2 --     User 2343847324: got 980 friends instead of 981.

Next reset time: 2020-12-13 01:11:56 (UTC)
Data saved!
Number of collected followers: 285,813 (out of 1,662,736).
Number of unique users: 3,846 (out of 18,689).
Number of collected friends: 407,012 (out of 2,208,697).
Number of unique users: 3,845 (out of 22,099).
Waiting for 894 seconds to continue (until 2020-12-13 01:11:56 (UTC)).

Executing 15 queries for followers.


HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=15.0), HTML(value='')))

https://api.twitter.com/1.1/followers/ids.json?user_id=205252246&cursor=-1&count=624
Query 4 --     User 205252246: got 622 followers instead of 624.
https://api.twitter.com/1.1/followers/ids.json?user_id=1383432720&cursor=-1&count=106
Query 13 --    User 1383432720: got 105 followers instead of 106.

Next reset time: 2020-12-13 01:26:56 (UTC)
Executing 15 queries for friends.


HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=15.0), HTML(value='')))


Next reset time: 2020-12-13 01:27:00 (UTC)
Data saved!
Number of collected followers: 290,218 (out of 1,662,736).
Number of unique users: 3,861 (out of 18,689).
Number of collected friends: 408,997 (out of 2,208,697).
Number of unique users: 3,860 (out of 22,099).
Waiting for 894 seconds to continue (until 2020-12-13 01:27:00 (UTC)).

Executing 15 queries for followers.


HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=15.0), HTML(value='')))


Next reset time: 2020-12-13 01:42:00 (UTC)
Executing 15 queries for friends.


HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=15.0), HTML(value='')))


Next reset time: 2020-12-13 01:42:04 (UTC)
Data saved!
Number of collected followers: 290,757 (out of 1,662,736).
Number of unique users: 3,876 (out of 18,689).
Number of collected friends: 409,657 (out of 2,208,697).
Number of unique users: 3,875 (out of 22,099).
Waiting for 893 seconds to continue (until 2020-12-13 01:42:04 (UTC)).

Executing 15 queries for followers.


HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=15.0), HTML(value='')))

https://api.twitter.com/1.1/followers/ids.json?user_id=716423898&cursor=-1&count=15
Query 5 --     User 716423898: got 14 followers instead of 15.

Next reset time: 2020-12-13 01:57:04 (UTC)
Executing 15 queries for friends.


HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=15.0), HTML(value='')))

https://api.twitter.com/1.1/friends/ids.json?user_id=1223367151&cursor=-1&count=901
Query 13 --    User 1223367151: got 900 friends instead of 901.

Next reset time: 2020-12-13 01:57:08 (UTC)
Data saved!
Number of collected followers: 291,050 (out of 1,662,736).
Number of unique users: 3,891 (out of 18,689).
Number of collected friends: 410,899 (out of 2,208,697).
Number of unique users: 3,890 (out of 22,099).
Waiting for 894 seconds to continue (until 2020-12-13 01:57:08 (UTC)).

Executing 15 queries for followers.


HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=15.0), HTML(value='')))

https://api.twitter.com/1.1/followers/ids.json?user_id=938043882&cursor=-1&count=21
Query 12 --    User 938043882: got 20 followers instead of 21.
https://api.twitter.com/1.1/followers/ids.json?user_id=281247721&cursor=-1&count=3480
Query 14 --    User 281247721: got 3,478 followers instead of 3,480.

Next reset time: 2020-12-13 02:12:08 (UTC)
Executing 15 queries for friends.


HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=15.0), HTML(value='')))

https://api.twitter.com/1.1/friends/ids.json?user_id=2937707710&cursor=-1&count=564
Query 4 --     User 2937707710: got 562 friends instead of 564.

Next reset time: 2020-12-13 02:12:12 (UTC)
Data saved!
Number of collected followers: 294,936 (out of 1,662,736).
Number of unique users: 3,906 (out of 18,689).
Number of collected friends: 411,650 (out of 2,208,697).
Number of unique users: 3,905 (out of 22,099).
Waiting for 894 seconds to continue (until 2020-12-13 02:12:12 (UTC)).

Executing 15 queries for followers.


HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=15.0), HTML(value='')))


Next reset time: 2020-12-13 02:27:12 (UTC)
Executing 15 queries for friends.


HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=15.0), HTML(value='')))


Next reset time: 2020-12-13 02:27:16 (UTC)
Data saved!
Number of collected followers: 295,486 (out of 1,662,736).
Number of unique users: 3,921 (out of 18,689).
Number of collected friends: 414,330 (out of 2,208,697).
Number of unique users: 3,920 (out of 22,099).
Waiting for 894 seconds to continue (until 2020-12-13 02:27:16 (UTC)).

Executing 15 queries for followers.


HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=15.0), HTML(value='')))


Next reset time: 2020-12-13 02:42:16 (UTC)
Executing 15 queries for friends.


HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=15.0), HTML(value='')))


Next reset time: 2020-12-13 02:42:20 (UTC)
Data saved!
Number of collected followers: 296,156 (out of 1,662,736).
Number of unique users: 3,936 (out of 18,689).
Number of collected friends: 415,041 (out of 2,208,697).
Number of unique users: 3,935 (out of 22,099).
Waiting for 894 seconds to continue (until 2020-12-13 02:42:20 (UTC)).

Executing 15 queries for followers.


HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=15.0), HTML(value='')))

https://api.twitter.com/1.1/followers/ids.json?user_id=1373862175&cursor=-1&count=215
Query 8 --     User 1373862175: got 214 followers instead of 215.

Next reset time: 2020-12-13 02:57:20 (UTC)
Executing 15 queries for friends.


HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=15.0), HTML(value='')))


Next reset time: 2020-12-13 02:57:24 (UTC)
Data saved!
Number of collected followers: 297,301 (out of 1,662,736).
Number of unique users: 3,951 (out of 18,689).
Number of collected friends: 416,412 (out of 2,208,697).
Number of unique users: 3,950 (out of 22,099).
Waiting for 894 seconds to continue (until 2020-12-13 02:57:24 (UTC)).

Executing 15 queries for followers.


HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=15.0), HTML(value='')))

https://api.twitter.com/1.1/followers/ids.json?user_id=2176551135&cursor=-1&count=469
Query 8 --     User 2176551135: got 468 followers instead of 469.
https://api.twitter.com/1.1/followers/ids.json?user_id=415709718&cursor=-1&count=1494
Query 14 --    User 415709718: got 1,489 followers instead of 1,494.

Next reset time: 2020-12-13 03:12:24 (UTC)
Executing 15 queries for friends.


HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=15.0), HTML(value='')))

https://api.twitter.com/1.1/friends/ids.json?user_id=302797718&cursor=-1&count=1151
Query 3 --     User 302797718: got 742 friends instead of 1,151.

Next reset time: 2020-12-13 03:12:28 (UTC)
Data saved!
Number of collected followers: 299,354 (out of 1,662,736).
Number of unique users: 3,966 (out of 18,689).
Number of collected friends: 417,340 (out of 2,208,697).
Number of unique users: 3,965 (out of 22,099).
Waiting for 894 seconds to continue (until 2020-12-13 03:12:28 (UTC)).

Executing 15 queries for followers.


HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=15.0), HTML(value='')))


Next reset time: 2020-12-13 03:27:28 (UTC)
Executing 15 queries for friends.


HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=15.0), HTML(value='')))

https://api.twitter.com/1.1/friends/ids.json?user_id=23074339&cursor=-1&count=72
Query 9 --     User 23074339: got 71 friends instead of 72.

Next reset time: 2020-12-13 03:27:32 (UTC)
Data saved!
Number of collected followers: 302,054 (out of 1,662,736).
Number of unique users: 3,981 (out of 18,689).
Number of collected friends: 418,012 (out of 2,208,697).
Number of unique users: 3,980 (out of 22,099).
Waiting for 894 seconds to continue (until 2020-12-13 03:27:32 (UTC)).

Executing 15 queries for followers.


HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=15.0), HTML(value='')))


Next reset time: 2020-12-13 03:42:32 (UTC)
Executing 15 queries for friends.


HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=15.0), HTML(value='')))


Next reset time: 2020-12-13 03:42:36 (UTC)
Data saved!
Number of collected followers: 302,551 (out of 1,662,736).
Number of unique users: 3,996 (out of 18,689).
Number of collected friends: 418,395 (out of 2,208,697).
Number of unique users: 3,995 (out of 22,099).
Waiting for 894 seconds to continue (until 2020-12-13 03:42:36 (UTC)).

Executing 15 queries for followers.


HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=15.0), HTML(value='')))


Next reset time: 2020-12-13 03:57:37 (UTC)
Executing 15 queries for friends.


HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=15.0), HTML(value='')))

https://api.twitter.com/1.1/friends/ids.json?user_id=293429956&cursor=-1&count=476
Query 12 --    User 293429956: got 475 friends instead of 476.

Next reset time: 2020-12-13 03:57:41 (UTC)
Data saved!
Number of collected followers: 302,770 (out of 1,662,736).
Number of unique users: 4,011 (out of 18,689).
Number of collected friends: 419,362 (out of 2,208,697).
Number of unique users: 4,010 (out of 22,099).
Waiting for 894 seconds to continue (until 2020-12-13 03:57:41 (UTC)).

Executing 15 queries for followers.


HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=15.0), HTML(value='')))

https://api.twitter.com/1.1/followers/ids.json?user_id=2213780530&cursor=-1&count=131
Query 5 --     User 2213780530: got 129 followers instead of 131.
https://api.twitter.com/1.1/followers/ids.json?user_id=330811177&cursor=-1&count=295
Query 7 --     User 330811177: got 293 followers instead of 295.
https://api.twitter.com/1.1/followers/ids.json?user_id=287530268&cursor=-1&count=173
Query 12 --    User 287530268: got 172 followers instead of 173.

Next reset time: 2020-12-13 04:12:41 (UTC)
Executing 15 queries for friends.


HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=15.0), HTML(value='')))


Next reset time: 2020-12-13 04:12:45 (UTC)

Pull is done!

Data saved!


---

# Break

In [24]:
break

SyntaxError: 'break' outside loop (<ipython-input-24-6aaf1f276005>, line 1)

# Test section

In [None]:
user_data_pull_status[user_data_pull_status.tweet_count > 0].head(15)

In [None]:
get_initial_followers_df().shape[0]

In [None]:
r = requests.get(
    "https://api.twitter.com/1.1/followers/ids.json?user_id=555533734&cursor=-1&count=311",
    headers=headers,
)

In [None]:
test = pd.DataFrame(
    r.json(),
    columns=[
        "user_id",
        "ids",
        "next_cursor",
    ],
    dtype=int,
).fillna("")
test

In [None]:
len(r.json()["ids"])

In [None]:
np.unique(user_timeline_data.id.values).shape[0]

In [None]:
import os

os.path.isfile(PUBLIC_USERS_FILE)

In [None]:
pd.DataFrame(
    columns=[
        "user_id",
        "user",
        "id",
        "created_at",
        "text",
        "in_reply_to_status_id",
        "in_reply_to_user_id",
        "source",
        "truncated",
        "coordinates",
        "place",
        "is_quote_status",
        "quoted_status_id",
        "quoted_status",
        "quote_count",
        "retweeted_status",
        "retweet_count",
        "favorite_count",
        "entities",
        "extended_entities",
        "possibly_sensitive",
        "lang",
    ],
    # Replace NaNs by empty strings to facilitate pre-processing
)

In [None]:
user_data_pull_status = user_data[
    ["id", "followers_count", "following_count", "tweet_count"]
].copy()

In [None]:
user_data_pull_status["timeline_lowest_id"] = -1
user_data_pull_status["timeline_tweets_pulled"] = -1

user_data_pull_status["followers_cursor"] = -1
user_data_pull_status["followers_pulled"] = -1

user_data_pull_status["following_cursor"] = -1
user_data_pull_status["following_pulled"] = -1

In [None]:
user_data_pull_status = user_data_pull_status[
    [
        "id",
        "timeline_lowest_id",
        "timeline_tweets_pulled",
        "tweet_count",
        "followers_cursor",
        "followers_pulled",
        "followers_count",
        "following_cursor",
        "following_pulled",
        "following_count",
    ]
]
user_data_pull_status

In [None]:
req = (
    "https://api.twitter.com/1.1/application/rate_limit_status.json"
    #     + "783214,15994119,1320117356"
    #     + "&"
    #     + API_USER_FIELDS
    #     + "&"
    #     + API_TWEET_FIELDS
)
payload = {"resources": "application,statuses,followers,friends"}
print(req)
r = requests.get(req, headers=headers, params=payload)
print(r.url)

In [None]:
r.json()["resources"]["statuses"]["/statuses/user_timeline"]

In [None]:
req = API_USER_TIMELINE_ENDPOINT + "?user_id=783214" + "&count=10"
print(req)
r = requests.get(req, headers=headers)

In [None]:
r.json()

In [None]:
test = get_user_timeline_df(r)

In [None]:
test

In [None]:
test.created_at = pd.to_datetime(test.created_at)

In [None]:
test.sort_values(by="id").user[0]

In [None]:
int(r.headers["x-rate-limit-remaining"])

In [None]:
wait_for_reset(r)

In [None]:
df = pd.DataFrame(
    r.json()["data"],
    columns=[
        "id",
        "username",
        "name",
        "protected",
        "withheld",
        "verified",
        "created_at",
        "location",
        "public_metrics",
        "description",
        "url",
        "entities",
        "pinned_tweet_id",
    ],
).fillna("")
df

In [None]:
df.public_metrics[0]

In [None]:
783214 in user_data.id.values.astype(int)

In [None]:
timeline_cols = ["timeline_lowest_id", "timeline_tweets_pulled", "tweet_count"]
other_cols = [x for x in user_data_pull_status.columns if x not in timeline_cols]

In [None]:
user_data_pull_status_timeline = user_data_pull_status[timeline_cols].copy()
user_data_pull_status_timeline.head()

In [None]:
user_data_pull_status_timeline.to_csv(
    DATA_FOLDER + "public_users_pull_status_timeline.csv"
)

In [None]:
user_data_pull_status_ff = user_data_pull_status[other_cols].copy()
user_data_pull_status_ff.head()

In [None]:
user_data_pull_status_ff.to_csv(DATA_FOLDER + "public_users_pull_status_ff.csv")

In [None]:
user_data_pull_status_timeline.join(user_data_pull_status_ff)

In [None]:
user_data_pull_status.head()

In [None]:
np.sum(
    ~np.equal(
        user_data_pull_status.values,
        user_data_pull_status_timeline.join(user_data_pull_status_ff).values,
    )
)