# Twitter Scraper

[`Twitter Scraper`](https://github.com/bisguzar/twitter-scraper) is a `Python` library that lets you collect Twitter data without using the API.

## Import libraries & setup

In [None]:
import pandas as pd
from twitter_scraper import get_tweets
from twitter_scraper import Profile

The file [twitter_accounts.csv](./data/twitter_accounts.csv) in the `data` folder of this repository contains a few Twitter screen names which we will use in the following examples.

In [None]:
accounts = pd.read_csv('data/twitter_accounts.csv')
accounts = accounts['Screen Name'].tolist()

In [None]:
accounts

## Profile information

With the following code we collect the profile information for the accounts from the list we imported before.

In [None]:
account_info = []
for account in accounts:
    profile = Profile(account)
    profile = profile.to_dict()
    account_info.append(profile)

In [None]:
account_info

We can transform this list of dictionaries into a `pandas` dataframe.

In [None]:
account_info_df = pd.DataFrame(account_info)
account_info_df

If you want to, you can export this dataframe as a `CSV` file.

In [None]:
account_info_df.to_csv('data/account_info_ts.csv', index = False)

## Tweets

We can also use `Twitter Scraper` to collect the tweets from specific accounts. The `pages` parameter specifies how many pages of results you want (although the frontend of [Twitter Search](https://twitter.com/explore) continuously loads additional results, they are still divided into pages in the backend). If you want more tweets, you can increase the number of `pages` or leave out that parameter in the `get_tweets()` command.

In [None]:
account_tweets = []
for account in accounts:
    for tweet in get_tweets(account, pages=1): #increase the number of pages of remove the 'pages' parameter if you want to collect more tweets
        account_tweets.append(tweet)

We can, again, convert the resulting list of dictionaries to a `pandas` dataframe.

In [None]:
account_tweets_df = pd.DataFrame(account_tweets)

Before we export the dataframe it helps to split up the `entries` column that contains dictionaries into separate columns that contain (comma-separated) strings. You may receive an error message when running the code cell below, if one of the resulting columns (e.g., the `video` column) does not contain any values (in which case you can safely ignore the error message).

In [None]:
account_tweets_df = pd.concat([account_tweets_df.drop(['entries'], axis=1), account_tweets_df['entries'].apply(pd.Series)], axis=1)
account_tweets_df['hashtags'] = account_tweets_df['hashtags'].apply(', '.join)
account_tweets_df['urls'] = account_tweets_df['urls'].apply(', '.join)
account_tweets_df['photos'] = account_tweets_df['photos'].apply(', '.join)
account_tweets_df['videos'] = account_tweets_df['videos'].apply(', '.join)

To check if everything worked, you can have a look at the first 5 rows of the resulting dataframe.

In [None]:
account_tweets_df.head()

Now you can store the resulting dataframe as a `CSV` file in the `data` folder.

In [None]:
account_tweets_df.to_csv('data/tweets_ts.csv', index = False)