# Twitter trolls - how Russia meddles with western democracies


## Abstract
Over the last several years there has been an attempt from Russian trolls to spread propaganda and fake news over social media in order to spread political ideas among the general population both nationally and internationally. Can these attempts be regarded as undermining the democracy of the affected countries?

In this project we are going to analyze a great number of these russian tweets and look into their motivations for this meddling. We will mainly look into their overall political goals in the US, and examine how these goals change over time. Have the trolls achieved their goals? We will also look into how the trolls operate and organize themselves, trying to find patterns in the madness. Such patterns can hopefully help the general population to indicate that a tweet is originating from a troll. As the Russian efforts are increasing every year, a solution is needed to defend the democracy.

## Research questions

-Which themes does the propaganda mainly revolve around? About which issues should people be particularly careful not to believe everything they read?

Darren: "A lot of this has been done . . . at least with the English language data. More needs to be done on the European data."

-Does the trolls advocate for a common political stance in each specific country? If so which leaning do they have? If not, how polarized are the tweets between left leaning and right leaning?

Darren comment: Again, this has been done on the English language data. I'm unaware to what extent it has been done on the European data. I would highly recommend looking closely at the German data. There is a good volume there and I have not spoken to anyone working with it. I tried to get a story in Der Speigel and spoke at length with a journalist there, but nothing came of it.

-Were the trolls united with a common political leaning in the period after the primaries in the US elections?

Darren comment: 
Some of this analysis appears in the fivethirtyeight article.

-Was the original mission of the Russian trolls for the US election to make sure that Clinton was not elected, or to get Trump elected?

Darren: done

-Are the trolls organized as a unit? Do they interact with each other (retweets, etc)?

Darren: interesting

-Is there a way for people without a technical background to determine if a tweet is coming from a Russian troll?

## Research questions

IRA russian twitter trolls - Three million tweets amounting to 175 Mb, along with a detailed description of the dataset

This dataset contains around three million tweets and retweets from 2848 unique twitter users. Each tweet has several attributes, some of them are extracted from the tweet itself. For instance the author, content, and time stamp. Other attributes are later added. An example of such attributes is category of the troll, e.g. RightTroll, NewsFeed, etc.

There are many factors that we can examine in the dataset. Looking at given features like timing could be very interesting to find patterns. We also intend to add features, in order to make the dataset more suitable for our analysis. As every data point contains the whole tweet itself, it is possible to perform a broad analysis on the content. This could be used to determine whether a certain person or word is mentioned, and add the overall theming as a feature. Another example of a possible extra feature is stating whether a tweet is a retweet of another troll, unique, or identical to another tweet in the set.

The data set it pretty small so it should be pretty manageable to process with pandas, but spark could also be used.

## Requierements

The second task is to intimately acquaint yourself with the data, preprocess it and complete all the necessary descriptive statistics tasks. We expect you to have a pipeline in place, fully documented in a notebook, and show us that you’ve advanced with your understanding of the project goals by updating its README description.

When describing the data, in particular, you should show (non-exhaustive list):

- That you can handle the data in its size.
- That you understand what’s into the data (formats, distributions, missing values, correlations, etc.).
- That you considered ways to enrich, filter, transform the data according to your needs.
- That you have updated your plan in a reasonable way, reflecting your improved knowledge after data acquaintance. In particular, discuss how your data suits your project needs and discuss the methods you’re going to use, giving their essential mathematical details in the notebook.
- That your plan for analysis and communication is now reasonable and sound, potentially discussing alternatives to your choices that you considered but dropped.

We will evaluate this milestone according to how well these previous steps (or other reasonable ones) have been done and documented, the quality of the code and its documentation, the feasibility and critical awareness of the project.

## Description of second data set: 

- rus_troll_user.csv : Contains user specific features. (nickname, description field, follower count etc.)
- rus_troll_tweet_text.csv: Contains text and language of the given tweet. You will use this if you are doing text classification, sentiment analysis, topic detection etc. 
- rus_troll_tweet_metadata.csv: Contains features that are user specific, but may change tweet to tweet.
- rus_troll_tweet_stats.csv: Contains other (imo important) tweet features

In [1]:
import pandas as pd
import numpy as np
from zipfile import ZipFile
import scipy as sp
from pyspark.sql import *
import matplotlib.pyplot as plt
from statistics import median 
%matplotlib inline
from pyspark.sql.functions import *
from pyspark.sql.types import *
#import pyspark.sql.SQLContext

spark = SparkSession.builder.getOrCreate()

In [2]:
data_folder = './data/'

In [3]:
zip_file = ZipFile('russian-troll-tweets.zip')
zip_file_new = ZipFile("New_russian_tweets.zip")
data = pd.DataFrame()
new_data = pd.DataFrame()


In [4]:
for i in range(1,9):
    data = data.append(pd.read_csv(zip_file.open("IRAhandle_tweets_"+str(i)+".csv")))
data=data.reset_index()

In [20]:
new_troll_user = pd.read_csv(zip_file_new.open("rus_troll_user.csv"), header = 1)
new_troll_text = pd.read_csv(zip_file_new.open("rus_troll_tweet_metadata.csv"))
new_troll_metadata = pd.read_csv(zip_file_new.open("rus_troll_tweet_text.csv"))
new_troll_stats = pd.read_csv(zip_file_new.open("rus_troll_tweet_stats.csv"))

  interactivity=interactivity, compiler=compiler, result=result)


I am going to load in the files as spark dataframe as well:

In [6]:
spark_rus_text = spark.read.csv("New_russian_tweets/rus_troll_tweet_text.csv")
spark_rus_metadata = spark.read.csv("New_russian_tweets/rus_troll_tweet_metadata.csv")
spark_rus_stats = spark.read.csv("New_russian_tweets/rus_troll_tweet_stats.csv")
spark_rus_user = spark.read.csv("New_russian_tweets/rus_troll_user.csv")

In [17]:
spark_rus_user.show()
data["publish_date"] =   pd.to_datetime(data["publish_date"], format='%m/%d/%Y\n%H:%M')

+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+----------------+--------------+---------------+----------------+
|                 _c0|                 _c1|                 _c2|                 _c3|                 _c4|                 _c5|                 _c6|             _c7|           _c8|            _c9|            _c10|
+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+----------------+--------------+---------------+----------------+
|              userid|   user_display_name|    user_screen_name|user_reported_loc...|user_profile_desc...|    user_profile_url|account_creation_...|account_language|follower_count|following_count|   last_tweet_at|
|004c1875a5f3a8ddf...|004c1875a5f3a8ddf...|004c1875a5f3a8ddf...|            New York|                null|                null|          2014-05

Write and read it as parquet files.

In [13]:
#spark_rus_text.write.parquet("spark_rus_text.parquet2")
#spark_rus_stats.write.parquet("spark_rus_stats.parquet")
#spark_rus_metadata.write.parquet("spark_rus_metadata.parquet")
#spark_rus_user.write.parquet("spark_rus_user.parquet")
spark_rus_text = spark.read.parquet("spark_rus_text.parquet2")
spark_rus_metadata = spark.read.parquet("spark_rus_metadata.parquet")
spark_rus_user = spark.read.parquet("spark_rus_user.parquet")
spark_rus_stats = spark.read.parquet("spark_rus_stats.parquet")

In [16]:
spark_rus_metadata.show()

+------------------+--------------+---------------+--------+---------+------------------+
|               _c0|           _c1|            _c2|     _c3|      _c4|               _c5|
+------------------+--------------+---------------+--------+---------+------------------+
|           tweetid|follower_count|following_count|latitude|longitude| tweet_client_name|
|849295393867399169|          4042|           1470|    null|     null|Twitter Web Client|
|567280957913587713|           272|            390|    null|     null|          iziaslav|
|493095247690612736|            89|            223|    null|     null|          vavilonX|
|493892174069903360|            89|            223|    null|     null|          vavilonX|
|512503798506721280|            89|            223|    null|     null|          vavilonX|
|499624206246871041|            89|            223|    null|     null|          vavilonX|
|491828568251707392|            89|            223|    null|     null|          vavilonX|
|493768356

In [18]:
new_troll_user.head(20)
display(new_troll_user.shape)
new_troll_text.shape

(3667, 11)

(9041308, 6)

In [19]:
display(new_troll_user["account_language"].unique())
#new_troll_user["account_language"].drop
new_troll_user

array(['en', 'ru', 'es', 'id', 'de', 'ar', 'fr', 'it', 'uk', 'en-gb',
       'zh-cn'], dtype=object)

Unnamed: 0,userid,user_display_name,user_screen_name,user_reported_location,user_profile_description,user_profile_url,account_creation_date,account_language,follower_count,following_count,last_tweet_at
0,004c1875a5f3a8ddfd2044b857a81c5d458882ac5cdf67...,004c1875a5f3a8ddfd2044b857a81c5d458882ac5cdf67...,004c1875a5f3a8ddfd2044b857a81c5d458882ac5cdf67...,New York,,,2014-05-23,en,63,77,2014-09-18 08:18
1,005b6c0f7e3371b1cacced2890fead3d5543694ab21372...,005b6c0f7e3371b1cacced2890fead3d5543694ab21372...,005b6c0f7e3371b1cacced2890fead3d5543694ab21372...,"New York, NY",,,2014-08-05,en,112,153,2015-11-27 00:57
2,005c20d3604f7b90f0d5b22f60226d60cfd9c7bdf5c728...,005c20d3604f7b90f0d5b22f60226d60cfd9c7bdf5c728...,005c20d3604f7b90f0d5b22f60226d60cfd9c7bdf5c728...,"Macon, GA",,,2016-04-14,en,64,825,2016-05-10 16:52
3,00b6194ca3359e2f37000037e20223ee91cfffcf15c7b9...,00b6194ca3359e2f37000037e20223ee91cfffcf15c7b9...,00b6194ca3359e2f37000037e20223ee91cfffcf15c7b9...,"Хабаровск, Россия",Непрост,,2013-11-11,ru,181,255,2016-09-23 09:14
4,00bd49f19d4096b1f47e6e7702dddd746cf7021795f234...,00bd49f19d4096b1f47e6e7702dddd746cf7021795f234...,00bd49f19d4096b1f47e6e7702dddd746cf7021795f234...,United States,,,2014-05-15,en,71,81,2015-11-27 03:28
5,00bf159b7380421c64abf4edcb220d10626cbd29dad0b8...,00bf159b7380421c64abf4edcb220d10626cbd29dad0b8...,00bf159b7380421c64abf4edcb220d10626cbd29dad0b8...,,,https://t.co/s5jE6mUsZn,2016-11-20,ru,3,181,2016-11-20 06:34
6,00e1726daf96a4451c536ef5c891263b8e306d32ddf8c3...,00e1726daf96a4451c536ef5c891263b8e306d32ddf8c3...,00e1726daf96a4451c536ef5c891263b8e306d32ddf8c3...,Санкт-Петербург,"У сердца есть причины, которые разуму не понять",,2014-03-19,en,164,348,2015-12-28 17:41
7,00e60b6efcc4135bb73e81ba1e385786894899da0a9633...,00e60b6efcc4135bb73e81ba1e385786894899da0a9633...,00e60b6efcc4135bb73e81ba1e385786894899da0a9633...,United States,Every child is an artist. The problem is how ...,,2014-06-07,en,41,302,2015-01-21 16:35
8,0105e08292e57e3bc201e097ac85752a7a1e17e6d2c188...,0105e08292e57e3bc201e097ac85752a7a1e17e6d2c188...,0105e08292e57e3bc201e097ac85752a7a1e17e6d2c188...,,I love you. https://t.co/02XzbFcEAc,,2017-11-06,ru,17,193,2017-12-09 09:25
9,010659f988d882af388da498e705d5d49d1540211f9955...,010659f988d882af388da498e705d5d49d1540211f9955...,010659f988d882af388da498e705d5d49d1540211f9955...,Atlanta,TroublemakerXD,,2013-12-19,en,70,142,2014-09-14 07:13


I am not keen to keep zh-cn, which is chinese, ar, which is arabic. id, which is indonesian, and uk which is ukrainian and ru, which is russian. But what if they tweet in english?

In [21]:
new_troll_metadata.head(100)
new_troll_metadata.shape

(9041308, 3)

In [22]:
new_troll_text.head(30)
new_troll_text.shape

(9041308, 6)

In [24]:
new_troll_stats.head(30)

Unnamed: 0,tweetid,userid,tweet_time,in_reply_to_tweetid,in_reply_to_userid,quoted_tweet_tweetid,is_retweet,retweet_userid,retweet_tweetid,quote_count,reply_count,like_count,retweet_count,hashtags,urls,user_mentions,poll_choices
0,877919995476496385,249064136b1c5cb00a705316ab73dd9b53785748ab757f...,2017-06-22 16:03,,,,True,2572896396.0,8.779172e+17,0,0,0,0,[],[http://ru-open.livejournal.com/374284.html],[2572896396],
1,492388766930444288,0974d5dbee4ca9bd6c3b46d62a5cbdbd5c0d86e196b624...,2014-07-24 19:20,,,,False,,,0,0,0,0,,[http://pyypilg33.livejournal.com/11069.html],,
2,719455077589721089,bda40f262856eee77c48a332e5eb23bc4f1943d600867d...,2016-04-11 09:20,7.194399e+17,40807205.0,,False,,,0,0,0,0,[],[https://www.youtube.com/watch?v=9GvpImWxTJc],[40807205],
3,536179342423105537,bda40f262856eee77c48a332e5eb23bc4f1943d600867d...,2014-11-22 15:28,,,,False,,,0,0,0,0,[STOPNazi],,,
4,841410788409630720,a53ed619f1dea6015c7c878bf744b0eefe8f7272dccf34...,2017-03-13 22:08,,,,False,,,0,0,3,4,[],[https://goo.gl/fBp94X],,
5,834365760776630272,a53ed619f1dea6015c7c878bf744b0eefe8f7272dccf34...,2017-02-22 11:34,,,,False,,,0,0,3,5,[],[https://goo.gl/9w5hso],,
6,577490527299457024,95b3aba6b9140f5dda993148de174ff57d62f4a6e68e88...,2015-03-16 15:24,,,,True,2599775719.0,5.774854e+17,0,0,0,0,[],[http://nahnews.com.ua/180774-na-xarkovshhine-...,[2599775719],
7,596522755379560448,5744c546bdf9e81ea0aad223c9db4b702ccba7c81d4c11...,2015-05-08 03:51,,,,True,2518710111.0,5.965142e+17,0,0,0,0,[],[http://bit.ly/1Rizso9],[2518710111],
8,567357519547207680,2b0d7525bed1df5119b7956f9be4888b45686172d68006...,2015-02-16 16:19,,,,False,,,0,0,0,0,,,,
9,665533117369876480,b88fd4fc4b169f0a98eb38d3f5ef72a1eb3f6861cb3e81...,2015-11-14 14:13,,,,True,72525490.0,6.655222e+17,0,0,0,0,[],[http://vesti.ru/t?2686779],[72525490],


In [25]:
data["language"].unique()

array(['English', 'Russian', 'Serbian', 'Ukrainian', 'Tagalog (Filipino)',
       'Albanian', 'Italian', 'Romanian', 'Spanish', 'Catalan', 'German',
       'Estonian', 'French', 'Norwegian', 'Vietnamese', 'Dutch', 'Arabic',
       'Uzbek', 'Bulgarian', 'Macedonian', 'Farsi (Persian)', 'Turkish',
       'LANGUAGE UNDEFINED', 'Czech', 'Somali', 'Lithuanian', 'Croatian',
       'Slovak', 'Icelandic', 'Slovenian', 'Japanese', 'Indonesian',
       'Pushto', 'Hungarian', 'Finnish', 'Latvian', 'Portuguese',
       'Danish', 'Swedish', 'Malay', 'Polish', 'Korean', 'Hebrew', 'Urdu',
       'Kurdish', 'Hindi', 'Greek', 'Simplified Chinese', 'Thai',
       'Bengali', 'Traditional Chinese', 'Gujarati', 'Kannada', 'Tamil',
       'Telugu', 'Malayalam'], dtype=object)

HÅKON

Ting å gjøre: Slå sammen twitter id og tweets for å sjekke hvilket språk de twitrer på. Kan arabiske folk twitre på engelsk? 

## Old dataset

In [32]:
datad = data.drop(columns=["retweet", "index", "new_june_2018", "harvested_date"]) # also"harvested_date","new_june_2018",

In [33]:
datad

Unnamed: 0,external_author_id,author,content,region,language,publish_date,following,followers,updates,post_type,account_type,account_category
0,9.060000e+17,10_GOP,"""We have a sitting Democrat US Senator on tria...",Unknown,English,2017-10-01 19:58:00,1052,9636,253,,Right,RightTroll
1,9.060000e+17,10_GOP,Marshawn Lynch arrives to game in anti-Trump s...,Unknown,English,2017-10-01 22:43:00,1054,9637,254,,Right,RightTroll
2,9.060000e+17,10_GOP,Daughter of fallen Navy Sailor delivers powerf...,Unknown,English,2017-10-01 22:50:00,1054,9637,255,RETWEET,Right,RightTroll
3,9.060000e+17,10_GOP,JUST IN: President Trump dedicates Presidents ...,Unknown,English,2017-10-01 23:52:00,1062,9642,256,,Right,RightTroll
4,9.060000e+17,10_GOP,"19,000 RESPECTING our National Anthem! #StandF...",Unknown,English,2017-10-01 02:13:00,1050,9645,246,RETWEET,Right,RightTroll
5,9.060000e+17,10_GOP,"Dan Bongino: ""Nobody trolls liberals better th...",Unknown,English,2017-10-01 02:47:00,1050,9644,247,,Right,RightTroll
6,9.060000e+17,10_GOP,🐝🐝🐝 https://t.co/MorL3AQW0z,Unknown,English,2017-10-01 02:48:00,1050,9644,248,RETWEET,Right,RightTroll
7,9.060000e+17,10_GOP,'@SenatorMenendez @CarmenYulinCruz Doesn't mat...,Unknown,English,2017-10-01 02:52:00,1050,9644,249,,Right,RightTroll
8,9.060000e+17,10_GOP,"As much as I hate promoting CNN article, here ...",Unknown,English,2017-10-01 03:47:00,1050,9646,250,,Right,RightTroll
9,9.060000e+17,10_GOP,After the 'genocide' remark from San Juan Mayo...,Unknown,English,2017-10-01 03:51:00,1050,9646,251,,Right,RightTroll


In [34]:
display(datad["account_category"].unique())
display(datad["account_type"].unique())
display(datad["post_type"].unique())
display(datad["region"].unique())

array(['RightTroll', 'NonEnglish', 'Fearmonger', 'LeftTroll', 'Unknown',
       'HashtagGamer', 'NewsFeed', 'Commercial'], dtype=object)

array(['Right', 'Russian', 'Koch', 'Italian', 'left', '?', 'German',
       'Spanish', 'Hashtager', 'Arabic', 'local', 'Commercial', 'French',
       'Ukranian', 'ZAPOROSHIA', 'news', 'right', 'Uzbek', 'Ebola ', nan,
       'Portuguese'], dtype=object)

array([nan, 'RETWEET', 'QUOTE_TWEET'], dtype=object)

array(['Unknown', nan, 'United States', 'Italy', 'United Arab Emirates',
       'Japan', 'Israel', 'Azerbaijan', 'Egypt', 'United Kingdom',
       'Russian Federation', 'Turkey', 'Iraq', 'Germany', 'France',
       'Ukraine', 'Serbia', 'Hong Kong', 'Austria', 'Belarus', 'Malaysia',
       'Spain', 'Samoa', 'India', 'Afghanistan', 'Saudi Arabia',
       'Iran, Islamic Republic of', 'Mexico', 'Canada', 'Greece',
       'Czech Republic', 'Finland', 'Latvia', 'Estonia', 'Sweden',
       'Denmark', 'Switzerland'], dtype=object)

Since nan-tweets are only original tweets, i want to change this values to ORIGINAL_TWEETS

In [35]:
datad["post_type"] = datad["post_type"].fillna("ORIGINAL")
datad 


Unnamed: 0,external_author_id,author,content,region,language,publish_date,following,followers,updates,post_type,account_type,account_category
0,9.060000e+17,10_GOP,"""We have a sitting Democrat US Senator on tria...",Unknown,English,2017-10-01 19:58:00,1052,9636,253,ORIGINAL,Right,RightTroll
1,9.060000e+17,10_GOP,Marshawn Lynch arrives to game in anti-Trump s...,Unknown,English,2017-10-01 22:43:00,1054,9637,254,ORIGINAL,Right,RightTroll
2,9.060000e+17,10_GOP,Daughter of fallen Navy Sailor delivers powerf...,Unknown,English,2017-10-01 22:50:00,1054,9637,255,RETWEET,Right,RightTroll
3,9.060000e+17,10_GOP,JUST IN: President Trump dedicates Presidents ...,Unknown,English,2017-10-01 23:52:00,1062,9642,256,ORIGINAL,Right,RightTroll
4,9.060000e+17,10_GOP,"19,000 RESPECTING our National Anthem! #StandF...",Unknown,English,2017-10-01 02:13:00,1050,9645,246,RETWEET,Right,RightTroll
5,9.060000e+17,10_GOP,"Dan Bongino: ""Nobody trolls liberals better th...",Unknown,English,2017-10-01 02:47:00,1050,9644,247,ORIGINAL,Right,RightTroll
6,9.060000e+17,10_GOP,🐝🐝🐝 https://t.co/MorL3AQW0z,Unknown,English,2017-10-01 02:48:00,1050,9644,248,RETWEET,Right,RightTroll
7,9.060000e+17,10_GOP,'@SenatorMenendez @CarmenYulinCruz Doesn't mat...,Unknown,English,2017-10-01 02:52:00,1050,9644,249,ORIGINAL,Right,RightTroll
8,9.060000e+17,10_GOP,"As much as I hate promoting CNN article, here ...",Unknown,English,2017-10-01 03:47:00,1050,9646,250,ORIGINAL,Right,RightTroll
9,9.060000e+17,10_GOP,After the 'genocide' remark from San Juan Mayo...,Unknown,English,2017-10-01 03:51:00,1050,9646,251,ORIGINAL,Right,RightTroll


In [36]:
isna_columns = datad.isna().any(axis=0)
column_nan_list = isna_columns[isna_columns== True].index.tolist()
#datad[index_nan_list]
column_nan_list

['external_author_id', 'content', 'region', 'account_type']

We can see that 4 categories has nan-values. We want to know the count of nan-values in each row.

In [37]:
for x in column_nan_list:
    print(x, ":", datad[x].isnull().sum())

external_author_id : 4
content : 1
region : 8774
account_type : 363


Since account_type already has "?" as a type, we set account_type to this. We drop the tweets with content = nan and external id = nan. For region we do the same as with account_type, setting this to "unknown".

In [38]:
datad["account_type"] = datad["account_type"].fillna("?")
datad["region"] = datad["region"].fillna("Unknown")
datad[["content", "external_author_id"]] = datad[["content", "external_author_id"]].dropna()


In [39]:
datad

Unnamed: 0,external_author_id,author,content,region,language,publish_date,following,followers,updates,post_type,account_type,account_category
0,9.060000e+17,10_GOP,"""We have a sitting Democrat US Senator on tria...",Unknown,English,2017-10-01 19:58:00,1052,9636,253,ORIGINAL,Right,RightTroll
1,9.060000e+17,10_GOP,Marshawn Lynch arrives to game in anti-Trump s...,Unknown,English,2017-10-01 22:43:00,1054,9637,254,ORIGINAL,Right,RightTroll
2,9.060000e+17,10_GOP,Daughter of fallen Navy Sailor delivers powerf...,Unknown,English,2017-10-01 22:50:00,1054,9637,255,RETWEET,Right,RightTroll
3,9.060000e+17,10_GOP,JUST IN: President Trump dedicates Presidents ...,Unknown,English,2017-10-01 23:52:00,1062,9642,256,ORIGINAL,Right,RightTroll
4,9.060000e+17,10_GOP,"19,000 RESPECTING our National Anthem! #StandF...",Unknown,English,2017-10-01 02:13:00,1050,9645,246,RETWEET,Right,RightTroll
5,9.060000e+17,10_GOP,"Dan Bongino: ""Nobody trolls liberals better th...",Unknown,English,2017-10-01 02:47:00,1050,9644,247,ORIGINAL,Right,RightTroll
6,9.060000e+17,10_GOP,🐝🐝🐝 https://t.co/MorL3AQW0z,Unknown,English,2017-10-01 02:48:00,1050,9644,248,RETWEET,Right,RightTroll
7,9.060000e+17,10_GOP,'@SenatorMenendez @CarmenYulinCruz Doesn't mat...,Unknown,English,2017-10-01 02:52:00,1050,9644,249,ORIGINAL,Right,RightTroll
8,9.060000e+17,10_GOP,"As much as I hate promoting CNN article, here ...",Unknown,English,2017-10-01 03:47:00,1050,9646,250,ORIGINAL,Right,RightTroll
9,9.060000e+17,10_GOP,After the 'genocide' remark from San Juan Mayo...,Unknown,English,2017-10-01 03:51:00,1050,9646,251,ORIGINAL,Right,RightTroll


Vi kan kanskje plotte histogram over hvor mange tweets hver bruker twitrer?

In [40]:
datade = datad.loc[data['language'] == "English"]
datade

Unnamed: 0,external_author_id,author,content,region,language,publish_date,following,followers,updates,post_type,account_type,account_category
0,9.060000e+17,10_GOP,"""We have a sitting Democrat US Senator on tria...",Unknown,English,2017-10-01 19:58:00,1052,9636,253,ORIGINAL,Right,RightTroll
1,9.060000e+17,10_GOP,Marshawn Lynch arrives to game in anti-Trump s...,Unknown,English,2017-10-01 22:43:00,1054,9637,254,ORIGINAL,Right,RightTroll
2,9.060000e+17,10_GOP,Daughter of fallen Navy Sailor delivers powerf...,Unknown,English,2017-10-01 22:50:00,1054,9637,255,RETWEET,Right,RightTroll
3,9.060000e+17,10_GOP,JUST IN: President Trump dedicates Presidents ...,Unknown,English,2017-10-01 23:52:00,1062,9642,256,ORIGINAL,Right,RightTroll
4,9.060000e+17,10_GOP,"19,000 RESPECTING our National Anthem! #StandF...",Unknown,English,2017-10-01 02:13:00,1050,9645,246,RETWEET,Right,RightTroll
5,9.060000e+17,10_GOP,"Dan Bongino: ""Nobody trolls liberals better th...",Unknown,English,2017-10-01 02:47:00,1050,9644,247,ORIGINAL,Right,RightTroll
6,9.060000e+17,10_GOP,🐝🐝🐝 https://t.co/MorL3AQW0z,Unknown,English,2017-10-01 02:48:00,1050,9644,248,RETWEET,Right,RightTroll
7,9.060000e+17,10_GOP,'@SenatorMenendez @CarmenYulinCruz Doesn't mat...,Unknown,English,2017-10-01 02:52:00,1050,9644,249,ORIGINAL,Right,RightTroll
8,9.060000e+17,10_GOP,"As much as I hate promoting CNN article, here ...",Unknown,English,2017-10-01 03:47:00,1050,9646,250,ORIGINAL,Right,RightTroll
9,9.060000e+17,10_GOP,After the 'genocide' remark from San Juan Mayo...,Unknown,English,2017-10-01 03:51:00,1050,9646,251,ORIGINAL,Right,RightTroll


In [44]:
datade = datade.sort_values("publish_date")
after_primaries = datade[datade["publish_date" > "2018-05-24 00:00:00"]]

KeyError: True