### Fake News Datasets
#### Combine Multiple Datasets
Tweets/News from BuzzFeed, PolitiFact, GossipCop, COVID-19, Disasters

Followed this link to get full data from PolitiFact and GossipCop:
https://github.com/KaiDMML/FakeNewsNet

#### Other Data Sources
- BuzzFeed Data: https://github.com/mdani38/Fake-News-Detection
- COVID-19 Data: https://raw.githubusercontent.com/susanli2016/NLP-with-Python/master/data/corona_fake.csv
- Disasters Data: https://www.kaggle.com/c/nlp-getting-started/notebooks 

In [1]:
# general packages
import pandas as pd
import numpy as np
from scipy import stats
import pickle
import re
%matplotlib notebook
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
import seaborn as sns
from scipy.stats import zscore
sns.set()

#### Load BuzzFeed Data

In [2]:
# load BuzzFeed data
buzz_fake = pd.read_csv('new_data/BuzzFeed_fake_news_content.csv', delimiter=',')
buzz_real = pd.read_csv('new_data/BuzzFeed_real_news_content.csv', delimiter=',')
print(buzz_fake.shape)
print(buzz_real.shape)

buzz_fake.head()

(91, 12)
(91, 12)


Unnamed: 0,id,title,text,url,top_img,authors,source,publish_date,movies,images,canonical_link,meta_data
0,Fake_1-Webpage,Proof The Mainstream Media Is Manipulating The...,I woke up this morning to find a variation of ...,http://www.addictinginfo.org/2016/09/19/proof-...,http://addictinginfo.addictinginfoent.netdna-c...,Wendy Gittleson,http://www.addictinginfo.org,{'$date': 1474243200000},,"http://i.imgur.com/JeqZLhj.png,http://addictin...",http://addictinginfo.com/2016/09/19/proof-the-...,"{""publisher"": ""Addicting Info | The Knowledge ..."
1,Fake_10-Webpage,Charity: Clinton Foundation Distributed “Water...,Former President Bill Clinton and his Clinton ...,http://eaglerising.com/36899/charity-clinton-f...,http://eaglerising.com/wp-content/uploads/2016...,View All Posts,http://eaglerising.com,{'$date': 1474416521000},,http://constitution.com/wp-content/uploads/201...,http://eaglerising.com/36899/charity-clinton-f...,"{""description"": ""The possibility that CHAI dis..."
2,Fake_11-Webpage,A Hillary Clinton Administration May be Entire...,After collapsing just before trying to step in...,http://eaglerising.com/36880/a-hillary-clinton...,http://eaglerising.com/wp-content/uploads/2016...,"View All Posts,Tony Elliott",http://eaglerising.com,{'$date': 1474416638000},,http://constitution.com/wp-content/uploads/201...,http://eaglerising.com/36880/a-hillary-clinton...,"{""description"": ""Hillary Clinton may be the fi..."
3,Fake_12-Webpage,Trump’s Latest Campaign Promise May Be His Mos...,"Donald Trump is, well, deplorable. He’s sugges...",http://www.addictinginfo.org/2016/09/19/trumps...,http://addictinginfo.addictinginfoent.netdna-c...,John Prager,http://www.addictinginfo.org,{'$date': 1474243200000},,"http://i.imgur.com/JeqZLhj.png,http://2.gravat...",http://addictinginfo.com/2016/09/19/trumps-lat...,"{""publisher"": ""Addicting Info | The Knowledge ..."
4,Fake_13-Webpage,Website is Down For Maintenance,Website is Down For Maintenance,http://www.proudcons.com/clinton-foundation-ca...,,,http://www.proudcons.com,,,,,"{""og"": {""url"": ""http://www.proudcons.com"", ""ty..."


In [3]:
# set targets
buzz_fake['target'] = 'fake'
buzz_real['target'] = 'real'

In [4]:
# combine real/fake
buzz_full = pd.concat([buzz_fake,buzz_real], ignore_index=True)
print(buzz_full.shape)
buzz_full.head()

(182, 13)


Unnamed: 0,id,title,text,url,top_img,authors,source,publish_date,movies,images,canonical_link,meta_data,target
0,Fake_1-Webpage,Proof The Mainstream Media Is Manipulating The...,I woke up this morning to find a variation of ...,http://www.addictinginfo.org/2016/09/19/proof-...,http://addictinginfo.addictinginfoent.netdna-c...,Wendy Gittleson,http://www.addictinginfo.org,{'$date': 1474243200000},,"http://i.imgur.com/JeqZLhj.png,http://addictin...",http://addictinginfo.com/2016/09/19/proof-the-...,"{""publisher"": ""Addicting Info | The Knowledge ...",fake
1,Fake_10-Webpage,Charity: Clinton Foundation Distributed “Water...,Former President Bill Clinton and his Clinton ...,http://eaglerising.com/36899/charity-clinton-f...,http://eaglerising.com/wp-content/uploads/2016...,View All Posts,http://eaglerising.com,{'$date': 1474416521000},,http://constitution.com/wp-content/uploads/201...,http://eaglerising.com/36899/charity-clinton-f...,"{""description"": ""The possibility that CHAI dis...",fake
2,Fake_11-Webpage,A Hillary Clinton Administration May be Entire...,After collapsing just before trying to step in...,http://eaglerising.com/36880/a-hillary-clinton...,http://eaglerising.com/wp-content/uploads/2016...,"View All Posts,Tony Elliott",http://eaglerising.com,{'$date': 1474416638000},,http://constitution.com/wp-content/uploads/201...,http://eaglerising.com/36880/a-hillary-clinton...,"{""description"": ""Hillary Clinton may be the fi...",fake
3,Fake_12-Webpage,Trump’s Latest Campaign Promise May Be His Mos...,"Donald Trump is, well, deplorable. He’s sugges...",http://www.addictinginfo.org/2016/09/19/trumps...,http://addictinginfo.addictinginfoent.netdna-c...,John Prager,http://www.addictinginfo.org,{'$date': 1474243200000},,"http://i.imgur.com/JeqZLhj.png,http://2.gravat...",http://addictinginfo.com/2016/09/19/trumps-lat...,"{""publisher"": ""Addicting Info | The Knowledge ...",fake
4,Fake_13-Webpage,Website is Down For Maintenance,Website is Down For Maintenance,http://www.proudcons.com/clinton-foundation-ca...,,,http://www.proudcons.com,,,,,"{""og"": {""url"": ""http://www.proudcons.com"", ""ty...",fake


In [5]:
buzz_full['source'][0].replace('https://', '').replace('http://', '').strip()

'www.addictinginfo.org'

In [6]:
buzz_full['source'][14]

nan

In [7]:
buzz_full['source'] = buzz_full['source'].fillna('')
buzz_full['source'][14]

''

In [8]:
# clean source field
source_strip = lambda x: x["source"].replace('https://', '').replace('http://', '').strip()

buzz_full["source2"] = buzz_full.apply(source_strip, axis=1)
buzz_full.head()

Unnamed: 0,id,title,text,url,top_img,authors,source,publish_date,movies,images,canonical_link,meta_data,target,source2
0,Fake_1-Webpage,Proof The Mainstream Media Is Manipulating The...,I woke up this morning to find a variation of ...,http://www.addictinginfo.org/2016/09/19/proof-...,http://addictinginfo.addictinginfoent.netdna-c...,Wendy Gittleson,http://www.addictinginfo.org,{'$date': 1474243200000},,"http://i.imgur.com/JeqZLhj.png,http://addictin...",http://addictinginfo.com/2016/09/19/proof-the-...,"{""publisher"": ""Addicting Info | The Knowledge ...",fake,www.addictinginfo.org
1,Fake_10-Webpage,Charity: Clinton Foundation Distributed “Water...,Former President Bill Clinton and his Clinton ...,http://eaglerising.com/36899/charity-clinton-f...,http://eaglerising.com/wp-content/uploads/2016...,View All Posts,http://eaglerising.com,{'$date': 1474416521000},,http://constitution.com/wp-content/uploads/201...,http://eaglerising.com/36899/charity-clinton-f...,"{""description"": ""The possibility that CHAI dis...",fake,eaglerising.com
2,Fake_11-Webpage,A Hillary Clinton Administration May be Entire...,After collapsing just before trying to step in...,http://eaglerising.com/36880/a-hillary-clinton...,http://eaglerising.com/wp-content/uploads/2016...,"View All Posts,Tony Elliott",http://eaglerising.com,{'$date': 1474416638000},,http://constitution.com/wp-content/uploads/201...,http://eaglerising.com/36880/a-hillary-clinton...,"{""description"": ""Hillary Clinton may be the fi...",fake,eaglerising.com
3,Fake_12-Webpage,Trump’s Latest Campaign Promise May Be His Mos...,"Donald Trump is, well, deplorable. He’s sugges...",http://www.addictinginfo.org/2016/09/19/trumps...,http://addictinginfo.addictinginfoent.netdna-c...,John Prager,http://www.addictinginfo.org,{'$date': 1474243200000},,"http://i.imgur.com/JeqZLhj.png,http://2.gravat...",http://addictinginfo.com/2016/09/19/trumps-lat...,"{""publisher"": ""Addicting Info | The Knowledge ...",fake,www.addictinginfo.org
4,Fake_13-Webpage,Website is Down For Maintenance,Website is Down For Maintenance,http://www.proudcons.com/clinton-foundation-ca...,,,http://www.proudcons.com,,,,,"{""og"": {""url"": ""http://www.proudcons.com"", ""ty...",fake,www.proudcons.com


In [9]:
buzz_full['id'] = 'buzzfeed'

In [10]:
buzz_full2 = buzz_full[['id', 'title', 'text', 'source2', 'target']]
buzz_full2 = buzz_full2.rename(columns = {'source2':'source'})
buzz_full2.head()

Unnamed: 0,id,title,text,source,target
0,buzzfeed,Proof The Mainstream Media Is Manipulating The...,I woke up this morning to find a variation of ...,www.addictinginfo.org,fake
1,buzzfeed,Charity: Clinton Foundation Distributed “Water...,Former President Bill Clinton and his Clinton ...,eaglerising.com,fake
2,buzzfeed,A Hillary Clinton Administration May be Entire...,After collapsing just before trying to step in...,eaglerising.com,fake
3,buzzfeed,Trump’s Latest Campaign Promise May Be His Mos...,"Donald Trump is, well, deplorable. He’s sugges...",www.addictinginfo.org,fake
4,buzzfeed,Website is Down For Maintenance,Website is Down For Maintenance,www.proudcons.com,fake


#### Load COVID-19 Data

In [11]:
# load COVID-19 data
covid = pd.read_csv('new_data/corona_fake.csv', delimiter=',')
print(covid.shape)

covid.head()

(1164, 4)


Unnamed: 0,title,text,source,label
0,Due to the recent outbreak for the Coronavirus...,"You just need to add water, and the drugs and ...",coronavirusmedicalkit.com,Fake
1,,Hydroxychloroquine has been shown to have a 10...,RudyGiuliani,Fake
2,,Fact: Hydroxychloroquine has been shown to hav...,CharlieKirk,Fake
3,,The Corona virus is a man made virus created i...,JoanneWrightForCongress,Fake
4,,Doesn’t @BillGates finance research at the Wuh...,JoanneWrightForCongress,Fake


In [12]:
covid['id'] = 'covid'
covid['label'] = covid['label'].str.lower()
covid = covid.rename(columns = {'label':'target'})
covid.head()

Unnamed: 0,title,text,source,target,id
0,Due to the recent outbreak for the Coronavirus...,"You just need to add water, and the drugs and ...",coronavirusmedicalkit.com,fake,covid
1,,Hydroxychloroquine has been shown to have a 10...,RudyGiuliani,fake,covid
2,,Fact: Hydroxychloroquine has been shown to hav...,CharlieKirk,fake,covid
3,,The Corona virus is a man made virus created i...,JoanneWrightForCongress,fake,covid
4,,Doesn’t @BillGates finance research at the Wuh...,JoanneWrightForCongress,fake,covid


In [13]:
covid2 = covid[['id', 'title', 'text', 'source', 'target']]

In [14]:
# combine with buzzfeed data
combined = pd.concat([buzz_full2,covid2], ignore_index=True)
print(combined.shape)
combined.head()

(1346, 5)


Unnamed: 0,id,title,text,source,target
0,buzzfeed,Proof The Mainstream Media Is Manipulating The...,I woke up this morning to find a variation of ...,www.addictinginfo.org,fake
1,buzzfeed,Charity: Clinton Foundation Distributed “Water...,Former President Bill Clinton and his Clinton ...,eaglerising.com,fake
2,buzzfeed,A Hillary Clinton Administration May be Entire...,After collapsing just before trying to step in...,eaglerising.com,fake
3,buzzfeed,Trump’s Latest Campaign Promise May Be His Mos...,"Donald Trump is, well, deplorable. He’s sugges...",www.addictinginfo.org,fake
4,buzzfeed,Website is Down For Maintenance,Website is Down For Maintenance,www.proudcons.com,fake


#### Load Disaster Data

In [15]:
# load disaster data
natural = pd.read_csv('new_data/train_disaster.csv', delimiter=',')
print(natural.shape)

natural.head()

(7613, 5)


Unnamed: 0,id,keyword,location,text,target
0,1,,,Our Deeds are the Reason of this #earthquake M...,1
1,4,,,Forest fire near La Ronge Sask. Canada,1
2,5,,,All residents asked to 'shelter in place' are ...,1
3,6,,,"13,000 people receive #wildfires evacuation or...",1
4,7,,,Just got sent this photo from Ruby #Alaska as ...,1


In [16]:
natural['id'] = 'natural_disaster'
natural['title'] = ''
natural['source'] = ''
natural['target'] = np.where(natural['target']==0, 'fake', 'real')
natural.head()

Unnamed: 0,id,keyword,location,text,target,title,source
0,natural_disaster,,,Our Deeds are the Reason of this #earthquake M...,real,,
1,natural_disaster,,,Forest fire near La Ronge Sask. Canada,real,,
2,natural_disaster,,,All residents asked to 'shelter in place' are ...,real,,
3,natural_disaster,,,"13,000 people receive #wildfires evacuation or...",real,,
4,natural_disaster,,,Just got sent this photo from Ruby #Alaska as ...,real,,


In [17]:
# combine with buzzfeed, covid data
natural2 = natural[['id', 'title', 'text', 'source', 'target']]
combined = pd.concat([combined,natural2], ignore_index=True)
print(combined.shape)
combined.head()

(8959, 5)


Unnamed: 0,id,title,text,source,target
0,buzzfeed,Proof The Mainstream Media Is Manipulating The...,I woke up this morning to find a variation of ...,www.addictinginfo.org,fake
1,buzzfeed,Charity: Clinton Foundation Distributed “Water...,Former President Bill Clinton and his Clinton ...,eaglerising.com,fake
2,buzzfeed,A Hillary Clinton Administration May be Entire...,After collapsing just before trying to step in...,eaglerising.com,fake
3,buzzfeed,Trump’s Latest Campaign Promise May Be His Mos...,"Donald Trump is, well, deplorable. He’s sugges...",www.addictinginfo.org,fake
4,buzzfeed,Website is Down For Maintenance,Website is Down For Maintenance,www.proudcons.com,fake


#### Load PolitiFact and GossipCop Data

In [18]:
# load PolitiFact data - partial data
polit_fake = pd.read_csv('new_data/politifact_fake.csv', delimiter=',')
polit_real = pd.read_csv('new_data/politifact_real.csv', delimiter=',')
print(polit_fake.shape)
print(polit_real.shape)

polit_fake.head()

(432, 4)
(624, 4)


Unnamed: 0,id,news_url,title,tweet_ids
0,politifact15014,speedtalk.com/forum/viewtopic.php?t=51650,BREAKING: First NFL Team Declares Bankruptcy O...,937349434668498944\t937379378006282240\t937380...
1,politifact15156,politics2020.info/index.php/2018/03/13/court-o...,Court Orders Obama To Pay $400 Million In Rest...,972666281441878016\t972678396575559680\t972827...
2,politifact14745,www.nscdscamps.org/blog/category/parenting/467...,UPDATE: Second Roy Moore Accuser Works For Mic...,929405740732870656\t929439450400264192\t929439...
3,politifact14355,https://howafrica.com/oscar-pistorius-attempts...,Oscar Pistorius Attempts To Commit Suicide,886941526458347521\t887011300278194176\t887023...
4,politifact15371,http://washingtonsources.org/trump-votes-for-d...,Trump Votes For Death Penalty For Being Gay,915205698212040704\t915242076681506816\t915249...


In [19]:
# load GossipCop data - partial data
goss_fake = pd.read_csv('new_data/gossipcop_fake.csv', delimiter=',')
goss_real = pd.read_csv('new_data/gossipcop_real.csv', delimiter=',')
print(goss_fake.shape)
print(goss_real.shape)

goss_fake.head()

(5323, 4)
(16817, 4)


Unnamed: 0,id,news_url,title,tweet_ids
0,gossipcop-2493749932,www.dailymail.co.uk/tvshowbiz/article-5874213/...,Did Miley Cyrus and Liam Hemsworth secretly ge...,284329075902926848\t284332744559968256\t284335...
1,gossipcop-4580247171,hollywoodlife.com/2018/05/05/paris-jackson-car...,Paris Jackson & Cara Delevingne Enjoy Night Ou...,992895508267130880\t992897935418503169\t992899...
2,gossipcop-941805037,variety.com/2017/biz/news/tax-march-donald-tru...,Celebrities Join Tax March in Protest of Donal...,853359353532829696\t853359576543920128\t853359...
3,gossipcop-2547891536,www.dailymail.co.uk/femail/article-3499192/Do-...,Cindy Crawford's daughter Kaia Gerber wears a ...,988821905196158981\t988824206556172288\t988825...
4,gossipcop-5476631226,variety.com/2018/film/news/list-2018-oscar-nom...,Full List of 2018 Oscar Nominations – Variety,955792793632432131\t955795063925301249\t955798...


In [20]:
# load full PolitiFact data 
goss_fake2 = pd.read_csv('new_data/gossipcop_fake_full.csv', delimiter=',')
goss_real2 = pd.read_csv('new_data/gossipcop_real_full.csv', delimiter=',')
print(goss_fake2.shape)
print(goss_real2.shape)

goss_fake2.head()

(4900, 4)
(14243, 4)


Unnamed: 0.1,Unnamed: 0,text,source,id
0,0,Star magazine has released an explosive report...,http://www.newidea.com.au,gossipcop-1000240645
1,1,"Earlier this year, the buzz around Megyn Kelly...",https://web.archive.org,gossipcop-1009248558
2,2,For the first time since his involvement in a ...,http://hollywoodlife.com,gossipcop-1012123555
3,3,"Those heels were cute, but they didn't last lo...",http://www.x17online.com,gossipcop-1014383679
4,4,American reality television personality and re...,http://en.wikipedia.org,gossipcop-1014616559


In [21]:
# load full GossipCop data 
polit_fake2 = pd.read_csv('new_data/politifact_fake_full.csv', delimiter=',')
polit_real2 = pd.read_csv('new_data/politifact_real_full.csv', delimiter=',')
print(polit_fake2.shape)
print(polit_real2.shape)

polit_fake2.head()

(396, 4)
(559, 4)


Unnamed: 0.1,Unnamed: 0,text,source,id
0,0,Sponsored LinksRepublican attacks on transgend...,http://www.occupydemocrats.com,politifact11773
1,1,,https://www.facebook.com,politifact13038
2,2,Mental Images“My pictures ask where does the t...,http://www.alison-jackson.co.uk,politifact13467
3,3,But it appears not all Mr Zuckerberg's own emp...,http://www.bbc.com,politifact13468
4,4,Fake story here…http://abcnews.com.co/donald-t...,http://genius.com,politifact13475


In [22]:
# join GossipCop datasets
goss_fake_full = goss_fake.merge(goss_fake2, on=["id"])
goss_real_full = goss_real.merge(goss_real2, on=["id"])

print(goss_fake_full.shape)
print(goss_real_full.shape)

goss_fake_full.head()

(4900, 7)
(14243, 7)


Unnamed: 0.1,id,news_url,title,tweet_ids,Unnamed: 0,text,source
0,gossipcop-2493749932,www.dailymail.co.uk/tvshowbiz/article-5874213/...,Did Miley Cyrus and Liam Hemsworth secretly ge...,284329075902926848\t284332744559968256\t284335...,796,Congratulations might be in order for Miley Cy...,https://www.dailymail.co.uk
1,gossipcop-4580247171,hollywoodlife.com/2018/05/05/paris-jackson-car...,Paris Jackson & Cara Delevingne Enjoy Night Ou...,992895508267130880\t992897935418503169\t992899...,1922,Paris Jackson and Cara Delevingne were spotted...,http://hollywoodlife.com
2,gossipcop-941805037,variety.com/2017/biz/news/tax-march-donald-tru...,Celebrities Join Tax March in Protest of Donal...,853359353532829696\t853359576543920128\t853359...,4567,Thousands are taking the streets to protest Pr...,http://variety.com
3,gossipcop-2547891536,www.dailymail.co.uk/femail/article-3499192/Do-...,Cindy Crawford's daughter Kaia Gerber wears a ...,988821905196158981\t988824206556172288\t988825...,829,We'd venture to say that Cindy Crawford's daug...,http://www.dailymail.co.uk
4,gossipcop-5476631226,variety.com/2018/film/news/list-2018-oscar-nom...,Full List of 2018 Oscar Nominations – Variety,955792793632432131\t955795063925301249\t955798...,2421,Oscar nominations for the 90th annual awards w...,http://variety.com


In [23]:
# join PolitFact datasets
polit_fake_full = polit_fake.merge(polit_fake2, on=["id"])
polit_real_full = polit_real.merge(polit_real2, on=["id"])

print(polit_fake_full.shape)
print(polit_real_full.shape)

polit_fake_full.head()

(396, 7)
(559, 7)


Unnamed: 0.1,id,news_url,title,tweet_ids,Unnamed: 0,text,source
0,politifact15156,politics2020.info/index.php/2018/03/13/court-o...,Court Orders Obama To Pay $400 Million In Rest...,972666281441878016\t972678396575559680\t972827...,292,"The West Texas Federal Appeals Court, operatin...",https://web.archive.org
1,politifact14745,www.nscdscamps.org/blog/category/parenting/467...,UPDATE: Second Roy Moore Accuser Works For Mic...,929405740732870656\t929439450400264192\t929439...,211,Read original article hereLiberals sure are af...,http://www.nscdscamps.org
2,politifact14355,https://howafrica.com/oscar-pistorius-attempts...,Oscar Pistorius Attempts To Commit Suicide,886941526458347521\t887011300278194176\t887023...,140,The former Paralympic athlete reportedly tried...,https://howafrica.com
3,politifact15371,http://washingtonsources.org/trump-votes-for-d...,Trump Votes For Death Penalty For Being Gay,915205698212040704\t915242076681506816\t915249...,342,The Structure says states get to resolve how y...,http://washingtonsources.org
4,politifact14404,gloria.tv/video/yRrtUtTCfPga6cq2VDJPcgQe4,Putin says: ‘Pope Francis Is Not A Man Of God’...,893290900637483009\t893290950700802048\t893290...,149,Putin says: ‘Pope Francis Is Not A Man Of God’...,http://gloria.tv


In [24]:
goss_fake_full['target'] = 'fake'
goss_fake_full[['id', 'title', 'text', 'source', 'target']].head()

Unnamed: 0,id,title,text,source,target
0,gossipcop-2493749932,Did Miley Cyrus and Liam Hemsworth secretly ge...,Congratulations might be in order for Miley Cy...,https://www.dailymail.co.uk,fake
1,gossipcop-4580247171,Paris Jackson & Cara Delevingne Enjoy Night Ou...,Paris Jackson and Cara Delevingne were spotted...,http://hollywoodlife.com,fake
2,gossipcop-941805037,Celebrities Join Tax March in Protest of Donal...,Thousands are taking the streets to protest Pr...,http://variety.com,fake
3,gossipcop-2547891536,Cindy Crawford's daughter Kaia Gerber wears a ...,We'd venture to say that Cindy Crawford's daug...,http://www.dailymail.co.uk,fake
4,gossipcop-5476631226,Full List of 2018 Oscar Nominations – Variety,Oscar nominations for the 90th annual awards w...,http://variety.com,fake


In [25]:
goss_fake_full2 = goss_fake_full[['id', 'title', 'text', 'source', 'target']]
combined = pd.concat([combined,goss_fake_full2], ignore_index=True)
print(combined.shape)
combined.head()

(13859, 5)


Unnamed: 0,id,title,text,source,target
0,buzzfeed,Proof The Mainstream Media Is Manipulating The...,I woke up this morning to find a variation of ...,www.addictinginfo.org,fake
1,buzzfeed,Charity: Clinton Foundation Distributed “Water...,Former President Bill Clinton and his Clinton ...,eaglerising.com,fake
2,buzzfeed,A Hillary Clinton Administration May be Entire...,After collapsing just before trying to step in...,eaglerising.com,fake
3,buzzfeed,Trump’s Latest Campaign Promise May Be His Mos...,"Donald Trump is, well, deplorable. He’s sugges...",www.addictinginfo.org,fake
4,buzzfeed,Website is Down For Maintenance,Website is Down For Maintenance,www.proudcons.com,fake


In [26]:
goss_real_full['target'] = 'real'
goss_real_full[['id', 'title', 'text', 'source', 'target']].head()

Unnamed: 0,id,title,text,source,target
0,gossipcop-882573,Teen Mom Star Jenelle Evans' Wedding Dress Is ...,Exclusive: How Ruby’s Kente Cloth Wedding Dres...,https://www.brides.com,real
1,gossipcop-875924,Kylie Jenner refusing to discuss Tyga on Life ...,Kylie Jenner reportedly doesn't want to talk a...,https://www.dailymail.co.uk,real
2,gossipcop-894416,Quinn Perkins,Character on American television series Scanda...,https://en.wikipedia.org,real
3,gossipcop-857248,I Tried Kim Kardashian's Butt Workout & Am For...,"From there, you transition to the leg press ma...",https://www.refinery29.com,real
4,gossipcop-884684,Celine Dion donates concert proceeds to Vegas ...,(CNN) An emotional Celine Dion returned to the...,https://www.cnn.com,real


In [27]:
goss_real_full2 = goss_real_full[['id', 'title', 'text', 'source', 'target']]
combined = pd.concat([combined,goss_real_full2], ignore_index=True)
print(combined.shape)
combined.head()

(28102, 5)


Unnamed: 0,id,title,text,source,target
0,buzzfeed,Proof The Mainstream Media Is Manipulating The...,I woke up this morning to find a variation of ...,www.addictinginfo.org,fake
1,buzzfeed,Charity: Clinton Foundation Distributed “Water...,Former President Bill Clinton and his Clinton ...,eaglerising.com,fake
2,buzzfeed,A Hillary Clinton Administration May be Entire...,After collapsing just before trying to step in...,eaglerising.com,fake
3,buzzfeed,Trump’s Latest Campaign Promise May Be His Mos...,"Donald Trump is, well, deplorable. He’s sugges...",www.addictinginfo.org,fake
4,buzzfeed,Website is Down For Maintenance,Website is Down For Maintenance,www.proudcons.com,fake


In [28]:
polit_real_full['target'] = 'real'
polit_real_full[['id', 'title', 'text', 'source', 'target']].head()

Unnamed: 0,id,title,text,source,target
0,politifact14984,National Federation of Independent Business,SMALL BUSINESS ECONOMIC TRENDSThe NFIB Optimis...,http://www.nfib-sbet.org,real
1,politifact12944,comments in Fayetteville NC,Need help? Contact the CQ Hotline at (800) 678...,http://www.cq.com,real
2,politifact333,"Romney makes pitch, hoping to close deal : Ele...","Romney makes pitch, hoping to close dealPhoto ...",https://web.archive.org,real
3,politifact4358,Democratic Leaders Say House Democrats Are Uni...,Democratic Leaders Say House Democrats Are Uni...,https://web.archive.org,real
4,politifact779,"Budget of the United States Government, FY 2008",THE NATION’S FISCAL OUTLOOKThe President’s 200...,https://web.archive.org,real


In [29]:
polit_fake_full['target'] = 'fake'
polit_fake_full[['id', 'title', 'text', 'source', 'target']].head()

Unnamed: 0,id,title,text,source,target
0,politifact15156,Court Orders Obama To Pay $400 Million In Rest...,"The West Texas Federal Appeals Court, operatin...",https://web.archive.org,fake
1,politifact14745,UPDATE: Second Roy Moore Accuser Works For Mic...,Read original article hereLiberals sure are af...,http://www.nscdscamps.org,fake
2,politifact14355,Oscar Pistorius Attempts To Commit Suicide,The former Paralympic athlete reportedly tried...,https://howafrica.com,fake
3,politifact15371,Trump Votes For Death Penalty For Being Gay,The Structure says states get to resolve how y...,http://washingtonsources.org,fake
4,politifact14404,Putin says: ‘Pope Francis Is Not A Man Of God’...,Putin says: ‘Pope Francis Is Not A Man Of God’...,http://gloria.tv,fake


In [30]:
polit_real_full2 = polit_real_full[['id', 'title', 'text', 'source', 'target']]
polit_fake_full2 = polit_fake_full[['id', 'title', 'text', 'source', 'target']]
combined = pd.concat([combined,polit_real_full2], ignore_index=True)
print(combined.shape)
combined = pd.concat([combined,polit_fake_full2], ignore_index=True)
print(combined.shape)
combined.head()

(28661, 5)
(29057, 5)


Unnamed: 0,id,title,text,source,target
0,buzzfeed,Proof The Mainstream Media Is Manipulating The...,I woke up this morning to find a variation of ...,www.addictinginfo.org,fake
1,buzzfeed,Charity: Clinton Foundation Distributed “Water...,Former President Bill Clinton and his Clinton ...,eaglerising.com,fake
2,buzzfeed,A Hillary Clinton Administration May be Entire...,After collapsing just before trying to step in...,eaglerising.com,fake
3,buzzfeed,Trump’s Latest Campaign Promise May Be His Mos...,"Donald Trump is, well, deplorable. He’s sugges...",www.addictinginfo.org,fake
4,buzzfeed,Website is Down For Maintenance,Website is Down For Maintenance,www.proudcons.com,fake


In [31]:
# fix target field
combined['target'].unique()

array(['fake', 'real', nan, 'true'], dtype=object)

In [34]:
combined[combined['target'].isna()]

Unnamed: 0,id,title,text,source,target
187,covid,CORONA UNMASKED: Chinese Intelligence Officer ...,,,
197,covid,You can recover from the coronavirus disease (...,Most of the people who catch COVID-19 can reco...,https://www.who.int/emergencies/diseases/novel...,
225,covid,Pandemic Bio-Weapon – 9. Supervirus Created by...,,,
313,covid,Why the Coronavirus Seems to Hit Men Harder Th...,The coronavirus that originated in China has s...,,
424,covid,The Coronavirus 5G Connection and Coverup,THE STORY:The China Coronavirus COVID-19 rose ...,https://www.wakingtimes.com/,


In [36]:
combined[combined['target'] == 'true'].shape

(584, 5)

In [37]:
combined['target'] = np.where(combined['target']=='true', 'real', combined['target'])
combined['target'].unique()

array(['fake', 'real', nan], dtype=object)

In [38]:
# drop where null label
combined.drop(combined[combined['target'].isna()].index, inplace = True)
combined[combined['target'].isnull()].index

Int64Index([], dtype='int64')

In [39]:
combined['target'].value_counts()

real    18748
fake    10304
Name: target, dtype: int64

In [40]:
combined['source'] = combined['source'].fillna('')
combined['title'] = combined['title'].fillna('')
combined['text'] = combined['text'].fillna('')

In [41]:
combined.isnull().sum()

id        0
title     0
text      0
source    0
target    0
dtype: int64

In [42]:
combined.to_csv('combined_tweets.csv', index=False) 