## Springboard Data Science Track: Capstone Two

# Bot Detection on Reddit

<i>Objective: To build a classifier that identifies Reddit users as 'bots' or 'non-bots'</i>

Using the Pushshift API, I will collect a minimum 100,000 comments from confirmed non-bot Reddit users, and 10,000 comments made by bots.

## 1. Data Collection

### Step One: Import packages and authenticate via OAuth

In [3]:
#import packages
import pandas as pd
import requests
from pushshift_py import PushshiftAPI
import time
import csv
import json
import praw
import credentials
import datetime
import re
from collections import Counter
from urllib.request import urlopen
from bs4 import BeautifulSoup
import lxml


#settings
pd.set_option('display.max_colwidth', None)

In [4]:
#Authentic credentials
reddit = praw.Reddit(client_id=credentials.client_id,
                     client_secret=credentials.client_secret,
                     password=credentials.password,
                     user_agent= "Comment Classification App 1.0 by Diat0nic",
                     username=credentials.username)

### Step Two: Collect a list of Non-Bot Reddit users


In [5]:
authors = ['mvea', 'peteINC_', 'unibeech', 'OldDeadEyez', 'rogersimon10', 'ZDTreefur', 'sweetbabette','reverse_friday','throwawayMBA3', 'queenoreo', 'sisandniecesituation', 'buttonsarethebomb','garlicdeath','Truji11o','recluseMeteor','MeInYourPocket','zachinoz','nopamo','fatalist-shadow','redditKMC','AmKindaAnonymousyh', 'Merry_Little_Liberal', 'dogedriver','ColChrisHadfield','InterestingCloud9', 'QSquared','TheBlimpMan','ReginaldJohnston','sorry_wasntlistening', 'chefmattmatt','verifiedson', 'SFLoridan', 'vernetroyer', 'poem_for_your_sprog', 'ramsesthepigeon','sincewedidthedo','LurkerNan','drewiepoodle','38LeaguesUnderTheSea', '' 'Missfitsin','Sembaka','Mitchell_Findle','Depuis78','fredlikesfire', 'KittyPitty','backtoleddit','drtvmaniacphd','azzwhole/', 'DeceptiveFrost','nagitoe_','Vinyl_BunBuns', 'Palifaith','Portarossa','_always_sunny_/', 'ScumBunny/', 'ms_horseshoe', 'Terrible__Ted', 'Holy-Report50','StickleyMan','Glitch5970','Romainvicta476','An_Open_Field_Ned','StefFeldman','enbenlen,''DoremusJessup','OverlyAdorable','bestem','jimmyjohnssandwiches','wonderingsocrates', 'wasborninafactory_','thesoundandthefury', 'mepper','imagepoem', 'ashy_slashyyy', 'relevantlife', 'm0rris0n_hotel','8bithihat', 'ttyabish','TannedCroissant', 'SlimJones123', 'maxwellhill', 'iBleeedorange', 'spacecadetcenttal', 'FriendlyAM', 'Warlizard', 'hyperchrisz', 'vernetroyer', 'KevlarYarmulke', 'ImNotJesus', 'tooshiftyforyou', 'hennomeister', 'pseudolobster', 'poopellar', 'se7en_sinner', 'journalisto', 'Warlizard', 'ILL_Show_Myself_Out', 'Kijafa',
          'ggggbabybabybaby', 'Drunken_Economist', 'ProbablyHittingOnYou', 'TheAtomicPlayboy', 'red321red321', 'Trapped_in_Reddit',
          'Shitty_Watercolour', 'NotAMethAddict', 'andrewsmith1986', 'Apostolate', 'dickfromaccounting', 'way_fairer', 'spez', 'gallowboob', 'rogersimon10',
          '_vargas_ ', 'presidentobama', 'wolfhunterzz', 'PeterMayhew', 'hawaiinshiirts','Here_Comes_The_King', 'GovSchwarzenegger', 'mistborn', 'StanGibson18', 'janellemonae']

In [7]:
#Visual Check
len(authors)

122

### Step Three: Collect comments from each user.

I will collect a total of 10,000 comments, along with:
<ul>
    <li>Unique id </li>
    <li>Author name </li>
    <li>Subreddit</li>
    <li>Score</li>
    <li>Time</li>
    <li>Author Flair</li>
</ul>

In [8]:
#Request data through Pushshift API 
def get_author_comments(**kwargs):
    r = requests.get("https://api.pushshift.io/reddit/comment/search/", params=kwargs)
    data = r.json()
    return data['data']

rcomments = {}


i = 0

#Loop through author list and save comments and id to rcomments dictionary
while i < len(authors):
    comments = get_author_comments(author=authors[i], size=10000, sort='desc',sort_type='created_utc')
    for comment in comments:
        key = comment['id']
        rcomments[key] = []       

        value1 = authors[i]
        value2 = comment['body']
        value3 = comment['subreddit']
        value4 = comment['score']
        value5 = comment['created_utc']
        value6 = comment['author_flair_text']
        
        #Append to rcomments dict
        rcomments[key].append(value1)
        rcomments[key].append(value2)
        rcomments[key].append(value3)
        rcomments[key].append(value4)
        rcomments[key].append(value5)
        rcomments[key].append(value6)
        
    i += 1


    time.sleep(1)

In [9]:
print(i)

122


### Step Three: Convert dictionary to dataframe, and save to CSV

In [11]:
#Convert to dataframe
df = pd.DataFrame.from_dict(rcomments, orient='index', columns= ['Author', 'Comment', 'Subreddit', 'Score', 'Time', 'Flair'])
#Visual Check
df.head(2)

Unnamed: 0,Author,Comment,Subreddit,Score,Time,Flair
gafc3xa,mvea,"Teran RA, Ghinai I, Gretsch S, et al. \n\nCOVID-19 Outbreak Among a University’s Men’s and Women’s Soccer Teams — Chicago, Illinois, July–August 2020. \n\nMMWR Morb Mortal Wkly Rep. ePub: 27 October 2020. \n\nDOI: http://dx.doi.org/10.15585/mmwr.mm6943e5\n\nSummary\n\nWhat is already known about this topic?\n\nSARS-CoV-2 transmission occurs in congregate settings, including colleges and universities.\n\nWhat is added by this report?\n\nInvestigation of 17 COVID-19 cases among a university’s men’s and women’s soccer team identified numerous social gatherings as possible transmission events. Minimal mask use and social distancing resulted in rapid spread among students who live, practice, and socialize together.\n\nWhat are the implications for public health practice?\n\nColleges and universities are at risk for COVID-19 outbreaks because of shared housing and social gatherings where recommended prevention guidance is not followed. Schools should consider conducting periodic repeat testing of asymptomatic students to identify outbreaks early and implementing policies and improving messaging to promote mask use and social distancing.",science,2,1603921249,MD-PhD-MBA | Clinical Professor/Medicine
gadhpkc,mvea,"Teran RA, Ghinai I, Gretsch S, et al. \n\nCOVID-19 Outbreak Among a University’s Men’s and Women’s Soccer Teams — Chicago, Illinois, July–August 2020. \n\nMMWR Morb Mortal Wkly Rep. ePub: 27 October 2020. \n\nDOI: http://dx.doi.org/10.15585/mmwr.mm6943e5\n\nSummary\n\nWhat is already known about this topic?\n\nSARS-CoV-2 transmission occurs in congregate settings, including colleges and universities.\n\nWhat is added by this report?\n\nInvestigation of 17 COVID-19 cases among a university’s men’s and women’s soccer team identified numerous social gatherings as possible transmission events. Minimal mask use and social distancing resulted in rapid spread among students who live, practice, and socialize together.\n\nWhat are the implications for public health practice?\n\nColleges and universities are at risk for COVID-19 outbreaks because of shared housing and social gatherings where recommended prevention guidance is not followed. Schools should consider conducting periodic repeat testing of asymptomatic students to identify outbreaks early and implementing policies and improving messaging to promote mask use and social distancing.",science,1,1603888595,MD-PhD-MBA | Clinical Professor/Medicine


In [12]:
len(df)

10424

In [13]:
#Save complete data set CSV before cleaning
df.to_csv('Data/SB_Reddit_NB_Comments_Pre-Clean.csv')

### Step Four: Clean Non-Bot Comments Dataframe

In [14]:
#Re-import csv
df1 = pd.read_csv('Data/SB_Reddit_NB_Comments_Pre-Clean.csv')

In [16]:
#Visual Check
df1.head(2)

Unnamed: 0.1,Unnamed: 0,Author,Comment,Subreddit,Score,Time,Flair
0,gafc3xa,mvea,"Teran RA, Ghinai I, Gretsch S, et al. \n\nCOVID-19 Outbreak Among a University’s Men’s and Women’s Soccer Teams — Chicago, Illinois, July–August 2020. \n\nMMWR Morb Mortal Wkly Rep. ePub: 27 October 2020. \n\nDOI: http://dx.doi.org/10.15585/mmwr.mm6943e5\n\nSummary\n\nWhat is already known about this topic?\n\nSARS-CoV-2 transmission occurs in congregate settings, including colleges and universities.\n\nWhat is added by this report?\n\nInvestigation of 17 COVID-19 cases among a university’s men’s and women’s soccer team identified numerous social gatherings as possible transmission events. Minimal mask use and social distancing resulted in rapid spread among students who live, practice, and socialize together.\n\nWhat are the implications for public health practice?\n\nColleges and universities are at risk for COVID-19 outbreaks because of shared housing and social gatherings where recommended prevention guidance is not followed. Schools should consider conducting periodic repeat testing of asymptomatic students to identify outbreaks early and implementing policies and improving messaging to promote mask use and social distancing.",science,2,1603921249,MD-PhD-MBA | Clinical Professor/Medicine
1,gadhpkc,mvea,"Teran RA, Ghinai I, Gretsch S, et al. \n\nCOVID-19 Outbreak Among a University’s Men’s and Women’s Soccer Teams — Chicago, Illinois, July–August 2020. \n\nMMWR Morb Mortal Wkly Rep. ePub: 27 October 2020. \n\nDOI: http://dx.doi.org/10.15585/mmwr.mm6943e5\n\nSummary\n\nWhat is already known about this topic?\n\nSARS-CoV-2 transmission occurs in congregate settings, including colleges and universities.\n\nWhat is added by this report?\n\nInvestigation of 17 COVID-19 cases among a university’s men’s and women’s soccer team identified numerous social gatherings as possible transmission events. Minimal mask use and social distancing resulted in rapid spread among students who live, practice, and socialize together.\n\nWhat are the implications for public health practice?\n\nColleges and universities are at risk for COVID-19 outbreaks because of shared housing and social gatherings where recommended prevention guidance is not followed. Schools should consider conducting periodic repeat testing of asymptomatic students to identify outbreaks early and implementing policies and improving messaging to promote mask use and social distancing.",science,1,1603888595,MD-PhD-MBA | Clinical Professor/Medicine


In [17]:
#Rename ID Column
df1.rename(columns={"Unnamed: 0": "ID"}, inplace=True)

In [18]:
#Check counts
df1.isnull().sum()

ID              0
Author          0
Comment         0
Subreddit       0
Score           0
Time            0
Flair        8137
dtype: int64

In [19]:
#Replace Null Values with 'None'
df1.fillna('None', inplace=True)

#Visual Check
df1.isnull().sum()

ID           0
Author       0
Comment      0
Subreddit    0
Score        0
Time         0
Flair        0
dtype: int64

In [20]:
#Convert Epoch Time to Datetime
df1['Time'] = df1['Time'].apply(lambda x: datetime.datetime.fromtimestamp(x))

In [21]:
#Check dtypes
df1.dtypes

ID                   object
Author               object
Comment              object
Subreddit            object
Score                 int64
Time         datetime64[ns]
Flair                object
dtype: object

In [22]:
#Check for unique time values
df1.Time.nunique()

10412

In [23]:
#Visual Check
df1.head(2)

Unnamed: 0,ID,Author,Comment,Subreddit,Score,Time,Flair
0,gafc3xa,mvea,"Teran RA, Ghinai I, Gretsch S, et al. \n\nCOVID-19 Outbreak Among a University’s Men’s and Women’s Soccer Teams — Chicago, Illinois, July–August 2020. \n\nMMWR Morb Mortal Wkly Rep. ePub: 27 October 2020. \n\nDOI: http://dx.doi.org/10.15585/mmwr.mm6943e5\n\nSummary\n\nWhat is already known about this topic?\n\nSARS-CoV-2 transmission occurs in congregate settings, including colleges and universities.\n\nWhat is added by this report?\n\nInvestigation of 17 COVID-19 cases among a university’s men’s and women’s soccer team identified numerous social gatherings as possible transmission events. Minimal mask use and social distancing resulted in rapid spread among students who live, practice, and socialize together.\n\nWhat are the implications for public health practice?\n\nColleges and universities are at risk for COVID-19 outbreaks because of shared housing and social gatherings where recommended prevention guidance is not followed. Schools should consider conducting periodic repeat testing of asymptomatic students to identify outbreaks early and implementing policies and improving messaging to promote mask use and social distancing.",science,2,2020-10-28 14:40:49,MD-PhD-MBA | Clinical Professor/Medicine
1,gadhpkc,mvea,"Teran RA, Ghinai I, Gretsch S, et al. \n\nCOVID-19 Outbreak Among a University’s Men’s and Women’s Soccer Teams — Chicago, Illinois, July–August 2020. \n\nMMWR Morb Mortal Wkly Rep. ePub: 27 October 2020. \n\nDOI: http://dx.doi.org/10.15585/mmwr.mm6943e5\n\nSummary\n\nWhat is already known about this topic?\n\nSARS-CoV-2 transmission occurs in congregate settings, including colleges and universities.\n\nWhat is added by this report?\n\nInvestigation of 17 COVID-19 cases among a university’s men’s and women’s soccer team identified numerous social gatherings as possible transmission events. Minimal mask use and social distancing resulted in rapid spread among students who live, practice, and socialize together.\n\nWhat are the implications for public health practice?\n\nColleges and universities are at risk for COVID-19 outbreaks because of shared housing and social gatherings where recommended prevention guidance is not followed. Schools should consider conducting periodic repeat testing of asymptomatic students to identify outbreaks early and implementing policies and improving messaging to promote mask use and social distancing.",science,1,2020-10-28 05:36:35,MD-PhD-MBA | Clinical Professor/Medicine
2,gadh860,mvea,"A large national outbreak of COVID-19 linked to air travel, Ireland, summer 2020. \n\nEuroSurveillance. 2020;25(42):pii=2001624. \n\n22 October 2020\n\nDOI: https://doi.org/10.2807/1560-7917.ES.2020.25.42.2001624\n\nAbstract\n\nAn outbreak of 59 cases of coronavirus disease (COVID-19) originated with 13 cases linked by a 7 h, 17% occupancy flight into Ireland, summer 2020. The flight-associated attack rate was 9.8–17.8%. Spread to 46 non-flight cases occurred country-wide. Asymptomatic/pre-symptomatic transmission in-flight from a point source is implicated by 99% homology across the virus genome in five cases travelling from three different continents. Restriction of movement on arrival and robust contact tracing can limit propagation post-flight.",science,1,2020-10-28 05:31:02,MD-PhD-MBA | Clinical Professor/Medicine
3,gadgmyy,mvea,"Comparative ACE2 variation and primate COVID-19 risk. \n\nCommunications Biology 3, 641 (2020). \n\nDOI: https://doi.org/10.1038/s42003-020-01370-w\n\nAbstract\n\nThe emergence of SARS-CoV-2 has caused over a million human deaths and massive global disruption. The viral infection may also represent a threat to our closest living relatives, nonhuman primates. The contact surface of the host cell receptor, ACE2, displays amino acid residues that are critical for virus recognition, and variations at these critical residues modulate infection susceptibility. Infection studies have shown that some primate species develop COVID-19-like symptoms; however, the susceptibility of most primates is unknown. Here, we show that all apes and African and Asian monkeys (catarrhines), exhibit the same set of twelve key amino acid residues as human ACE2. Monkeys in the Americas, and some tarsiers, lemurs and lorisoids, differ at critical contact residues, and protein modeling predicts that these differences should greatly reduce SARS-CoV-2 binding affinity. Other lemurs are predicted to be closer to catarrhines in their susceptibility. Our study suggests that apes and African and Asian monkeys, and some lemurs, are likely to be highly susceptible to SARS-CoV-2. Urgent actions have been undertaken to limit the exposure of great apes to humans, and similar efforts may be necessary for many other primate species.",science,1,2020-10-28 05:24:09,MD-PhD-MBA | Clinical Professor/Medicine
4,gadg4o2,mvea,"Association of Country-wide Coronavirus Mortality with Demographics, Testing, Lockdowns, and Public Wearing of Masks\n\nChristopher T. Leffler, Edsel Ing, Joseph D. Lykins, Matthew C. Hogan, Craig A. McKeown and Andrzej Grzybowski\n\nThe American Journal of Tropical Medicine and Hygiene \n\nAvailable online: 26 October 2020\n\nDOI: https://doi.org/10.4269/ajtmh.20-1015\n\nAbstract\n\nWe studied sources of variation between countries in per-capita mortality from COVID-19 (caused by the SARS-CoV-2 virus). Potential predictors of per-capita coronavirus-related mortality in 200 countries by May 9, 2020 were examined, including age, gender, obesity prevalence, temperature, urbanization, smoking, duration of the outbreak, lockdowns, viral testing, contact-tracing policies, and public mask-wearing norms and policies. Multivariable linear regression analysis was performed. In univariate analysis, the prevalence of smoking, per-capita gross domestic product, urbanization, and colder average country temperature was positively associated with coronavirus-related mortality. In a multivariable analysis of 196 countries, the duration of the outbreak in the country, and the proportion of the population aged 60 years or older were positively associated with per-capita mortality, whereas duration of mask-wearing by the public was negatively associated with mortality (all P &lt; 0.001). Obesity and less stringent international travel restrictions were independently associated with mortality in a model which controlled for testing policy. Viral testing policies and levels were not associated with mortality. Internal lockdown was associated with a nonsignificant 2.4% reduction in mortality each week (P = 0.83). The association of contact-tracing policy with mortality was not statistically significant (P = 0.06). In countries with cultural norms or government policies supporting public mask-wearing, per-capita coronavirus mortality increased on average by just 16.2% each week, as compared with 61.9% each week in remaining countries. Societal norms and government policies supporting the wearing of masks by the public, as well as international travel controls, are independently associated with lower per-capita mortality from COVID-19.",science,1,2020-10-28 05:18:09,MD-PhD-MBA | Clinical Professor/Medicine


In [25]:
#Add column specifying non-bot
df1['Class'] = 'Non-Bot'

#Visual Check
df1.head(2)

Unnamed: 0,ID,Author,Comment,Subreddit,Score,Time,Flair,Class
0,gafc3xa,mvea,"Teran RA, Ghinai I, Gretsch S, et al. \n\nCOVID-19 Outbreak Among a University’s Men’s and Women’s Soccer Teams — Chicago, Illinois, July–August 2020. \n\nMMWR Morb Mortal Wkly Rep. ePub: 27 October 2020. \n\nDOI: http://dx.doi.org/10.15585/mmwr.mm6943e5\n\nSummary\n\nWhat is already known about this topic?\n\nSARS-CoV-2 transmission occurs in congregate settings, including colleges and universities.\n\nWhat is added by this report?\n\nInvestigation of 17 COVID-19 cases among a university’s men’s and women’s soccer team identified numerous social gatherings as possible transmission events. Minimal mask use and social distancing resulted in rapid spread among students who live, practice, and socialize together.\n\nWhat are the implications for public health practice?\n\nColleges and universities are at risk for COVID-19 outbreaks because of shared housing and social gatherings where recommended prevention guidance is not followed. Schools should consider conducting periodic repeat testing of asymptomatic students to identify outbreaks early and implementing policies and improving messaging to promote mask use and social distancing.",science,2,2020-10-28 14:40:49,MD-PhD-MBA | Clinical Professor/Medicine,Non-Bot
1,gadhpkc,mvea,"Teran RA, Ghinai I, Gretsch S, et al. \n\nCOVID-19 Outbreak Among a University’s Men’s and Women’s Soccer Teams — Chicago, Illinois, July–August 2020. \n\nMMWR Morb Mortal Wkly Rep. ePub: 27 October 2020. \n\nDOI: http://dx.doi.org/10.15585/mmwr.mm6943e5\n\nSummary\n\nWhat is already known about this topic?\n\nSARS-CoV-2 transmission occurs in congregate settings, including colleges and universities.\n\nWhat is added by this report?\n\nInvestigation of 17 COVID-19 cases among a university’s men’s and women’s soccer team identified numerous social gatherings as possible transmission events. Minimal mask use and social distancing resulted in rapid spread among students who live, practice, and socialize together.\n\nWhat are the implications for public health practice?\n\nColleges and universities are at risk for COVID-19 outbreaks because of shared housing and social gatherings where recommended prevention guidance is not followed. Schools should consider conducting periodic repeat testing of asymptomatic students to identify outbreaks early and implementing policies and improving messaging to promote mask use and social distancing.",science,1,2020-10-28 05:36:35,MD-PhD-MBA | Clinical Professor/Medicine,Non-Bot


In [26]:
#Save complete data set CSV
df1.to_csv('Data/SB_Reddit_NB_Comments_Clean.csv', date_format='%Y-%m-%d %H:%M:%S')

### Step Five: Collect Bots

Next, I need to collect comments for users known to be bots: I've chosen to use bots identified on the Reddit <a href='https://www.reddit.com/r/botwatch/comments/1xojwh/list_of_320_reddit_bots/'> r/botwatch</a> subreddit. 

I'll use Beautiful Soup to scrape user names from this list.

In [82]:
#Get html via Beautiful Soup
url = "https://www.reddit.com/r/autowikibot/wiki/redditbots"
headers = {'User-Agent': 'Mozilla/5.0'}
page = requests.get(url, headers=headers)
soup = BeautifulSoup(page.text, 'html.parser')

In [132]:
#Loop through links, appending bot users to list
bots = []
links = soup.find_all("a", href=re.compile("/u/"))
for link in links:
        userh = link['href']
        user = userh.strip("/u/")
        if user not in bots:
            bots.append(user)

In [133]:
#Visual Check
print(bots[0:10])

['A858DE45F56D9BC9', 'AAbot', 'ADHDbot', 'ALTcointip', 'AVR_Modbot', 'A_random_gif', 'AltCodeBot', 'Antiracism_Bot', 'ApiContraption', 'AssHatBot']


In [134]:
#Check number of bots
len(bots)

393

### Step Six: Collect Bot Comments

In [137]:
#Request data through Pushshift API 
def get_author_comments(**kwargs):
    r = requests.get("https://api.pushshift.io/reddit/comment/search/", params=kwargs)
    data = r.json()
    return data['data']

bcomments = {}


i = 0

#Loop through author list and save comments and id to rcomments dictionary
while i < len(bots):
    try:
        before = None
        comments = get_author_comments(author=bots[i], size=100, before=before, sort='desc',sort_type='created_utc')
        
        for comment in comments:           
            before = comment['created_utc']
            key = comment['id']
            bcomments[key] = []       

            value1 = bots[i]
            value2 = comment['body']
            value3 = comment['subreddit']
            value4 = comment['score']
            value5 = comment['created_utc']
            value6 = comment['author_flair_text']
        
            #Append to rcomments dict
            bcomments[key].append(value1)
            bcomments[key].append(value2)
            bcomments[key].append(value3)
            bcomments[key].append(value4)
            bcomments[key].append(value5)
            bcomments[key].append(value6)
        
    except:
        pass
        
    i += 1


    time.sleep(1)

In [138]:
#Convert to dataframe
df2 = pd.DataFrame.from_dict(bcomments, orient='index', columns= ['Author', 'Comment', 'Subreddit', 'Score', 'Time', 'Flair'])
#Visual Check
df2.head(2)

Unnamed: 0,Author,Comment,Subreddit,Score,Time,Flair
e7wig3e,ADHDbot,Just dropping by to show that OP is completely misrepresenting the conversation\n\n&amp;#x200B;\n\n1. [https://i.imgur.com/Q9ePFI4.png](https://i.imgur.com/Q9ePFI4.png)\n\n&amp;#x200B;\n\n2. [https://i.imgur.com/jKiVMVc.png](https://i.imgur.com/jKiVMVc.png)\n\n&amp;#x200B;\n\n3. [https://i.imgur.com/JasrHdV.png](https://i.imgur.com/JasrHdV.png),unpopularopinion,4,1539733648,
cww687m,ADHDbot,"As per the rules in the side bar, yes or no questions such as ""Does anyone else"" or ""Has anyone else"" (or variants thereof) are not allowed in post titles. Please repost with a more specific question, such as ""How do you manage this symptom?"" instead of ""Does anyone else have this symptom."" You'll get better answers and more replies. \n\nPlease see the rule explanation [here](http://www.reddit.com/r/adhd/wiki/rules#wiki_ask_a_question_that_can_be_answered_with_simply_yes_or_no).\n\nWe appreciate your understanding, thank you.\n\n\n*[I am a bot](/r/AutoModerator/comments/q11pu/what_is_automoderator/), and this action was performed automatically. No humans get notified of replies to this comment. Please [contact the moderators of this subreddit](/message/compose?to=%2Fr%2FADHD) if you have any questions or concerns.*",ADHD,1,1447214638,
cwvv15o,ADHDbot,"As per the rules in the side bar, yes or no questions such as ""Does anyone else"" or ""Has anyone else"" (or variants thereof) are not allowed in post titles. Please repost with a more specific question, such as ""How do you manage this symptom?"" instead of ""Does anyone else have this symptom."" You'll get better answers and more replies. \n\nPlease see the rule explanation [here](http://www.reddit.com/r/adhd/wiki/rules#wiki_ask_a_question_that_can_be_answered_with_simply_yes_or_no).\n\nWe appreciate your understanding, thank you.\n\n\n*[I am a bot](/r/AutoModerator/comments/q11pu/what_is_automoderator/), and this action was performed automatically. No humans get notified of replies to this comment. Please [contact the moderators of this subreddit](/message/compose?to=%2Fr%2FADHD) if you have any questions or concerns.*",ADHD,1,1447195889,
cwvh2g3,ADHDbot,"We cannot and will not diagnose anyone. You need to speak with a trained professional to determine if you have ADHD or not. Read the [Wiki page on the Diagnosis Process](https://np.reddit.com/r/ADHD/wiki/diagnosis) for more information on finding a doctor and other steps.\n\nIt is unsafe to self-diagnose based solely on a list of symptoms, and only in speaking with a psychiatrist or ADHD specialist will you be able to get an objective view on whether you have ADHD or not.\n\nPlease see [this rule](http://www.reddit.com/r/adhd/wiki/rules#wiki_ask_if_you_have_adhd) and the two after it.\n\n\n*[I am a bot](/r/AutoModerator/comments/q11pu/what_is_automoderator/), and this action was performed automatically. No humans get notified of replies to this comment. Please [contact the moderators of this subreddit](/message/compose?to=%2Fr%2FADHD) if you have any questions or concerns.*",ADHD,1,1447175249,
cwv7db0,ADHDbot,"We cannot and will not diagnose anyone. You need to speak with a trained professional to determine if you have ADHD or not. Read the [Wiki page on the Diagnosis Process](https://np.reddit.com/r/ADHD/wiki/diagnosis) for more information on finding a doctor and other steps.\n\nIt is unsafe to self-diagnose based solely on a list of symptoms, and only in speaking with a psychiatrist or ADHD specialist will you be able to get an objective view on whether you have ADHD or not.\n\nPlease see [this rule](http://www.reddit.com/r/adhd/wiki/rules#wiki_ask_if_you_have_adhd) and the two after it.\n\n\n*[I am a bot](/r/AutoModerator/comments/q11pu/what_is_automoderator/), and this action was performed automatically. No humans get notified of replies to this comment. Please [contact the moderators of this subreddit](/message/compose?to=%2Fr%2FADHD) if you have any questions or concerns.*",ADHD,1,1447154078,


In [139]:
df2.shape

(28962, 6)

In [140]:
#Save complete data set CSV before cleaning
df2.to_csv('Data/Reddit_Bot_Comments_Pre-Clean.csv')

### Step Seven: Clean Bot Comments Dataframe

In [4]:
#Re-import csv
df3 = pd.read_csv('Data/Reddit_Bot_Comments_Pre-Clean.csv')

In [5]:
df3.head(2)

Unnamed: 0.1,Unnamed: 0,Author,Comment,Subreddit,Score,Time,Flair
0,e7wig3e,ADHDbot,Just dropping by to show that OP is completely misrepresenting the conversation\n\n&amp;#x200B;\n\n1. [https://i.imgur.com/Q9ePFI4.png](https://i.imgur.com/Q9ePFI4.png)\n\n&amp;#x200B;\n\n2. [https://i.imgur.com/jKiVMVc.png](https://i.imgur.com/jKiVMVc.png)\n\n&amp;#x200B;\n\n3. [https://i.imgur.com/JasrHdV.png](https://i.imgur.com/JasrHdV.png),unpopularopinion,4,1539733648,
1,cww687m,ADHDbot,"As per the rules in the side bar, yes or no questions such as ""Does anyone else"" or ""Has anyone else"" (or variants thereof) are not allowed in post titles. Please repost with a more specific question, such as ""How do you manage this symptom?"" instead of ""Does anyone else have this symptom."" You'll get better answers and more replies. \n\nPlease see the rule explanation [here](http://www.reddit.com/r/adhd/wiki/rules#wiki_ask_a_question_that_can_be_answered_with_simply_yes_or_no).\n\nWe appreciate your understanding, thank you.\n\n\n*[I am a bot](/r/AutoModerator/comments/q11pu/what_is_automoderator/), and this action was performed automatically. No humans get notified of replies to this comment. Please [contact the moderators of this subreddit](/message/compose?to=%2Fr%2FADHD) if you have any questions or concerns.*",ADHD,1,1447214638,
2,cwvv15o,ADHDbot,"As per the rules in the side bar, yes or no questions such as ""Does anyone else"" or ""Has anyone else"" (or variants thereof) are not allowed in post titles. Please repost with a more specific question, such as ""How do you manage this symptom?"" instead of ""Does anyone else have this symptom."" You'll get better answers and more replies. \n\nPlease see the rule explanation [here](http://www.reddit.com/r/adhd/wiki/rules#wiki_ask_a_question_that_can_be_answered_with_simply_yes_or_no).\n\nWe appreciate your understanding, thank you.\n\n\n*[I am a bot](/r/AutoModerator/comments/q11pu/what_is_automoderator/), and this action was performed automatically. No humans get notified of replies to this comment. Please [contact the moderators of this subreddit](/message/compose?to=%2Fr%2FADHD) if you have any questions or concerns.*",ADHD,1,1447195889,
3,cwvh2g3,ADHDbot,"We cannot and will not diagnose anyone. You need to speak with a trained professional to determine if you have ADHD or not. Read the [Wiki page on the Diagnosis Process](https://np.reddit.com/r/ADHD/wiki/diagnosis) for more information on finding a doctor and other steps.\n\nIt is unsafe to self-diagnose based solely on a list of symptoms, and only in speaking with a psychiatrist or ADHD specialist will you be able to get an objective view on whether you have ADHD or not.\n\nPlease see [this rule](http://www.reddit.com/r/adhd/wiki/rules#wiki_ask_if_you_have_adhd) and the two after it.\n\n\n*[I am a bot](/r/AutoModerator/comments/q11pu/what_is_automoderator/), and this action was performed automatically. No humans get notified of replies to this comment. Please [contact the moderators of this subreddit](/message/compose?to=%2Fr%2FADHD) if you have any questions or concerns.*",ADHD,1,1447175249,
4,cwv7db0,ADHDbot,"We cannot and will not diagnose anyone. You need to speak with a trained professional to determine if you have ADHD or not. Read the [Wiki page on the Diagnosis Process](https://np.reddit.com/r/ADHD/wiki/diagnosis) for more information on finding a doctor and other steps.\n\nIt is unsafe to self-diagnose based solely on a list of symptoms, and only in speaking with a psychiatrist or ADHD specialist will you be able to get an objective view on whether you have ADHD or not.\n\nPlease see [this rule](http://www.reddit.com/r/adhd/wiki/rules#wiki_ask_if_you_have_adhd) and the two after it.\n\n\n*[I am a bot](/r/AutoModerator/comments/q11pu/what_is_automoderator/), and this action was performed automatically. No humans get notified of replies to this comment. Please [contact the moderators of this subreddit](/message/compose?to=%2Fr%2FADHD) if you have any questions or concerns.*",ADHD,1,1447154078,


In [6]:
#Rename ID Column
df3.rename(columns={"Unnamed: 0": "ID"}, inplace=True)

In [7]:
#Check for null values
df3.isna().sum()

ID               0
Author           0
Comment          0
Subreddit        0
Score            0
Time             0
Flair        23123
dtype: int64

In [8]:
#Replace Null Values with 'None'
df3.fillna('None', inplace=True)

#Visual Check
df3.isnull().sum()

ID           0
Author       0
Comment      0
Subreddit    0
Score        0
Time         0
Flair        0
dtype: int64

In [9]:
df3.dtypes

ID           object
Author       object
Comment      object
Subreddit    object
Score         int64
Time          int64
Flair        object
dtype: object

In [10]:
#Convert Epoch Time to Datetime
df3['Time'] = df3['Time'].apply(lambda x: datetime.datetime.fromtimestamp(x))

In [11]:
df3.dtypes

ID                   object
Author               object
Comment              object
Subreddit            object
Score                 int64
Time         datetime64[ns]
Flair                object
dtype: object

In [12]:
#Visual Check
df3.head(2)

Unnamed: 0,ID,Author,Comment,Subreddit,Score,Time,Flair
0,e7wig3e,ADHDbot,Just dropping by to show that OP is completely misrepresenting the conversation\n\n&amp;#x200B;\n\n1. [https://i.imgur.com/Q9ePFI4.png](https://i.imgur.com/Q9ePFI4.png)\n\n&amp;#x200B;\n\n2. [https://i.imgur.com/jKiVMVc.png](https://i.imgur.com/jKiVMVc.png)\n\n&amp;#x200B;\n\n3. [https://i.imgur.com/JasrHdV.png](https://i.imgur.com/JasrHdV.png),unpopularopinion,4,2018-10-16 16:47:28,
1,cww687m,ADHDbot,"As per the rules in the side bar, yes or no questions such as ""Does anyone else"" or ""Has anyone else"" (or variants thereof) are not allowed in post titles. Please repost with a more specific question, such as ""How do you manage this symptom?"" instead of ""Does anyone else have this symptom."" You'll get better answers and more replies. \n\nPlease see the rule explanation [here](http://www.reddit.com/r/adhd/wiki/rules#wiki_ask_a_question_that_can_be_answered_with_simply_yes_or_no).\n\nWe appreciate your understanding, thank you.\n\n\n*[I am a bot](/r/AutoModerator/comments/q11pu/what_is_automoderator/), and this action was performed automatically. No humans get notified of replies to this comment. Please [contact the moderators of this subreddit](/message/compose?to=%2Fr%2FADHD) if you have any questions or concerns.*",ADHD,1,2015-11-10 20:03:58,
2,cwvv15o,ADHDbot,"As per the rules in the side bar, yes or no questions such as ""Does anyone else"" or ""Has anyone else"" (or variants thereof) are not allowed in post titles. Please repost with a more specific question, such as ""How do you manage this symptom?"" instead of ""Does anyone else have this symptom."" You'll get better answers and more replies. \n\nPlease see the rule explanation [here](http://www.reddit.com/r/adhd/wiki/rules#wiki_ask_a_question_that_can_be_answered_with_simply_yes_or_no).\n\nWe appreciate your understanding, thank you.\n\n\n*[I am a bot](/r/AutoModerator/comments/q11pu/what_is_automoderator/), and this action was performed automatically. No humans get notified of replies to this comment. Please [contact the moderators of this subreddit](/message/compose?to=%2Fr%2FADHD) if you have any questions or concerns.*",ADHD,1,2015-11-10 14:51:29,
3,cwvh2g3,ADHDbot,"We cannot and will not diagnose anyone. You need to speak with a trained professional to determine if you have ADHD or not. Read the [Wiki page on the Diagnosis Process](https://np.reddit.com/r/ADHD/wiki/diagnosis) for more information on finding a doctor and other steps.\n\nIt is unsafe to self-diagnose based solely on a list of symptoms, and only in speaking with a psychiatrist or ADHD specialist will you be able to get an objective view on whether you have ADHD or not.\n\nPlease see [this rule](http://www.reddit.com/r/adhd/wiki/rules#wiki_ask_if_you_have_adhd) and the two after it.\n\n\n*[I am a bot](/r/AutoModerator/comments/q11pu/what_is_automoderator/), and this action was performed automatically. No humans get notified of replies to this comment. Please [contact the moderators of this subreddit](/message/compose?to=%2Fr%2FADHD) if you have any questions or concerns.*",ADHD,1,2015-11-10 09:07:29,
4,cwv7db0,ADHDbot,"We cannot and will not diagnose anyone. You need to speak with a trained professional to determine if you have ADHD or not. Read the [Wiki page on the Diagnosis Process](https://np.reddit.com/r/ADHD/wiki/diagnosis) for more information on finding a doctor and other steps.\n\nIt is unsafe to self-diagnose based solely on a list of symptoms, and only in speaking with a psychiatrist or ADHD specialist will you be able to get an objective view on whether you have ADHD or not.\n\nPlease see [this rule](http://www.reddit.com/r/adhd/wiki/rules#wiki_ask_if_you_have_adhd) and the two after it.\n\n\n*[I am a bot](/r/AutoModerator/comments/q11pu/what_is_automoderator/), and this action was performed automatically. No humans get notified of replies to this comment. Please [contact the moderators of this subreddit](/message/compose?to=%2Fr%2FADHD) if you have any questions or concerns.*",ADHD,1,2015-11-10 03:14:38,


In [13]:
#Check for unique time values
df3.Time.nunique()

28801

In [14]:
df3.Time.max()

Timestamp('2020-10-29 16:00:13')

In [15]:
df3.Time.min()

Timestamp('2009-12-18 14:21:05')

In [None]:
#Check for Removed Posts
df3.loc[df3['Comment'] == 32535]

In [153]:
#Add column specifying Bot
df3['Class'] = 'Bot'
#Visual Check
df3.head(2)

Unnamed: 0,ID,Author,Comment,Subreddit,Score,Time,Flair,Class
0,e7wig3e,ADHDbot,Just dropping by to show that OP is completely misrepresenting the conversation\n\n&amp;#x200B;\n\n1. [https://i.imgur.com/Q9ePFI4.png](https://i.imgur.com/Q9ePFI4.png)\n\n&amp;#x200B;\n\n2. [https://i.imgur.com/jKiVMVc.png](https://i.imgur.com/jKiVMVc.png)\n\n&amp;#x200B;\n\n3. [https://i.imgur.com/JasrHdV.png](https://i.imgur.com/JasrHdV.png),unpopularopinion,4,2018-10-16 16:47:28,,Bot
1,cww687m,ADHDbot,"As per the rules in the side bar, yes or no questions such as ""Does anyone else"" or ""Has anyone else"" (or variants thereof) are not allowed in post titles. Please repost with a more specific question, such as ""How do you manage this symptom?"" instead of ""Does anyone else have this symptom."" You'll get better answers and more replies. \n\nPlease see the rule explanation [here](http://www.reddit.com/r/adhd/wiki/rules#wiki_ask_a_question_that_can_be_answered_with_simply_yes_or_no).\n\nWe appreciate your understanding, thank you.\n\n\n*[I am a bot](/r/AutoModerator/comments/q11pu/what_is_automoderator/), and this action was performed automatically. No humans get notified of replies to this comment. Please [contact the moderators of this subreddit](/message/compose?to=%2Fr%2FADHD) if you have any questions or concerns.*",ADHD,1,2015-11-10 20:03:58,,Bot
2,cwvv15o,ADHDbot,"As per the rules in the side bar, yes or no questions such as ""Does anyone else"" or ""Has anyone else"" (or variants thereof) are not allowed in post titles. Please repost with a more specific question, such as ""How do you manage this symptom?"" instead of ""Does anyone else have this symptom."" You'll get better answers and more replies. \n\nPlease see the rule explanation [here](http://www.reddit.com/r/adhd/wiki/rules#wiki_ask_a_question_that_can_be_answered_with_simply_yes_or_no).\n\nWe appreciate your understanding, thank you.\n\n\n*[I am a bot](/r/AutoModerator/comments/q11pu/what_is_automoderator/), and this action was performed automatically. No humans get notified of replies to this comment. Please [contact the moderators of this subreddit](/message/compose?to=%2Fr%2FADHD) if you have any questions or concerns.*",ADHD,1,2015-11-10 14:51:29,,Bot
3,cwvh2g3,ADHDbot,"We cannot and will not diagnose anyone. You need to speak with a trained professional to determine if you have ADHD or not. Read the [Wiki page on the Diagnosis Process](https://np.reddit.com/r/ADHD/wiki/diagnosis) for more information on finding a doctor and other steps.\n\nIt is unsafe to self-diagnose based solely on a list of symptoms, and only in speaking with a psychiatrist or ADHD specialist will you be able to get an objective view on whether you have ADHD or not.\n\nPlease see [this rule](http://www.reddit.com/r/adhd/wiki/rules#wiki_ask_if_you_have_adhd) and the two after it.\n\n\n*[I am a bot](/r/AutoModerator/comments/q11pu/what_is_automoderator/), and this action was performed automatically. No humans get notified of replies to this comment. Please [contact the moderators of this subreddit](/message/compose?to=%2Fr%2FADHD) if you have any questions or concerns.*",ADHD,1,2015-11-10 09:07:29,,Bot
4,cwv7db0,ADHDbot,"We cannot and will not diagnose anyone. You need to speak with a trained professional to determine if you have ADHD or not. Read the [Wiki page on the Diagnosis Process](https://np.reddit.com/r/ADHD/wiki/diagnosis) for more information on finding a doctor and other steps.\n\nIt is unsafe to self-diagnose based solely on a list of symptoms, and only in speaking with a psychiatrist or ADHD specialist will you be able to get an objective view on whether you have ADHD or not.\n\nPlease see [this rule](http://www.reddit.com/r/adhd/wiki/rules#wiki_ask_if_you_have_adhd) and the two after it.\n\n\n*[I am a bot](/r/AutoModerator/comments/q11pu/what_is_automoderator/), and this action was performed automatically. No humans get notified of replies to this comment. Please [contact the moderators of this subreddit](/message/compose?to=%2Fr%2FADHD) if you have any questions or concerns.*",ADHD,1,2015-11-10 03:14:38,,Bot


In [154]:
#Save to csv
df3.to_csv('Data/Reddit_Bot_Comments_Clean.csv', sep=',', date_format='%Y-%m-%d %H:%M:%S')

### Step Seven: Join Bots and Non-Bots into a single dataframe

In [155]:
#Re-import Bots csv
bots = pd.read_csv('Data/Reddit_Bot_comments_Clean.csv', index_col=0)

#Check
bots.head(2)

Unnamed: 0,ID,Author,Comment,Subreddit,Score,Time,Flair,Class
0,e7wig3e,ADHDbot,Just dropping by to show that OP is completely misrepresenting the conversation\n\n&amp;#x200B;\n\n1. [https://i.imgur.com/Q9ePFI4.png](https://i.imgur.com/Q9ePFI4.png)\n\n&amp;#x200B;\n\n2. [https://i.imgur.com/jKiVMVc.png](https://i.imgur.com/jKiVMVc.png)\n\n&amp;#x200B;\n\n3. [https://i.imgur.com/JasrHdV.png](https://i.imgur.com/JasrHdV.png),unpopularopinion,4,2018-10-16 16:47:28,,Bot
1,cww687m,ADHDbot,"As per the rules in the side bar, yes or no questions such as ""Does anyone else"" or ""Has anyone else"" (or variants thereof) are not allowed in post titles. Please repost with a more specific question, such as ""How do you manage this symptom?"" instead of ""Does anyone else have this symptom."" You'll get better answers and more replies. \n\nPlease see the rule explanation [here](http://www.reddit.com/r/adhd/wiki/rules#wiki_ask_a_question_that_can_be_answered_with_simply_yes_or_no).\n\nWe appreciate your understanding, thank you.\n\n\n*[I am a bot](/r/AutoModerator/comments/q11pu/what_is_automoderator/), and this action was performed automatically. No humans get notified of replies to this comment. Please [contact the moderators of this subreddit](/message/compose?to=%2Fr%2FADHD) if you have any questions or concerns.*",ADHD,1,2015-11-10 20:03:58,,Bot
2,cwvv15o,ADHDbot,"As per the rules in the side bar, yes or no questions such as ""Does anyone else"" or ""Has anyone else"" (or variants thereof) are not allowed in post titles. Please repost with a more specific question, such as ""How do you manage this symptom?"" instead of ""Does anyone else have this symptom."" You'll get better answers and more replies. \n\nPlease see the rule explanation [here](http://www.reddit.com/r/adhd/wiki/rules#wiki_ask_a_question_that_can_be_answered_with_simply_yes_or_no).\n\nWe appreciate your understanding, thank you.\n\n\n*[I am a bot](/r/AutoModerator/comments/q11pu/what_is_automoderator/), and this action was performed automatically. No humans get notified of replies to this comment. Please [contact the moderators of this subreddit](/message/compose?to=%2Fr%2FADHD) if you have any questions or concerns.*",ADHD,1,2015-11-10 14:51:29,,Bot
3,cwvh2g3,ADHDbot,"We cannot and will not diagnose anyone. You need to speak with a trained professional to determine if you have ADHD or not. Read the [Wiki page on the Diagnosis Process](https://np.reddit.com/r/ADHD/wiki/diagnosis) for more information on finding a doctor and other steps.\n\nIt is unsafe to self-diagnose based solely on a list of symptoms, and only in speaking with a psychiatrist or ADHD specialist will you be able to get an objective view on whether you have ADHD or not.\n\nPlease see [this rule](http://www.reddit.com/r/adhd/wiki/rules#wiki_ask_if_you_have_adhd) and the two after it.\n\n\n*[I am a bot](/r/AutoModerator/comments/q11pu/what_is_automoderator/), and this action was performed automatically. No humans get notified of replies to this comment. Please [contact the moderators of this subreddit](/message/compose?to=%2Fr%2FADHD) if you have any questions or concerns.*",ADHD,1,2015-11-10 09:07:29,,Bot
4,cwv7db0,ADHDbot,"We cannot and will not diagnose anyone. You need to speak with a trained professional to determine if you have ADHD or not. Read the [Wiki page on the Diagnosis Process](https://np.reddit.com/r/ADHD/wiki/diagnosis) for more information on finding a doctor and other steps.\n\nIt is unsafe to self-diagnose based solely on a list of symptoms, and only in speaking with a psychiatrist or ADHD specialist will you be able to get an objective view on whether you have ADHD or not.\n\nPlease see [this rule](http://www.reddit.com/r/adhd/wiki/rules#wiki_ask_if_you_have_adhd) and the two after it.\n\n\n*[I am a bot](/r/AutoModerator/comments/q11pu/what_is_automoderator/), and this action was performed automatically. No humans get notified of replies to this comment. Please [contact the moderators of this subreddit](/message/compose?to=%2Fr%2FADHD) if you have any questions or concerns.*",ADHD,1,2015-11-10 03:14:38,,Bot


In [156]:
#Check Bots Data Types
bots.dtypes

ID           object
Author       object
Comment      object
Subreddit    object
Score         int64
Time         object
Flair        object
Class        object
dtype: object

In [157]:
#Change time to datettime
bots['Time'] = pd.to_datetime(bots['Time'], format='%Y-%m-%d %H:%M:%S')

In [158]:
#Check Bots Data Types
bots.dtypes

ID                   object
Author               object
Comment              object
Subreddit            object
Score                 int64
Time         datetime64[ns]
Flair                object
Class                object
dtype: object

In [159]:
#Check length
bots_size = len(df3)
print(bots_size)

28962


In [160]:
#Re-import Non-Bots csv
nbots = pd.read_csv('Data/Reddit_NB_Comments_Clean.csv', index_col=0)

#Check
nbots.head(2)

Unnamed: 0,ID,Author,Comment,Subreddit,Score,Time,Flair,Class
0,g75vw68,mvea,"Link to study: https://wwwnc.cdc.gov/eid/article/26/12/20-3910_article\n\nSpeake H, Phillips A, Chong T, Sikazwe C, Levy A, Lang J, et al. \n\nFlight-associated transmission of severe acute respiratory syndrome coronavirus 2 corroborated by whole-genome sequencing. \n\nEmerg Infect Dis. 2020 Dec \n\nhttps://doi.org/10.3201/eid2612.203910\n\nDOI: 10.3201/eid2612.203910\n\nAbstract\n\nTo investigate potential transmission of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) during a domestic flight within Australia, we performed epidemiologic analyses with whole-genome sequencing. Eleven passengers with PCR-confirmed SARS-CoV-2 infection and symptom onset within 48 hours of the flight were considered infectious during travel; 9 had recently disembarked from a cruise ship with a retrospectively identified SARS-CoV-2 outbreak. The virus strain of those on the cruise and the flight was linked (A2-RP) and had not been previously identified in Australia. For 11 passengers, none of whom had traveled on the cruise ship, PCR-confirmed SARS-CoV-2 illness developed between 48 hours and 14 days after the flight. Eight cases were considered flight associated with the distinct SARS-CoV-2 A2-RP strain; the remaining 3 cases (1 with A2-RP) were possibly flight associated. All 11 passengers had been in the same cabin with symptomatic persons who had primary, culture-positive, A2-RP cases. This investigation provides evidence of flight-associated SARS-CoV-2 transmission.",science,1,2020-09-30 04:56:42,MD-PhD-MBA | Clinical Professor/Medicine,Non-Bot
1,g74lsnc,mvea,"Genopo: a nanopore sequencing analysis toolkit for portable Android devices\n\nHiruna Samarakoon, Sanoj Punchihewa, […]Ira W. Deveson \n\nCommunications Biology volume 3, Article number: 538 (2020) \n\nDOI: https://doi.org/10.1038/s42003-020-01270-z\n\nAbstract\n\nThe advent of portable nanopore sequencing devices has enabled DNA and RNA sequencing to be performed in the field or the clinic. However, advances in in situ genomics require parallel development of portable, offline solutions for the computational analysis of sequencing data. Here we introduce Genopo, a mobile toolkit for nanopore sequencing analysis. Genopo compacts popular bioinformatics tools to an Android application, enabling fully portable computation. To demonstrate its utility for in situ genome analysis, we use Genopo to determine the complete genome sequence of the human coronavirus SARS-CoV-2 in nine patient isolates sequenced on a nanopore device, with Genopo executing this workflow in less than 30 min per sample on a range of popular smartphones. We further show how Genopo can be used to profile DNA methylation in a human genome sample, illustrating a flexible, efficient architecture that is suitable to run many popular bioinformatics tools and accommodate small or large genomes. As the first ever smartphone application for nanopore sequencing analysis, Genopo enables the genomics community to harness this cheap, ubiquitous computational resource.",science,1,2020-09-29 18:25:12,MD-PhD-MBA | Clinical Professor/Medicine,Non-Bot
2,g74dalf,mvea,"Low risk of SARS-CoV-2 transmission by fomites in real-life conditions\n\nMario U Mondelli\nMarta Colaneri\nElena M Seminari\nFausto Baldanti\nRaffaele Bruno \n\nLancet Infectious Diseases\n\nPublished:September 29, 2020\n\nDOI:https://doi.org/10.1016/S1473-3099(20)30678-2\n\nWe have done two sequential studies4, 5 seeking to determine on one hand the extent, if any, of contamination of inanimate surfaces in a standard infectious disease ward of a major referral hospital in northern Italy, and on the other hand whether the risk of contamination was higher in emergency rooms and sub-intensive care wards than on ordinary wards. Cleaning procedures were standard. A number of objects and surfaces were swabbed. Remarkably, only the continuous positive airway pressure helmet of one patient was positive for SARS-CoV-2 RNA. More importantly, attempts to culture the positive swabs on Vero E6 cells were unsuccessful,5 suggesting that patient fomites and surfaces are not contaminated with viable virus. \n\nOur findings suggest that environmental contamination leading to SARS-CoV-2 transmission is unlikely to occur in real-life conditions, provided that standard cleaning procedures and precautions are enforced. These data would support Goldman's point that the chance of transmission through inanimate surfaces is less frequent than hitherto recognised.",science,3,2020-09-29 17:02:25,MD-PhD-MBA | Clinical Professor/Medicine,Non-Bot
3,g74d5dd,mvea,"Safety and Immunogenicity of SARS-CoV-2 mRNA-1273 Vaccine in Older Adults\n\nEvan J. Anderson, M.D., Nadine G. Rouphael, M.D., Alicia T. Widge, M.D., Lisa A. Jackson, M.D., M.P.H., et al., for the mRNA-1273 Study Group*\n\nNEJM September 29, 2020\n\nDOI: 10.1056/NEJMoa2028436\n\nAbstract\n\nBACKGROUND\n\nTesting of vaccine candidates to prevent infection with severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) in an older population is important, since increased incidences of illness and death from coronavirus disease 2019 (Covid-19) have been associated with an older age.\n\nMETHODS\n\nWe conducted a phase 1, dose-escalation, open-label trial of a messenger RNA vaccine, mRNA-1273, which encodes the stabilized prefusion SARS-CoV-2 spike protein (S-2P) in healthy adults. The trial was expanded to include 40 older adults, who were stratified according to age (56 to 70 years or ≥71 years). All the participants were assigned sequentially to receive two doses of either 25 μg or 100 μg of vaccine administered 28 days apart.\n\nRESULTS\n\nSolicited adverse events were predominantly mild or moderate in severity and most frequently included fatigue, chills, headache, myalgia, and pain at the injection site. Such adverse events were dose-dependent and were more common after the second immunization. Binding-antibody responses increased rapidly after the first immunization. By day 57, among the participants who received the 25-μg dose, the anti–S-2P geometric mean titer (GMT) was 323,945 among those between the ages of 56 and 70 years and 1,128,391 among those who were 71 years of age or older; among the participants who received the 100-μg dose, the GMT in the two age subgroups was 1,183,066 and 3,638,522, respectively. After the second immunization, serum neutralizing activity was detected in all the participants by multiple methods. Binding- and neutralizing-antibody responses appeared to be similar to those previously reported among vaccine recipients between the ages of 18 and 55 years and were above the median of a panel of controls who had donated convalescent serum. The vaccine elicited a strong CD4 cytokine response involving type 1 helper T cells.\n\nCONCLUSIONS\n\nIn this small study involving older adults, adverse events associated with the mRNA-1273 vaccine were mainly mild or moderate. The 100-μg dose induced higher binding- and neutralizing-antibody titers than the 25-μg dose, which supports the use of the 100-μg dose in a phase 3 vaccine trial. (Funded by the National Institute of Allergy and Infectious Diseases and others; mRNA-1273 Study ClinicalTrials.gov number, NCT04283461. opens in new tab.)",science,1,2020-09-29 17:00:59,MD-PhD-MBA | Clinical Professor/Medicine,Non-Bot
4,g71t8na,mvea,"Helfand BKI, Webb M, Gartaganis SL, Fuller L, Kwon C, Inouye SK. \n\nThe Exclusion of Older Persons From Vaccine and Treatment Trials for Coronavirus Disease 2019—Missing the Target. \n\nJAMA Intern Med. \n\nPublished online September 28, 2020. \n\ndoi:10.1001/jamainternmed.2020.5084\n\nOlder adults are at greatest risk of severe disease and death due to coronavirus disease 2019 (COVID-19). Globally, persons older than 65 years comprise 9% of the population,1 yet account for 30% to 40% of cases and more than 80% of deaths.2\n\nUnfortunately, there is a long history of exclusion of older adults from clinical trials. In response, the National Institutes of Health instituted the Inclusion Across the Lifespan policy, requiring the inclusion of older adults in clinical trials.3 Thus, we reviewed all COVID-19 treatment and vaccine trials on http://www.clinicaltrials.gov to evaluate their risk for exclusion of older adults (≥65 years).\n\nMethods\n\nDetails of our approach, methods, and description of included clinical trials are shown in the eMethods in the Supplement.\n\nEach of the 847 clinical trials was abstracted by at least 1 trained research associate, with reliability checks of all ratings. Age exclusions were identified by viewing all of the eligibility and exclusionary criteria. Specific age exclusions were classified into 5-year categories from ages 55 to 80 years; our focus was on exclusion of the 65 to 80 years age group most affected by COVID-19. Informed consent was waived because all data were deidentified and came from previously published studies.\n\nResults\n\nTable 1 identifies clinical trials by treatment with an exclusion by age. We found large variability in the age exclusions. Among the 847 trials, 195 (23%) included an age cut-off.\n\nTable 2 displays indirect age-related exclusions preferentially affecting older adults; each trial could have multiple exclusions. The most common age-related exclusion was compliance concerns (213 trials), and 129 of these were related to consent. Next, were broad nonspecified exclusions, specific comorbidities, requirement of technology, and other reasons. A total of 366 (43%) trials had any exclusions, of which 252 (30%) did not have an age-based exclusion. Combining the results of age-based exclusions (Table 1) and exclusions preferentially affecting older adults (Table 2), 447 (53%) trials were considered high risk for excluding older adults.\n\nIn 232 phase 3 clinical trials, 38 (16%) included age cut-offs and 77 (33%) had exclusions preferentially affecting older adults; thus, 115 (50%) were considered high risk for excluding older adults. Of 18 vaccine trials, 11 (61%) included age cut-offs, and the remaining 7 had broad nonspecified exclusions; thus, 100% were considered high risk for excluding older adults.\n\n\nDiscussion\n\nOur findings indicate that older adults are likely to be excluded from more than 50% of COVID-19 clinical trials and 100% of vaccine trials. Such exclusion will limit the ability to evaluate the efficacy, dosage, and adverse effects of the intended treatments. We acknowledge that some exclusions for severe or uncontrolled comorbidities will be essential to protect the health and safety of older adults. However, caution must be taken to avoid excluding otherwise eligible participants for reasons that are not well-justified.",science,1,2020-09-29 04:46:13,MD-PhD-MBA | Clinical Professor/Medicine,Non-Bot


In [161]:
#Check non-bot datatypes
nbots.dtypes

ID           object
Author       object
Comment      object
Subreddit    object
Score         int64
Time         object
Flair        object
Class        object
dtype: object

In [162]:
#Change time to datettime
nbots['Time'] = pd.to_datetime(nbots['Time'], format='%Y-%m-%d %H:%M:%S')

In [163]:
#Check non-bot datatypes
nbots.dtypes

ID                   object
Author               object
Comment              object
Subreddit            object
Score                 int64
Time         datetime64[ns]
Flair                object
Class                object
dtype: object

In [164]:
#Create new dataframe, appending df3 to df2
reddit = bots.append(nbots)

In [165]:
#Visual Check
reddit.head(2)

Unnamed: 0,ID,Author,Comment,Subreddit,Score,Time,Flair,Class
0,e7wig3e,ADHDbot,Just dropping by to show that OP is completely misrepresenting the conversation\n\n&amp;#x200B;\n\n1. [https://i.imgur.com/Q9ePFI4.png](https://i.imgur.com/Q9ePFI4.png)\n\n&amp;#x200B;\n\n2. [https://i.imgur.com/jKiVMVc.png](https://i.imgur.com/jKiVMVc.png)\n\n&amp;#x200B;\n\n3. [https://i.imgur.com/JasrHdV.png](https://i.imgur.com/JasrHdV.png),unpopularopinion,4,2018-10-16 16:47:28,,Bot
1,cww687m,ADHDbot,"As per the rules in the side bar, yes or no questions such as ""Does anyone else"" or ""Has anyone else"" (or variants thereof) are not allowed in post titles. Please repost with a more specific question, such as ""How do you manage this symptom?"" instead of ""Does anyone else have this symptom."" You'll get better answers and more replies. \n\nPlease see the rule explanation [here](http://www.reddit.com/r/adhd/wiki/rules#wiki_ask_a_question_that_can_be_answered_with_simply_yes_or_no).\n\nWe appreciate your understanding, thank you.\n\n\n*[I am a bot](/r/AutoModerator/comments/q11pu/what_is_automoderator/), and this action was performed automatically. No humans get notified of replies to this comment. Please [contact the moderators of this subreddit](/message/compose?to=%2Fr%2FADHD) if you have any questions or concerns.*",ADHD,1,2015-11-10 20:03:58,,Bot


In [166]:
#Visual Check
reddit.tail(2)

Unnamed: 0,ID,Author,Comment,Subreddit,Score,Time,Flair,Class
10278,dgga1g2,janellemonae,❤️,pics,3,2017-04-18 19:49:01,,Non-Bot
10279,ddur7h1,janellemonae,Trying to get him to get an account so he can answer directly.,hiphopheads,253,2017-02-16 22:22:57,THE QUEEN,Non-Bot


In [167]:
#Check length
reddit.shape

(39242, 8)

In [168]:
reddit.dtypes

ID                   object
Author               object
Comment              object
Subreddit            object
Score                 int64
Time         datetime64[ns]
Flair                object
Class                object
dtype: object

In [170]:
#Save to csv
reddit.to_csv('Data/All_Comments_Clean.csv', index=False, date_format='%Y-%m-%d %H:%M:%S')