# Reddit Data Collection — With PRAW

One way to collect Reddit data is with the Reddit API and [PRAW](https://praw.readthedocs.io/en/latest/getting_started/quick_start.html) (an acronym for **P**ython **R**eddit **W**rapper). 

In [None]:
!pip install praw

In [3]:
import praw
import pandas as pd
pd.set_option("max_colwidth", 500)

# Apply for Reddit API Access

To apply for Reddit API access, [read the instructions here](https://www.reddit.com/wiki/api) and then [fill out an application and aggree to the Terms of Use here](https://docs.google.com/forms/d/e/1FAIpQLSezNdDNK1-P8mspSbmtC2r86Ee9ZRbC66u929cG2GX0T9UMyw/viewform). Once you have a Reddit developer account, create a PRAW instance with your client ID, client secret, and Reddit user name.

In [None]:
reddit = praw.Reddit(client_id='your client id',
                     client_secret='your client secret',
                     user_agent='your reddit user name')

# Get Reddit Posts From a Subreddit

The following code draws from [TannerGilbert's PRAW tutorial code](https://github.com/TannerGilbert/Tutorials/blob/master/Reddit%20Webscraping%20using%20PRAW/Reddit%20API.ipynb).

## Hot Posts

In [100]:
hot_posts = reddit.subreddit('AmItheAsshole').hot(limit=10)

In [101]:
for reddit_post in hot_posts:
    print(reddit_post.title)

Check out /r/choosemyalignment for a D&D-themed judging experience!
Coronavirus Post Moratorium: Updated
AITA for dming a girl who bullied me relentlessly in high school regarding her instagram post?
AITA for telling my stepdaughter she’s absolutely under no circumstances allowed to switch out my cats food for vegan food even though she’s losing weight because the sight of normal cat food makes her sick?
AITA for 'ruining' a marriage because I made them sleep separately?
AITA for demanding my brother gives me thousands of pounds for next few years after he crashed into my car when he had a seizure?
AITA for telling my girlfriend she doesn't get special treatment on Mother's Day?
WIBTA if I ask my wife to stop listening to crime podcasts
AITA for decreasing my review for my computer repair guy from 5 stars to 2 stars?
AITA for getting upset at my in laws after they put a Disney-style family portrait which includes the woman my husband cheated on me with/mother of his illegitimate son in

In [102]:
for reddit_post in reddit.subreddit('AmItheAsshole').hot(limit=10):
    print(f"✨Title✨\n{reddit_post.title}\n✨Text✨\n{reddit_post.selftext}\n")

✨Title✨
Check out /r/choosemyalignment for a D&D-themed judging experience!
✨Text✨
Greetings my judgmental friends! I would like to bring a rising subreddit to your attention: /r/choosemyalignment. 

CMA is a fresh take on the AITA/AITB formula where instead of being called a dick, you can submit a situation and the users will vote on your D&D alignment. If you aren’t familiar with alignments, here is a chart:

https://wp-media.patheos.com/blogs/sites/124/2019/09/dnd-alignment-chart.jpg

Once your post has been judged, the bot will poop out a neat heatmap showing the break down of judgments you received like this: 

https://i.imgur.com/CbwqX1W.jpg

Just like for AITA and AITB,  the mods have crafted an intricate flair system where you gain XP by making posts and leaving judgments. As you gain XP, you will level up and get to pick a D&D class and earn ranks. There are prestige classes available at the higher ranks and we are planning some class-based events for everyone to participate i

## Top Posts

### By Day

In [103]:
for reddit_post in reddit.subreddit('AmItheAsshole').top("day", limit=5):
    print(f"✨Title✨\n{reddit_post.title}\n✨Text✨\n{reddit_post.selftext}\n")

✨Title✨
AITA for refusing to help my boyfriend with his business after he publicly declared it was "just him"
✨Text✨
My boyfriend runs a small food business, has done for the last few years. I have my own job but since he's begin cooking professionally, I have always been involved. From helping with menus, doing his finances and packaging orders for delivery etc, I have always felt like it was partly mine. I have always assumed he saw it the same way as it's never been "hey, can you help me with this" but rather "you need to do this today for me" 

Although I have put some of my own money in this business, I never expected to be an equal partner but as I spend around 20 hours a week doing work for him, some credit would be nice. Up until recently, he would usually say "we" in most social media posts and I assumed that was both of us. It is really all the credit I needed.

The other day, he made an Instagram business post which was partly about how difficult it is to operate right now a

### By Month

In [104]:
for reddit_post in reddit.subreddit('AmItheAsshole').top("month", limit=5):
    print(f"✨Title✨\n{reddit_post.title}\n✨Text✨\n{reddit_post.selftext}\n")

✨Title✨
AITA for telling my girlfriend to shut the fuck up after she insulted my sisters thighs?
✨Text✨
I’m 30 and my 12 year old sister is living with me right now because mom and pops are vulnerable so it made more sense for me to care for my sis for the time being. 

She is a really great kid and tbh I feel in a lot of ways like she’s my own kid because my mom and dad don’t speak English so I kind of had to raise my sis in ways that they couldn’t. Hard to explain but I’m sure anyone with a secondary culture will get what I mean- my mom and dad are great parents but having an English speaking person to guide you through shit when you live in an English speaking country is invaluable imo and my sister trusts me with stuff she won’t necessarily trust my parents with. 

Anyway my girlfriend was FaceTiming me and my sister walked past in shorts and a t shirt cuz it’s hot. My ~~sister~~ gf waited til my sister had left the area ( but not the room) and made a face and said ‘maybe feed her 

### By Year

In [105]:
for reddit_post in reddit.subreddit('AmItheAsshole').top("year", limit=5):
    print(f"✨Title✨\n{reddit_post.title}\n✨Text✨\n{reddit_post.selftext}\n")

✨Title✨
META: This sub is moving towards a value system that frequently doesn't align with the rest of the world
✨Text✨
I’ve enjoyed reading and posting on this sub for many months now, and I feel like I’ve noticed a disconcerting trend, lately. Over time, more and more of the posts seem to have A- a universal consensus on every post, with any dissenters massively downvoted and B- a shift towards judgments that seem (to me at least) to be out of step with how people in the real world judge situations.

Given that, I think it’s important to remember that even though the sub is not intended to be for validation posts or to be an echo chamber or to give advice on how people should behave in specific situations- in practice, a lot of times it is.

So just as a reminder- offline, people in your real life will think you’re an asshole if you take the last cookie when you know the child behind you wants it.

They’ll think you’re an asshole if you don’t stand up for an elderly person on a bus. 

### By All Time

In [106]:
for reddit_post in reddit.subreddit('AmItheAsshole').top("all", limit=5):
    print(f"✨Title✨\n{reddit_post.title}\n✨Text✨\n{reddit_post.selftext}\n")

✨Title✨
META: This sub is moving towards a value system that frequently doesn't align with the rest of the world
✨Text✨
I’ve enjoyed reading and posting on this sub for many months now, and I feel like I’ve noticed a disconcerting trend, lately. Over time, more and more of the posts seem to have A- a universal consensus on every post, with any dissenters massively downvoted and B- a shift towards judgments that seem (to me at least) to be out of step with how people in the real world judge situations.

Given that, I think it’s important to remember that even though the sub is not intended to be for validation posts or to be an echo chamber or to give advice on how people should behave in specific situations- in practice, a lot of times it is.

So just as a reminder- offline, people in your real life will think you’re an asshole if you take the last cookie when you know the child behind you wants it.

They’ll think you’re an asshole if you don’t stand up for an elderly person on a bus. 

To get Reddit posts by a specific date range, see the tutorial on using the Puhshift.io

# Get Reddit Posts From a Subreddit and Make a DataFrame

To see all the attributes that you can retrieve from a single Reddit post, consult [PRAW's "Submission" documentation](https://praw.readthedocs.io/en/latest/code_overview/models/submission.html#praw.models.Submission).

In [107]:
reddit_posts = []
aita_subreddit = reddit.subreddit('AmItheAsshole')

for reddit_post in aita_subreddit.top("month", limit=10):
    reddit_posts.append([reddit_post.title, reddit_post.score, reddit_post.id, reddit_post.subreddit, reddit_post.url, reddit_post.num_comments, reddit_post.selftext, reddit_post.created_utc])

reddit_posts = pd.DataFrame(reddit_posts, columns=['title', 'upvote_score', 'post_id', 'subreddit', 'post_url', 'num_comments', 'post_body', 'full_date'])

#Format date
reddit_posts['full_date'] = pd.to_datetime(reddit_posts['full_date'], utc=True, unit='s')
reddit_posts['date'] = reddit_posts['full_date'].dt.strftime("%Y-%m-%d")
reddit_posts

Unnamed: 0,title,upvote_score,post_id,subreddit,post_url,num_comments,post_body,full_date,date
0,AITA for telling my girlfriend to shut the fuck up after she insulted my sisters thighs?,41726,fzvxw7,AmItheAsshole,https://www.reddit.com/r/AmItheAsshole/comments/fzvxw7/aita_for_telling_my_girlfriend_to_shut_the_fuck/,2273,I’m 30 and my 12 year old sister is living with me right now because mom and pops are vulnerable so it made more sense for me to care for my sis for the time being. \n\nShe is a really great kid and tbh I feel in a lot of ways like she’s my own kid because my mom and dad don’t speak English so I kind of had to raise my sis in ways that they couldn’t. Hard to explain but I’m sure anyone with a secondary culture will get what I mean- my mom and dad are great parents but having an English speak...,2020-04-12 12:32:56+00:00,2020-04-12
1,AITA For Not Baking Much For My Family,31688,gcr7vr,AmItheAsshole,https://www.reddit.com/r/AmItheAsshole/comments/gcr7vr/aita_for_not_baking_much_for_my_family/,1444,"I [16M] started dating my girlfriend 2 years ago. I also got super super into baking around that time. I bake a lot. My girlfriend loves desserts. So I've given her a ton of stuff I bake, all kinds of different stuff. I often try to bake something new and then she gets to try something new. I honestly love baking way more than eating it. My girlfriend is the opposite.\n\n\nWell recently she gave me a scrapbook she made. She had counted every thing I baked her apparently, and she gave me this...",2020-05-03 14:10:08+00:00,2020-05-03
2,AITA for telling my autistic brother the truth when he asked me why women don’t like him?,29297,gbl3wk,AmItheAsshole,https://www.reddit.com/r/AmItheAsshole/comments/gbl3wk/aita_for_telling_my_autistic_brother_the_truth/,2342,I’ve got a younger brother (24) with Aspergers and he’s very high functioning albeit with his quirks. \n\nRecently I’ve moved back home during the stay at home orders to look after my parents. My brother still lives with them. I find out he’s been trying to date in recent months and confessed it’s been pretty unsuccessful for him. He even got to go on a first date but his date literally got up and left after about a half hour. \n\nI know exactly the reason why and it’s not flattering. For on...,2020-05-01 16:18:13+00:00,2020-05-01
3,"AITA for disowning my brother when he came out as gay, because of how he's treating his wife?",28634,ggmc1h,AmItheAsshole,https://www.reddit.com/r/AmItheAsshole/comments/ggmc1h/aita_for_disowning_my_brother_when_he_came_out_as/,1809,"I (21f) have a brother (28) who came out as gay last month. He has been married to my best friend's big sister (24) for four years, they have 2yo twin daughters together. I'm really close with her, so I've been trying to stay neutral in what has become a messy separation. \n\nMy brother told his wife he's gay by sitting her down, and saying he had been sleeping with two different men for about six months. He said he is now sure that he feels romantic feelings for men, and also told her he ha...",2020-05-09 19:52:25+00:00,2020-05-09
4,AITA? My parents took most of my wardrobe away as punishment and I said I didn't want the clothes back because it's obvious they're not actually mine.,27963,g5hta8,AmItheAsshole,https://www.reddit.com/r/AmItheAsshole/comments/g5hta8/aita_my_parents_took_most_of_my_wardrobe_away_as/,2946,"I got in trouble at school this fall, I'm a junior in high school. \n\n(Edit to add what I got in trouble for since a couple people asked... I smoked weed with a guy in the woods after school once and got caught. I also made out with him a couple times and my parents found out about that too)\n\nAs punishment my parents took away a lot of my things; all my clothes except 3 pairs of plain jeans and 3 plain black shirts and my coat. And all my makeup and hair stuff, purses and shoes; saying I ...",2020-04-21 15:56:36+00:00,2020-04-21
5,AITA for telling my wife that we're BOTH pregnant?,27467,ge2hgf,AmItheAsshole,https://www.reddit.com/r/AmItheAsshole/comments/ge2hgf/aita_for_telling_my_wife_that_were_both_pregnant/,1558,"I know this sounds bad, but hear me out. im using a throwaway cuz my wife uses reddit, so please don't upvote this. I just want honest feedback.\n\nMy wife (29f) and I (27m) do well financially, so we decided to have our fourth child. Every single pregnancy we've been through my wife has been a complete nightmare. Some things I can deal with, like waking up to the sound of her puking her guts out every morning, but when she starts demanding I go to the store every day to get her snacks or se...",2020-05-05 17:49:48+00:00,2020-05-05
6,AITA for changing my name? my parents named me Qur'stylle (Chrystal)?,27009,g1bn71,AmItheAsshole,https://www.reddit.com/r/AmItheAsshole/comments/g1bn71/aita_for_changing_my_name_my_parents_named_me/,2616,"So my asshole scumbag parents named me Qur'stylle and my whole life i have gotten shit like ""are you muslim"" ""what language is your name originated from?"" ""what country are you from"" and people butchering its pronunciation, for obvious reasons. I have always told people to just spell it as Chrystal and my parents (mainly mom) would take huge offense to it and would email my teachers every year to make sure they pronounce my name correctly. \n\nMy mom even grounded me once because I told peop...",2020-04-14 19:04:53+00:00,2020-04-14
7,AITA for telling my mom to not have any more kids?,26865,g95klk,AmItheAsshole,https://www.reddit.com/r/AmItheAsshole/comments/g95klk/aita_for_telling_my_mom_to_not_have_any_more_kids/,1927,"I(16F) live with my parents and 11 siblings. I'm the 4th kid, and the ages range from 20 to 1. We live in a 4 bedroom house, but it is so cramped with everyone in bunks and no privacy. My parents also put most of the responsibility of the younger kids on us while they lay down and watch TV. True, they can have their breaks but they take them so often that I don't really get to be a teenager. \n Last night, after my mom told me to but J+A(3) to bed, I told her to do it herself as I need ...",2020-04-27 17:47:50+00:00,2020-04-27
8,AITA for putting my dog's wee-wee pads on the bathroom floor b/c my BF has bad aim and keeps missing the toilet?,25167,gdhfdi,AmItheAsshole,https://www.reddit.com/r/AmItheAsshole/comments/gdhfdi/aita_for_putting_my_dogs_weewee_pads_on_the/,4044,"My boyfriend (30/M) and I (28/F) have been together about a year and when our city implemented lockdown for the virus, he decided to quarantine with me at my apartment as I live alone and have a nicer apartment and he has several roommates.\n\nMostly it has been going well and thankfully we both still have jobs, except in the first few weeks I started noticing that the bathroom floor was suddenly always wet around the toilet. (The sink is across the room from the toilet so it's unlikely to b...",2020-05-04 18:44:19+00:00,2020-05-04
9,UPDATE: AITA for throwing away my husband's Xbox after he refused to look for our lost dog?,28905,ghaqsu,AmItheAsshole,https://www.reddit.com/r/AmItheAsshole/comments/ghaqsu/update_aita_for_throwing_away_my_husbands_xbox/,1888,"Original post, here: https://www.reddit.com/r/AmItheAsshole/comments/g64rsj/aita_for_throwing_away_my_husbands_xbox_after_he/?utm_medium=android_app&utm_source=share\n\n\nFirst of all, thank you everyone for your immensely kind and considerate responses. I am thankful to each and everyone of you to give me such beautiful and encouraging messages. These kept me going, no joke. \n\n\nTippy was found 2 miles away from our house, a day after I posted ads and posters on several platforms and webs...",2020-05-10 22:10:14+00:00,2020-05-10


## Save to CSV File

In [108]:
reddit_posts.to_csv("top-reddit-aita-posts.csv", encoding="utf-8", index=False)

# Get Comments From a Post

https://praw.readthedocs.io/en/latest/tutorials/comments.html

In [109]:
submission = reddit.submission(id="gcr7vr")

In [110]:
submission.comments.replace_more(limit=None)

for comment in submission.comments:
    print(f"\nAuthor:\n{comment.author}\n\nComment:\n{comment.body}\n\n-------------------------------------")


Author:
AutoModerator

Comment:

Welcome to /r/AmITheAsshole. Please view our [voting guide here](https://www.reddit.com/r/AmItheAsshole/wiki/faq#wiki_what.2019s_with_these_acronyms.3F_what_do_they_mean.3F), and remember to use **only one** judgement in your comment.

Help keep the sub engaging!  

#Don’t downvote assholes! 


Do upvote interesting posts!

[Click Here For Our Rules](https://www.reddit.com/r/AmItheAsshole/about/rules/) and [Click Here For Our FAQ](https://www.reddit.com/r/AmItheAsshole/wiki/faq)


---


*I am a bot, and this action was performed automatically. Please [contact the moderators of this subreddit](/message/compose/?to=/r/AmItheAsshole) if you have any questions or concerns.*

-------------------------------------

Author:
vandajoy

Comment:
NTA - you and your girlfriend sound sweet and your family sounds bitter

-------------------------------------

Author:
pisspot718

Comment:
And look how much GF appreciated what you had been doing. She secretly made a s

In [111]:
def get_comments(row):
    submission = reddit.submission(id=row['post_id'])
    submission.comments.replace_more(limit=None)
    comments = [comment.body for comment in submission.comments]
    return comments

In [112]:
reddit_posts = []
aita_subreddit = reddit.subreddit('Datasets')

for reddit_post in aita_subreddit.top("month", limit=10):
    reddit_posts.append([reddit_post.title, reddit_post.score, reddit_post.id, reddit_post.subreddit, reddit_post.url, reddit_post.num_comments, reddit_post.selftext, reddit_post.created_utc])

reddit_posts = pd.DataFrame(reddit_posts, columns=['title', 'upvote_score', 'post_id', 'subreddit', 'post_url', 'num_comments', 'post_body', 'full_date'])

#Format date
reddit_posts['full_date'] = pd.to_datetime(reddit_posts['full_date'], utc=True, unit='s')
reddit_posts['date'] = reddit_posts['full_date'].dt.strftime("%Y-%m-%d")
reddit_posts

Unnamed: 0,title,upvote_score,post_id,subreddit,post_url,num_comments,post_body,full_date,date
0,We've updated our database... malicious online activity related to Covid-19,141,g6d1cr,datasets,https://www.reddit.com/r/datasets/comments/g6d1cr/weve_updated_our_database_malicious_online/,16,"Shared this data last week and got some really great feedback. We've now got a partnership with a new WHOIS provider allowing us to paint an incredibly detailed picture of malicious online activity throughout the pandemic. \n\n\nI'm certain more can be done with the data we've pulled together. Please download it, play with it, let me know if you have any thoughts.\n\n [https://github.com/ProPrivacy/covid-19](https://github.com/ProPrivacy/covid-19) \n\n [https://proprivacy.com/tools/scam-w...",2020-04-23 00:39:12+00:00,2020-04-23
1,Introducing the Spotify Podcast Dataset,114,g2qqrq,datasets,https://labs.spotify.com/2020/04/16/introducing-the-spotify-podcast-dataset-and-trec-challenge-2020/amp/,2,,2020-04-16 23:52:14+00:00,2020-04-16
2,I've scraped around 800 million characters worth of comments from the top 50 subreddits,110,gaukz5,datasets,https://www.reddit.com/r/datasets/comments/gaukz5/ive_scraped_around_800_million_characters_worth/,41,"Hi,\n\n&#x200B;\n\nI've been working on a machine learning side project amidst the quarantine, and for that, I have scraped around the 1000 top posts from the top 50 most subscribed subreddits, and saved 100 comments of each into a data set.\n\nI ended up going with different data for my project, but decided that I might as well share it.\n\n[You can find the dataset here](https://github.com/CrakenHUN/RedditCommentsDataset)\n\n&#x200B;\n\n[And in case you want to toy around with the scraper ...",2020-04-30 12:18:31+00:00,2020-04-30
3,U.S. Supreme Court Case Dataset,109,g6ulbg,datasets,https://www.reddit.com/r/datasets/comments/g6ulbg/us_supreme_court_case_dataset/,15,"Hi Everyone,\n\nI created a dataset of cleaned Supreme Court transcripts (speaker name, speaker duration, court details, etc.) and information on Supreme Court justices (place of birth, age, race, parent's occupation, religion, etc.). The data was supposed to be used for a research project, but it ended up falling through. I wanted to share it here in case anyone could find any good uses for it. Here's a link to the [GitHub repo](https://github.com/EricWiener/supreme-court-cases). Please let...",2020-04-23 20:47:13+00:00,2020-04-23
4,Free graphical CSV file editor for Windows 10,99,gd9evj,datasets,https://www.reddit.com/r/datasets/comments/gd9evj/free_graphical_csv_file_editor_for_windows_10/,13,"I wrote a graphical CSV file editor for my own needs and then made it user friendly, robust and fast enough so I could sell it on Microsoft Store. Unfortunately my marketing skills are not up to my coding and engineering skills, so not very many people are buying it... so I thought I could just as well give it away here on Reddit for free now. There's no catch, no ads or other annoyances - I really just want it to be put to use wherever it makes sense.\n\nIt's different from other CSV edito...",2020-05-04 10:51:17+00:00,2020-05-04
5,500+ Datasets for Machine Learning,95,g2ayde,datasets,https://lionbridge.ai/datasets/ultimate-dataset-aggregator-for-machine-learning/,2,,2020-04-16 08:21:36+00:00,2020-04-16
6,TED Talks – Ultimate Dataset,91,gfioy3,datasets,https://www.reddit.com/r/datasets/comments/gfioy3/ted_talks_ultimate_dataset/,5,[Kaggle link](https://www.kaggle.com/miguelcorraljr/ted-ultimate-dataset)\n\nI created a Python scraper – [(TEDscraper)](https://github.com/corralm/TEDscraper) – to scrape TED talk data including transcripts in over 100 languages from TED.com. \n\nI published the datasets for 12 languages on Kaggle [TED - Ultimate Dataset](https://www.kaggle.com/miguelcorraljr/ted-ultimate-dataset).\n\nHopefully this is useful for someone looking to do some NLP or other analysis.,2020-05-08 00:23:38+00:00,2020-05-08
7,400 NLP Datasets,90,gedfjp,datasets,https://datasets.quantumstat.com/,3,,2020-05-06 04:40:43+00:00,2020-05-06
8,The British Museum Just Made 1.9M Stunningly Detailed Images Free Online,91,gakear,datasets,https://www.vice.com/en_us/article/akw57z/the-british-museum-just-made-19m-stunningly-detailed-images-free-online,2,,2020-04-29 23:31:45+00:00,2020-04-29
9,"County Data Scraped from 3,142 Wikipedia Pages into 214 Columns",84,g8e9cq,datasets,https://github.com/dbabbitt/notebooks/blob/master/covid19/saves/csv/counties_df.csv,7,,2020-04-26 13:22:13+00:00,2020-04-26


In [113]:
reddit_posts['comments'] = reddit_posts.apply(get_comments, axis='columns')

In [115]:
reddit_posts[['title','post_body', 'num_comments', 'comments']]

Unnamed: 0,title,post_body,num_comments,comments
0,We've updated our database... malicious online activity related to Covid-19,"Shared this data last week and got some really great feedback. We've now got a partnership with a new WHOIS provider allowing us to paint an incredibly detailed picture of malicious online activity throughout the pandemic. \n\n\nI'm certain more can be done with the data we've pulled together. Please download it, play with it, let me know if you have any thoughts.\n\n [https://github.com/ProPrivacy/covid-19](https://github.com/ProPrivacy/covid-19) \n\n [https://proprivacy.com/tools/scam-w...",16,"[Very nice! Thank you for sharing., Super cool! But having trouble understanding what the threshold is for counting a domain as 'malicious'. In ProPrivacy\_VirusTotal.csv there's not one domain which has more reports of it being malicious than reports of it being harmless. Does it just take one malicious report to for the domain to be counted as malicious?, That's a lot of data crunching man. I thought VirusTotal's public API had a 1k a day limit?, This is amazing, I’m studying national secu..."
1,Introducing the Spotify Podcast Dataset,,2,"[Neat! I haven't seen a timeseries of text dataset of this magnitude before. (+- News data), Interesting! What a fascinating body of information to explore.]"
2,I've scraped around 800 million characters worth of comments from the top 50 subreddits,"Hi,\n\n&#x200B;\n\nI've been working on a machine learning side project amidst the quarantine, and for that, I have scraped around the 1000 top posts from the top 50 most subscribed subreddits, and saved 100 comments of each into a data set.\n\nI ended up going with different data for my project, but decided that I might as well share it.\n\n[You can find the dataset here](https://github.com/CrakenHUN/RedditCommentsDataset)\n\n&#x200B;\n\n[And in case you want to toy around with the scraper ...",41,"[This is the reason why I delete all my comments after I am done with reddit 😆, That's good. There are a lot of ""\[removed\]"" comments, though.\n\n[https://i.imgur.com/qd5HQD3.png](https://i.imgur.com/qd5HQD3.png), thanks for you publicly sharing it, I just forked it haha.\n\nMe coming from a very different domain, aspiring to have something to do productively with gathering and analyzing data, always am stuck with what people mean when they say “machine learning.” I’m still studying algebra..."
3,U.S. Supreme Court Case Dataset,"Hi Everyone,\n\nI created a dataset of cleaned Supreme Court transcripts (speaker name, speaker duration, court details, etc.) and information on Supreme Court justices (place of birth, age, race, parent's occupation, religion, etc.). The data was supposed to be used for a research project, but it ended up falling through. I wanted to share it here in case anyone could find any good uses for it. Here's a link to the [GitHub repo](https://github.com/EricWiener/supreme-court-cases). Please let...",15,"[Great dataset. I'm going to import it into Dolt ([https://www.dolthub.com](https://www.dolthub.com)) so people can query across it using SQL., Awesome! I started the same thing (justice demographics) a year ago and ran out of steam halfway through because I'm an idiot.\n\nSorry your project fell through, but it's great that you're sharing this work., Thanks so much for this. I have a question. We are interested in looking at individual federal judge decisions in criminal cases and controlli..."
4,Free graphical CSV file editor for Windows 10,"I wrote a graphical CSV file editor for my own needs and then made it user friendly, robust and fast enough so I could sell it on Microsoft Store. Unfortunately my marketing skills are not up to my coding and engineering skills, so not very many people are buying it... so I thought I could just as well give it away here on Reddit for free now. There's no catch, no ads or other annoyances - I really just want it to be put to use wherever it makes sense.\n\nIt's different from other CSV edito...",13,"[You definitely deserve to be paid for making that tool, I'm sure it'll be useful to so many people in this sub and other Business intelligence type subs. Awesome work!, This is awesome. Will definitely be downloading, That is a really impressive program you’ve built! I hope this post gets it some exposure and some downloads.\n\nCheers, This is really cool! and very neat User Interface. It looks a bit like a cheaper version of tableau. Does it work only with timed data or any kind of data ca..."
5,500+ Datasets for Machine Learning,,2,[Thanks for this suggestion. Led me to the dataset I am using for a school project.]
6,TED Talks – Ultimate Dataset,[Kaggle link](https://www.kaggle.com/miguelcorraljr/ted-ultimate-dataset)\n\nI created a Python scraper – [(TEDscraper)](https://github.com/corralm/TEDscraper) – to scrape TED talk data including transcripts in over 100 languages from TED.com. \n\nI published the datasets for 12 languages on Kaggle [TED - Ultimate Dataset](https://www.kaggle.com/miguelcorraljr/ted-ultimate-dataset).\n\nHopefully this is useful for someone looking to do some NLP or other analysis.,5,"[Worth spreading ;), Good job!, Going on my list of datasets to tackle, when I start my NLP journey. Interesting to see what the Big Ideas across regions and language., I wonder if there is a way to measure impact or spread of a TED talk. For example: related key words being mentioned \_after\_ versus \_before\_ the talk.]"
7,400 NLP Datasets,,3,"[This looks like a goldmine. I only wish most of them had more metadata associated with each text., Is there a way to download all of these bad boys at once?]"
8,The British Museum Just Made 1.9M Stunningly Detailed Images Free Online,,2,"[[Link](https://www.britishmuseum.org/collection) to said collection rather than Vice.., does anyone know if these images can be used in a blog that has ads on it?]"
9,"County Data Scraped from 3,142 Wikipedia Pages into 214 Columns",,7,[I started with the columns and rows in [https://www2.census.gov/programs-surveys/popest/tables/2010-2019/counties/totals/co-est2019-annres.xlsx](https://www2.census.gov/programs-surveys/popest/tables/2010-2019/counties/totals/co-est2019-annres.xlsx) and split the county/parish/borough/city names from their state names. Then I used the wikipedia python library (an API wrapper) to find the best URL for each row. Then I used the API wrapper and BeautifulSoup to scrape the infobox to get the co...
