# Collecting Data from the Subreddits

In this notebook we will aim to collect about 1000 posts per subreddit using PRAW.

In [None]:
# Imports
import praw

In [9]:

# Defining my API credentials
client_id = "my_client_id"
client_secret = "my_password"
user_agent = "data collection by u/my_user"

# Initializing PRAW
reddit = praw.Reddit(client_id=client_id,
                     client_secret=client_secret,
                     user_agent=user_agent)

In [19]:
# Defining a function to collect posts from r/askscience and r/explainlikeim5
def collect_posts(subreddit_name, limit=1000):
    subreddit = reddit.subreddit(subreddit_name)
    posts = []
    
    # Collecting posts from the subreddit
    for post in subreddit.hot(limit=limit):
        post_data = {
            'title': post.title,
            'body': post.selftext,
            'subreddit': subreddit_name,
            'post_id': post.id,
        }
        posts.append(post_data)
    
    return posts

askscience_posts = collect_posts('askscience', limit=1000)
explainlikeim5_posts = collect_posts('explainlikeimfive', limit=1000)

# Combining the posts into a single list
all_posts = askscience_posts + explainlikeim5_posts
print(f"Collected {len(all_posts)} posts from both subreddits.")

Collected 1335 posts from both subreddits.


In [29]:
import pandas as pd

# Converting the list of posts to a DataFrame
all_posts_df = pd.DataFrame(all_posts)

In [31]:
all_posts_df.head()

Unnamed: 0,title,body,subreddit,post_id
0,AskScience Panel of Scientists XXVII,**Please read this entire post carefully and f...,askscience,1i4r3rj
1,Meta: What's going on with funding for science...,Funding and support for science in the United ...,askscience,1it5o2t
2,Why are saturns rings seen as “flat” and not d...,B,askscience,1jbh4au
3,Does the microbiome of the human skin (eyelash...,There are a lot of things that live on the hum...,askscience,1janev9
4,Flu shots are a product of eggs. Is the curre...,Obviously the egg shortage is currently a prob...,askscience,1jam6a8


In [33]:
# Defining a function to collect posts(from the top section)
def collect_posts(subreddit_name, limit=1000):
    subreddit = reddit.subreddit(subreddit_name)
    posts = []
    
    # Collecting posts from the subreddit (using 'top' instead of 'hot')
    for post in subreddit.top(limit=limit):
        post_data = {
            'title': post.title,
            'body': post.selftext,
            'subreddit': subreddit_name,
            'post_id': post.id,
        }
        posts.append(post_data)
    
    return posts

explainlikeim5_posts = collect_posts('explainlikeimfive', limit=1000)
askscience_posts = collect_posts('askscience', limit=1000)


# Combining the posts into a single list and display a sample
all_posts = askscience_posts + explainlikeim5_posts
print(f"Collected {len(all_posts)} posts from both subreddits.")

Collected 1966 posts from both subreddits.


In [35]:
# Converting the list of posts to a DataFrame
new_posts_df = pd.DataFrame(all_posts)

In [39]:
# Combining the original dataframe with the new posts and removing the duplicates
all_posts_df = pd.concat([all_posts_df, new_posts_df], ignore_index=True).drop_duplicates(subset=['post_id'])

# Checking the final dataframe...
print(f"Total number of unique posts: {len(all_posts_df)}")
print(all_posts_df.head())

Total number of unique posts: 3292
                                               title  \
0               AskScience Panel of Scientists XXVII   
1  Meta: What's going on with funding for science...   
2  Why are saturns rings seen as “flat” and not d...   
3  Does the microbiome of the human skin (eyelash...   
4  Flu shots are a product of eggs.  Is the curre...   

                                                body   subreddit  post_id  
0  **Please read this entire post carefully and f...  askscience  1i4r3rj  
1  Funding and support for science in the United ...  askscience  1it5o2t  
2                                                  B  askscience  1jbh4au  
3  There are a lot of things that live on the hum...  askscience  1janev9  
4  Obviously the egg shortage is currently a prob...  askscience  1jam6a8  


In [47]:
all_posts_df['subreddit'].value_counts()

subreddit
askscience           1378
explainlikeimfive    1314
Name: count, dtype: int64

In [59]:
all_posts_df.head()

Unnamed: 0,title,body,subreddit,post_id
0,Why do planets keep rotating and revolving around the sun?,I mean why are they not being pulled in by Sun's gravity or where are they getting the energy to keep rotating? And what will happen if they suddenly stop revolving?,askscience,1fe2aa1
1,How do lasers intially start?,"For an atom/molecule in an excited state to release a photon, a photon of a similar wavelength must pass nearby. My question is where does the original photon come from to start the chain reaction?",askscience,1fd7lpa
2,"When the Andromeda Galaxy ""collides"" with the Milky Way, I understand it's highly unlikely that any celestial bodies will actually collide, but therefore I don't understand why a ""new"" combined galaxy will be formed. Why won't Andromeda just keep moving through us and carry on its way?",,askscience,1fct4nn
3,"AskScience AMA Series: We are students and faculty of the Molecular Engineering & Sciences Institute at the University of Washington. The field of Molecular Engineering is growing quickly. As one of only two US schools offering this program, we wish to spread awareness about our exciting field! AUA!","We are graduate students and faculty from the University of Washington Molecular Engineering and Science (MolES) PhD program. Molecular Engineering is a new field; we were one of the first Molecular Engineering graduate programs to appear in the world, and one of only two in the United States. Though our program only began in 2014, we have had many discoveries to share!\n\nMolecular engineering itself is a broad and evolving field that seeks to understand how molecular properties and interactions can be manipulated to design and assemble better materials, systems, and processes for specific functions. Any time you attempt to change the behavior of something by precisely altering it on a molecular level - given knowledge of how the molecules in that ""something"" interact with one another - you're engaging in a type of molecular engineering. The applications are limited only by your imagination! \n\nMolecular engineering is recognized by the National Academy of Engineering as one of the areas of education and research most critical to ensuring the future economic, environmental and medical health of the U.S. We would like to spread awareness about its applications, as well as the exciting opportunities that come with it. \n\nAs a highly interdisciplinary field spanning across the science and engineering space, students of Molecular Engineering have produced numerous impactful scientific discoveries. We specifically believe that Molecular Engineering could be an exciting avenue for up-and-coming young scientists, and thus we would like to broaden the general awareness of our discipline!\n\nHere to answer your questions are:\n\n+ **Suzie Pun** - ( /u/MolESAMA-SuziePun ) - Professor of Bioengineering, Director of MolES Institute \n + Research area: drug delivery, biomaterials, aptamers\n+ **Cole DeForest** - ( /u/profcole ) - Associate Professor of Bioengineering, MolES Director of Education\n + Research area: biomaterials, tissue engineering, drug delivery, protein engineering\n+ **Andre Berndt** - ( /u/Mystic_Scientist ) - Assistant Professor of Bioengineering\n + Research area: protein engineering, optogenetics, neuroscience\n+ **Jeff Nivala** - ( /u/technomolecularprof ) - Assistant Professor of Computer Science and Engineering\n + Research area: nanopore, synbio, molecular data storage and computing\n+ **David Bergsman** - ( /u/ProfBergsman ) - Assistant Professor of Chemical Engineering\n + Research area: thin films, atomic layer deposition, nanomaterials, membrane separations, catalysis, interfacial engineering\n+ **Doug Ballard** ( /u/UW-MolES ) - MolES Graduate Program Advisor\n+ **Justin Daho Lee** ( /u/MolES-Justin ) - Sixth Year PhD Student\n + Research area: protein engineering, optogenetics, neuroscience, stem cells\n+ **Evan Pepper** ( /u/evanpepper ) - Fifth Year PhD Student\n + Research area: microbiology, tuberculosis, antibiotic resistance\n+ **Ben Nguyen** ( /u/nguyencd296 ) - Fifth Year PhD Student\n + Research area: polymer chemistry, drug delivery\n+ **Gaby Balistreri** ( /u/GB_2022 ) - Fourth Year PhD Student\n + Research area: drug delivery, nanomedicine, nanoparticles, green engineering\n+ **Ariel Lin** ( /u/MolEgradstudent ) - Third Year PhD Student\n + Research area: open microfluidics, tissue engineering, bioanalytical chemistry, cell co-culture",askscience,1fcmmw7
4,"Does a vacuum have entropy? If so, is it high or low?","I'm not sure if this is the right way to ask the question. I've learnt that a vacuum isn't really empty or still, but quantum waves constantly fluctuate and virtual particles pop in and out of existence. With so much hidden activity going on that seems like high entropy. But there aren't any real particles, which seems like it would be the lowest possible entropy. So now I'm wondering if it makes sense to apply entropy to a vacuum at all?",askscience,1fcpcuy


In [57]:
# Reset the index after removing posts
all_posts_df.reset_index(drop=True, inplace=True)

# Check the dataframe
print(all_posts_df.head())

                                                                                                                                                                                                                                                                                                          title  \
0                                                                                                                                                                                                                                                    Why do planets keep rotating and revolving around the sun?   
1                                                                                                                                                                                                                                                                                 How do lasers intially start?   
2                When the Andromeda Galaxy "collides" with the Milky Way, I und

In [68]:
# Saving the combined DataFrame
#all_posts_df.to_csv('data/final_reddit_posts.csv', index=False)

Great, now that we have our data, in the next notebook we will be visualizing some preliminary insights.