### Scraping Reddit's Posts and Articles



In this project, we focus on extracting posts and articles from Reddit using web scraping techniques. Reddit is a vast platform with a wealth of user-generated content, making it an excellent resource for various applications, including sentiment analysis, trend monitoring, and topic modeling.

#### Objectives
- **Data Collection**: Retrieve posts and articles from specific subreddits based on health topic  criteria (e.g., r/health). 
- **Data Processing**: Clean and preprocess the extracted data to prepare it for analysis. 
- **Data Storage**: Store the scraped data in a structured format, such as CSV or a database, for further analysis.
#### Tools and Technologies
- **Python**: The primary programming language for web scraping. !
- **Beautiful Soup**: A library for parsing HTML and extracting data. 
- **Requests**: A library for making HTTP requests to access web pages. 
- **Pandas**: A data manipulation library to handle and analyze the scraped data. 

#### Getting Started
1. **Set Up the Environment**: Install the necessary libraries using pip.
2. **Define Scraping Logic**: Write functions to scrape data from specific subreddits.
3. **Run the Scraper**: Execute the scraping script and monitor the data collection process.
4. **Analyze the Data**: Use Pandas to analyze the collected posts and articles for insights.

#### Conclusion
This project serves as a practical introduction to web scraping and data analysis using Python, providing valuable experience in handling real-world data from a popular online community.


<p style="color:#FE4406;text-align:center;font-size:30px"> Scraping Reddit's  Posts And Articles </p>

In [2]:
!pip install bs4
!pip install selenium




[notice] A new release of pip is available: 24.0 -> 24.2
[notice] To update, run: python.exe -m pip install --upgrade pip


In [57]:
# importing packages
import requests
from bs4 import BeautifulSoup

### Scraping Reddit

In [58]:
## importing libraries
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
import time
from bs4 import BeautifulSoup

### Scraping New Reddit

In [3]:
indexOfUrl = 0
consequtiveFails = 0
postsLinks = []  # To track found links

In [4]:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import time
from bs4 import BeautifulSoup
from datetime import datetime

# Replace with your Reddit credentials
REDDIT_USERNAME = 'Ok-Count905'
REDDIT_PASSWORD = 'hb20213323'

# URLs to scrape
urls = [
    'https://www.reddit.com/r/Health/top/?t=year',
    'https://www.reddit.com/r/Health/top/?t=hour',
    'https://www.reddit.com/r/Health/top/?t=week',
    'https://www.reddit.com/r/Health/top/?t=month',
    'https://www.reddit.com/r/Health/top/?t=all',
    'https://www.reddit.com/r/Health/hot/',
    'https://www.reddit.com/r/Health/new/',
    'https://www.reddit.com/r/Health/top/',
    'https://www.reddit.com/r/Health/rising/'
]

posts = []  # List to store post data
consecutiveFails = 0  # Initialize the fail counter


def login(driver):
    """Log in to Reddit using the provided credentials."""
    driver.get('https://www.reddit.com/login/')
    WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.NAME, 'username')))

    # Locate and input username and password
    driver.find_element(By.NAME, 'username').send_keys(REDDIT_USERNAME)
    driver.find_element(By.NAME, 'password').send_keys(REDDIT_PASSWORD)

    # Submit login form
    driver.find_element(By.NAME, 'password').send_keys(Keys.RETURN)
    time.sleep(5)  # Wait for login to complete


def scrapReddit(index):
    """Scrape Reddit posts from the specified URL index."""
    global consecutiveFails
    driver = webdriver.Edge()

    try:
        # Log in to Reddit
        login(driver)

        # Load the page
        driver.get(urls[index])
        time.sleep(5)

        while True:
            page_source = driver.page_source

            # Scroll down to load more content
            driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
            time.sleep(5)

            # Parse page with BeautifulSoup
            soup = BeautifulSoup(page_source, 'html.parser')

            # Find all posts using the shreddit-post class
            post_elements = soup.find_all('shreddit-post')  # Use class shreddit-post
            new_content_found = False

            for post_element in post_elements:
                # Extract subreddit ID
                subredditId = post_element.get('subreddit-id')

                # Extract post ID
                postId = post_element.get('id')

                # Extract post title and link
                postTitle = post_element.get('post-title')
                postLink = post_element.get('content-href')  # Get the post link from content-href

                # Extract number of comments and comments link
                commentsNumber = post_element.get('comment-count', '0')
                commentsLink = "https://www.reddit.com" + post_element.get('permalink')


                # Extract creation time (if available)
                createdAt = post_element.get('created-timestamp')

                # Record the time when data was collected
                collectedAt = datetime.now().strftime('%Y-%m-%d %H:%M:%S')

                # Ensure uniqueness by checking if the post already exists
                if postLink not in [post['postLink'] for post in posts]:
                    new_content_found = True

                    # Add post to the list
                    post_data = {
                        "postId": postId,
                        "postTitle": postTitle,
                        "postLink": postLink,  # Keep this unchanged
                        "commentsNumber": commentsNumber,
                        "commentsLink": commentsLink,  # Keep this unchanged
                        "createdAt": createdAt,
                        "collectedAt": collectedAt
                    }
                    posts.append(post_data)
                    print(post_data)

            if not new_content_found:
                print("No new content found, waiting...")
                time.sleep(2)
                consecutiveFails += 1

                if consecutiveFails >= 10:
                    consecutiveFails = 0
                    driver.quit()  # Close WebDriver before moving to the next URL

                    if index + 1 < len(urls):
                        scrapReddit(index + 1)  # Scrape the next URL
                    else:
                        print("All URLs processed.")
                        break
                    return
            else:
                consecutiveFails = 0  # Reset fail count if new content is found
                time.sleep(1)

    finally:
        # Ensure WebDriver is closed
        driver.quit()


# Start scraping from the first URL (index 0)
scrapReddit(0)


{'postId': 't3_1d2129h', 'postTitle': 'Chinese scientists develop cure for diabetes, insulin patient becomes medicine-free in just 3 months', 'postLink': 'https://economictimes.indiatimes.com/industry/healthcare/biotech/healthcare/chinese-scientists-develop-cure-for-diabetes-insulin-patient-becomes-medicine-free-in-just-3-months/articleshow/110466659.cms?from=mdr', 'commentsNumber': '442', 'commentsLink': 'https://www.reddit.com/r/Health/comments/1d2129h/chinese_scientists_develop_cure_for_diabetes/', 'createdAt': '2024-05-27T20:13:21.856000+0000', 'collectedAt': '2024-10-16 11:21:25'}
{'postId': 't3_18br0gv', 'postTitle': 'Man dies on way home from Panera after having three “charged” lemonades', 'postLink': 'https://arstechnica.com/health/2023/12/man-dies-on-way-home-from-panera-after-having-three-charged-lemonades/', 'commentsNumber': '757', 'commentsLink': 'https://www.reddit.com/r/Health/comments/18br0gv/man_dies_on_way_home_from_panera_after_having/', 'createdAt': '2023-12-06T00:2

In [7]:
import pandas as pd 
redditPost=pd.DataFrame(posts)
oldRedditPosts=pd.read_csv("../data/redditPosts.csv")


In [22]:
collectedWebsites=pd.concat([redditPost,oldRedditPosts],ignore_index=True)

In [24]:
collectedWebsites=collectedWebsites.drop(["id"],axis=1)

In [25]:
collectedWebsites.head()

Unnamed: 0,postId,postTitle,postLink,commentsNumber,commentsLink,createdAt,collectedAt,source
0,t3_1d2129h,"Chinese scientists develop cure for diabetes, ...",https://economictimes.indiatimes.com/industry/...,442,https://www.reddit.com/r/Health/comments/1d212...,2024-05-27T20:13:21.856000+0000,2024-10-16 11:21:25,new Reddit
1,t3_18br0gv,Man dies on way home from Panera after having ...,https://arstechnica.com/health/2023/12/man-die...,757,https://www.reddit.com/r/Health/comments/18br0...,2023-12-06T00:23:50.579000+0000,2024-10-16 11:21:25,new Reddit
2,t3_17kdvry,"1 in 4 US medical students consider quitting, ...",https://thehill.com/policy/healthcare/4283643-...,630,https://www.reddit.com/r/Health/comments/17kdv...,2023-10-31T05:31:31.851000+0000,2024-10-16 11:21:25,new Reddit
3,t3_18uz6x3,Man who didn't sleep for a record 264 hours su...,https://www.unilad.com/community/life/man-didn...,364,https://www.reddit.com/r/Health/comments/18uz6...,2023-12-31T04:22:42.395000+0000,2024-10-16 11:21:25,new Reddit
4,t3_1avfyex,Measles erupts in Florida school where 11% of ...,https://arstechnica.com/science/2024/02/measle...,267,https://www.reddit.com/r/Health/comments/1avfy...,2024-02-20T11:58:27.393000+0000,2024-10-16 11:21:25,new Reddit


In [26]:
## save collected data to .csv file 
collectedWebsites=collectedWebsites.to_csv("../data/redditPostsAll.csv")

## Scrap Old Reddit

In [2]:
# Set to hold unique posts (to avoid duplicates)
found_posts = []

In [17]:
from selenium import webdriver
from selenium.webdriver.common.by import By
import time
from bs4 import BeautifulSoup
from datetime import datetime

# Set up the Selenium WebDriver
driver = webdriver.Chrome()  # Ensure you have the correct WebDriver

# URL to scrape
url = 'https://old.reddit.com/r/Health/top/?sort=top&t=all'
driver.get(url)

# Allow the page to load
time.sleep(5)

def scrape_current_page():
    # Get the page source
    page_source = driver.page_source
    soup = BeautifulSoup(page_source, 'html.parser')

    # Find all posts in the page (use the common class or structure to target posts)
    post_elements = soup.find_all('div', class_='thing')

    new_content_found = False

    for post_element in post_elements:
        post_data = {}
        try:
            # Extract relevant attributes
            commentsCount = post_element.get('data-comments-count', '0')  # Number of comments
            postLink = post_element.get('data-url')  # Post URL
            commentsLink = "https://www.reddit.com" + post_element.get('data-permalink', '')  # Comments link
            postId = post_element.get('data-fullname', '')  # Post ID
            createdAt = post_element.get('data-timestamp')  # Created timestamp
            subreddit = post_element.get('data-subreddit', '')  # Subreddit name

            # Extract post title from the appropriate inner element
            postTitle = post_element.find('a', class_='title').text.strip()

            # Convert timestamp to readable format (if available)
            if createdAt:
                createdAt = datetime.fromtimestamp(int(createdAt) / 1000).strftime('%Y-%m-%d %H:%M:%S')

            # Record the time when data was collected
            collectedAt = datetime.now().strftime('%Y-%m-%d %H:%M:%S')

            # Ensure uniqueness by checking if the post already exists
            if postLink and postLink not in [post['postLink'] for post in posts]:
                new_content_found = True

                # Add post to the list
                post_data = {
                    "postId": postId,
                    "subreddit": subreddit,
                    "postTitle": postTitle,
                    "postLink": postLink,
                    "commentsCount": commentsCount,
                    "commentsLink": commentsLink,
                    "createdAt": createdAt,
                    "collectedAt": collectedAt
                }
                posts.append(post_data)
                print(post_data)
        except Exception as e:
            print(f"Error processing post: {e}")

    return new_content_found

while True:
    # Scrape content from the current page
    content_found = scrape_current_page()

    # Scroll down to load more posts (if there are posts left on the current page)
    driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
    time.sleep(5)  # Wait for new posts to load

    # Find the "next" button and click it to navigate to the next page
    try:
        next_button = driver.find_element(By.CSS_SELECTOR, 'div.nav-buttons .next-button a')
        if next_button:
            print("Clicking 'suivant' to load more content...")
            next_button.click()
            time.sleep(5)  # Allow time for new content to load after clicking
        else:
            print("No 'suivant' button found. Exiting.")
            break
    except Exception as e:
        print(f"Error while trying to click 'suivant': {e}")
        break

    # Exit loop if no new content is found
    if not content_found:
        print("No new posts found, waiting for more content...")
        time.sleep(5)
    else:
        time.sleep(2)  # Shorter wait if new content is found


{'postId': 't3_1fknlzi', 'subreddit': 'u_SASSoftware_Official', 'postTitle': 'Built for developers and modelers, Viya Workbench is a self-service, on-demand environment for analytical development, including building AI and ML!', 'postLink': 'https://alb.reddit.com/cr?za=WRynO0olptj88HcBhYy5BQiTQAJ9wQ6WYZIDpzwxOwElkigQGZ_HsSbOCHrSTLpK47H-cfL2gJP51XE5Q2vZ-kI0HuUdHLTyfblCPueSiP1d57_b4F4tHENr-EePfxhlxIW4U6oPO8PEyVfHXQllv9l2O5as8cBW72cTHoT-19I5CnMIZ7hni2lg0rfKXiiwJ6VomHRMw7Ygz9Z72iCuC30zgI-G9BdjwW1VrXA5TUEWFA-wUpGm5HrI7e8JD42fJNBk8U9mif4HSBAspeeqLyybvlLzHlxaNjEumcvqAA1VX0Vs_-W2PWZ8HMq9uu7vr-Pt9Dm8_I3Aw6k5VhzeGqN4tUv7vFkYmPlSZ1Wfj6zoDarVTqbbinLpnCy4XrvPqhfPzCEs4BSCTFgM4fJShfFspk3zTsSRnKwWoVgG1B-85yGx_ep9cr-mB06tiKBHhBnATDrrJvJw0M6V8iXuf6M2R371--jZNuBSiHK2wHQSYMLECXSBiBwgrkbN_8FxfZllCp5ikqCaf8Xjil6yArWuVw&zp=BaR4A3AcnhtufrWjULkVfpgS4HTr5ioRrCnTHSVdZ4TjOuGCo4q8L9ddSvJJtFWfv5wXsVsN45Cxy-hivbf3MjF4N3YFwM3g2Zy1MKu3DPQA0iXOtWp2YODsqCbHMGnWhj9HXQQYB3z145IZgtQCR04O7FrMKWejQRNp2wr5s3sfPOpTqhrP02Duk_2

In [57]:
# saving the old reddit post to  the dataset 
import pandas as pd 
dataset=pd.read_csv("../data/redditPostsAll.csv")
#convert the collected posts to a dataframe 
posts=pd.DataFrame(posts)
posts.rename(columns={'commentsCount': 'commentsNumber'}, inplace=True)
posts["source"]="old Reddit"
posts.drop(columns=["subreddit"],axis=1,inplace=True)
dataset=pd.concat([dataset,posts],ignore_index=True)
dataset.drop(columns=['Unnamed: 0.1', 'Unnamed: 0'], inplace=True)
dataset=dataset.to_csv("../data/redditPostsAll.csv")

In [66]:
dataset=pd.read_csv("../data/redditPostsAll.csv")

In [69]:
dataset.tail()

Unnamed: 0,index,postId,postTitle,postLink,commentsNumber,commentsLink,createdAt,collectedAt,source
6304,6304,t3_1alam2z,Explore Rabbit Air - Your Ultimate Answer for ...,https://alb.reddit.com/cr?za=7m3mXW_HTPEV9y3Pj...,0,https://www.reddit.com/user/RabbitAir/comments...,2024-02-07 19:59:28,2024-10-17 07:20:03,old Reddit
6305,6305,t3_1fzxllt,The Yakity Yak - A weekly newsletter for thera...,https://alb.reddit.com/cr?za=BDYOx7VrKZkoCqbEp...,0,https://www.reddit.com/user/PracticeYak/commen...,2024-10-09 18:53:09,2024-10-17 07:20:17,old Reddit
6306,6306,t3_1alamy2,Discover the game-changing world of Rabbit Air...,https://alb.reddit.com/cr?za=2WynUYT5qsQkW3J0M...,0,https://www.reddit.com/user/RabbitAir/comments...,2024-02-07 20:00:23,2024-10-17 07:20:31,old Reddit
6307,6307,t3_1g17u9e,Join us on October 23rd for the ToughTech even...,https://i.redd.it/4j6beg4k74ud1.png,2,https://www.reddit.com/user/toughlex/comments/...,2024-10-11 12:46:18,2024-10-17 07:20:45,old Reddit
6308,6308,t3_1fzxlhj,Oh great...ANOTHER AD 🙄. Yea...we get it. Ads ...,https://alb.reddit.com/cr?za=R-hfS3-jcnCfgkXpj...,0,https://www.reddit.com/user/PracticeYak/commen...,2024-10-09 18:53:02,2024-10-17 07:20:59,old Reddit


## Scraping Reddit Posts using API and PRAW

In [1]:
!pip install praw


Collecting praw
  Downloading praw-7.7.1-py3-none-any.whl.metadata (9.8 kB)
Collecting prawcore<3,>=2.1 (from praw)
  Downloading prawcore-2.4.0-py3-none-any.whl.metadata (5.0 kB)
Collecting update-checker>=0.18 (from praw)
  Downloading update_checker-0.18.0-py3-none-any.whl.metadata (2.3 kB)
Downloading praw-7.7.1-py3-none-any.whl (191 kB)
   ---------------------------------------- 0.0/191.0 kB ? eta -:--:--
   ------ -------------------------------- 30.7/191.0 kB 660.6 kB/s eta 0:00:01
   ------------ -------------------------- 61.4/191.0 kB 544.7 kB/s eta 0:00:01
   ------------------ -------------------- 92.2/191.0 kB 751.6 kB/s eta 0:00:01
   ------------------------------ ------- 153.6/191.0 kB 833.5 kB/s eta 0:00:01
   ---------------------------------- --- 174.1/191.0 kB 748.1 kB/s eta 0:00:01
   -------------------------------------- 191.0/191.0 kB 723.1 kB/s eta 0:00:00
Downloading prawcore-2.4.0-py3-none-any.whl (17 kB)
Downloading update_checker-0.18.0-py3-none-any.whl (7


[notice] A new release of pip is available: 24.0 -> 24.2
[notice] To update, run: python.exe -m pip install --upgrade pip


In [47]:
import praw
import datetime

# Replace the following with your own credentials
client_id = 'ehjWti0nzD5TmNkE4VQrXA'
client_secret = '7SIcB88p7V1EzwEuEC4pol-VD0PwtA'
user_agent = 'meCare'

# Initialize Reddit instance
reddit = praw.Reddit(
    client_id=client_id,
    client_secret=client_secret,
    user_agent=user_agent, 
    password='hb20213323',
)
# Select the subreddit
subreddit = reddit.subreddit('health')  # Replace 'health' with your desired subreddit

# Function to fetch posts from a specific method
def fetch_posts(subreddit, method, limit=None):
    posts = []
    for submission in method(limit=limit):  # Fetch posts from the selected method
        x=(submission.num_comments)
        posts.append({
            'title': submission.title,
            'score': submission.score,
            'id': submission.id,
            'url': submission.url,
            'num_comments': x,
            'comments_link': submission.permalink,
            'created': submission.created,
            'body': submission.selftext
        })
    return posts

# Fetch posts from all categories
all_posts = []
all_posts.extend(fetch_posts(subreddit, subreddit.hot, limit=None))   # Hot posts
all_posts.extend(fetch_posts(subreddit, subreddit.new, limit=None))   # New posts
all_posts.extend(fetch_posts(subreddit, subreddit.top, limit=None))   # Top posts
all_posts.extend(fetch_posts(subreddit, subreddit.rising, limit=None))# Rising posts

# Display fetched posts
print(f"Total posts fetched: {len(all_posts)}")
foundPosts=[]

for post in all_posts:
    postElement={}
    postElement["postId"]=post['id']
    postElement["postTitle"]=post['title']
    postElement["postLink"]=post['url']
    postElement["commentsNumber"]=post['num_comments']
    postElement["commentsLink"]=post['comments_link']
    postElement["createdAt"]=post['created']
    postElement["collectedAt"]=datetime.datetime.now()
    foundPosts.append(postElement)


Total posts fetched: 1749


##### Converting the collected posts form reddit using Reddit's Api to a .csv file

In [56]:
# Converting the collected posts form reddit using the api to .csv file
import pandas as pd 
collectedPosts=pd.DataFrame(foundPosts)
collectedPosts=collectedPosts.to_csv("../data/redditPosts.csv")

<p style="color:#28A745;text-align:center;font-size:30px"> Scraping Reddit's  Questions And Answers </p>

<p style="color:#FFC107;text-align:left;font-size:20px"> Searching for Reddit's health related topics  </p>

In [16]:
from selenium import webdriver
from selenium.webdriver.common.by import By
import time
from bs4 import BeautifulSoup

# Set up the Selenium WebDriver
driver = webdriver.Chrome()  # Ensure you have the correct WebDriver
communities = []

# URL to scrape
url = 'https://www.reddit.com/r/Health/wiki/communities/'
driver.get(url)

# Allow the page to load
time.sleep(5)

def scrape_current_page():
    # Get the page source
    page_source = driver.page_source
    soup = BeautifulSoup(page_source, 'html.parser')

    # Find all posts in the page (use the common class or structure to target posts)
    post_elements = soup.find_all('div', class_='md wiki')

    new_content_found = False

    for post_element in post_elements:
        communities_list = None
        try:
            communities_list = post_element.find("ul")
            if communities_list:  # Ensure the list exists
                for community_link in communities_list.find_all("li"):
                    topic={}
                    link = community_link.find("a")
                    if link:  # Ensure the link exists
                        topic["topicLink"]="https://www.reddit.com"+link.get("href") # Use get_text instead of get("text")
                        topic["topicName"]=(link.get_text(strip=True)) # collect topics 
                        communities.append(topic)
                        new_content_found = True  # Mark that new content was found
        except Exception as e:
            print(f"Error processing post: {e}")

    return new_content_found


# Scrape content from the current page
content_found = scrape_current_page()


if not content_found:
    print("No new posts found, waiting for more content...")
    time.sleep(5)


# Close the driver when done
driver.quit()

# Optionally print the communities collected
print(communities)


[{'topicLink': 'https://www.reddit.com/r/ADHD', 'topicName': 'ADHD'}, {'topicLink': 'https://www.reddit.com/r/alternativeHealth', 'topicName': 'AlternativeHealth'}, {'topicLink': 'https://www.reddit.com/r/Amblyopia', 'topicName': 'Amblyopia'}, {'topicLink': 'https://www.reddit.com/r/anxiety', 'topicName': 'Anxiety'}, {'topicLink': 'https://www.reddit.com/r/thritis', 'topicName': 'Arthritis'}, {'topicLink': 'https://www.reddit.com/r/AskDocs', 'topicName': 'Ask Docs'}, {'topicLink': 'https://www.reddit.com/r/aspergers', 'topicName': 'Aspergers'}, {'topicLink': 'https://www.reddit.com/r/Asthma', 'topicName': 'Asthma'}, {'topicLink': 'https://www.reddit.com/r/Autoimmune', 'topicName': 'Autoimmune'}, {'topicLink': 'https://www.reddit.com/r/BipolarReddit', 'topicName': 'BipolarReddit'}, {'topicLink': 'https://www.reddit.com/r/BPD', 'topicName': 'Borderline Personality Disorder'}, {'topicLink': 'https://www.reddit.com/r/BreastCancer', 'topicName': 'Breast Cancer'}, {'topicLink': 'https://www.

In [17]:
#saving topics to .csv file 
import pandas as pd
communities=pd.DataFrame(communities)
communities=communities.to_csv("../data/redditHealthTopics.csv")


['ADHD',
 'alternativeHealth',
 'Amblyopia',
 'anxiety',
 'thritis',
 'AskDocs',
 'aspergers',
 'Asthma',
 'Autoimmune',
 'BipolarReddit',
 'BPD',
 'BreastCancer',
 'cancer',
 'CaregiverSupport',
 'cfs',
 'ChronicPain',
 'cleftlip',
 'chd',
 'CrohnsDisease/',
 'dementia/',
 'Dentistry/',
 'Depression',
 'diabetes',
 'Diagnosed',
 'dysautonomia',
 'dystonia',
 'eczema',
 'fibro',
 'Fitness',
 'flu',
 'GravesDisease',
 'Green',
 'Gastroparesis',
 'globalhealth',
 'Gutscience',
 'healthcare',
 'Hemophilia',
 'hepc',
 'HIIT',
 'Hypothyroidism',
 'ibs',
 'Infertility',
 'itsneverlupus',
 'interstitialcystitis',
 'juicing',
 'Keratoconus',
 'kidneystones',
 'Kinesiology',
 'lactoseintolerant',
 'longevity/',
 'maculardegeneration',
 'MadOver30',
 'malesupportnetwork',
 'Massage',
 'Medical_Students/',
 'medicine',
 'menieres',
 'MentalHealth',
 'MultipleSclerosis',
 'NaturalBeauty',
 'NoMoreGaming',
 'nutrition',
 'optometry',
 'Paleo',
 'PancreaticCancer',
 'parkinsons',
 'Pescetarian',
 'p