For this research on the impact of ChatGPT on higher education, we selected subreddits that discuss topics related to AI and education technology. Our goal was to ensure that the data collected is relevant, representative, and rich in content.

We selected subreddits based on the following criteria:
- **Relevance**: Subreddits discussing AI or education technology topics.
- **Activity Level**: Subreddits with a high recent activity level.
- **Community Size**: Subreddits with a large number of subscribers.
- **Engagement**: Subreddits with high engagement levels (e.g., comments per post).
- **Language**: English-language subreddits.

We used PRAW, a Python wrapper for the Reddit API, to collect data on potential subreddits. We gathered information on the number of subscribers, active users, average comments per post, and descriptions to evaluate their relevance and activity levels.

In [3]:
import praw
import pandas as pd
from prawcore.exceptions import NotFound, Forbidden

# Initialize the Reddit instance using the configuration from praw.ini
reddit = praw.Reddit('DEFAULT')

# List of subreddits categorized
subreddits = {
    "AI-related": ["OpenAI", "MachineLearning", "artificial", "ChatGPT", "ArtificialInteligence", "Automate", "technology", "singularity", "agi", "CharacterAI", "Midjourney", "ChatGPTPro", "ChatGPTCoding", "weirddalle", "ChatGPTPromptGenius", "Futurology", "GPT3"],
    "Education Technology-related": ["education", "edtech", "Teachers", "highereducation", "aiclass", "teaching", "CSEducation", "learnpython", "AskAcademia", "AskAcademiaUK", "Indian_Academia", "GradSchool", "Professors", "College", "Python", "PhD"]
}

def get_subreddit_data(subreddit_name):
    try:
        subreddit = reddit.subreddit(subreddit_name)
        # Calculate average comments per post in the top 50 posts
        top_posts = list(subreddit.top(limit=50))
        avg_comments_per_post = sum(post.num_comments for post in top_posts) / len(top_posts) if top_posts else 0
        
        return {
            'name': subreddit.display_name,
            'title': subreddit.title,
            'description': subreddit.public_description,
            'subscribers': subreddit.subscribers,
            'active_users': subreddit.accounts_active,
            'url': subreddit.url,
            'avg_comments_per_post': avg_comments_per_post
        }
    except (NotFound, Forbidden):
        print(f"Subreddit '{subreddit_name}' not found or access is forbidden.")
        return None

# Collect data for each subreddit in the lists
all_subreddit_data = []
for category, subs in subreddits.items():
    for subreddit_name in subs:
        data = get_subreddit_data(subreddit_name)
        if data:
            data['category'] = category
            all_subreddit_data.append(data)

# Convert to DataFrame
df = pd.DataFrame(all_subreddit_data)

# Display DataFrame
print(df)

# Save DataFrame to a CSV file
df.to_csv('subreddit_data.csv', index=False)


                     name                                              title  \
0                  OpenAI                                             OpenAI   
1         MachineLearning                                   Machine Learning   
2              artificial                            Artificial Intelligence   
3                 ChatGPT                                            ChatGPT   
4   ArtificialInteligence                    Artificial Intelligence Gateway   
5                Automate                         The future is automation !   
6              technology                                     /r/Technology    
7             singularity                                        Singularity   
8                     agi  Artificial General Intelligence - Strong AI Re...   
9             CharacterAI                                       Character.AI   
10             Midjourney                                         midjourney   
11             ChatGPTPro               