### Step 1: Importing Necessary Libraries

We start by importing the libraries that are essential for our project. These libraries will help us work with data, create visualizations, and interact with the Reddit API.

- `import numpy as np`: NumPy is a library for numerical computations in Python. We use it for efficient array operations.
- `import pandas as pd`: Pandas is a library for data manipulation and analysis. It provides data structures like DataFrames.
- `import matplotlib.pyplot as plt`: Matplotlib is a popular library for creating visualizations, and `plt` is a common alias for it.
- `import praw`: PRAW stands for "Python Reddit API Wrapper," and it enables us to interact with Reddit's API programmatically.
- `import re`: The `re` library provides support for regular expressions, which we'll use for text pattern matching and extraction.

Now that we have our libraries loaded, we can proceed with the rest of our project.



In [14]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import praw
import re
import time

### Step 2: Authenticating with the Reddit API

To access Reddit's data programmatically, we need to authenticate our Python application using the `praw` library. Here's what each part of the code does:

- `import praw`: We import the `praw` library, which is a Python wrapper for the Reddit API. This library simplifies the process of interacting with Reddit's data.

- `reddit = praw.Reddit(...)`: We create a Reddit API client object named `reddit`. This object is used for making authenticated requests to the Reddit API. The constructor takes the following parameters:

   - `client_id`: This should be replaced with the unique identifier of your Reddit Developer Application, which you obtained when you created the application on the Reddit Developer Portal. It's used to identify your application when making API requests.

   - `client_secret`: This key should be replaced with the secret key provided during the creation of your Reddit Developer Application. It's a secret key that, when combined with the client ID, allows your application to securely authenticate with the Reddit API.

   - `user_agent`: The user agent is a string that identifies your application and its purpose. It's important to provide a user agent that follows Reddit's guidelines, typically including the name of your application and a version number. For personal projects, you can include your Reddit username or any other descriptive information.

With this authenticated `reddit` object, we can now access various Reddit data and perform operations like fetching posts, comments, and more, which will be an essential part of our project.
```

In [15]:
reddit = praw.Reddit(
    client_id='DAuF7LHCr_OM-_PGf-UBaw',
    client_secret='uor61HS6MgW5yHYBk8LcOmTlW5j5xQ',
    user_agent='Dry_Try8800',
)

### Step 3: Data Retrieval

We will use the Python Reddit API Wrapper (PRAW) library to access Reddit data. Before running the code, make sure you have installed the required libraries and configured your Reddit API credentials.

### Parameters

- **Subreddit**: We will retrieve posts from the 'all' subreddit, but you can replace it with a specific subreddit name of your choice.

- **Total Posts to Retrieve**: Adjust the `total_posts_to_retrieve` variable to set the number of posts you want to fetch. In this example, we aim to retrieve 500 posts.

- **Time Filter**: We will filter posts to cover a specific time period (e.g., 'year'). Modify the `time_filter` variable to adjust the time frame.

- **Batch Size**: Define the batch size for retrieving posts. The code will paginate through posts to respect Reddit's API rate limits. Adjust the `batch_size` variable as needed.

### Data Retrieval Loop

We use a `while` loop to retrieve posts in batches until we reach the desired number (`total_posts_to_retrieve`). Here's what each step does:

1. Calculate the remaining posts to retrieve in the current batch.
2. Introduce a 2-second delay between API requests to avoid rate limiting.
3. Make an API request to search for posts using the 'before' parameter for pagination.
4. Check if there are more posts to retrieve. If not, exit the loop.
5. Extend the `all_posts` list with the retrieved posts.
6. Update counters for retrieved posts and current batch.

## Data Processing

Once we have retrieved the data, we process it and create a DataFrame for analysis. Here's what the code does:

1. Initialize an empty list `post_data` to store post-related information.
2. Loop through each retrieved post:
   - Extract information such as author username, post title, URL, number of upvotes, and more.
   - Handle cases where author information might not be available (AttributeError).
   - Append the extracted data as a dictionary to the `post_data` list.

3. Create a DataFrame (`df`) using the `post_data` list, where each row represents a Reddit post, and each column represents a specific attribute.

## Data Analysis

With the DataFrame (`df`) in place, you can perform various analyses on the Reddit data, such as:

- Analyzing the distribution of upvotes and downvotes.
- Investigating trends in post creation times.
- Examining the most common words or hashtags used in post titles or text.
- Exploring relationships between variables, such as the number of comments and upvotes.

You can visualize these insights using Python libraries like Matplotlib, Seaborn, or NetworkX, depending on your analysis goals.

Feel free to expand on this notebook to perform more in-depth analyses and create visualizations based on your research questions related to the Ukraine conflict data on Reddit.


In [16]:
# Subreddit and data retrieval parameters
subreddit = reddit.subreddit('all')
total_posts_to_retrieve = 500  # Increase this as needed
time_filter = 'year'
batch_size = 50  # Adjust as needed

# Initialize variables
all_posts = []
retrieved_posts = 0
current_batch = 0

while retrieved_posts < total_posts_to_retrieve:
    remaining_posts = total_posts_to_retrieve - retrieved_posts
    posts_to_retrieve = min(remaining_posts, batch_size)

    time.sleep(2)

    # Use 'before' parameter to paginate through posts
    ukraine_posts = list(subreddit.search('Ukraine', limit=posts_to_retrieve, time_filter=time_filter, sort='top', params={'before': f'{current_batch}y'}))

    # Check if there are no more posts to retrieve
    if not ukraine_posts:
        break

    all_posts.extend(ukraine_posts)
    retrieved_posts += len(ukraine_posts)
    current_batch += 1

# Continue with post_data and DataFrame creation as before
post_data = []
for post in all_posts:
    try:
        author_name = post.author.name
        author_id = post.author.id
    except AttributeError:
        author_name = None
        author_id = None

    post_data.append({
        'Author Username': author_name,
        'Author ID': author_id,
        'Post Title': post.title,
        'Post ID': post.id,
        'Post URL': post.url,
        'Post Text': post.selftext,
        'Number of Upvotes': post.score,
        'Number of Downvotes': post.downs,
        'Number of Comments': len(post.comments),
        'Post Creation Time': post.created_utc,
        'Post Score': post.score,
        'URL of Post\'s Permalink': post.permalink,
        'Subreddit Name': post.subreddit.display_name,
        'Subreddit ID': post.subreddit_id,
        'Post Flair': post.link_flair_text,
        'Is the Post a Crosspost': post.is_crosspostable,
        'Number of Awards': len(post.all_awardings),
        'Author\'s Comment Karma': getattr(post.author, 'comment_karma', None),
        'Author\'s Post Karma': getattr(post.author, 'link_karma', None),
        'Post Domain': post.domain
    })

df = pd.DataFrame(post_data)

# Now, 'df' contains the data from all the retrieved Reddit posts in a DataFrame


In [17]:
df

Unnamed: 0,Author Username,Author ID,Post Title,Post ID,Post URL,Post Text,Number of Upvotes,Number of Downvotes,Number of Comments,Post Creation Time,Post Score,URL of Post's Permalink,Subreddit Name,Subreddit ID,Post Flair,Is the Post a Crosspost,Number of Awards,Author's Comment Karma,Author's Post Karma,Post Domain
0,qpgmr,5imsx,Miss Ukraine at the Miss Universe pageant,zwpt13,https://i.redd.it/azjgupq27i8a1.jpg,,140777,0,79,1.672174e+09,140784,/r/pics/comments/zwpt13/miss_ukraine_at_the_mi...,pics,t5_2qh0u,,False,0,91895.0,18545.0,i.redd.it
1,FuturisticFighting,34payxdx,A man invades the pitch at the World Cup in Qa...,z787py,https://www.reddit.com/gallery/z787py,,109884,0,70,1.669669e+09,109885,/r/interestingasfuck/comments/z787py/a_man_inv...,interestingasfuck,t5_2qhsa,/r/ALL,False,0,31133.0,1117785.0,reddit.com
2,bildo72,10rm74,President Biden Makes Surprise Visit to Ukraine,1172vx1,https://www.rollingstone.com/politics/politics...,,89266,0,78,1.676887e+09,89266,/r/worldnews/comments/1172vx1/president_biden_...,worldnews,t5_2qh13,Russia/Ukraine,False,0,221605.0,160413.0,rollingstone.com
3,HydrolicKrane,1wc7lrta,Analysis of Twitter algorithm code reveals soc...,129gpui,https://www.yahoo.com/news/analysis-twitter-al...,,83740,0,60,1.680430e+09,83741,/r/worldnews/comments/129gpui/analysis_of_twit...,worldnews,t5_2qh13,Russia/Ukraine,False,0,46600.0,969778.0,yahoo.com
4,Lithium321,4nivitpo,This is what the front line in Ukraine looks l...,zm2k3w,https://v.redd.it/4zef3278kx5a1,,83518,0,80,1.671053e+09,83511,/r/interestingasfuck/comments/zm2k3w/this_is_w...,interestingasfuck,t5_2qhsa,/r/ALL,False,0,44485.0,75941.0,v.redd.it
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
495,Disastrous_Gate_8193,l1yfa22h,"At night, Ukraine struck at the Kremlin with a...",136j12y,https://v.redd.it/9tgotoqd0mxa1,,50438,0,112,1.683117e+09,50433,/r/interestingasfuck/comments/136j12y/at_night...,interestingasfuck,t5_2qhsa,,False,0,2418.0,44622.0,v.redd.it
496,Gopu_17,ruz49h6l,"Ukraine's Zelensky Urges ""Complete Isolation"" ...",zm47ts,https://www.ndtv.com/world-news/russia-ukraine...,,50260,0,78,1.671057e+09,50256,/r/worldnews/comments/zm47ts/ukraines_zelensky...,worldnews,t5_2qh13,Russia/Ukraine,False,0,46760.0,111507.0,ndtv.com
497,Loolom,j0wc8,Ukraine claims its first kill of Russia's 'Ter...,10zmx2r,https://www.businessinsider.com/ukraine-claims...,,49515,0,47,1.676120e+09,49517,/r/worldnews/comments/10zmx2r/ukraine_claims_i...,worldnews,t5_2qh13,Behind Soft Paywall,False,0,240.0,22814.0,businessinsider.com
498,DmitriyBragin,7l373nrz,My hometown Kharkov in Ukraine 2022-2022,zh9wiw,https://www.reddit.com/gallery/zh9wiw,,49065,0,109,1.670623e+09,49068,/r/OldPhotosInRealLife/comments/zh9wiw/my_home...,OldPhotosInRealLife,t5_3exv5,Gallery,False,0,1113.0,168790.0,reddit.com
