<a href="https://colab.research.google.com/github/jben-hun/colab_notebooks/blob/master/algorithms/breadthDepthCommentTraversal.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Reddit comment traversal
Implementing breadth- and depth-first traversal and using them to fetch all comments from a reddit comment forest



In [1]:
!pip install -q praw

import praw
from collections import deque

client_id = "" #@param {type:"string"}
client_secret = "" #@param {type:"string"}
user_agent = "" #@param {type:"string"}

reddit = praw.Reddit(
    client_id=client_id,
    client_secret=client_secret,
    user_agent=user_agent)

[K     |████████████████████████████████| 153kB 2.7MB/s 
[K     |████████████████████████████████| 204kB 5.1MB/s 
[?25h

In [2]:
def traverse_comments(comments, *, breadth_first=False):
    queue = deque(comments[:])
    result = []
    while queue:
        e = queue.pop()
        if isinstance(e, praw.models.MoreComments):
            if breadth_first:
                queue.extendleft(e.comments())
            else:
                queue.extend(e.comments())
        else:
            if breadth_first:
                queue.extendleft(e.replies)
            else:
                queue.extend(e.replies)
            result.append(e.body)
    return result

**Supply a subreddit url**

Preferably an archived one, so the comments will not change during operation

In [3]:
submission_url = "https://www.reddit.com/r/aww/comments/fo6q11/his_favorite_place_is_his_bed/" #@param {type:"string"}

**Our depth-first traversal**

In [4]:
%time
comments_depthfirst = set(traverse_comments(
    reddit.submission(url=submission_url).comments))

CPU times: user 4 µs, sys: 1 µs, total: 5 µs
Wall time: 8.58 µs


**Our breadth-first traversal**

In [5]:
%time
comments_breadthfirst = set(traverse_comments(
    reddit.submission(url=submission_url).comments, breadth_first=True))

CPU times: user 2 µs, sys: 1e+03 ns, total: 3 µs
Wall time: 5.25 µs


**Built-in breadth-first traversal**

In [6]:
%time
submission = reddit.submission(url=submission_url)
submission.comments.replace_more(limit=None)
comments_builtin = set([comment.body for comment in submission.comments.list()])

CPU times: user 3 µs, sys: 0 ns, total: 3 µs
Wall time: 6.68 µs


**Result validation**

In [7]:
print("Results are equivalent:",
      comments_depthfirst ==
      comments_breadthfirst ==
      comments_builtin)

Results are equivalent: True


# References

*   https://praw.readthedocs.io/en/latest/tutorials/comments.html
*   https://en.wikipedia.org/wiki/Depth-first_search
*   https://en.wikipedia.org/wiki/Breadth-first_search