# Find Comment IDs from Submission IDs
In this notebook, I will show you how to use the `search_submission_comment_ids` method from `PMAW` to retrieve all the Reddit comment IDs for an array of submission IDs. You can view details about this endpoint in the Pushshift [documentation](https://github.com/pushshift/api#get-all-comment-ids-for-a-particular-submission).

In [1]:
import pandas as pd
from pmaw import PushshiftAPI

In [2]:
# instantiate
api = PushshiftAPI()

## Data Preparation

In [3]:
# import test data into a dataframe
posts_df = pd.read_csv(f'./test_data.csv', delimiter=';', header=0)
posts_df.head(5)

Unnamed: 0,all_awardings,allow_live_comments,author,author_flair_css_class,author_flair_richtext,author_flair_text,author_flair_type,author_fullname,author_patreon_flair,author_premium,...,author_cakeday,distinguished,suggested_sort,crosspost_parent,crosspost_parent_list,category,top_awarded_type,poll_data,steward_reports,comment_ids
0,[],False,nf_hades,,[],,text,t2_hriq1b,False,False,...,,,,,,,,,,"gjacwx5,gjad2l6,gjadatw,gjadc7w,gjadcwh,gjadgd..."
1,[],False,MyLittleDeku,,[],,text,t2_7dj62vj2,False,False,...,,,,,,,,,,"gjacn1r,"
2,[],False,lilirucaarde12,,[],,text,t2_6i04uaxw,False,False,...,,,,,,,,,,"gjac5fb,gjacdy5,gjaco45,gjasj4f,gjbxfeg,"
3,[],False,[deleted],,,,,,,,...,,,,,,,,,,"gjac9d6,"
4,[],False,sirdimpleton,,[],,text,t2_bznmn4i,False,False,...,,,,,,,,,,"gjaocmg,gjb2jsj,gjbisrw,gjbjbk8,"


In [4]:
len(posts_df)

2500

The data in `posts_df`, contains 2500 submissions and their respective metadata extracted from a subreddit submission search, the `comment_ids` was added post-search with additional requests.

In [5]:
# create submission ID list
post_ids = list(posts_df.loc[:, 'id'])
post_ids[:10]

['kxi2w8',
 'kxi2g1',
 'kxhzrl',
 'kxhyh6',
 'kxhwh0',
 'kxhv53',
 'kxhm7b',
 'kxhm3s',
 'kxhg37',
 'kxhak9']

## Comment IDs for a Single Submission

In [6]:
comment_id_dict = api.search_submission_comment_ids(ids=post_ids[0])

Total Success Rate: 100.00% -- Total Reqs: 1 -- Num Retries: 0


In [7]:
comment_id_dict

{'kxi2w8': ['gjacwx5',
  'gjad2l6',
  'gjadatw',
  'gjadc7w',
  'gjadcwh',
  'gjadgd7',
  'gjadlbc',
  'gjadnoc',
  'gjadog1',
  'gjadphb',
  'gjadtz3',
  'gjaduck',
  'gjadxa0',
  'gjaeb3p',
  'gjaeb5o',
  'gjaeg5d',
  'gjaegdn',
  'gjaemkt',
  'gjaenva',
  'gjaerpm',
  'gjaex2y',
  'gjaf5nv',
  'gjaim0d',
  'gjapx5s',
  'gjaqruo',
  'gjarqic']}

## Comment IDs for Multiple Submissions

In [8]:
%%time
comment_id_dict = api.search_submission_comment_ids(ids=post_ids)

Total Success Rate: 89.76% -- Total Reqs: 2500 -- Num Retries: 0
Total Success Rate: 89.44% -- Total Reqs: 2756 -- Num Retries: 1
Total Success Rate: 89.32% -- Total Reqs: 2791 -- Num Retries: 2
Total Success Rate: 89.35% -- Total Reqs: 2798 -- Num Retries: 3
Wall time: 41min 55s


At a the default rate limit of 60 requests per minute, which is equal to 1 second per request, we would expect 2500 requests to take 41min 40s. With a completion time of 41min 55s, which includes rejected requests (89.35% success rate), we achieve a rate of 1.008 seconds per request, 0.8% slower than expected.

### Save Comment IDs

In [13]:
# convert arrays to comma seperated strings
for index, post_id in enumerate(posts_df['id']):
    comment_arr = comment_id_dict.get(post_id, [])
    posts_df.loc[index, 'comment_ids'] = ",".join(comment_arr)

In [15]:
posts_df.head(3)

Unnamed: 0,all_awardings,allow_live_comments,author,author_flair_css_class,author_flair_richtext,author_flair_text,author_flair_type,author_fullname,author_patreon_flair,author_premium,...,author_cakeday,distinguished,suggested_sort,crosspost_parent,crosspost_parent_list,category,top_awarded_type,poll_data,steward_reports,comment_ids
0,[],False,nf_hades,,[],,text,t2_hriq1b,False,False,...,,,,,,,,,,"gjacwx5,gjad2l6,gjadatw,gjadc7w,gjadcwh,gjadgd..."
1,[],False,MyLittleDeku,,[],,text,t2_7dj62vj2,False,False,...,,,,,,,,,,gjacn1r
2,[],False,lilirucaarde12,,[],,text,t2_6i04uaxw,False,False,...,,,,,,,,,,"gjac5fb,gjacdy5,gjaco45,gjasj4f,gjbxfeg"


In [16]:
posts_df.to_csv('./test_data.csv', sep=';', header=True, index=False, columns=list(posts_df.axes[1]))