# Search Submissions
In this notebook, I will show you how to use the method `search_submissions` from `PMAW` to retrieve submissions from the Reddit Pushshift API. To view more details about the Search Submissions endpoint you can view the Pushshift [documentation](https://github.com/pushshift/api#searching-submissions).

In [1]:
import pandas as pd
from pmaw import PushshiftAPI

In [2]:
# instantiate
api = PushshiftAPI()

## Data Preparation

In [3]:
# import test data into a dataframe
posts_df = pd.read_csv(f'./test_data.csv', delimiter=';', header=0)
posts_df.head(5)

Unnamed: 0,all_awardings,allow_live_comments,author,author_flair_css_class,author_flair_richtext,author_flair_text,author_flair_type,author_fullname,author_patreon_flair,author_premium,...,author_cakeday,distinguished,suggested_sort,crosspost_parent,crosspost_parent_list,category,top_awarded_type,poll_data,steward_reports,comment_ids
0,[],False,nf_hades,,[],,text,t2_hriq1b,False,False,...,,,,,,,,,,"gjacwx5,gjad2l6,gjadatw,gjadc7w,gjadcwh,gjadgd..."
1,[],False,MyLittleDeku,,[],,text,t2_7dj62vj2,False,False,...,,,,,,,,,,gjacn1r
2,[],False,lilirucaarde12,,[],,text,t2_6i04uaxw,False,False,...,,,,,,,,,,"gjac5fb,gjacdy5,gjaco45,gjasj4f,gjbxfeg"
3,[],False,[deleted],,,,,,,,...,,,,,,,,,,gjac9d6
4,[],False,sirdimpleton,,[],,text,t2_bznmn4i,False,False,...,,,,,,,,,,"gjaocmg,gjb2jsj,gjbisrw,gjbjbk8"


In [4]:
len(posts_df)

2500

The data in `posts_df`, contains 2500 submissions and their respective metadata extracted from a subreddit submission search, the comment_ids were added post-search with additional requests. For the purpose of demonstration, submission ids will be used from this dataframe, even though the data has already been retrieved.

In [5]:
# create submission ID list
post_ids = list(posts_df.loc[:, 'id'])
post_ids[:10]

['kxi2w8',
 'kxi2g1',
 'kxhzrl',
 'kxhyh6',
 'kxhwh0',
 'kxhv53',
 'kxhm7b',
 'kxhm3s',
 'kxhg37',
 'kxhak9']

## Search Submissions by ID

### Using a Single Submission ID

In [6]:
post = api.search_submissions(ids=post_ids[0])
post

Total Success Rate: 100.00% -- Total Reqs: 1 -- Num Retries: 0


[{'all_awardings': [],
  'allow_live_comments': False,
  'author': 'nf_hades',
  'author_flair_richtext': [],
  'author_flair_type': 'text',
  'author_fullname': 't2_hriq1b',
  'author_patreon_flair': False,
  'author_premium': False,
  'awarders': [],
  'can_mod_post': False,
  'content_categories': ['entertainment'],
  'contest_mode': False,
  'created_utc': 1610668203,
  'domain': 'self.anime',
  'full_link': 'https://www.reddit.com/r/anime/comments/kxi2w8/stop_complaining_about_the_thighs_in/',
  'gildings': {},
  'id': 'kxi2w8',
  'is_crosspostable': True,
  'is_meta': False,
  'is_original_content': False,
  'is_reddit_media_domain': False,
  'is_robot_indexable': True,
  'is_self': True,
  'is_video': False,
  'link_flair_background_color': '#7193ff',
  'link_flair_css_class': 'discussion',
  'link_flair_richtext': [],
  'link_flair_template_id': 'eeafce2a-7ef5-11e8-a46a-0e47aad96570',
  'link_flair_text': 'Discussion',
  'link_flair_text_color': 'light',
  'link_flair_type': 't

### Using Multiple Submission IDs

In [7]:
%%time
posts_arr = api.search_submissions(ids=post_ids)

Total Success Rate: 66.67% -- Total Reqs: 3 -- Num Retries: 0
Total Success Rate: 75.00% -- Total Reqs: 4 -- Num Retries: 1
Wall time: 3.55 s


In [8]:
print(f'{len(posts_arr)} submissions returned by Pushshift')

2500 submissions returned by Pushshift


In [9]:
posts_arr[:3]

[{'all_awardings': [],
  'allow_live_comments': False,
  'author': 'tsundere_yanji',
  'author_flair_richtext': [],
  'author_flair_type': 'text',
  'author_fullname': 't2_93pnz1yg',
  'author_patreon_flair': False,
  'author_premium': False,
  'awarders': [],
  'can_mod_post': False,
  'content_categories': ['entertainment'],
  'contest_mode': False,
  'created_utc': 1610083181,
  'domain': 'self.anime',
  'full_link': 'https://www.reddit.com/r/anime/comments/ksvwhz/i_just_want_to_share_my_anime_songs_playlist/',
  'gildings': {},
  'id': 'ksvwhz',
  'is_crosspostable': False,
  'is_meta': False,
  'is_original_content': False,
  'is_reddit_media_domain': False,
  'is_robot_indexable': False,
  'is_self': True,
  'is_video': False,
  'link_flair_background_color': '#646d73',
  'link_flair_css_class': 'misc',
  'link_flair_richtext': [],
  'link_flair_template_id': '06c1953e-7ef6-11e8-8fad-0eb8e5dc3b5c',
  'link_flair_text': 'Misc.',
  'link_flair_text_color': 'light',
  'link_flair_ty

### Convert to Dataframe

In [10]:
# convert submissions to dataframe
new_posts_df = pd.DataFrame(posts_arr)

In [11]:
new_posts_df.head(3)

Unnamed: 0,all_awardings,allow_live_comments,author,author_flair_richtext,author_flair_type,author_fullname,author_patreon_flair,author_premium,awarders,can_mod_post,...,author_flair_css_class,author_flair_text_color,media_metadata,author_flair_template_id,author_flair_text,banned_by,edited,author_cakeday,distinguished,gilded
0,[],False,tsundere_yanji,[],text,t2_93pnz1yg,False,False,[],False,...,,,,,,,,,,
1,[],False,CytoPlasm129,[],text,t2_3c4ctvpa,False,False,[],False,...,,,,,,,,,,
2,[],False,TheTealAnusLearn,[],text,t2_8910u693,False,False,[],False,...,,,,,,,,,,
