# Introduction
This is a Jupyter notebook, designed to test out the PRAW library and get an idea of how to use it for reddit scraping. It requires the `praw` library, which can be installed directly (e.g. `pip install praw`) or in a conda environment.

# Set up a reddit instance
First, we need to set up a reddit instance. This requires setting up a Reddit application. Head over to [the authorized applications page](https://www.reddit.com/prefs/apps) and create a new script at the bottom. It is required to add a title, description and redirect uri (use `http://localhost:8080` here). Once your apps has been created, you will have the required credentials to start using the Reddit API. The `user_agent` is the name of the application; the `client_id` is the line of gibberish next to the icon of the application; and the `client_secret` is the `secret` gibberish. 

In [1]:
import praw
reddit = praw.Reddit(
    user_agent="IsItMould?",
    client_id="EXRL43UtmsdPymRwqZErkg",
    client_secret="JrkEkIWSzI_srcUvJeh24Q__qvS3VQ",
)

# Get and filter data from subreddit
Next, let's try to obtain some data from our subreddit of interest: `r/kombucha`. First, lets print the attributes and methods for the latest post. Then we will get the last 5 posts by date and output some basic information like the post id, timestamp and title.

## Available attributes
Let's have a look at the available attributes and methods for the objects that are returned by `praw`.

In [105]:
# Get the attributes for a subreddit
print(dir(reddit.subreddit('kombucha')))

# Get the attributes for a post
last_posts = reddit.subreddit('kombucha').new(limit=1)
for post in last_posts:
    print(vars(post))

['MESSAGE_PREFIX', 'STR_FIELD', 'VALID_TIME_FILTERS', '__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattr__', '__getattribute__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', '_convert_to_fancypants', '_create_or_update', '_fetch', '_fetch_data', '_fetch_info', '_fetched', '_kind', '_parse_xml_response', '_path', '_reddit', '_reset_attributes', '_safely_add_arguments', '_submission_class', '_submit_media', '_subreddit_collections_class', '_subreddit_list', '_upload_inline_media', '_upload_media', '_url_parts', '_validate_gallery', '_validate_inline_media', '_validate_time_filter', 'banned', 'collections', 'comments', 'contributor', 'controversial', 'display_name', 'emoji', 'filters', 'flair', 'fullname', 'gilded', 'hot', 'message', 'mod', 'moderator', 

## Return the last posts
Seems that the last posts can be returned using `.new()`. We can then extract useful information like the post id, the timestamp and the post title.

In [48]:
# Import datetime
import datetime

# Grab the last 5 posts
last_posts = reddit.subreddit('kombucha').new(limit=5)

# Function to determine date and time posted
# courtesty of https://www.reddit.com/r/learnprogramming/comments/37kr5n/praw_is_it_possible_to_get_post_time_and_date/
def get_date(submission):
	time = submission.created
	return datetime.datetime.fromtimestamp(time)

# Print post id, date and title
print("Last 5 posts from the kombucha reddit:\n")
print("id\ttimestamp\ttitle")
for post in last_posts:
    attrs = (post.id, get_date(post), post.title)
    print("%s\t%s\t%s" % attrs)

Last 5 posts from the kombucha reddit:

id	timestamp	title
t7rlvt	2022-03-06 06:07:52	Scoby hotel question
t7qbhc	2022-03-06 04:48:15	Kratom as a singular base in 1f
t7o1p7	2022-03-06 02:38:07	Is this normal??
t7kec5	2022-03-05 23:23:52	Third batch, four days in, but what is that dark area within the scoby — and what are these dark thread-like things in the liquid? Newbie here, still learning.
t7k2i9	2022-03-05 23:07:08	5 Days into F1 (second batch), do these white bumps look anything like mold to you?


## Filter by flair
To create our scraper, we are looking for posts that contain images of pellicles. Specifically, we want images of pellicles that have been classified as either "mold!" or not "not mold" by their flair. It seems to be possible to filter posts by flair.

In [61]:
# Obtain the last 5 mold posts
mold_posts = reddit.subreddit('kombucha').search(query='flair:"mold!"', sort='new', limit = 5, syntax='lucene')
print("Last 5 mold posts from the kombucha reddit:\n")
print("id\ttimestamp\ttitle")
for post in mold_posts:
    attrs = (post.id, get_date(post), post.title)
    print("%s\t%s\t%s" % attrs)

Last 5 mold posts from the kombucha reddit:

id	timestamp	title
t6a39m	2022-03-04 05:03:15	Mold?
t4k01s	2022-03-01 23:18:08	hey guys, that looks normal? or it's kind of mold?
t3i3pd	2022-02-28 16:26:36	Any idea if this is normal pellicle formation, or mold?
t2cb8t	2022-02-27 02:17:09	My first time making kombucha. This does look like mold? Should I throw it out?
t11wwj	2022-02-25 12:34:45	Ignored this for… a year? Definitely has a dried leathery top that extends about an inch down. Should I (try) take it out and cut off the dried part and then add more sweet tea?


In [68]:
# Obtain the last 5 not mold posts
not_mold_posts = reddit.subreddit('kombucha').search(query='flair:"not mold"', sort='new', limit = 5, syntax='lucene')
print("\nLast 5 not mold posts from the kombucha reddit:\n")
print("id\ttimestamp\ttitle")
for post in not_mold_posts:
    attrs = (post.id, get_date(post), post.title)
    print("%s\t%s\t%s" % attrs)


Last 5 not mold posts from the kombucha reddit:

id	timestamp	title
t6a39m	2022-03-04 05:03:15	Mold?
t3i3pd	2022-02-28 16:26:36	Any idea if this is normal pellicle formation, or mold?
t2cb8t	2022-02-27 02:17:09	My first time making kombucha. This does look like mold? Should I throw it out?
t11wwj	2022-02-25 12:34:45	Ignored this for… a year? Definitely has a dried leathery top that extends about an inch down. Should I (try) take it out and cut off the dried part and then add more sweet tea?
t0n4vh	2022-02-24 23:20:59	First time making kombucha. Is this mold?


That didn't seem to work - the flairs are different, yet the posts overlap. Altering the syntax between `lucene`, `cloudsearch` and `plain` syntaxes also didn't do the trick. On the reddit website - `flair:"not mold"` returns just the `not mold` posts. As an alternative, can we get all `mold` posts and then obtain by their flair?

In [86]:
mold_posts = reddit.subreddit('kombucha').search(
    query='flair:"*mold*"', 
    sort='new', 
    limit = 5, 
    syntax='lucene')
print("\nLast 5 mold or not mold posts from the kombucha reddit:\n")
print("id\ttimestamp\tflair\ttitle")
for post in mold_posts:
    attrs = (post.id, get_date(post), post.link_flair_text, post.title)
    print("%s\t%s\t%s\t%s" % attrs)


Last 5 mold or not mold posts from the kombucha reddit:

id	timestamp	flair	title
t6a39m	2022-03-04 05:03:15	not mold	Mold?
t4k01s	2022-03-01 23:18:08	mold!	hey guys, that looks normal? or it's kind of mold?
t3i3pd	2022-02-28 16:26:36	not mold	Any idea if this is normal pellicle formation, or mold?
t2cb8t	2022-02-27 02:17:09	not mold	My first time making kombucha. This does look like mold? Should I throw it out?
t11wwj	2022-02-25 12:34:45	not mold	Ignored this for… a year? Definitely has a dried leathery top that extends about an inch down. Should I (try) take it out and cut off the dried part and then add more sweet tea?


That worked! These posts match the results on the reddit website, and this seems sufficient to scrape the posts (and hopefully images) we need for our classifier.

## Image from post
Next, we need to obtain the URLs to images that have been attached to these posts. First, let's try to get these from the post URL.

In [89]:
mold_posts = reddit.subreddit('kombucha').search(
    query='flair:"*mold*"', 
    sort='new', 
    limit = 5, 
    syntax='lucene')
print("\nLast 5 mold or not mold posts from the kombucha reddit:\n")
print("id\ttimestamp\tflair\turl")
for post in mold_posts:
    attrs = (post.id, get_date(post), post.link_flair_text, post.url)
    print("%s\t%s\t%s\t%s" % attrs)


Last 5 mold or not mold posts from the kombucha reddit:

id	timestamp	flair	url
t6a39m	2022-03-04 05:03:15	not mold	https://www.reddit.com/gallery/t6a39m
t4k01s	2022-03-01 23:18:08	mold!	https://www.reddit.com/gallery/t4k01s
t3i3pd	2022-02-28 16:26:36	not mold	https://www.reddit.com/gallery/t3i3pd
t2cb8t	2022-02-27 02:17:09	not mold	https://i.redd.it/sry5fwcf1ak81.jpg
t11wwj	2022-02-25 12:34:45	not mold	https://www.reddit.com/gallery/t11wwj


It seems that in some posts, the URL points directly to an image, whereas in others, it points to a gallery. Perhaps the `media_metadata` allows us to access all images within the gallery?

In [107]:
mold_posts = reddit.subreddit('kombucha').search(
    query='flair:"*mold*"', 
    sort='new', 
    limit = 5, 
    syntax='lucene')
print("\nLast 5 mold or not mold posts from the kombucha reddit:\n")
print("id\ttimestamp\tflair\turl")
for post in mold_posts:
    #print(dir(post))
    #break
    attrs = (post.id, get_date(post), post.link_flair_text, post.media_metadata)
    print("%s\t%s\t%s\t%s\n\n" % attrs)


Last 5 mold or not mold posts from the kombucha reddit:

id	timestamp	flair	url
t6a39m	2022-03-04 05:03:15	not mold	{'5ik11nnmjal81': {'status': 'valid', 'e': 'Image', 'm': 'image/jpg', 'p': [{'y': 192, 'x': 108, 'u': 'https://preview.redd.it/5ik11nnmjal81.jpg?width=108&crop=smart&auto=webp&s=a9a3d6ad8eebfc9cd31e14e400b1f691d7d5372f'}, {'y': 384, 'x': 216, 'u': 'https://preview.redd.it/5ik11nnmjal81.jpg?width=216&crop=smart&auto=webp&s=4de944ecc985afad908ca87a1d0cd4ed4840d78b'}, {'y': 568, 'x': 320, 'u': 'https://preview.redd.it/5ik11nnmjal81.jpg?width=320&crop=smart&auto=webp&s=de608d065ee8d9249989fff3f0d5bef50803501b'}, {'y': 1137, 'x': 640, 'u': 'https://preview.redd.it/5ik11nnmjal81.jpg?width=640&crop=smart&auto=webp&s=66884928f830099b8ba674e953a4ff63d6d82b77'}, {'y': 1706, 'x': 960, 'u': 'https://preview.redd.it/5ik11nnmjal81.jpg?width=960&crop=smart&auto=webp&s=27453143cb7b5dc12b48543c99c914eea151d657'}, {'y': 1920, 'x': 1080, 'u': 'https://preview.redd.it/5ik11nnmjal81.jpg?widt

AttributeError: 'Submission' object has no attribute 'media_metadata'

This returns an error, because one of the submissions doesn't have `media_metadata`. It seems that this is the post that pointed directly to an URL. Can we avoid this problem by checking if the URL point to an image?

In [109]:
mold_posts = reddit.subreddit('kombucha').search(
    query='flair:"*mold*"', 
    sort='new', 
    limit = 5, 
    syntax='lucene')
print("\nLast 5 mold or not mold posts from the kombucha reddit:\n")
print("id\ttimestamp\tflair\turl")
for post in mold_posts:
    if post.url.endswith(('.jpg', '.png', '.gif', '.jpeg')):
        attrs = (post.id, get_date(post), post.link_flair_text, post.url)
    else:
        attrs = (post.id, get_date(post), post.link_flair_text, post.media_metadata)
    print("%s\t%s\t%s\t%s\n\n" % attrs)


Last 5 mold or not mold posts from the kombucha reddit:

id	timestamp	flair	url
t6a39m	2022-03-04 05:03:15	not mold	{'5ik11nnmjal81': {'status': 'valid', 'e': 'Image', 'm': 'image/jpg', 'p': [{'y': 192, 'x': 108, 'u': 'https://preview.redd.it/5ik11nnmjal81.jpg?width=108&crop=smart&auto=webp&s=a9a3d6ad8eebfc9cd31e14e400b1f691d7d5372f'}, {'y': 384, 'x': 216, 'u': 'https://preview.redd.it/5ik11nnmjal81.jpg?width=216&crop=smart&auto=webp&s=4de944ecc985afad908ca87a1d0cd4ed4840d78b'}, {'y': 568, 'x': 320, 'u': 'https://preview.redd.it/5ik11nnmjal81.jpg?width=320&crop=smart&auto=webp&s=de608d065ee8d9249989fff3f0d5bef50803501b'}, {'y': 1137, 'x': 640, 'u': 'https://preview.redd.it/5ik11nnmjal81.jpg?width=640&crop=smart&auto=webp&s=66884928f830099b8ba674e953a4ff63d6d82b77'}, {'y': 1706, 'x': 960, 'u': 'https://preview.redd.it/5ik11nnmjal81.jpg?width=960&crop=smart&auto=webp&s=27453143cb7b5dc12b48543c99c914eea151d657'}, {'y': 1920, 'x': 1080, 'u': 'https://preview.redd.it/5ik11nnmjal81.jpg?widt

Success! Although it's probably safest to still check whether the `media_metadata` attribute exists once we develop our true scraper.

## Parse the `media_metadata`
Now that we know that the image URLs can be stored in the `media_metadata` attribute, it's time to get the image URLs out. It seems that the the `media_metadata` attribute is a dict, which contains one name:value pair per image, and that for each image, many links to different images sizes are created in the `p` name:value pair. Another link is stored in the `s` key:value pair, which appears not to be scaled, so I assume that this is the original submission. For now, let's select these URLs, although for our classifier purposes, they are much too large, and we may choose to select a resized image (e.g. `x = 1080`).

In [175]:
mold_posts = reddit.subreddit('kombucha').search(
    query='flair:"*mold*"', 
    sort='new', 
    limit = 10, 
    syntax='lucene')
print("\nLast 5 mold or not mold posts from the kombucha reddit:\n")
print("id\ttimestamp\tflair\turl")
for post in mold_posts:
    if post.url.endswith(('.jpg', '.png', '.gif', '.jpeg')):
        attrs = (post.id, get_date(post), post.link_flair_text, post.url)
        print("%s\t%s\t%s\t%s" % attrs)
    else:
        attrs = (post.id, get_date(post), post.link_flair_text)
        try:
            media_metadata = post.media_metadata
            for id in media_metadata.keys():
                url = media_metadata[id]["s"]["u"]
                sub_attrs = attrs + (url,)
                print("%s\t%s\t%s\t%s" % sub_attrs)
        except AttributeError:
            # Most likely a video - skip
            continue


Last 5 mold or not mold posts from the kombucha reddit:

id	timestamp	flair	url
t6a39m	2022-03-04 05:03:15	not mold	https://preview.redd.it/5ik11nnmjal81.jpg?width=2268&format=pjpg&auto=webp&s=bc64f6eeb7a9a510df20a3fcb9c93ae882a8f0fe
t6a39m	2022-03-04 05:03:15	not mold	https://preview.redd.it/hdkxlonmjal81.jpg?width=2268&format=pjpg&auto=webp&s=1890eced03d9199751ca813fac984138367ce3e5
t4k01s	2022-03-01 23:18:08	mold!	https://preview.redd.it/eh1lhbe8kuk81.jpg?width=3024&format=pjpg&auto=webp&s=b6c10d625d5cd71b2e1239897535fe29bd607555
t4k01s	2022-03-01 23:18:08	mold!	https://preview.redd.it/hp1gmbe8kuk81.jpg?width=3024&format=pjpg&auto=webp&s=cee7d91729704155afdba4eb641f9d2d785c561b
t3i3pd	2022-02-28 16:26:36	not mold	https://preview.redd.it/brdgv5rwdlk81.jpg?width=3024&format=pjpg&auto=webp&s=7cec490c2602a8f1e0db0bf374da6deb0d026fca
t3i3pd	2022-02-28 16:26:36	not mold	https://preview.redd.it/bysx86rwdlk81.jpg?width=3024&format=pjpg&auto=webp&s=6f5a7182eb7ce0d7a2fa6e36da5634dd23ad1b6b
t

# Download and store images
Now we have URLs to a bunch of images (sometimes multiple from the same post) and their classification. Let's see if we can download these images using `urllib` requests.

In [171]:
import urllib.request
import os.path
import time

# Downloads an image to path
def download_img(url, path, name):
    img_path = os.path.join(path, name)
    with open(img_path, "wb") as f:
        f.write(urllib.request.urlopen(url).read())

# Simplifies the classification
def get_class(flair):
    cl = "mold_1" if flair == "mold!" else "not_mold_0"
    return(cl)

# Define path
path = "/Users/guus/Downloads"

# Get the last 5 mold/not mold posts
mold_posts = reddit.subreddit('kombucha').search(
    query='flair:"*mold*"', 
    sort='new', 
    limit = 5, 
    syntax='lucene')

# Loop through posts and save the image
for post in mold_posts:
    if post.url.endswith(('.jpg', '.png', '.gif', '.jpeg')):
        url = post.url
        id = post.id
        cl = get_class(post.link_flair_text)
        img_name, img_type = os.path.splitext(url)
        img_name = os.path.basename(img_name)
        name = id + "_" + img_name + "_" + cl + img_type
        download_img(url, path, name)
        
        # 1 request per 2 second allowed (apparently)
        time.sleep(2)

    else:
        attrs = (post.id, get_date(post), post.link_flair_text)
        try:
            media_metadata = post.media_metadata
            for id in media_metadata.keys():
                url = media_metadata[id]["s"]["u"]
                post_id = post.id
                cl = get_class(post.link_flair_text)
                img_type = media_metadata[id]["m"].replace("image/", "")
                img_name = id
                name = post_id + "_" + img_name + "_" + cl + img_type
                download_img(url, path, name)
                
                # 1 request per 2second allowed (apparently)
                time.sleep(2)
        except AttributeError:
            continue


AttributeError: 'Submission' object has no attribute 'media_metadata'

# Number of available datapoints
Now that we know that we can download images with a classification, let's inspect how large out dataset will be.

In [167]:
# Get the last 5 mold/not mold posts
mold_posts = reddit.subreddit('kombucha').search(
    query='flair:"*mold*"', 
    sort='new', 
    limit = None, 
    time_filter = "all",
    syntax='lucene')

n = 0
print("id\ttimestamp\tflair\ttitle")
for post in mold_posts:
    attrs = (post.id, get_date(post), post.link_flair_text, post.title)
    print("%s\t%s\t%s\t%s" % attrs)
    n += 1
print("Number of posts:\t" + str(n))

id	timestamp	flair	title
t6a39m	2022-03-04 05:03:15	not mold	Mold?
t4k01s	2022-03-01 23:18:08	mold!	hey guys, that looks normal? or it's kind of mold?
t3i3pd	2022-02-28 16:26:36	not mold	Any idea if this is normal pellicle formation, or mold?
t2cb8t	2022-02-27 02:17:09	not mold	My first time making kombucha. This does look like mold? Should I throw it out?
t11wwj	2022-02-25 12:34:45	not mold	Ignored this for… a year? Definitely has a dried leathery top that extends about an inch down. Should I (try) take it out and cut off the dried part and then add more sweet tea?
t0n4vh	2022-02-24 23:20:59	not mold	First time making kombucha. Is this mold?
t05i3u	2022-02-24 08:58:56	mold!	is this mold?
sz7n44	2022-02-23 04:58:16	not mold	Not sure what to do next.
sz3x3e	2022-02-23 02:05:50	mold!	It’s definitely mold 😭
syzoqz	2022-02-22 23:01:48	not mold	Is this mold?
swjncj	2022-02-19 21:54:24	mold!	Looks like a couple mold spots, is the black mold too?
sw522h	2022-02-19 09:13:23	mold!	RIP Scoby
svp

There appears to be a limit on the number of posts returned: only 248 posts were returned, but the oldest date is not very far in the past.

# Moving from PRAW to Pushshift
Pushshift is a copy of all reddit posts and comments + an API intended for big queries. This means that it can be used to query further back than PRAW allows. Below is an API call that queries the kombucha subreddit for the 5 latest posts. The output is a JSON object with all the familiar attributes.

```
https://api.pushshift.io/reddit/search/submission/?subreddit=kombucha&sort=desc&sort_type=created_utc&size=5
```

However, there are a few issues that need to be overcome when using Pushshift:

* Pushshift doesn't have any method for filtering by flair, though flairs are available in the output. This means we'll need to device a different way of findings mould related posts;
* Pushshift stores posts as they were submitted. This means that any later updated to the post might not be included in the database. Thus, any flair update to `mold!` or `not mold` will not have been included, and they are likely to still be labeled as `what's wrong!?` or `question`.

Since we're looking for images, we can use the query `.jpg` to find any posts with images. This will likely return many posts not related to mould, so we will need to come up with a solution to identify mould related posts. We could maybe do this by using Pushshift to identify all submissions with one or more images, and then use PRAW to obtain the flair for those submission IDs.

In a way, only having the original flair from Pushshift is an advantage, because some users initially label their question about mould as `mold!`, despite not knowing whether it is mould. Those labels could be wrong if not updated by the user. By filtering those submissions out, we improve our final data set.

In [180]:
import json

q = 'https://api.pushshift.io/reddit/search/submission/?subreddit=kombucha&sort=desc&sort_type=created_utc&size=5&q=".jpg"'
response = urllib.request.urlopen(q)
data = response.read()
result = json.loads(data)
print(result)


{'data': [{'all_awardings': [], 'allow_live_comments': False, 'author': 'LWYMMD_1989', 'author_flair_css_class': None, 'author_flair_richtext': [], 'author_flair_text': None, 'author_flair_type': 'text', 'author_fullname': 't2_573lddsh', 'author_is_blocked': False, 'author_patreon_flair': False, 'author_premium': False, 'awarders': [], 'can_mod_post': False, 'contest_mode': False, 'created_utc': 1644436132, 'domain': 'self.Kombucha', 'full_link': 'https://www.reddit.com/r/Kombucha/comments/som5vk/any_ideas_whats_going_on_here/', 'gildings': {}, 'id': 'som5vk', 'is_created_from_ads_ui': False, 'is_crosspostable': True, 'is_meta': False, 'is_original_content': False, 'is_reddit_media_domain': False, 'is_robot_indexable': True, 'is_self': True, 'is_video': False, 'link_flair_background_color': '', 'link_flair_richtext': [], 'link_flair_text_color': 'dark', 'link_flair_type': 'text', 'locked': False, 'media_metadata': {'o75euy2d3vg81': {'e': 'Image', 'id': 'o75euy2d3vg81', 'm': 'image/jpg'

In [192]:
ids = ["t3_som5vk", "t3_soekp2"]
full_ids = [i if i.startswith('t3_') else f't3_{i}' for i in ids]
posts = reddit.info(full_ids)
for post in posts:
    print(post.link_flair_text)

None
kahm!


That seems to work. We first fetched the posts using Pushshift, and then batch queried PRAW using the post ids. We could do this in small batches to then fetch all the data. Next question: can we fetch posts from further back? to test this, let's sort the posts ascending, rather than descending - this should get us the first ever image posts in the subreddit. Because flairs may not have been in use back then, let's just print the titles.

In [193]:
q = 'https://api.pushshift.io/reddit/search/submission/?subreddit=kombucha&sort=asc&sort_type=created_utc&size=5&q=".jpg"'
response = urllib.request.urlopen(q)
data = response.read()
result = json.loads(data)
print(result)

{'data': [{'author': 'dadankskunk', 'author_flair_css_class': None, 'author_flair_text': None, 'created_utc': 1338358063, 'domain': 'self.Kombucha', 'full_link': 'https://www.reddit.com/r/Kombucha/comments/ubql2/made_my_first_kombucha_just_bottled_pics/', 'id': 'ubql2', 'is_self': True, 'media_embed': {}, 'num_comments': 8, 'over_18': False, 'permalink': '/r/Kombucha/comments/ubql2/made_my_first_kombucha_just_bottled_pics/', 'score': 5, 'selftext': "So I made the kombucha from GT's No 9, which is blueberry I think.  I made the tea from Gen Maicha, which I had some left over of.  I purchased it from Teavana I think... Good tea company.  Anyway, it's been 2-3 weeks and this mushroom was a lot thicker than I expected!  It looked thin and floppy from the outside, but the moment I picked it up I knew it was a big one, thick and firm... That sounded wrong but it's the truth.    I threw in some lemon, lime, and orange peels in the first fermentation, I don't know how it'll taste, but I threw 

In [195]:
ids = ["ubql2", "227d3r"]
full_ids = [i if i.startswith('t3_') else f't3_{i}' for i in ids]
posts = reddit.info(full_ids)
for post in posts:
    print(post.title)

Made my first kombucha, just bottled, pics!  
Any Boston area Kombuchittors?
