# Getting started

## Prerequisites (from the PRAW documentation):

 - **Python knowledge**: You need to know at least a little Python to use PRAW; it's a Python wrapper after all. PRAW supports `Python 2.7`, and `Python 3.3 to 3.6`. If you are stuck on a problem, `/r/learnpython` is a great place to ask for help.  
 - **Reddit Knowledge** A basic understanding of how `reddit.com` works is a must. In the event you are not already familiar with Reddit start with their FAQ.
 - **Reddit Account**: A Reddit account is required to access Reddit’s API. Create one at reddit.com.
 - **Client ID & Client Secret**: These two values are needed to access Reddit’s API as a script application (see Authenticating via OAuth for other application types). If you don’t already have a client ID and client secret, follow Reddit’s [First Steps Guide](https://github.com/reddit/reddit/wiki/OAuth2-Quick-Start-Example#first-steps) to create them.
 - **User Agent**: A user agent is a unique identifier that helps Reddit determine the source of network requests. To use Reddit's API, you need a unique and descriptive user agent. The recommended format is `<platform>:<app ID>:<version string> (by /u/<Reddit username>)`. For example, `android:com.example.myredditapp:v1.2.3 (by /u/kemitche)`. Read more about user-agents at [Reddit's API wiki page](https://github.com/reddit/reddit/wiki/API).

## Installing PRAW

For Python in general:

```
pip install praw
```

If running inside Anaconda:

```
conda install --name [myenv] praw
```

OK, the script above does **not** work. Just install it using `pip`.


## Using PRAW

We'll be using an script application as it is the simplest to work with. You must first register an application of the appropriate type on Reddit.

### Start with by authenticating yourself

You'll need four important pieces of information (make sure you fill them out somehow).

In [None]:
# The client ID is the 14 character string listed just under “personal use script” 
# for the desired developed application.
client_id = ""

# The client secret is the 27 character string listed adjacent to secret for the application.
client_secret = ""

# The password for the Reddit account used to register the script application.
password = ""

# The username of the Reddit account used to register the script application.
username = ""

Now, with that information, you can perform the authentication:

In [None]:
import praw

reddit = praw.Reddit(client_id=client_id,
                     client_secret=client_secret,
                     password=password,
                     user_agent='praw_playground by /u/' + username,
                     username=username)

# Print some information about the authenticated user to verify the auth process
print(reddit.user.me())

## Get data from a subreddit

In [None]:
r_glasgow = reddit.subreddit('glasgow')
hot_posts = r_glasgow.hot(limit=5)
print("Submissions to r/glasgow:")
for submission in hot_posts:
    print(submission.title)

If you print the type of `hot_posts` you'll see that is not a list or an array, as you may have thought. Instead it is a type defined by PRAW:

In [None]:
print(type(hot_posts))

It is a `praw.models.listing.generator.ListingGenerator`! but for now, think of it as a list. From now on, I'll be refering to this datatype as a *list*.

### The `Sumbission` model
Now, each element of our list is a `Submission`, the properties/methods/members of each submission are: 'approved_at_utc', 'approved_by', 'archived', 'author', 'author_flair_css_class', 'author_flair_text', 'banned_at_utc', 'banned_by', 'brand_safe', 'can_gild', 'can_mod_post', 'clear_vote', 'clicked', 'comment_limit', 'comment_sort', 'comments', 'contest_mode', 'created', 'created_utc', 'crosspost', 'delete', 'disable_inbox_replies', 'distinguished', 'domain', 'downs', 'downvote', 'duplicates', 'edit', 'edited', 'enable_inbox_replies', 'flair', 'fullname', 'gild', 'gilded', 'hidden', 'hide', 'hide_score', 'id', 'id_from_url', 'is_crosspostable', 'is_reddit_media_domain', 'is_self', 'is_video', 'likes', 'link_flair_css_class', 'link_flair_text', 'locked', 'media', 'media_embed', 'mod', 'mod_reports', 'name', 'num_comments', 'num_crossposts', 'num_reports', 'over_18', 'parent_whitelist_status', 'parse', 'permalink', 'pinned', 'quarantine', 'removal_reason', 'reply', 'report', 'report_reasons', 'save', 'saved', 'score', 'secure_media', 'secure_media_embed', 'selftext', 'selftext_html', 'shortlink', 'spoiler', 'stickied', 'subreddit', 'subreddit_id', 'subreddit_name_prefixed', 'subreddit_type', 'suggested_sort', 'thumbnail', 'thumbnail_height', 'thumbnail_width', 'title', 'unhide', 'unsave', 'ups', 'upvote', 'url', 'user_reports', 'view_count', 'visited', 'whitelist_status'.


Crazy, so many. Try printing some of them.

In [None]:
hot_posts = r_glasgow.hot(limit=5)
for submission in hot_posts:
    print(submission.author, submission.is_self, submission.ups, submission.created_utc)

### Searching a subreddit
So, Glasgow is fun, but let's look at another more active subreddit, one with more links than self posts... let's try r/news:

In [None]:
r_news = reddit.subreddit('news')

Now that we have an interesting subreddit, we can search within it using the `search` method. You can try printing the `help` for the search method, if that's too much for you, just keep reading:

The search method receives one very important parameter: `query` which is the query string we'll be looking for. The other parameters have default values, so for now we can ignore them... just for now, though. 

In [None]:
q = 'nuclear' # Change the query to something less dangerous.
nuclear_submissions = r_news.search(query=q)
for submission in nuclear_submissions:
    print(submission.title, submission.url)

So many results! try limiting the search by setting another parameter of the search method: `limit`

In [None]:
limited_nuclear_submissions = r_news.search(query=q, limit=5)
for submission in limited_nuclear_submissions:
    print(submission.title, submission.url)

By default the search is performed used the Lucene syntax, which is a thing that I don't understand yet. But what I do understand is that there are certain limitations to it when it comes to search the reddit database, one of the main limitations is that we cannot specifiy a a timeframe to search posts. To overcome this, you can spcify what kind of syntax you want to use for the search, there are three supported syntaxes: cloudsearch, lucene, plain. 

In this case, we'll use `cloudsearch`, you can specify which syntax are you using via the `syntax` parameter, now, let's add a time window to perform our search again.

In [None]:
# Query at an specific point in time:
q = "'nuclear' (and timestamp:1420027200..1420070400)"
#   01/01/2015 @ 12:00am (UTC)^           ^01/15/2015 @ 12:00am (UTC)

timed_nuclear_submissions  = r_news.search(q, syntax='cloudsearch', limit=10)
for submission in timed_nuclear_submissions:
    print(submission.title, submission.url)

In [13]:
import datetime

def date_ms(timestamp):
    return datetime.datetime.fromtimestamp(timestamp).strftime('%Y-%m-%d %H:%M:%S')

import math
def strip_time(timestamp):
    first = math.floor(timestamp/86400)*86400;
    second = first + ((24*60*60) - 1)
    return first, second

In [16]:
time_to_consider = 1510185484
first, second = strip_time(time_to_consider)
print(date_ms(time_to_consider))
print(date_ms(first))
print(date_ms(second))

2017-11-08 23:58:04
2017-11-08 00:00:00
2017-11-08 23:59:59


In [26]:
import numpy as np
x = []
values = np.linspace(-1, 1, 21)
for a, b in zip(values[:-1], values[1:]):
    x.append([a, b, (a+b) / 2])
x = np.array(x)

def get_categorical_value(value):
    filter1 = x[(x[:,0] < value)]
    filter1 = filter1[value <= filter1[:,1]]
    if value == -1:
        return -0.95
    return filter1[0,2]

print(get_categorical_value(0.2323))
print(get_categorical_value(1))
print(get_categorical_value(-1))
print(get_categorical_value(-0.32))

0.25
0.95
-0.95
-0.35
