# 1) Objective

Analysis of possibilities on adding Wordpress as a new source as well as the technical strategy for effectively adding it.

# 2) Types of Access

Realtime Access: Gnip provides realtime access to Wordpress data via the following products:
- **PowerTrack** - Filtered access to the combined Wordpress and IntenseDebate firehoses via Gnip’s PowerTrack filtering language
- **Firehose** - Unfiltered delivery of the Wordpress firehoses

For Encore application, we are going to use **PowerTrack**.

# 3) Activities

References:
- http://support.gnip.com/sources/wordpress/overview.html
- http://support.gnip.com/sources/wordpress/data_format.html


## 3.1) Type of Activities

* **Article Post** (*verb*: "post", *objectType*: "article")
* **Comment Post** (*verb*: "post", *objectType*: "comment")
* **Article Like** (*verb*: "like", *objectType*: "article")
* **Article Unlike** (*verb*: "unlike", *objectType*: "article")

## 3.2) Most relevant Activities' attributes

* **postedTime** - Present in any activity
* **displayName** - The title of the article being posted or commented on. Not present for likes and unlikes.
* **actor** - Actor who authored the activity (posting the article or comment, liking or unliking the article)
* **object** - An object representing the comment or article that is being posted, liked or unliked.

### 3.2.1) More details on Actor entity

The actor entity's attributes are as follows:
* **id** - example "person:wordpress.com:30144526"
* **link** - example "http://gravatar.com/coffeeandconfidence"
* **wpEmailMd5** - example "3ea634fa6c83749c4e1ecee8f22356f9"
* **displayName** - example "coffeeandconfidence"
* **objectType** - example "person"


### 3.2.2) More details on object attribute

The activity's object attribute sub-attributes are as follows:
* **id** - example "comment:wordpress.com:11591264:105729:981031"
* **wpBlogId** - The WordPress Blog ID
* **wpPostId** - The WordPress Post ID
* **link** - example "http://news.blogs.cnn.com/2011/12/20/overheard-on-cnn-com-mad-as-hell-with-congress/comment-page-2/#comment-981031"
* **updatedTime** - A timestamp for when the article was last updated
* **wpCommentCount** - The number of comments on the article
* **content** - example "I'm liking that idea"
* **inReplyTo** - link of original comment, if this one is an answer
* **summary** - A short summary of the article (if article), html encoded, containing no markup.
* **tags** - list of *tag* entities. Only present in articles.

### 3.2.3) More details on tags attributes

* **objectType** - example "category"
* **displayName** - example "marketing"
* **link** - example "http://en.wordpress.com/tag/marketing"

## 3.3) Examples of Activities

### 3.3.1) Article Post Activity

In [None]:
{
  "verb":"post",
  "target":{
    "summary":"Just another WordPress.com site",
    "link":"http://coffeeandconfidence.wordpress.com/",
    "wpBlogId":30631836,
    "displayName":"coffeeandconfidence",
    "objectType":"blog"
  },
  "postedTime":"2011-12-20T18:14:51.000Z",
  "provider":{
    "link":"http://im.wordpress.com:8008/posts.json",
    "displayName":"WordPress",
    "objectType":"service"
  },
  "object":{
    "id":"article:wordpress.com:30631836:5",
    "content":"<a title=\"Girls- 'Honey Bunny'\" href=\"http://youtu.be/IxuDoYhQI2o\" target=\"_blank\">http://youtu.be/IxuDoYhQI2o</a>\r\n\r\nI don't know what made me love him. Androgyny is a powerful magnet...And he's using it to the best of his abilities. Never before has a too short, black, crushed velvet shirt entertained me so.\r\n\r\nMaybe it's the cult he was in. Mystery wrapped inside enigmas and all that. Lots of flowers everywhere.\r\n\r\nThen again it could be that the music itself is good. But I highly doubt that.\r\n\r\n<a title=\"Girls 'Morning Light'\" href=\"http://www.youtube.com/watch?v=kJVN3QoGMe0\" target=\"_blank\">http://www.youtube.com/watch?v=kJVN3QoGMe0</a>\r\n\r\nFeel the need to run around with beautiful girls and eat entire apples? Your soundtrack has been found.\r\n\r\n<a title=\"Girls 'Laura'\" href=\"http://www.youtube.com/watch?v=O5Oa6ih0kgA\" target=\"_blank\">http://www.youtube.com/watch?v=O5Oa6ih0kgA</a>\r\n\r\nAnd lastly, a love song. Or In-Need-Of-Love song. Something of that sort, figure it out for yourselves am I supposed to do everything for you?\r\n\r\n&nbsp;\r\n\r\nEnd Transmission\r\n\r\n&nbsp;\r\n\r\n&nbsp;\r\n\r\n&nbsp;",
    "summary":"http://youtu.be/IxuDoYhQI2o I don&#8217;t know what made me love him. Androgyny is a powerful magnet&#8230;And he&#8217;s using it to the best of his abilities. Never before has a too short, black, crushed velvet shirt entertained me so. Maybe it&#8217;s the cult he was in. Mystery wrapped inside enigmas and all that. Lots of flowers everywhere. Then [...]",
    "link":"http://coffeeandconfidence.wordpress.com/2011/12/20/first-post-more-to-come-so-much-more/",
    "postedTime":"2011-12-20T18:14:49Z",
    "wpBlogId":30631836,
    "displayName":"First Post, More to come. So much more...",
    "objectType":"article",
    "updatedTime":"2011-12-20T18:14:49Z"
    "tags": [
      {
        "objectType": "category",
        "displayName": "music",
        "link": "http:\/\/en.wordpress.com\/tag\/music"
      }
    ]
  },
  "actor":{
    "id":"person:wordpress.com:30144526",
    "link":"http://gravatar.com/coffeeandconfidence",
    "wpEmailMd5":"3ea634fa6c83749c4e1ecee8f22356f9",
    "displayName":"coffeeandconfidence",
    "objectType":"person"
  },
  "displayName":"First Post, More to come. So much more...",
  "location": {
    "geo": {
      "type": "Point",
      "coordinates": [
        -108.70545,
        35.529095
      ]
    }
  },
  "gnip":{
    "publisher":{
      "name":"wordpresscom"
    }
  }
}

### 3.3.2) Comment Post Activity

In [None]:
{
  "body":"I'm liking that idea",
  "verb":"post",
  "target":{
    "id":"article:wordpress.com:11591264:105729",
    "summary":"Editor&#039;s note: This post is part of the Overheard on CNN.com series, a regular feature that examines interesting comments and thought-provoking conversations posted by the community. Congress showed little sign of resolving its partisan standoff Tuesday over the payroll tax cut extension as the Republican-controlled House of Representatives passed a measure expressing disapproval of a [...]",
    "author":{
      "id":"person:wordpress.com:4480249",
      "link":"http://gravatar.com/masimon",
      "wpEmailMd5":"8d115c727dd66f465f38f3237770de8d",
      "displayName":"Mallory Simon, CNN News blog editor",
      "objectType":"person"
    },
    "wpCommentCount":141,
    "link":"http://news.blogs.cnn.com/2011/12/20/overheard-on-cnn-com-mad-as-hell-with-congress/",
    "wpBlogId":11591264,
    "displayName":"Overheard on CNN.com: Congress' lack of action sparks anger",
    "objectType":"article",
    "wpPostId":105729
  },
  "postedTime":"2011-12-20T18:17:30.000Z",
  "provider":{
    "link":"http://im.wordpress.com:8008/comments.json",
    "displayName":"WordPress",
    "objectType":"service"
  },
  "object":{
    "id":"comment:wordpress.com:11591264:105729:981031",
    "content":"I'm liking that idea",
    "inReplyTo":{
      "link":http://news.blogs.cnn.com/2011/12/20/overheard-on-cnn-com-mad-as-hell-with-congress/comment-page-2/#comment-980849
    },
    "link":"http://news.blogs.cnn.com/2011/12/20/overheard-on-cnn-com-mad-as-hell-with-congress/comment-page-2/#comment-981031",
    "postedTime":"2011-12-20T18:17:30.000Z",
    "displayName":"Comment on Overheard on CNN.com: Congress&#039; lack of action sparks anger by carla",
    "objectType":"comment"
  },
  "actor":{
    "wpEmailMd5":"f4f519816c4b3b77a61ae52e2339886a",
    "displayName":"carla",
    "objectType":"person"
  },
  "displayName":"Overheard on CNN.com: Congress&#039; lack of action sparks anger",
  "gnip":{
    "publisher":{
      "name":"wordpresscom"
    }
  }
}

### 3.3.3) Like Article Activity

In [None]:
{
  "verb":"like",
  "target":{
    "link":"http://theheartisunpolluted.wordpress.com/",
    "wpBlogId":8066970,
    "displayName":"I AM here",
    "objectType":"blog"
  },
  "postedTime":"2011-12-20T18:22:06.000Z",
  "provider":{
    "link":"http://im.wordpress.com:8008/likes.json",
    "displayName":"WordPress",
    "objectType":"service"
  },
  "object":{
    "id":"article:wordpress.com:8066970:2991",
    "link":"http://theheartisunpolluted.wordpress.com/2011/12/20/for-those-who-appreciate-street-art/",
    "wpBlogId":8066970,
    "displayName":"For those who appreciate street art",
    "objectType":"article"
  },
  "actor":{
    "id":"person:wordpress.com:29967749",
    "link":"http://gravatar.com/endiosconfiamos",
    "wpEmailMd5":"241229424a0e27d0321367bb5505aae5",
    "displayName":"endiosconfiamos",
    "objectType":"person"
  },
  "gnip":{
    "publisher":{
      "name":"wordpresscom"
    }
  }
}

### 3.3.4) Unlike Article Activity

In [None]:
{
  "verb": "unlike",
  "id": "tag:gnip.wordpress.com:2012:blog\/53682150\/post\/1552\/unlike\/51441893",
  "postedTime": "2013-10-31T20:00:54.000Z",
  "provider": {
    "objectType": "service",
    "displayName": "WordPress",
    "link": "http:\/\/im.wordpress.com:8008\/likes.json"
  },
  "actor": {
    "objectType": "person",
    "displayName": "Merrissa",
    "id": "person:wordpress.com:51441893",
    "wpEmailMd5": "e2e9cb6fa644989b99e4c95b26d24d96",
    "link": "http:\/\/gravatar.com\/merrissapalmer"
  },
  "target": {
    "objectType": "blog",
    "displayName": "Rad Maverix",
    "link": "http:\/\/radmaverix.com\/",
    "wpBlogId": 53682150
  },
  "object": {
    "objectType": "article",
    "displayName": "Comfort Food Dinner",
    "link": "http:\/\/radmaverix.com\/2013\/10\/30\/comfort-food-dinner\/",
    "wpBlogId": 53682150,
    "wpPostId": 1552,
    "id": "article:wordpress.com:53682150:1552"
  },
  "gnip": {}
}

# 4) Operators and Rules

References:
- http://support.gnip.com/sources/wordpress/powertrack_operators.html
- http://support.gnip.com/apis/powertrack/rules.html

## 4.1) Wordpress PowerTrack Operators

* **keyword** - Matches a keyword within an activity's tokenized body.
* **contains:** - Matches a keyword within an activity's body just as a substring operation, with no tokenization.
* **exact phrase match** - Matches an exact phrase within the body of an activity.
* **proximity operator** - Example: *"keyword1 keyword2"~N* where specified keywords must be no more than N tokens from each other. N cannot be greater than 6.
* **from:** - Matches any activity from the specified user.
* **url_contains:** - Matches activities with URLs that literally contain the given phrase or keyword. Considers both shortened and expanded URLs.
* **has:links:** - Matches activities which contains one or more links in the body.
* **lang:** - Matches activities that have been classified by Gnip as being of a particular language (if, and only if, the activity has been classified).
* **has:lang** - Matches activities which Gnip has classified as any language.
* **sample:** - Returns a random sample of activities that match a rule rather than the entire set of activities.
* **post_title:** - Matches an exact phrase within the title of a posted article.
* **post_title_contains:** - Matches a substring within the title of a posted article.
* **activity_url_contains:** - Matches activities where the activity URL (i.e. permalink) contains the given phrase or keyword. URL encodings are not encoded at this time. To search for patterns with punctuation in them (i.e. cnn.com) enclose the search term in quotes.
* **is:like** - Matches only activities that are Likes/Un-Likes.
* **is:article** - Matches only activities that are posted articles.
* **is:comment** - Matches only activities that are comments.
* **publisher:** - Matches the Automattic publisher, Wordpress.com, Wordpress.org, or IntenseDebate.

# 5) Conclusions

## 5.1) Trending Tags/URLs

Both tags and URLs may be included in articles, thus allowing identification of trends. 

Tags are included in a separate list attribute within article activities, whereas URLs must be manually extracted from activities' content or objectively searched through **url_contains** operator, which matches both shortened and expanded URLs.

## 5.2) Keywords and phrases

General keywords and phrases may also be easily searched by using the operators **keyword**, **contains** or **exact phrase match**.

The operator **post_title_contains** can be used for matching substrings in the title.

## 5.3) Popularity

Additionally to the topics **5.1** and **5.2**, article activities also hold an attribute **wpCommentsCount** which indicates the total number of comments that the article received.

**Likes** and **Un-likes** may also me added to popularity analysis.

## 5.4) Influencers

For deducting the influence of a given actor who authored a post, **Likes** and **Un-likes** may be used. 

The documentation states that these two types of feedback activities are addressed to articles, saying *nothing about comments*. 

Therefore, with a simple assumption, as more *likes* a given actor receives in his/her articles, more he/she may be considered an *influencer*. 

If the documentation is only incomplete and comments can also be targets of likes/un-likes, then a whole other sort of influence can be measured/deducted.

The only informations about actors are those detailed in **3.2.1**, observing that the attributes **id** and **link** are *not always present*.
