Skip to content

Latest commit

 

History

History
265 lines (209 loc) · 12.4 KB

ch01-what-are-activity-streams.asciidoc

File metadata and controls

265 lines (209 loc) · 12.4 KB

What are Activity Streams?

Consider that Activity Streams are a rather recent development on the Web. It might be illustrative to take a quick stroll down memory lane and peek at what came before.

From documents to blogs

HTML documents were the original hotness, drafted by "high energy physicists to share data, news, and documentation". But, soon, a broader population of academics and friends became webmasters and webmistresses. They transformed their .plan files into home pages and discovered animated GIFs for the first time.

Eventually, frequently updated pages evolved into the weblog—or "blog" for short. The blog format consists typically of pages with serial changes listed in reverse-chronological order—so the new stuff always shows up at the top of the page. Early blogs served as news desks, shared pointers to cool stuff, or even as personal journals open to the public.

Though handcrafted at first, blogs soon benefited from a growing category of software built to manage publishing on the web. As software took over the basic tasks, the complexity could grow. In fact, back in the early 2000’s, some of the first web service APIs were created to support publishing blog posts from desktop editors and other tools.

From blogs to feeds

Along with APIs for publishing, XML syndication feeds began to take off for distributing and consuming content. Whereas HTML pages were mainly intended for human consumption by way of a browser, syndication feeds wrapped web content in machine-friendly structures.

Syndication feeds are XML documents that consist mainly of a parent document describing a site, and a collection of recent content items from the site. The individual content items generally offer the following fields:

pubDate

Publication date and time

title

Short title, generally a single sentence or less

description

Longer description consisting of paragraphs or a fragment of HTML content

link

The URL where content described by the feed or item can be found

This ignores many fields defined for feeds and items in RSS. It also doesn’t even consider the alternate feed format called Atom (a.k.a. RFC 4287). But, from this gives you a general idea of the data model behind feeds.

Tip
Cheap plug ahead: If you’re interested in reading more in-depth about syndication feeds, then look for my first book: Hacking RSS and Atom, published in 2005 by Wiley. In a sense, this is the spiritual successor to that book.

So: date & time, title, description, and link—these are the basic components found in an item from a syndication feed. If you think about it, this schema closely mirrors the shape of blogs. And, just like blogs, they generally contain items in reverse-chronological order—so the new stuff shows up at the top. In other words, feeds are blogs for bots.

But, what kind of bots read syndication feeds? Web portals and personal news readers.

Imagine, if you will: In the days before feeds, we relied on browser bookmarks to make daily rounds to popular news sites and blogs. But, with the availability of feeds, software could make those daily rounds—even hourly rounds—on our behalf. That is, software fetched feeds from the sites we used to visit in person, and their content was assembled into a single combined display, arranged in reverse-chronological order—also known as a River of News. Automated surfing, the ultimate in information addiction convenience!

From high energy physicists to blogosphere techies

So, physics papers and homepages evolved into blogs. And blogs, in turn, were used to publish global news, link collections, personal journals, photo galleries, and even audiobooks in podcast form.

Blogs also sprouted sophisticated feedback mechanisms and interconnections: Comment threads appeared paired with entries. Referers (sic), trackbacks, and pingbacks were used to automatically form links between entries posted on different sites. Sidebar widgets offered links to affiliated blogs (see also: "blogroll"), opened up live chat rooms, and presented readership rosters. The network established between blogs—known as the blogosphere—formed a powerful and lively social space with plenty of subdivisions and activity.

But, the thing is, this social network grew almost entirely in spite of the technologies involved. That is to say, people signed up for web hosting and learned how to configure blogs in anticipation of promised expressiveness and community. Blogs presented a technical challenge that filtered out all but the most adventurous—thus, the Blogosphere has largely been an exclusive community, weighted heavily with techies.

The social web beyond blogs

That’s not to say there hasn’t been work to address that exclusivity, though. Throughout the later half of the 2000’s, a fresh wave of web startups arrived that was labelled by many as "Web 2.0".

Though there were plenty of fresh ideas, many of these new sites and services picked up leads and loose ends from the blogosphere. Rather than require the purchase of web hosting, most opted for ad-supported business models and free registration. Along with sponsored hosting, a focus on usability got the technology out of the way and trumped the need for social humans to learn an alphabet soup of arcanities like HTTP, FTP, PHP, and SQL. This made the next generation of social media accessible to so many more people.

Rather than set up a photo gallery theme on a blog, you could sign up for Flickr. Instead of posting lists of interesting links, you could share your bookmarks on del.icio.us. Videos could be easily uploaded, processed, and viewed via YouTube. The news of the day could shared and amplified over on Digg. Short missives, or asides, were easily absorbed and broadcast by Twitter. There’s even been a movie made about the much more accessible and inclusive—though more contained—cousin to the blogosphere offered by Facebook.

Human beings add texture

Blogs and feeds inherit a lot of DNA from news publishing and content management systems. Human beings aren’t just content generators, though. What we do and how we share experiences is not always captured in the data model of feeds. If anything, it’s that feeds are too free form: You can shove almost anything into the description field of a feed item, while title and link are handy for describing just about any page on the web.

Consider that the social media sites of Web 2.0 refined many of the purposes and patterns first pioneered in blogs. Along with those refinements came more interesting—and more textured—data. And even though that data can find rough expression in existing HTML and feed formats, it can be difficult to parse meaning and significant data from those rough expressions.

More precise representations can open up new applications and opportunities. Recall what made feeds so interesting in the first place: They made frequently-updated web pages more machine-friendly. This, in turn, opened the way for automated web surfers that could gather news for us.

Well, consider what automated web agents we could build if we had more machine-readable information about people and what they do?

Streaming your life away

Even older than blogs and feeds is the notion of lifestreams, a term coined by Eric Freeman and David Gelernter as "a time-ordered stream of documents that functions as a diary of your electronic life".

The data model of a lifestream, as described in Freeman’s 1997 dissertation, consists of a set of flexibly structured personal documents, each of which is collected into one or more chronologically-ordered streams. These streams can be curated by hand or, more interestingly, produced by dynamic filtering agents equipped with pattern match functions.

Feeds produced by social media tools look a lot like lifestreams. There’ve even been platforms built to collect, filter, archive, summarize, and republish content carried in RSS and Atom feeds. But, there are things missing from the data model. Or, at least from the data available, it can be difficult for a machine to determine things such as:

  • Does this item represent a location, photo, event, song, or essay?

  • Does this location indicate where I’ve been, or where I plan to be?

  • Did I create this item, or is it something I liked or shared?

  • Am I inviting you to this event, planning to attend myself, or sending regrets?

  • From where did this item come? To whom or where am I directing it?

Missing this kind of data in a machine-friendly form makes it difficult to build the powerful substream agents and summarization facilities described as part of lifestreams—and those are very interesting and powerful features. They can enable the construction of stream views like the following:

  • What are people in San Francisco talking about?

  • What have my friends found interesting in the news today?

  • Show me a gallery of photos taken by my friends who have been to Toronto.

  • Give me a playlist of music my co-workers have been listening to.

  • Who’s coming to my party?

Behind each of these examples, there’s bound to be one or more social media sites offering it as a feature. But, you and your friends have to live that slice of life through that particular site to get the benefits. Remember when feeds let us build robots to do all our surfing between a laundry list of sites? Wouldn’t it be nice if we could pull everything back into one common stream again and get these views all in one place, or better yet by using any tools we like?

A Theory of Activities