Skip to content
Browse files

Finish draft of social networking doc

Signed-off-by: Rick Copeland <rick@arborian.com>
  • Loading branch information...
1 parent 954b714 commit 8bedbafaa568117f4a74e78a0754d47d35fface9 @rick446 committed
Showing with 236 additions and 62 deletions.
  1. +236 −62 source/applications/use-cases/social-user-profile.txt
View
298 source/applications/use-cases/social-user-profile.txt
@@ -78,8 +78,7 @@ along with the user's profile data:
... },
...}
},
- blocked: ['gh1...0d'],
- pages: { wall: 4, news: 3 }
+ blocked: ['gh1...0d']
}
There are a few things to note about this schema:
@@ -97,10 +96,6 @@ There are a few things to note about this schema:
appear on the user's wall or news feed.
- The particular profile data stored for the user is isolated into the
``profile`` subdocument, allowing you to evolve the schema as necessary without
- worrying about introducing bugs into the social graph.
-- The ``pages`` property is used to store the number of pages in the
- ``social.wall``, and ``social.news`` collections for this
- particular user. These will be used below when creating new posts.
Of course, to make the network interesting, it's necessary to add various types of
posts. These are stored in the ``social.post`` collection:
@@ -162,8 +157,7 @@ follows.
{
_id: ObjectId(...),
user_id: "T4Y...AE",
- page: 4,
- num_posts: 42,
+ month: '201204',
posts: [
{ id: ObjectId(...),
ts: ISODateTime(...),
@@ -171,14 +165,15 @@ follows.
circles: [ '*public*' ],
type: 'status',
detail: { text: 'Loving MongoDB' },
+ comments_shown: 3,
comments: [
{ by: { id: "T4Y...AG", name: 'Dwight',
ts: ISODateTime(...),
text: 'Right on!' },
- ... only last X comments listed ...
+ ... only last 3 comments listed ...
]
},
- { id: ObjectId(...),
+ { id: ObjectId(...),s
ts: ISODateTime(...),
by: { id: "T4Y...AE", name: 'Max' },
circles: [ '*circles*' ],
@@ -188,11 +183,12 @@ follows.
geo: [ 40.724348,-73.997308 ],
name: '10gen Office',
photo: 'http://....' },
+ comments_shown: 1,
comments: [
{ by: { id: "T4Y...AD", name: 'Jared' },
ts: ISODateTime(...),
text: 'Wrong coast!' },
- ... only last X comments listed ...
+ ... only last 1 comment listed ...
]
},
{ id: ObjectId(...),
@@ -202,11 +198,12 @@ follows.
type: 'status',
detail: {
text: 'So when do you crush Oracle?' },
+ comments_shown: 2,
comments: [
{ by: { id: "T4Y...AE", name: 'Max' },
ts: ISODateTime(...),
text: 'Soon... ;-)' },
- ... only last X comments listed ...
+ ... only last 2 comments listed ...
]
},
...
@@ -220,12 +217,13 @@ There are a few things to note about this schema:
display more comments on a post, you would then query the ``social.post``
collection for full details.
- There are actually multiple ``social.wall`` documents for each ``social.user``
- document. This allows the system to keep a "page" of recent posts in the
- initial page view, fetching older "pages" if requested. A ``page`` property
- keeps track of the position of this page of posts on the user's overall wall
- timeline along with the timestamps on individual posts.
+ document, one wall document per month. This allows the system to keep a "page" of
+ recent posts in the initial page view, fetching older months if requested.
- Once again, the ``by`` properties store only the minimal author information for
display, helping to keep this document small.
+- The number of comments on each post is stored to allow later updates to find
+ posts with more than a certain number of comments since the ``$size`` query
+ operator does not allow inequality comparisons.
The other dependent collection you'll use is ``social.news``, posts from people
the user follows. This schema includes much of the same information as the
@@ -237,8 +235,7 @@ clarity:
{
_id: ObjectId(...),
user_id: "T4Y...AE",
- page: 3,
- num_posts: 42,
+ month: '201204',
posts: [ ... ]
}
@@ -265,29 +262,29 @@ similar operations, and can be supported by the same code:
.. code-block:: python
- def get_posts(collection, user_id, page=None):
+ def get_posts(collection, user_id, month=None):
spec = { 'user_id': viewed_user_id }
- if page is not None:
- spec['page'] = {'$lte': page}
+ if month is not None:
+ spec['month'] = {'$lte': month}
cur = collection.find(spec)
- cur = cur.sort('page', -1)
+ cur = cur.sort('month', -1)
for page in cur:
for post in reversed(page['posts']):
- yield page['page'], post
+ yield page['month'], post
The function ``get_posts`` above will retrieve all the posts on a particular user's
wall or news feed in reverse-chronological order. Some special handling is
required to efficieintly achieve the reverse-chronological ordering:
-- The ``posts`` within a page are actually stored in chronological order, so the
+- The ``posts`` within a month are actually stored in chronological order, so the
order of these posts must be reversed before displaying.
- As a user pages through her wall, it's preferable to avoid fetching the first
- few pages from the server each time. To achieve this, the code above specifies
- the first page to fetch in the ``page`` argument, passing this in as an
+ few months from the server each time. To achieve this, the code above specifies
+ the first month to fetch in the ``month`` argument, passing this in as an
``$lte`` expression in the query.
-- Rather than only yielding the post itself, the post's page is also yielded from
- the generator. This provides the ``page`` argument used in any subsequent calls
- to ``get_posts``.
+- Rather than only yielding the post itself, the post's month is also yielded from
+ the generator. This provides the ``month`` argument to be used in any
+ subsequent calls to ``get_posts``.
There is one other issue that needs to be considered in selecting posts for
display: privacy settings. In order to handle privacy issues effectively, you'll
@@ -341,32 +338,113 @@ Index Support
`````````````
In order to quickly retrieve the pages in the desired order, you'll need an index
-on (``user_id``, ``page``) in both the ``social.news`` and ``social.wall``
-collections. Since this combination is in fact unique, you should go ahead and
-specify ``unique=True`` for the index (this will become important later).
+on (``user_id``, ``month``) in both the ``social.news`` and ``social.wall``
+collections.
.. code-block:: pycon
- >>> db.social.news.ensure_index([
- ... ('user_id', 1),
- ... ('page', -1)],
- ... unique=True)
- >>> db.social.wall.ensure_index([
- ... ('user_id', 1),
- ... ('page', -1)],
- ... unique=True)
+ >>> for collection in (db.social.news, db.social.wall):
+ ... collection.ensure_index([
+ ... ('user_id', 1),
+ ... ('month', -1)])
+
+Commenting on a Post
+~~~~~~~~~~~~~~~~~~~~
+
+Other than viewing walls and news feeds, creating new posts is the next most
+common action taken on social networks. To create a comment by ``user`` on a
+given ``post`` containing the given ``text``, you'll need to execute code similar
+to the following:
+
+.. code-block:: python
+
+ from datetime import datetime
+
+ def comment(user, post_id, text):
+ ts = datetime.utcnow()
+ month = ts.strfime('%Y%m')
+ comment = {
+ 'by': { 'id': user['id'], 'name': user['name'] }
+ 'ts': ts,
+ 'text': text }
+ # Update the social.posts collection
+ db.social.post.update(
+ { '_id': post_id },
+ { '$push': { 'comments': comment } } )
+ # Update social.wall and social.news collections
+ db.social.wall.update(
+ { 'posts.id': post_id },
+ { '$push': { 'comments': comment },
+ '$inc': { 'comments_shown': 1 } },
+ upsert=True,
+ multi=True)
+ db.social.news.update(
+ { 'posts.id': _id },
+ { '$push': { 'comments': comment },
+ '$inc': { 'comments_shown': 1 } },
+ upsert=True,
+ multi=True)
+
+.. note::
+
+ One thing to note in this function is the presence of a couple of ``multi=True``
+ update statements. Since these can potentially take quite a long time, this
+ function is a good candidate for processing 'out of band' with the regular
+ request-response flow of your application.
+
+The code above can actually result in an unbounded number of comments being
+inserted into the ``social.wall`` and ``social.news`` collections. To compensate
+for this, you should periodically run the following update statement to truncate
+the number of displayed comments and keep the size of the news and wall documents
+manageable.:
+
+.. code-block:: python
+
+ COMMENTS_SHOWN = 3
+
+ def truncate_extra_comments():
+ db.social.news.update(
+ { 'posts.comments_shown': { '$gt': COMMENTS_SHOWN } },
+ { '$pop': { 'posts.$.comments': -1 },
+ '$inc': { 'posts.$.comments_shown': -1 } },
+ multi=True)
+ db.social.wall.update(
+ { 'posts.comments_shown': { '$gt': COMMENTS_SHOWN } },
+ { '$pop': { 'posts.$.comments': -1 },
+ '$inc': { 'posts.$.comments_shown': -1 } },
+ multi=True)
+Index Support
+`````````````
+In order to execute the updates to the ``social.news`` and ``social.wall``
+collections show above efficiently, you'll need to be able to quickly locate both
+of the following types of documents:
+
+- Documents containing a given post
+- Documents containing posts displaying too many comments
+
+To quickly execute these updates, then, you'll need to create the following
+indexes:
+
+.. code-block:: pycon
+
+ >>> for collection in (db.social.news, db.social.wall):
+ ... collection.ensure_index('posts.id')
+ ... collection.ensure_index('posts.comments_shown')
Creating a New Post
~~~~~~~~~~~~~~~~~~~
+Creating a new post fills out the content-creation activities on a social
+network:
+
.. code-block:: python
from datetime import datetime
- POSTS_PER_PAGE=25
def post(user, dest_user, type, detail, circles):
ts = datetime.utcnow()
+ month = ts.strfime('%Y%m')
post = {
'ts': ts,
'by': { id: user['id'], name: user['name'] },
@@ -376,43 +454,139 @@ Creating a New Post
'comments': [] }
# Update global post collection
db.social.post.insert(post)
- if dest_user in user['followers']
- result = db.social.wall.update(
- { 'user_id': user['id'], 'page': user['wall_pages'] }
-
+ # Copy to dest user's wall
+ if user['id'] not in dest_user['blocked']:
+ append_post(db.social.wall, [dest_user['id']], month, post)
+ # Copy to followers' news feeds
+ if circles == ['*public*']:
+ dest_userids = set(user['followers'].keys())
+ else:
+ dest_userids = set()
+ if circles == [ '*circles*' ]:
+ circles = user['circles'].keys()
+ for circle in circles:
+ dest_userids.update(user['circles'][circle])
+ append_post(db.social.news, dest_userids, month, post)
+
+The basic sequence of operations in the code above is the following:
+
+#. The post first saved into the "system of record," the ``social.post``
+ collection.
+#. The recipient's wall is updatd with the post.
+#. The news feeds of everyone who is 'circled' in the post is updated with the
+ post.
+
+Updating a particular wall or group of news feeds is then accomplished using the
+``append_post`` function:
-Commenting on a Post
-~~~~~~~~~~~~~~~~~~~~
+.. code-block:: python
-Adding a User to a Circle
-~~~~~~~~~~~~~~~~~~~~~~~~~
+ def append_post(collection, dest_userids, month, post):
+ collection.update(
+ { 'user_id': { '$in': sorted(dest_userids) },
+ 'month': month },
+ { '$push': { 'posts': post } },
+ multi=True)
-Removing a User from a Circle
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Index Support
+`````````````
+
+In order to quickly update the ``social.wall`` and ``social.news`` collections,
+you'll once again need an index on both ``user_id`` and ``month``. This time,
+however, the optimal order on the indexes is (``month``, ``user_id``). This is
+due to the fact that updates to these collections will always be for the current
+month; having month appear first in the index makes the index *right-aligned*,
+requiring significantly less memory to store the active part of the index.
-Viewing a User's Profile
-~~~~~~~~~~~~~~~~~~~~~~~~
+To actually create this index, you'll need to execute the following commands:
-Another common read operation on social networks is viewing a user's profile,
-including their wall posts. The code is actually quite similar to the code for
+.. code-block:: pycon
-Operation 1
-~~~~~~~~~~~
+ >>> for collection in (db.social.news, db.social.wall):
+ ... collection.ensure_index([
+ ... ('month', 1),
+ ... ('user_id', 1)])
-TODO: describe what the operation is (optional)
-Query
-`````
+Maintaining the Social Graph
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-TODO: describe query
+In your social network, maintaining the social graph is an infrequent but
+essential operation. The code to add a user ``other`` to the current user
+``self``\'s circles, you'll need to run the following function:
+
+.. code-block:: python
+
+ def circle_user(self, other, circle):
+ circles_path = 'circles.%s.%s' % (circle, other['_id'])
+ db.social.user.update(
+ { '_id': self['_id'] },
+ { '$set': { circles_path: { 'name': other['name' ]} } })
+ follower_circles = 'followers.%s.circles' % self['_id']
+ follower_name = 'followers.%s.name' % self['_id']
+ db.social.user.update(
+ { '_id': other['_id'] },
+ { '$push': { follower_circles: circle },
+ '$set': { follower_name: self['name'] } })
+
+Note that in this solution, previous posts of the ``other`` user are not added to
+the ``self`` user's news feed or wall. To actually include these past posts would
+be an expensive and complex operation, and goes beyond the scope of this use case.
+
+Of course, you'll also need to support *removing* users from circles:
+
+.. code-block:: python
+
+ def uncircle_user(self, other, circle):
+ circles_path = 'circles.%s.%s' % (circle, other['_id'])
+ db.social.user.update(
+ { '_id': self['_id'] },
+ { '$unset': { circles_path: 1 } })
+ follower_circles = 'followers.%s.circles' % self['_id']
+ db.social.user.update(
+ { '_id': other['_id'] },
+ { '$pull': { follower_circles: circle } })
+ # Special case -- 'other' is completely uncircled
+ db.social.user.update(
+ { '_id': other['_id'], follower_circles: {'$size': 0 } },
+ { '$unset': { 'followers.' + self['_id' } }})
Index Support
`````````````
-TODO: describe indexes to optimize this query
+In both the circling and uncircling cases, the ``_id`` is included in the update
+queries, so no additional indexes are required.
Sharding
--------
+In order to scale beyond the capacity of a single replica set, you will need to
+shard each of the collections mentioned above. Since the ``social.user``,
+``social.wall``, and ``social.news`` collections contain documents which are
+specific to a given user, the user's ``_id`` field is an appropriate shard key:
+
+.. code-block:: pycon
+
+ >>> db.command('shardcollection', 'social.user', {
+ ... 'key': {'_id': 1 } } )
+ { "collectionsharded": "social.user", "ok": 1 }
+ >>> db.command('shardcollection', 'social.wall', {
+ ... 'key': {'user_id': 1 } } )
+ { "collectionsharded": "social.wall", "ok": 1 }
+ >>> db.command('shardcollection', 'social.news', {
+ ... 'key': {'user_id': 1 } } )
+ { "collectionsharded": "social.news", "ok": 1 }
+
+It turns out that using the posting user's ``_id`` is actually *not* the best
+choice for a shard key for ``social.post``. This is due to the fact that queries
+and updates to this table are done using the ``_id`` field, and sharding on
+``by.id``, while tempting, would require these updates to be *broadcast* to all
+shards. To shard the ``social.post`` collection on ``_id``, then, you'll need to
+execute the following command:
+
+ >>> db.command('shardcollection', 'social.post', {
+ ... 'key': {'_id': 1 } } )
+ { "collectionsharded": "social.post", "ok": 1 }
+
.. seealso:: ":doc:`/faq/sharding`" and the ":wiki:`Sharding` wiki
page.

0 comments on commit 8bedbaf

Please sign in to comment.
Something went wrong with that request. Please try again.