Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mongo schema design #146

Closed
marcofiset opened this issue Feb 12, 2014 · 3 comments
Closed

Mongo schema design #146

marcofiset opened this issue Feb 12, 2014 · 3 comments

Comments

@marcofiset
Copy link
Contributor

Is there a particular reason why Comments and Votes belong to their own collection?
I'd rather have comments and votes as nested properties of the NewsItem document, as it seems that they are pretty much always queried together.

I believe comments aren't really useful on their own, neither are votes. They really are tied to a particular item. NewsItems could have a votes property as an array of embedded documents, and comments be embedded in the NewsItem in the same way. Comments would also have their own an array of votes.

I know it's kind of late to question such fundamental design decisions, but I'm concerned we might run into performance problems eventually. Plus it leads to pretty awful code for the frontpage, having to query each item's votes and comments in separate database request.

This is not supposed to be a rant in any way, I'm just trying to understand the reasoning behind the current design. English is my second language so please bear with me if I am not careful enough with my words, I don't mean to be offensive at all.

@treygriffith
Copy link
Collaborator

Hey @marco-fiset, I created the separate collections for Votes and Comments, so I'm probably the best one to answer as to the reasoning why. (and don't worry, I'm not offended).

The way the database is designed now is really more of a relational design, and doesn't take much advantage of MongoDB's document-based design. However, I've found that the document-based databases are the most useful when each document is self-contained. It starts to break down when the connections between documents become important, as is the case with both Votes and Comments. Specific reasoning behind each one is below.

I separated Comments into their own collection for two reasons:

  1. It will likely be useful to query comments by author to create a page of submitted comments (like on HN)
  2. The comments for any single article might grow fairly large, and we probably won't want to retrieve all of the comments for every article just to build the front page. You can exclude the comments from the query, but then you'd have to do a separate count query anyway.

I separated Votes into their own collection because of disadvantages of the other two options (or at least the only other two options I thought of).

  1. Store a vote count on the news item
    • If a simple vote count is stored on the news item, the user profile will have to include every news item that a user has voted for so that we can prevent voting for an item more than once. This keeps the data in two places, which is prone to problems, and is by it's nature a non-atomic operation.
  2. Store all of the voters on the news item in an array
    This has multiple drawbacks:
    • It boosts the size of the news items on the front page (a similar, although less serious problem than the comments problem above)
    • To avoid conflicts when users simultaneously vote for an item you'd have to employ Mongo's $addToSet or $push operator, which means that you can't use subdocuments (since each subdoc would have a unique _id). Without subdocuments, the votes can't have any metadata, like the magnitude or direction of a vote (i.e. a downvote). Without that metadata, you'd need an entirely new array for each change (e.g. a downvotes array and an upvotes array).

I'm definitely open to discussing the alternatives, especially for Votes, but the pro's of separate collections seemed to outweigh the cons to me, with the added benefit of being familiar to users coming from a relational DB background.

For reference, this reddit thread has over 10,000 comments. And the most popular HN post got over 4,000 upvotes.

@treygriffith
Copy link
Collaborator

Oh, and something I forgot to mention is that nested comments (#153) would result in a pretty deeply nested comments subdocuments / vote arrays.

@marcofiset
Copy link
Contributor Author

@treygriffith Thanks for the clarification, I see you gave it a lot more thought than I did! It makes a lot more sense to me now.

I exactly thought the same thing about multi-level comment nesting when I looked at the new issues this morning and saw #153.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants