Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Loading…

Database schema #1

Closed
mplorentz opened this Issue · 13 comments

4 participants

@mplorentz
Owner

I think it would be really cool if we could have a table of users and auth tokens (or however that works) and store everything else in a use'rs Tent server. Is that possible?

We could create tagged posts in the users Tent servers for their preferences, I can't think of anything that couldn't be stored that way.

The upshot is that we would have a lightweight database (SQLite) which makes things easier for us and anyone who wants to self-host the app.

The downside is that we'd need more code to prevent people who tinker with their Tent databases from getting smoke-signals to behave in undesirably ways (i.e. creating an rss feed out of their tent posts and then making tent posts out of the rss feed).

@seanmonstar
Collaborator

We certainly could. What sort of data do you imagine needing to be stored?

Keeping a local list of [RSS feeds]:[accounts publishing them] will let us show people before they create yet another account publishing the same feed.

@mplorentz
Owner

Hmm ya, that would be a great feature. Maybe we could just tag the preferences onto the users table. Just have columns like entity, auth token, type (rss2tent or tent2rss), and feed URL, etc.

@bnjbvr
Collaborator

I thought about this for the rss2tent part:

User:

  • app_id
  • app_mac_key
  • app_mac_key_id
  • user_mac_key
  • user_mac_key_id
  • entity

Feed:

  • name char256
  • URL char256
  • lastFetchDate date

FeedItem:

  • guid text
  • feed_id: id of belonging feed

UserFeed : many to many table

@seanmonstar
Collaborator

@BenjBouv and how do you determine a new item? Feed items that have a published date of later than lastFetchDate?

For the tent2rss part:

Most of the above, but add to User last_post_id.

@bnjbvr
Collaborator

@seanmonstar New items are determined the following way:

  • either the item contains a pubDate and we compare pubDate and lastFetchDate. FYI, RSS2 feed items are described over there: http://cyber.law.harvard.edu/rss/rss.html
  • or it doesn't, and in this case we just compare the guid with the guid of feed items belonging to the current RSS feed. A guid which is not present in the set of guids of retrieved items should be considered to be a new item ( http://www.詹姆斯.com/blog/2006/08/rss-dup-detection ). Of course, a dictionnary should help finding guid for a given feed, so as to avoid fetching the database everytime.
@seanmonstar
Collaborator

Since we know some people are horrid at making RSS feeds, what if pubDates change to be later than lastFetchDate? And for guid resolution: some feeds use the same guid for every item :(. And some use the same URL for every item too.

Also, would there ever be concern of storage space if we hold on to every feed item? Maybe we could clean up feed items every month, removing all that are older than a month (or pick a better time span)?

@bnjbvr
Collaborator

The uniqueness problem seems particularly complicated to deal with. Having an algorithm that decides which field to use in the feed seems to be overkill. What about just using the link as a unique identifier? Are there some stats about how bad people make RSS feeds? :)

Cleaning up feed items regularly seems indeed a good idea. A beta test with real world data should show us how fast the database gets filled and give us an estimate of a right time span.

@seanmonstar
Collaborator

An addition to the Feed table: add an etag field

@florianjacob
Collaborator

Hm, how do other feed readers solve the uniqueness problem? What about throwing all fields togehter and generate a single hash out of it? That's hopefully unique in every case. ;) And it would give us a post update detection mechanism e.g. when typos got corrected.

@mplorentz
Owner

I'm going to write up an sqlite3 schema based on our discussion. I'll ask for feedback when it's up.

@mplorentz
Owner

Should the mac_key_ids be char256 columns in the database? I've never worked with auth tokens before.

While I'm at it, what is the mac_key and mac_key_id. Related to the MAC address?

@seanmonstar
Collaborator
@mplorentz
Owner

Ok guys the initial database schema is up. The feed, feeditems, and the user's preferences will all be stored in the user's Tent server. We can go ahead and create tables for these as we need them since we don't have 0.3 servers to test against yet.

As a followup on determining new RSS posts here's my proposal:
We make a dict of the last 200 or so FeedItems we've seen from a particular feed, by hashing together a bunch of the RSS item's fields. We store this (pickled) dict in the user's Tent server and when we fetch the RSS feed we search for each of the items in the dict. Does anyone see major problems with that? It solves the uniqueness problem without requiring us to sift through a large amount of posts.

@mplorentz mplorentz closed this
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.