Datastore notes #3

aaronlidman · 2014-07-13T18:08:53Z

This has been a mess, some notes:

Initial idea: everything in S3 with a micro server. Server requests map to simple s3cmd commands
- Importing to S3 turned out to be very slow for ~500k objects, each only a few bytes. Bad idea; too much hammer not enough nail.
Parse.com, import a csv, grab a random object.
- No random query, would love to use Parse or Firebase to host this though
Current idea: each user gets assigned to a namespace and only grabs the next object from their namespace, hopefully greatly reducing the chance of clashing (ie multiple users editing the same thing at the same time which would happen with a single namespace).
- What happens when one namespace runs out of errors? or when they all start running low and everyone is clashing? Assign multiple namespaces (5) to a user and cycle through them? How do we ensure even distribution?
- Do we need a checkout system again? Very much want to avoid.
- Maybe rotate through namespaces by considering userid + time? Each request from each user would hopefully hit a different namespace every time. Not sure what happens when we approach zero.
I also did a redis thing; generate the csv, import it to redis, deal with it there, but I don't want to maintain an instance and it feels limited.
Fallback: RDS and a micro instance.

Stopgap for today, run directly off the already running Postgres. I hate it, but I need to get away from this for now. I'll figure out namespacing later.

aaronlidman · 2014-07-13T20:01:51Z

Another idea:

Have the client download the entire CSV for a particular error on initial load (they're surprisingly small, deadendoneways = 80946 items -> 4mb or we could split them up to some reasonable size), could just put them in a gist
store the status for each item in Parse which sends updates to all other clients
- could we use more real-timey stuff like this elsewhere?

aaronlidman · 2014-08-03T19:58:21Z

More or less figured out using redis for now. The challenging thing was finding a way of randomly selecting items in an efficient matter. Redis sets allows for this. I'm also storing values in redis for now. This will likely change as they can be large. Depending on how many items we items end up having I might have to rethink parts of this as redis is all in memory, but it's fine for ~10M right now and likely good for 5 or 10x.

aaronlidman mentioned this issue Jul 14, 2014

Todo #7

Closed

9 tasks

aaronlidman closed this as completed Aug 3, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Datastore notes #3

Datastore notes #3

aaronlidman commented Jul 13, 2014

aaronlidman commented Jul 13, 2014

aaronlidman commented Aug 3, 2014

Datastore notes #3

Datastore notes #3

Comments

aaronlidman commented Jul 13, 2014

aaronlidman commented Jul 13, 2014

aaronlidman commented Aug 3, 2014