Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Datastore notes #3

Closed
aaronlidman opened this issue Jul 13, 2014 · 2 comments
Closed

Datastore notes #3

aaronlidman opened this issue Jul 13, 2014 · 2 comments

Comments

@aaronlidman
Copy link
Contributor

This has been a mess, some notes:

  • Initial idea: everything in S3 with a micro server. Server requests map to simple s3cmd commands
    • Importing to S3 turned out to be very slow for ~500k objects, each only a few bytes. Bad idea; too much hammer not enough nail.
  • Parse.com, import a csv, grab a random object.
    • No random query, would love to use Parse or Firebase to host this though
  • Current idea: each user gets assigned to a namespace and only grabs the next object from their namespace, hopefully greatly reducing the chance of clashing (ie multiple users editing the same thing at the same time which would happen with a single namespace).
    • What happens when one namespace runs out of errors? or when they all start running low and everyone is clashing? Assign multiple namespaces (5) to a user and cycle through them? How do we ensure even distribution?
    • Do we need a checkout system again? Very much want to avoid.
    • Maybe rotate through namespaces by considering userid + time? Each request from each user would hopefully hit a different namespace every time. Not sure what happens when we approach zero.
  • I also did a redis thing; generate the csv, import it to redis, deal with it there, but I don't want to maintain an instance and it feels limited.
  • Fallback: RDS and a micro instance.

Stopgap for today, run directly off the already running Postgres. I hate it, but I need to get away from this for now. I'll figure out namespacing later.

@aaronlidman
Copy link
Contributor Author

Another idea:

  • Have the client download the entire CSV for a particular error on initial load (they're surprisingly small, deadendoneways = 80946 items -> 4mb or we could split them up to some reasonable size), could just put them in a gist
  • store the status for each item in Parse which sends updates to all other clients
    • could we use more real-timey stuff like this elsewhere?

@aaronlidman aaronlidman mentioned this issue Jul 14, 2014
Closed
9 tasks
@aaronlidman
Copy link
Contributor Author

More or less figured out using redis for now. The challenging thing was finding a way of randomly selecting items in an efficient matter. Redis sets allows for this. I'm also storing values in redis for now. This will likely change as they can be large. Depending on how many items we items end up having I might have to rethink parts of this as redis is all in memory, but it's fine for ~10M right now and likely good for 5 or 10x.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant