Smarter checking for new repos #236

Open
cespare opened this Issue May 4, 2012 · 4 comments

Comments

Projects
None yet
2 participants
Collaborator

cespare commented May 4, 2012

Right now, we have this call at the beginning of several high-frequency routes:

MetaRepo.instance.scan_for_new_repos

This can do two things:

  1. Compare the set of files in the repos directory to the list of repo names in the DB
  2. If these do not match, do expensive stuff (find or create every repo in the DB and do some grit calls)

This caused us a big performance issue because we had an empty directory in the repos folder, which caused 1. to always indicate that there are new repos, so 2. was being run every time MetaRepo.instance.scan_for_new_repos was called. This is not a hard pathological case to hit -- if you try to add a new repo that happens to be empty, for instance, you'll end up with just such an empty directory in the repos folder.

More generally, we should try to remove as much disk-touching as possible from the critical path of important frequently-called routes (api calls, /saved_search/:id, etc).

Contributor

philc commented May 4, 2012

We can debate more in person. I added this because without it, we have poor usability around new or deleted repos -- Commit objects will exist in the database and our code assumes commit.grit_commit is always valid, but it can be nil if we haven't scanned for new repos. I considered scanning for new repos when we discover that commit.grit_commit is nil, but there are enough code paths that access metarepo.grit_commit that I thought the current implementation was of similar complexity.

The common case where there are no new repos is very light and will not actually touch disk, but rather the disk cache. Was the empty directory detection broken by 792bc88?

This check can be made asynchronous in another thread, but it needs to be done in-process. The actual work that needs to be done is loading a Grit instance for each repo, and that needs to be done in all of our running process -- all unicorn workers and all Resque workers.

In the current usage, metarepo.scan_for_new_repos should be at the head of any route which loads commit information. The fact that it's not in /commits is an oversight.

Collaborator

cespare commented May 4, 2012

Can't we do the scan in Commit#grit_commit whenever we would return nil? Any solution here that requires us to figure out whether a route might load commit information and remember to add the scan method at the beginning has a smell to me.

This check seems fragile, because there are lots of ways in which it can be tricked into thinking that something changed every time we scan. A repo delete could fail in some way and leave a directory lying around. The repo delete could delete the directory, but fail before deleting the entry from the git_repos table. Somebody could just mkdir foo. Any of these will cause the same nasty performance problem. As an alternative approach, what about adding repos by first cloning, then adding the repo to the DB, and then "scanning for new repos" in the workers by just seeing if any new entries have been added to the DB? That way we won't depend on the file structure of repos folder as the source of the list of repos in the system.

Anyway, now that I see your reasoning here, I'll give some thought to it. Tomorrow I'll see if 792bc88 changed the behavior in an undesirable way, and add more comments here if I come to some more conclusions.

Contributor

philc commented May 4, 2012

If we move this logic into MetaRepo, it looks like there are 4 places that need to be patched. We can probably reduce this number by refactoring client code:

  • MetaRepo.all_repos (this isn't a method that exists, but we'll need something that detects new repos before handing back a list of all repos for iteration. The fetchCommits resque job uses it).
  • MetaRepo.get_grit_repo
  • MetaRepo.grit_commit
  • MetaRepo.db_commit

Actually, any method in MetaRepo that uses @repo_names_and_ids_to_repos, @repo_name_to_id or @repos.

Collaborator

cespare commented Oct 4, 2012

We had this same issue bite us again. ugh...

cespare was assigned Oct 25, 2012

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment