Skip to content
This repository has been archived by the owner on Sep 14, 2018. It is now read-only.

Fails on horribly huge inboxes #4

Open
gregose opened this issue Jun 23, 2014 · 20 comments
Open

Fails on horribly huge inboxes #4

gregose opened this issue Jun 23, 2014 · 20 comments

Comments

@gregose
Copy link

gregose commented Jun 23, 2014

My inbox has 5k+ messages. Hit a rate limit with the script:

Message details
Service invoked too many times in a short time: gmail rateMax. Try Utilities.sleep(1000) between calls. (line 311, file "labeler")
@mhagger
Copy link

mhagger commented Jul 9, 2014

I'm getting the same error (many times, rolled up in an email from Google most or all days). Except that my inbox only has 58 items in it. My "All Mail" folder has 2700, so even that isn't all that enormous. I set the script to run every 10 minutes. I just decreased that to every 15 minutes to see if it makes a difference.

I am using something very close to version dbb4a2d.

@btoews
Copy link
Owner

btoews commented Jul 9, 2014

@mhagger does it give a line number for your error message? I'm curious which API it is hitting the limit for.

@mhagger
Copy link

mhagger commented Jul 10, 2014

@mastahyeti: here's the start of the email:

screenshot from 2014-07-10 07 06 26

Line 319 in my version of the script (my customization probably changed the line numbers) is

parts = this._message.getRawContent().split("\r\n\r\n", 2);

@technicalpickles
Copy link

Seeing this as well after being offline a few days:

summary of failures for google apps script_ technicalpickles octogas - josh nichols github com - github mail

@mislav
Copy link

mislav commented Sep 16, 2014

For me it fails on an inbox that has less than 200 emails. See Google's quota limits here: https://script.google.com/dashboard

It claims "50000 Gmail operations / day" for Apps for Business accounts but doesn't go into detail of what this means. It's possible that we all share the same daily quota. Or, it's possible that there is a finer grained quota that would be alleviated by actually putting sleep() calls in the script?

@matthewmccullough
Copy link
Contributor

I'm getting the same/similar error every day now. My inbox typically has 5-20 emails in it (I process email into the archive hourly on weekdays). What can I do to adapt?

  • ✔️ Small number of items in inbox
  • ✔️ Script set to run only every 15 minutes (also tried 30m)

image

@mislav
Copy link

mislav commented Apr 6, 2015

I wrote my own simpler version of OctoGAS that optimizes a lot of queries over this one, but even that wasn't enough to get rid of query limit failures. I'll experiment with one more level of optimization and post my findings here.

@btoews
Copy link
Owner

btoews commented Apr 6, 2015

@matthewmccullough Looks like you're actually running into trouble with the muter script and not the labeler script. Google had changed some behavior which broke this script, but I fixed that in #11. Can you try copy-pasting the current version of https://github.com/mastahyeti/OctoGAS/blob/master/muter.gs into your copy of the script?

You might need to run the script manually once (they seem to rate limit manual script runs differently) to clear out your backlog of muted messages. Ping me if you need help with any of this.

@btoews
Copy link
Owner

btoews commented Apr 6, 2015

As for the rate limit problem with the labeler script, @josh had a good idea that I was meaning to follow up on. If the user adds a Gmail filter to add a octogas-queue label to each new message from GitHub, OctoGAS can search for that label instead of searching for all unarchived messages from GitHub. It can then do it's labeling and remove the octogas-queue label. This would cut down on the number of messages that are processed for each run.

@matthewmccullough
Copy link
Contributor

try copy-pasting the current version

👍 Whoops. Can do!

@technicalpickles
Copy link

While this is being figured out, I ended up creating a filter to just ignore the email notifications about hitting rate limits:

@mislav
Copy link

mislav commented Apr 6, 2015

In my script, I cache the timestamp when the script last run and then grab only threads that were updated since that time. Avoids iterating over threads that have already been processed:

  var query = 'in:inbox AND ( from:"notifications@github.com" OR from:"notifications@support.github.com" OR from:"noreply@github.com" )'
    , lastRunAt = cache.getLastRun()
    , newLastRun = new Date()

  if (lastRunAt) {
    query += " after:" + lastRunAt
  }
  cache.recordLastRun(newLastRun)

However, I think individual message.getPlainBody() calls are what is hitting the rate limits when there are long threads in progress in your inbox. Whenever a thread gets bumped, the GAS script needs to process it again, and does so from the beginning.

My plan was to experiment with saving the ID of the last message in the thread that was already processed, then start processing only new messages after that one. That will save on a lot of unnecessary getPlainBody() calls.

@technicalpickles
Copy link

As for the rate limit problem with the labeler script, @josh had a good idea that I was meaning to follow up on. If the user adds a Gmail filter to add a octogas-queue label to each new message from GitHub, OctoGAS can search for that label instead of searching for all unarchived messages from GitHub. It can then do it's labeling and remove the octogas-queue label. This would cut down on the number of messages that are processed for each run.

Mentioned this to @ross earlier today, and he said he had been doing something similar in his own copy of the script. Please to share? 😁

@ross
Copy link

ross commented Apr 6, 2015

Mentioned this to @ross earlier today, and he said he had been doing something similar in his own copy of the script. Please to share?

i just pr'd the customizations i made. they're directly in the labeler.gs file since i didn't know anything about coffeescript and i was in the middle of onboarding when i did it.

the nice this about this route is that it only takes a single gmail filter to move all notifications in to Github/Pending and then once it processes them it moves them out so it doesn't grow over time.

i actually did it this way b/c my android notifications were happening almost immediately after receiving the messages way before OctoGAS has a chance to run so my phone was constantly showing dozens of notifications.

@mislav
Copy link

mislav commented Apr 7, 2015

I've upgraded my "simpler OctoGAS" script to cache last read message index for all processed threads and, when new replies arrive, process only new messages in a thread rather than starting from the beginning of the thread.

    log("fetching messages for %d threads", todoThreads.length)
    forEach(GmailApp.getMessagesForThreads(todoThreads), function(messages, i){
      var message
        , thread = todoThreads[i]
        , i = cache.getStartingMessageIndex(thread)

      log("fetching body for %d messages starting from index %d", messages.length - i, i)

      for (; i < messages.length; i++) {
        message = messages[i]
        // ...

@josh
Copy link

josh commented Apr 7, 2015

Here's one of my labeling scripts.

function processQueue() { 
  var githubReasonLabels = {
    "assign": GmailApp.getUserLabelByName("GitHub/Assign"),
    "author": GmailApp.getUserLabelByName("GitHub/Author"),
    "comment": GmailApp.getUserLabelByName("GitHub/Comment"),
    "mention": GmailApp.getUserLabelByName("GitHub/Mention"),
    "team_mention": GmailApp.getUserLabelByName("GitHub/Team Mention"),
    "manual": GmailApp.getUserLabelByName("GitHub/Manual")
  };

  function processThread(thread, messages) {
    for (var i = 0; i < messages.length; i++) {
      if (!messages[i].isUnread()) continue;

      var rawContents = messages[i].getRawContent();
      var match = rawContents.match(/^X-GitHub-Reason: ((.|\r\n\s)+)\r\n/m);
      if (match) {
        var reasonLabel = githubReasonLabels[match[1]];
        if (reasonLabel) reasonLabel.addToThread(thread);
      }
    }
  }

  var label = GmailApp.getUserLabelByName("Queue");
  var threads = label.getThreads();
  var messages = GmailApp.getMessagesForThreads(threads); 

  for (var i = 0; i < threads.length; i++) {
    Logger.log("Process Thread[" + i + "]");
    processThread(threads[i], messages[i]);
    threads[i].removeLabel(label);
  }
}

@ross
Copy link

ross commented Apr 7, 2015

In my script, I cache the timestamp when the script last run

seems like anything relying on the cache is eventually going to run in to trouble when the cache is wiped/key is evicted.

to address that it would seem like it would have to process some number of things each pass and stop (recording the cache key) and then pick up at that point the next time.

another option might be to label the last processed item and use that non-ephemeral marker.

@mislav
Copy link

mislav commented Apr 7, 2015

seems like anything relying on the cache is eventually going to run in to trouble when the cache is wiped/key is evicted.

I set my cache TTL for 2 hours and renew it when I run the script multiple times within that period. In my experience I didn't see the caches get wiped arbitrarily. But I agree, that's a downside.

@josh Pretty cool trick with checking for isUnread 👍

@btoews
Copy link
Owner

btoews commented Apr 13, 2015

Sorry for not responding here sooner. Looking at the labler script again, we do have caching of which threads have already been processed and threads should only be processed once:

class Thread
  # Queue all threads to have the appropriate labels applied given our reason
  # for receiving them.
  #
  # Returns nothing.
  @labelAllForReason: ->
    @all[id].labelForReason() for id in @ids when !@all[id].alreadyDone()

  # Load a list of Thread ids that have already been labled. Because the ids
  # are based on the messages in the thread, new messages in a thread will
  # trigger relabeling.
  #
  # Returns nothing.
  @loadDoneFromCache: ->
    cached = CACHE.get @doneKey
    @done = JSON.parse(cached) if cached

  # Save the list of ids that we have already labeled.
  #
  # Returns nothing.
  @dumpDoneToCache: ->
    CACHE.put @doneKey, JSON.stringify(@done)

  # Has this thread already been labeled?
  #
  # Returns a bool.
  alreadyDone: ->
    Thread.done.indexOf(@id) >= 0

...

Label.loadPersisted()
Thread.loadFromSearch QUERY
Thread.loadDoneFromCache()
Message.loadReasonsFromCache()
try
  Thread.labelAllForReason()
  Thread.archiveAll() if SHOULD_ARCHIVE
catch error
  Logger.log error
finally
  try
    Label.applyAll()
  catch
    Logger.log error
  finally
    Thread.dumpDoneToCache()
    Message.dumpReasonsToCache()

Assuming that the error handling is correct, this should be able to process a large inbox over the course of many runs, even if it hits rate limit issues. I don't have a large inbox to test this in, so maybe it isn't working. They've updated the cache API a bit, so I made a few changes in dcd1086 and edd8ac8.

@ross
Copy link

ross commented Jan 27, 2016

I have a setup where I wrote a general rule that all incoming github notifications go in to a GitHub/Pending label and skip the inbox and the modified the script to only process things in that label and to remove the label when it was done. That has seemed to work even with 100's of messages when I've been gone for a while.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants