(Fast) Gmail's scraper to extract conversations and emails
Ruby
Switch branches/tags
Nothing to show
Latest commit 7e7a529 Feb 25, 2010 unknown unknown gem bug fixed
Permalink
Failed to load latest commit information.
lib gem bug fixed Feb 26, 2010
spec init Jan 6, 2010
README bug fixed Feb 25, 2010
gmail-scraper.gemspec gem bug fixed Feb 26, 2010
test.rb gem bug fixed Feb 26, 2010

README

# Install
 sudo gem install gmail-scraper

## Description
HTML Scraper to extract a summary of the conversations or/and also the emails inside.

## Features
 * Fast method to extract gmail's data compared to SMTP/IMAP = we don't download attached files etc.   
 * extract the list of summaries of each conversation
 * extract the emails of each conversation (email text = only the new text due to Gmail Diff method)

## Problems
 * too fast.. : If you extract all your emails (thousand of conversations) in few minutes, Gmail could detect an "abnormal behavior" and block your account for 24h if you.
  To solve that use sleep(n) method sometimes in your iterative block.

## Examples code

    gmail=Gmail.new("email","password")
    if gmail.connect
     conv_start=10 # start the extract at the conversation index =0
     conv_end= nil # no specific conversation index for the end of the interval

     #default method gmail.list() with conv_start=0
     gmail.list(conv_start, conv_end) {|thread_summary,i|

        # a summary of a conversation
        thread_summary.subject
        thread_summary.nb_emails
        thread_summary.uid
        thread_summary.tags

        # a complete conversation
        thread=gmail.fetch_conversation(thread_summary) # scrapt the whole conversation (=with the emails inside) from its summary
        thread.created_at
        thread.updated_at
        thread.subject
        thread.nb_emails
        thread.uid
        thread.tags

        #an email in the conversation
        email=thread.emails[0]
        email.uid
        email.sender
        emails.receivers
        email.created_at
        email.text    #= only the new text
        email.subject #= subject of the conversation

     }

    end


     # for all the emails related to a specific tag 
     gmail.search(tag) {|thread_summary,i|
        ...
     }

 #  TODO
* multi-thread scraping

 # Author
Nicolas Maisonneuve (n.maisonneuve@gmail.com)