Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
A Rails App for crawling trademarks on Google, Yahoo, and Bing as well as the associated sponsored links.
Ruby
branch: master

Fetching latest commit…

Cannot retrieve the latest commit at this time

Failed to load latest commit information.
app
bin
config
db
doc
lib/tasks
public
script
spec
vendor
.gems
.gitignore
.rvmrc
Gemfile
Rakefile
bing_styling.css
debug.log
global_crawler.rb
readme.md
runner.rb
trademark_errored.txt
trademarks.txt
trademarks_old.txt
yahoo_styling.css

readme.md

Trademark Search

Purpose

Professor David Hyman wants to do a search for a trademark on Google, Bing, and Yahoo and get the top organic search results and the top sponsored links (limit 5 - 10 for each). In the end he wants an Excel file will all of these results organized as well as a 'screenshot' of the search results page as well as each of the organic search and sponsored links.

Requirements

Be sure to install wkhtmltopdf if you need to export PDF's of the webpages. Follow tutorial on Github

Setup/Install ruby gems by using:

  • gem install bundler

  • bundle install

Instructions

Run the scraper:

  1. Enter the Rails Console by running: script/console
  2. Import Trademarks from the trademarks.txt file: Trademark.import
  3. Search Google, Yahoo, and Bing for those search terms: Trademark.scrape
  4. Compute where the organic links and sponsored links are on each page: Trademark.links
  5. (optional) Export all search result pages to PDF in the "TRADEMARKS/" folder: Trademark.pdfs
  6. (optional) Export all search result pages to a CSV: Trademark.excel

Code Overview

  • Read a text file (trademarks.txt) that has each trademark on a new line
  • For each trademark:
    • Run a search in Google
      • Get the top organic search results
      • Get the top x sponsored links
      • Put screen shot into a folder
      • Put urls to results into a CSV (for excel)
    • Run a search in Bing
      • Get the top organic search results
      • Get the top x sponsored links
      • Put screen shot into a folder
      • Put urls to results into a CSV (for excel)
    • Run a search in Yahoo
      • Get the top organic search results
      • Get the top x sponsored links
      • Put screen shot into a folder
      • Put urls to results into a CSV (for excel)

Usage

  • modify the trademarks.txt file with the search terms you want to run
  • type: ruby runner.rb
  • a trademark_results.csv file will be created along with folders representing the trademarks

Required Gems and Libraries

PDFKit Capybara Nokogiri wkhtmltopdf

Something went wrong with that request. Please try again.