# Tutorial on using Gerrit scraper

Gerrit scraper allows downloading data from Gerrit instances and store them in a file or database.

The model being stored is the base Gerrit REST API JSON model combined into a single dictionary.

## Creating store

The first step is to create an instance of a store. 

### JSON store

The simplest way to save Gerrit reviews is to dump them into a json file. 

In [1]:
from gerrit.store import JSONFileStore

In [2]:
json_store = JSONFileStore("example.json")

### MongoDB store

Reviews can be also stored in a MongoDB database.

In [3]:
from gerrit.store import MongoDBStore

In [4]:
mongo_store = MongoDBStore(db_name="testdb_gerrit", clear_before=False, skip_existing=True)

It is assumed that the database contains a collection 'reviews' that is used to store changes.

You can run the code below to create a database.

In [5]:
from pymongo import MongoClient

client = MongoClient('localhost', 27017)

client.drop_database('testdb_gerrit')
client.get_database(name='testdb_gerrit')

reviews = client.testdb_gerrit.get_collection(name="reviews")
reviews.insert_one({})
reviews.delete_many({})


<pymongo.results.DeleteResult at 0x1bf62411b88>

## Scraping Gerrit reviews

The scraper object needs to be firstly created by providing the store that should be used to save the reviews.

In [6]:
from gerrit.scraper import GerritScraper

In [7]:
# We create a scraper. It can use many stores. The worksers are used to download file contents.
gerrit_url = "https://android-review.googlesource.com"
scraper = GerritScraper(gerrit_url, stores=[mongo_store], workers=6, sleep_between_pages=5)

In [8]:
import logging, sys

# Create logger
logger = logging.getLogger()
logger.setLevel(logging.INFO)

handler = logging.StreamHandler(sys.stdout)

# Create formatter and add it to the handler
formatter = logging.Formatter('%(name)s - %(levelname)s - %(message)s')
handler.setFormatter(formatter)
logger.addHandler(handler)

scraper.scrap_and_store_changes( q="status:open OR status:merged OR status:abandoned&" \
                                     "o=ALL_FILES&o=ALL_REVISIONS&o=LABELS&o=DETAILED_LABELS&" \
                                     "o=DETAILED_ACCOUNTS&o=MESSAGES",
                                 n=5, pages=1, last_revision_only=True)

scraper.p.close()

gerrit.scraper.GerritScraper - INFO - Page 1, Changes = 5
gerrit.scraper.GerritScraper - INFO - #1: Processing change 766962
gerrit.scraper.GerritScraper - INFO - Processing change 766962, revision 3: files 3
gerrit.scraper.GerritScraper - INFO - Storing change 766962
gerrit.scraper.GerritScraper - INFO - #2: Processing change 766826
gerrit.scraper.GerritScraper - INFO - Processing change 766826, revision 1: files 0
gerrit.scraper.GerritScraper - INFO - Storing change 766826
gerrit.scraper.GerritScraper - INFO - #3: Processing change 763086
gerrit.scraper.GerritScraper - INFO - Processing change 763086, revision 2: files 3
gerrit.scraper.GerritScraper - INFO - Skipping change 763086
gerrit.scraper.GerritScraper - INFO - #4: Processing change 765107
gerrit.scraper.GerritScraper - INFO - Processing change 765107, revision 2: files 1
gerrit.scraper.GerritScraper - INFO - Storing change 765107
gerrit.scraper.GerritScraper - INFO - #5: Processing change 764922
gerrit.scraper.GerritScraper -