# Using the "greenCall" python package

At this point in time, the greenCall python package requires a series of function calls 
to make our way through the data pipeline. The pipeline consists of the following:

1. Read the csv file formatted as (unique id, query term)
2. Request information from the Search API 
3. Write results to disk in JSON format
4. Bulk upload results to elasticsearch

This notebook provides a concise example of how to work through the data pipeline.

## Settings Variables

In [1]:
# Importing secret keys 
from examples.secret import secret_key #, secret_cx

In [2]:
# Google Custom Search API id
#CX = secret_cx

# Secret Key required to access the API
SECRET_KEY = secret_key

# Maximum number of query items to request from API
QUERY_LIMIT = 5

# Maximum number or requests deferred
MAX_RUN = 20

# This many seconds will expire between requests sent
RATE_LIMIT = 1

# Path to original excel file, converted to CSV
filepath = 'examples/finance_demo.csv'

# Path to converted file to be used for API requests
outpath = 'examples/ipython_demo.json'

# results returned from the API via the networking engine
resultspath = 'results.json'

# Specify a document template for Elasticsearch
esformat = {
            "_index": "ipythonsearch",
            "_type": "website",
            "_id": None,
            "_source": ""
        }


##Start Logging

In [3]:
from greencall.utils.utilityBelt import enable_log

# Log everything, always.
enable_log('crawlah')


## Step 1 (Reading the CSV file)

In [4]:
from greencall.csvclean.inputCsv import tojson

# Convert the input file from CSV to JSON
tojson(filepath, outpath, QUERY_LIMIT)

## Step 2 ( Request information from the Search API)

In [5]:
from greencall.csvclean.clientConversion import runConversion
#from examples.secret import secret_key

# Use the API client to convert query terms into correct format
# for API requests. Currently hard coded for Google Search API
adict = runConversion(jsonpath=outpath,
                      secretKey= secret_key)

## Step 3 (Write results to disk in JSON format)

In [6]:
from twisted.internet import reactor
from greencall.crawlah import getPages

# Load the network engine which handles API requests (gas & brakes)
gp = getPages(adict, MAX_RUN, RATE_LIMIT)

# Start the networking engine

gp.start()
reactor.run()


## Step 4 (Bulk upload into elasticsearch)

In [7]:
from greencall.utils.google import GoogleParse
from greencall.utils.loadelastic import load_elastic, read_json
from greencall.csvclean.inputCsv import read_csv

# set elastic search document id to 1 (assumes new index)
gp = GoogleParse(es_id = 1)

# params 
resultsdict = read_json(resultspath)
accountdict = read_csv(filepath, QUERY_LIMIT)

# need to hook up these params
load_docs = gp.update_es_doc_id(resultsdict, accountdict, esformat)

# bulk load elasticsearch
load_elastic(load_docs)
