## DOCUMENTATION

**AUTHOR**: Rebecca Wright

**DATE**: 6/20/21

<br></br>

**PURPOSE**

To create a dummy dataset of Google Jobs API search results for use creating dummy trackerlog dataset in TRKR App product development.
Program is called upon to run a API query call up to 100 times using a FOR loop.
Each iteration of the FOR loop offsets the query results pagination by 10 (as 10 is the default number of results returned by each query call).
As a system wait command was not used, the results of each query call are written out as an append to a csv file, to guard against loss of query results due to timing errors.

After the Google Jobs API query calls are finished, the CSV results are imported as a new sample dataframe.
A new FOR loop API query call is made to the Google Jobs Listing API using each job result id number.
The first apply_option link value from each call is stored in a temporary app_url list.
The app_url_list is appended to the sample dataframe as a new column.
The sample dataframe then contains job listings joined together with a valid application_url value.
The dataframe is saved as CSV file 'job_listing_result_log_with_links.csv' to be imported by generate_dummy_trackerlog.ipynb.  

**DEPENDANCIES**

A valid secret_api_key must be obtained through <strong>https://serpapi.com/</strong>. 

**DEVELOPMENT NOTES**

This program does not protect against timing errors within the API query FOR loop.
A call to system sleep would be more efficient, as would reconsidering the explicit writing of each search result to the CSV.
Current program iteration documented 34 successful API query calls, resulting in 340 job listing results, which were sufficient for initial development needs.

<br></br>
***
#### IMPORTS

In [None]:
from serpapi import GoogleSearch
import pandas as pd
from csv import writer

#import api_config_file    #NOT INCLUDED in github repo

<br></br>
***
#### VARIABLES

### Api Key

<div class="alert alert-danger">
  <strong>API KEY needed to move forward:</strong><br>
    The next block of code WILL NOT EXECUTE unless the code is updated to include a valid SerpAPI key.  An API key is free for a 15-day trial period and can be obtained by visiting <strong>https://serpapi.com/</strong>
</div>

In [None]:
api_key = api_config_file.api_key

<br></br>
***
#### MAIN

**Execute Google Jobs API query call**

In [None]:
for x in range(0,100):
    index = x
    start_var = x * 10
    params = {
        "engine": "google_jobs",
        "q": "UX designer new york",
        "hl": "en",
        "start": start_var,
        "api_key": api_key
    }

    search = GoogleSearch(params)
    results = search.get_dict()
    jobs_results = results['jobs_results']
    df = pd.DataFrame(jobs_results)
    df.drop(columns='thumbnail', inplace=True)

    if index == 0:
        df.to_csv('job_listing_result_log.csv', index=False)
    else:
        with open('job_listing_result_log.csv', 'a') as f:
            df.to_csv(f, header=False, index=False)
    
    print(x)  # used to monitor FOR loop progress

**Import Google Jobs API query results CSV**

In [None]:
sample_df = pd.read_csv('job_listing_result_log.csv')
sample_df

**Temporary app_url_list container**

In [None]:
app_url_list = []

**Temporary dataframe size variable**

In [None]:
sample_df_size = sample_df.shape[0]
sample_df_size

**Execute Google Jobs Listing API query call**

In [None]:
for x in range(sample_df_size):
    job_id = sample_df.loc[x]['job_id']
    params2 = {
        "engine": "google_jobs_listing",
        "q": job_id,    # q variable holds job_id number
        "hl": "en",
        "api_key": api_key
    }
    search2 = GoogleSearch(params2)
    results2 = search2.get_dict()
    app_url_list.append(results2['apply_options'][0]['link'])

**Append app_url_list to sample_df as new column**

In [None]:
sample_df['app_url'] = app_url_list

In [None]:
sample_df

**Save sample job listing dataframe dummy data to CSV file**

In [None]:
sample_df.to_csv('job_listing_result_log_with_links.csv', index=False)