## DOCUMENTATION

**AUTHOR**: Rebecca Wright

**DATE**: 6/20/21

<br></br>

**PURPOSE**

To create a dummy dataset of mock user "trackerlog" data for use in TRKR App product development.
Program uses existing CSV file of Google Jobs API query results to create trackerlog entries.
Program creates 50 dummy user ids, then randomly populates anywhere from 1 to 50 trackerlog entries for each user.
While data points related to job posting remain static between entries, the following data points relating to the log entry are randomly generated and vary between entries:
- is_active status
- end_result status
- date_created
- date_updated

**DEPENDANCIES**

A CSV file must already exist containing the Google Jobs API query results.
This program iteration used a file called 'job_listing_result_log_with_links.csv' containing 340 job listings.

**DEVELOPMENT NOTES**

This program does not protect against repeated usage of the same random job_id listing by the same user_id.
Additional validation must be added to the FOR loop to protect against this.

<br></br>
***
#### IMPORTS

In [None]:
import datetime
import random
import pandas as pd

<br></br>
***
#### FUNCTIONS

In [None]:
# returns datetime.date as 2021-05-30
def random_apply_date():
    start_date = datetime.date(2020, 6, 1)
    end_date = datetime.date(2021, 4, 1)

    time_between_dates = end_date - start_date
    days_between_dates = time_between_dates.days
    random_number_of_days = random.randrange(days_between_dates)
    random_date = start_date + datetime.timedelta(days=random_number_of_days) 
    return(random_date)

In [None]:
# returns datetime.date as 2021-05-30
# produces a lapsed date from apply_date between 0 and 59 (0 representes never heard back)
def random_lapse_date(start_date):
    random_number_of_days = random.randrange(1,60)
    random_date = start_date + datetime.timedelta(days=random_number_of_days) 
    return(random_date)

<br></br>
***
#### VARIABLES

**Import CSV file with archived API query results**

<div class="alert alert-danger">
  <strong>Next cell WILL NOT EXECUTE as is:</strong><br>
    The next code cell references a local file that has not been included within the github repo.  See the <strong>Dependencies</strong> section in the above documentation for more info.
</div>

In [None]:
jobs_df = pd.read_csv('job_listing_result_log_with_links.csv', index_col=False)

In [None]:
jobs_df.info()

**Dummy user id values**

In [None]:
user_id_vals = range(1,51)

**Dummy job id values from row count of imported CSV file**

In [None]:
sample_jobs_id_vals = range(jobs_df.shape[0])

**Dummy is_active values**

In [None]:
is_active_vals = [0,1]

**Dummy range from 1 to 50 used to randomly determine number of log entries to be generated for each user**

In [None]:
num_of_apps_vals = range(1,51)

<br></br>
***
#### MAIN

**Populate empty tracker_log dictionary using FOR loop**

In [None]:
tracker_log = {}
index = 0
for user in user_id_vals:
    for x in range(random.choice(num_of_apps_vals)):
        rand_j = random.choice(sample_jobs_id_vals)
        rand_i = random.choice(is_active_vals)
        if rand_i == 0:     #app is inactive and some resolution was reached
            date_create = random_apply_date()
            tracker_entry = {
                            'user_id': user,
                            'job_id': jobs_df.loc[rand_j]['job_id'],
                            'company': jobs_df.loc[rand_j]['company_name'],
                            'position': jobs_df.loc[rand_j]['title'],
                            'location': jobs_df.loc[rand_j]['location'],
                            'application_url': jobs_df.loc[rand_j]['app_url'],
                            'is_active': rand_i,
                            'end_result': random.choice([1,2,3]),   # 1 is a yes, 2 is a no, 3 is a maybe
                            'date_created': str(date_create),   #application creation date
                            'date_updated': str(random_lapse_date(date_create))
                            }
        else:  #app is still active or pending result (is_active status is 1)
            tracker_entry = {
                        'user_id': user,
                        'job_id': jobs_df.loc[rand_j]['job_id'],
                        'company': jobs_df.loc[rand_j]['company_name'],
                        'position': jobs_df.loc[rand_j]['title'],
                        'location': jobs_df.loc[rand_j]['location'],
                        'application_url': jobs_df.loc[rand_j]['app_url'],
                        'is_active': rand_i,
                        'end_result': 0,
                        'date_created': str(random_apply_date()),   #application creation date
                        'date_updated': ""
                        }
        tracker_log[index] = tracker_entry
        index += 1

**Convert tracker_log dictionary to dataframe**

In [None]:
final_df = pd.DataFrame.from_dict(tracker_log, orient='index')

**Preview dataframe**

In [None]:
final_df

**Save tracker_log dataframe dummy data to CSV file**

In [None]:
final_df.to_csv('dummy_data_tracker_log.csv', index=False)