LGCT-schedule-scraper

Utilizes Google Docs API to recursively read a specified Employee's assigned Tour and Supervisor times into various data structures (lists and dictionaries), which are then parsed to write only relevant information about the Employee's work schedule to a text document.

Relevant information about the Employee's work schedule includes:

Employee is scheduled as Supervisor
Employee has a booked Tour(s)
Employee has an unbooked Tour(s)
Employee has a booked High Ropes
Employee has an unbooked High Ropes
Employee has a task assigned to them

Note: for ease of understanding, High Ropes and Tours will both be referred to as "Tours" for the rest of this document.

Motivation

Assigned Tours and tasks for all Employees are located in one Google Doc (Work Schedule). This means an Employee cannot glance at the Work Schedule and see their hours, as they need to sift through the Tours and tasks assigned to every employee at the company. This application aims to fix that by scraping the Work Schedule for only a specified Employee's Tours and tasks and writing only those times to a text file.

Current Work Schedule Example:

Note: Employee names removed for privacy.

Dependencies

Google Docs API v1
Pip (to install Google Docs API)

Project Setup

Setup Google Docs API v1:

Reference Google Docs API Python Quickstart Guide for in-depth instructions and troubleshooting installing the Google Docs API and generating the required credentials and tokens to use the Google Docs API.

Config File:

In config.py, set up the following fields:

DOCUMENT_ID = "<document-id>"
PATH = "<text-file-save-path>"

Note: this program is written to only correctly parse data gathered from Docs formatted in a very specific way. It is highly likely any attempt to use this program on a document not formatted like Work Schedule will fail. Rewriting large portions of extractText.py and writeFile.py would be required to use this program on another Doc.

Project Files

extractText.py recursively read text from the Schedule Doc into various data structures (Lists and Dictionaries containing dates, occurances of Employee, Employee work assignments, etc.)
writeFile.py parses text data in data structures created in extractText.py, writes Employee-relevant information to text file
application.py checks user's Doc API credentials, run main to generate file containing Employee's schedule

Challenges

The format that data is entered into the Work Schedule is not standardized. Several individuals are responsible for entering information into the Work Schedule, and each does it a slightly different way.

An Employee can be assigned as a Supervisor in several different ways:

Employee appears on same line as "Supervisor:"
Employee appears on same line as "Supervisor:" with the name of a different employee
Employee appears on line below "Supervisor:"

Additionally, the Supervisor may be unassigned, either by being blank or the work "unassigned".

Thus, several checks must be made when determining if Employee is scheduled as Supervisor.

def create_sup_list(text):
    """
    Only writes to SupData list if sup contains employee name
    :param text: single line of paragraph content
    :return boolean if text is Supervisor
    """
    sup = 'Supervisor:'
    # check line below occurrence of sup contains employee
    if SupData.sup_line:
        if EmployeeTourData.employee in text:
            SupData.s_list.append(text)
            SupData.s_list.append(TextData.text_index)
        SupData.sup_line = False
    # finds if sup is on the same line as employee name
    elif sup in text and EmployeeTourData.employee in text:
        # add only employee name to s_list
        SupData.s_list.append(text[12:])
        SupData.s_list.append(TextData.text_index)
        SupData.sup_line = False
    elif sup in text:
        # assume name of supervisor is on next line
        SupData.sup_line = True
    else:
        # text does not contain sup, return False
        return False
    # sup in text, return True
    return True

Employee Tours can be configured in several different ways:

Tour Time
Employee
Amount

Tour Time
Employee
Other Employee
Amount

Tour Time
Other Employee
Employee
Amount

Tour Time
Employee
Other Employee
Other Employee
Amount

Tour Time
Other Employee
Employee
Other Employee
Amount

Tour Time
Other Employee
Other Employee
Employee
Amount

Note: Amount can be an integer, a question mark, or empty space.

Demo

Running the program will prompt the user to enter the Employee name to build the schedule for. After the Employee name is entered, the program will create a text file containing the Employee's schedule.

Example Output:

Note: Employee name removed for privacy. A typical output file is substantially longer than the example shown above. The above example shows only the first few entries in the output file.

Future Work

Re-write implementing a data-parsing library
Write to a CSV file instead of text file (see here)
Improve program efficiency and accuracy of writing Employee Tours to file
Account for additional "Short Course" text under Tour time when writing Employee Tours to file
Collect and write occurrences of "No [Employee]" to file, under appropriate dates
Add GUI to configure Employee search settings
Write error message in file if leftover employee instances in EmployeeTourData

Credits

Google Docs API Extract the text from a document sample used to recursively visit each structural element in a Doc.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
example_images		example_images
src		src
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LGCT-schedule-scraper

Motivation

Dependencies

Project Setup

Project Files

Challenges

Demo

Future Work

Credits

About

Languages

ryankgit/LGCT-schedule-scraper

Folders and files

Latest commit

History

Repository files navigation

LGCT-schedule-scraper

Motivation

Dependencies

Project Setup

Project Files

Challenges

Demo

Future Work

Credits

About

Topics

Resources

Stars

Watchers

Forks

Languages