Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: new meeting subject line mining functions #173

Merged
merged 32 commits into from
Aug 27, 2021

Conversation

martinctc
Copy link
Member

@martinctc martinctc commented Jul 13, 2021

Summary

This branch introduces several new features for text mining subject lines.

Changes

The changes made in this PR are:

  1. Added subject_scan(). (Feature request: top subject words by hour #172)
  2. Added subject_classify() (Feature request: meeting_classify() by subject line #93)
  3. Added the ability to specify ngram tokenization settings in tm_clean().
  4. Refurbished meeting_tm_report() in terms of look and feel, as well as using the new generate_report2() underlying implementation. Note that the current merge does not contain the proposed changes as listed below.

Next steps: proposed changes

  1. In meeting_tm_report(), have word clouds coloured by HR attribute, or time of day, or day of week
  2. Include word clustering - either by supervised subject classification or unsupervised text clustering - in meting_tm_report()
  3. Include the ability to weight keywords in subject_scan()
  4. Have meeting_tm_report() classify pages by 'Total', 'ROB', 'Admin', 'Other', etc., and replicate tabs of text visualization for each page

Proposed Flow for meeting_tm_report()

  • Step 1: Start with all words
  • Step 2: Remove stop words - > Total Words
  • Step 3: Classify Total words into mutually exclusive categories:
    • Ways of Working: Weekly, Monthly, Annual, Review, Catch-up, Management, Stand-up, 1:1, etc.
    • Admin: Benefits, invoice, Travel, Flight
    • Training and Coaching: Learning, Coursera
      Step 4: Visualize in the report with a page for total words and a page per subset

Example

The following visual is produced with this code:

mt_data %>% subject_scan(hrvar = "Organizer_Organization")

image

Checks

  • All R CMD checks pass
  • roxygen2::roxygenise() has been run prior to merging to ensure that .Rd and NAMESPACE files are up to date.
  • NEWS.md has been updated.

Notes

This fixes #93, #172.

@martinctc martinctc added the enhancement New feature or request label Jul 13, 2021
@martinctc martinctc self-assigned this Jul 13, 2021
@martinctc martinctc marked this pull request as ready for review August 27, 2021 15:04
@martinctc martinctc merged commit 844c8cb into main Aug 27, 2021
@martinctc martinctc deleted the feature/subject_scan branch August 27, 2021 15:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Feature request: meeting_classify() by subject line
1 participant