Feature: new meeting subject line mining functions #173

martinctc · 2021-07-13T12:32:16Z

Summary

This branch introduces several new features for text mining subject lines.

Changes

The changes made in this PR are:

Added subject_scan(). (Feature request: top subject words by hour #172)
Added subject_classify() (Feature request: meeting_classify() by subject line #93)
Added the ability to specify ngram tokenization settings in tm_clean().
Refurbished meeting_tm_report() in terms of look and feel, as well as using the new generate_report2() underlying implementation. Note that the current merge does not contain the proposed changes as listed below.

Next steps: proposed changes

In meeting_tm_report(), have word clouds coloured by HR attribute, or time of day, or day of week
Include word clustering - either by supervised subject classification or unsupervised text clustering - in meting_tm_report()
Include the ability to weight keywords in subject_scan()
Have meeting_tm_report() classify pages by 'Total', 'ROB', 'Admin', 'Other', etc., and replicate tabs of text visualization for each page

Proposed Flow for `meeting_tm_report()`

Step 1: Start with all words
Step 2: Remove stop words - > Total Words
Step 3: Classify Total words into mutually exclusive categories:
- Ways of Working: Weekly, Monthly, Annual, Review, Catch-up, Management, Stand-up, 1:1, etc.
- Admin: Benefits, invoice, Travel, Flight
- Training and Coaching: Learning, Coursera
  Step 4: Visualize in the report with a page for total words and a page per subset

Example

The following visual is produced with this code:

mt_data %>% subject_scan(hrvar = "Organizer_Organization")

Checks

All R CMD checks pass
roxygen2::roxygenise() has been run prior to merging to ensure that .Rd and NAMESPACE files are up to date.
NEWS.md has been updated.

Notes

This fixes #93, #172.

using `...` to pass `n = 4`, etc.

Also adding more clarity in argument documentation.

martinctc added 7 commits July 9, 2021 11:55

feat: add subject_scan (#172)

9a4cd2d

fix: syntax error

75b3222

fix: unused variable called

8d0a8b3

feat: compute long and wide table

c973064

feat: add return options

1fbc317

feat: ability to vary ngrams + weights

1a0c1d5

using `...` to pass `n = 4`, etc.

fix: suppress anti-join messages

7517274

martinctc added the enhancement New feature or request label Jul 13, 2021

martinctc self-assigned this Jul 13, 2021

martinctc added 21 commits July 13, 2021 14:13

feat: add subject_classify (#93)

177b1f2

fix: error in examples for subject_scan

cf1943c

fix: namespace errors

5a48532

feat: add alias tm_scan

cb962fd

docs: add examples

44a89d5

format: fix plot elements

49a2710

fix: pass R CMD checks

f19de4c

feat: add data return option

9eeec72

Also adding more clarity in argument documentation.

chore: increment version

b525503

docs: update NEWS.md

c439ebe

feat: improve messages for subject_classify()

f3038cd

docs: update subject_classify() on NEWS.md

4da3ca4

docs: documentation for subject_classify()

bb402bb

fix: sizing ratios for tm_wordcloud

3561f98

docs: update example subject_scan

b5256d1

chore: format code

223f2d7

feat: refurbish meeting_tm_report

9cfb8c6

docs: update NEWS.md

15cedcc

Merge branch 'main' into feature/subject_scan

e35e685

feat: make stopwords an explicit argument

b33d3cc

docs: spell out wpa

dc20e06

martinctc added 3 commits August 27, 2021 15:39

feat: add stopwords for meeting_tm_report

e740284

docs: move example to section

9602dec

docs: update NEWS.md

8c9b382

martinctc marked this pull request as ready for review August 27, 2021 15:04

docs: update cran-comments.md

8cefe24

martinctc merged commit 844c8cb into main Aug 27, 2021

martinctc deleted the feature/subject_scan branch August 27, 2021 15:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature: new meeting subject line mining functions #173

Feature: new meeting subject line mining functions #173

martinctc commented Jul 13, 2021 •

edited

Loading

Feature: new meeting subject line mining functions #173

Feature: new meeting subject line mining functions #173

Conversation

martinctc commented Jul 13, 2021 • edited Loading

Summary

Changes

Next steps: proposed changes

Proposed Flow for meeting_tm_report()

Example

Checks

Notes

martinctc commented Jul 13, 2021 •

edited

Loading

Proposed Flow for `meeting_tm_report()`