Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

29 test category mapping techniques #37

Draft
wants to merge 19 commits into
base: dev
Choose a base branch
from

Conversation

cmbrennan002
Copy link
Contributor


Description

This is a draft PR, created to hold work over between sprints. So far, the code:

  • Expands the keyword search, then uses keywords to extract all sentences with a particular subcategory
  • Create phrases for matching from these sentences (manually)
  • Calculates the cosine similarity between all phrases labelled as 'benefits' in the binary labelled dataset, and the phrases for each subcategory

Next steps:

  • Iterate through phrases to create cosine similarity for the full dataset
  • Use the max score, and average score, to predict final category
  • Increase range of categories
  • Improve efficiency of scores
  • Shift to using benefit classifier output (rather than currently the labelled input)

Fixes # (issue)

Instructions for Reviewer

In order to test the code in this PR you need to ...

Please pay special attention to ...

Checklist:

  • I have refactored my code out from notebooks/
  • I have checked the code runs
  • I have tested the code
  • I have run pre-commit and addressed any issues not automatically fixed
  • I have merged any new changes from dev
  • I have documented the code
    • Major functions have docstrings
    • Appropriate information has been added to READMEs
  • I have explained this PR above
  • I have requested a code review

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants