Skip to content
This repository has been archived by the owner on May 19, 2021. It is now read-only.

syn: a package for generating synonyms and antonyms 📘 🔡 #9

Open
njtierney opened this issue Nov 8, 2018 · 6 comments
Open

syn: a package for generating synonyms and antonyms 📘 🔡 #9

njtierney opened this issue Nov 8, 2018 · 6 comments

Comments

@njtierney
Copy link
Collaborator

What is says on the tin!

I've had this idea for a little while, mainly to stop me from going to google to look for synonyms - I haven't made any progress, but a stub of a package is here: https://github.com/njtierney/syn

The goal of syn is to provide two main functions:

  • syn - generate synonyms
  • ant - generate antonyms

There are other packages that do this, but they usually do this in the context of other text-related work.

In terms of applications, I would use this all the time to output a set of (syn/ant)onyms for words in the terminal, but I imagine it could also be useful for type of text analysis where you might want to search for similar words? I have 0 experience with text analysis, so perhaps there are better tools for that already.

@ekothe
Copy link

ekothe commented Nov 9, 2018

@njtierney So you're thinking of something where you'd provide a word and the package would report the syn or ant based on some pre-specified dictionary?

So ant("good") would return [1] bad [2] wicked?

@njtierney
Copy link
Collaborator Author

Yup! Exactly that! I think that the trick is finding a good quality open source thesaurus that can be downloaded or provided with the package. This would mean that we avoid internet API calls so it would be fast, and not require an API key or internet.

But yes, I imagine it would be something like this:

syn("good")
[1] great fantastic excellent happy

@Lingtax
Copy link
Contributor

Lingtax commented Nov 9, 2018 via email

@RPanczak
Copy link

RPanczak commented Nov 9, 2018

Really cool idea. In a long run that could be a useful thing for editing longer prose inside markdown perhaps?

Are you aware of any publicly available data that could be used for that? Or API?

@markdly
Copy link
Contributor

markdly commented Nov 9, 2018

Nice idea. Would something like the Wiktionary Thesaurus be suitable as a data source?

A while back I had some mixed success downloading quotes for word lists from the Quotations Wiktionary. I imagine this might be similar to accessing the thesaurus information.

@markdly
Copy link
Contributor

markdly commented Nov 9, 2018

FWIW, here's the old code I used for downloading quotes in case it's useful

####
# Wiktionary quotes
####
# Description: Obtain phrases from wiktionary for given words.
# References:
# https://en.wiktionary.org/wiki/Wiktionary:Quotations
# https://en.wiktionary.org/wiki/Wiktionary:Entry_layout#Example_sentences

library(httr)
library(stringr)

wiki_quote <- function(some_word) {
  some_url <- GET(paste0("https://en.wiktionary.org/w/index.php?title=", some_word, "&action=raw"))
  some_text <- content(some_url, "text")  # e.g. text content of a wiktionary page 
  some_pattern <- paste0("#:[^\n]+?'''", some_word, "'''.+?\n")  # e.g. a wiktionary quote "#: There was a dark storm brewing.\n" 
  raw_match <- regexpr(some_pattern, some_text)
  if (nchar(some_text)  == 0) return(NA)
  if (all(raw_match[[1]] == -1)) return(NA)
  
  matched_substrings <- regmatches(some_text, raw_match)
  lapply(matched_substrings, tidy_quote)
}

tidy_quote <- function(quote) {
  temp <- str_replace(quote, "\\{\\{ux\\|en\\|",  "")
  temp <- str_replace(temp, "\\}\\}", "")
  temp <- str_replace(temp, "#:", "")
  trimws(temp) 
}

wiki_quote("storm")
#> [[1]]
#> [1] "''The proposed reforms have led to a political '''storm'''.''"
wiki_quote("sunshine")
#> [[1]]
#> [1] "We were warmed by the bright '''sunshine'''."
wiki_quote("hufflepuff")  # nonsense word - should retrun NA
#> [1] NA

Created on 2018-11-09 by the reprex package (v0.2.0).

@njtierney njtierney changed the title syn: a package for generating synonyms and antonyms syn: a package for generating synonyms and antonyms 📘 🔡 Nov 19, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants