Awlify

A very basic tool that takes in a sentence of text and outputs the same text, annotated with information about whether any of its words are in the Academic Word List.

installing

pip install awlify

and if you haven't used spacy on your system before, you'll need to install the model we're using here with the command below:

python -m spacy download en_core_web_sm

tests

python -m unittest

usage inside a file

from awlify import awlify

result = awlify('please inform me of the academic words in this sentence')

print(result)
{"data": {"sentence": "please inform me of the academic words in this sentence", "awl_words": [{"index": 5, "word": "academic", "meta": {"head": "academy", "sublist": 5}}]}}

usage from the command line

python -m awlify 'this is a sentence to check'

{"data": {"sentence": "this is a sentence to check", "awl_words": []}}

expected input / output

format for output:

{
  "data": {
    "sentence": "THIS IS THE ORIGINAL SENTENCE",
    "awl_words": [
      {
        "index": INDEX_OF_AWL_WORD_FOUND,
        "word": "AWL_WORD_FOUND",
        "meta": {
          "head": "THE_HEADWORD_FROM_THE_AWL",
          "sublist": THE_AWL_SUBLIST_OF_THE_WORD
        }
      }
    ]
  }
}

example input for a simple sentence (no AWL words):

simple_sentence = awlify('this is a sentence')

example output for a simple sentence (no AWL words):

{
  "data": {
    "sentence": "this is a sentence",
    "awl_words": []
  }
}

example input for a complex sentence (a few AWL words):

complex_sentence = awlify('the economic recovery is ongoing and potentially problematic')

example output for a complex sentence (a few AWL words):

{
  "data": {
    "sentence": "the economic recovery is ongoing and potentially problematic",
    "awl_words": [
      {
        "index": 1,
        "word": "economic",
        "meta": {
          "head": "economy",
          "sublist": 1
        }
      },
      {
        "index": 2,
        "word": "recovery",
        "meta": {
          "head": "recover",
          "sublist": 6
        }
      },
      {
        "index": 6,
        "word": "potentially",
        "meta": {
          "head": "potential",
          "sublist": 2
        }
      }
    ]
  }
}

NOTES

The current implementation of the sentence tokenization uses spacy, and so it's a bit heavier than absolutely necessary, since we're not taking advantage of any of the more advanced characteristics of the package.

In theory, it could probably perform 98% as well with just a simple regex, so I might add the option to do that in the future if there aren't any real use cases for needing the full weight of spacy.

REFERENCES

Coxhead, Averil (2000) A New Academic Word List. TESOL Quarterly, 34(2): 213-238.

Name	Name	Last commit message	Last commit date
Latest commit lpmi-13 add ability to call from command line and update to json output Feb 16, 2019 e1a5826 · Feb 16, 2019 History 8 Commits
awlify	awlify	add ability to call from command line and update to json output	Feb 16, 2019
tests	tests	add ability to call from command line and update to json output	Feb 16, 2019
.gitignore	.gitignore	fix data issues	Feb 8, 2019
LICENSE	LICENSE	add python3.6 badge and support files	Feb 7, 2019
MANIFEST.in	MANIFEST.in	add python3.6 badge and support files	Feb 7, 2019
README.md	README.md	add ability to call from command line and update to json output	Feb 16, 2019
setup.py	setup.py	add ability to call from command line and update to json output	Feb 16, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Awlify

installing

tests

usage inside a file

usage from the command line

expected input / output

NOTES

REFERENCES

About

Releases

Packages

Languages

License

lpmi-13/awlify-python

Folders and files

Latest commit

History

Repository files navigation

Awlify

installing

tests

usage inside a file

usage from the command line

expected input / output

NOTES

REFERENCES

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages