whAnalysis is a project for tagging clauseType and questType to analyze for frequencies.
The linguistic work which brought this project to life was presented at XPRAG on June 13th, 2019.
See more about our motivations behind this project here
To use our tagger, we recommend you install simply using git clone
git clone https://github.com/rangat/whAnalysis.git
To run our tagger, run the command below. It will create a new file in your current directory with the tagged json.
python tagger.py relative/dir/to/data.json
We've included a few really easy functions to convert corpora to our data format in the corpus_handlers/ directory. The best example is our bnc handler.
The data must be in a .json file. The file must be a list of JSON objects which must include a new object for each sentence and a key "sentence"
with the value of the sentence included. The JSON objects must be in a list for the tagger to work.
[
{
"sentence": "Why did I need to include this json as a part of my readme?"
},
{
"sentence": "It's helpful to have examples to follow!"
}
]
This project is designed and developed entirely in python with the use of
- Python - Python Version 3.5 or greater
- NLTK - The Natural Language Toolkit
- BeautifulSoup4 - Python scraping and parse tree processing library
- JSON - Python json library
- Multiprocessing - Python multiprocessing library
- Rangaraj Tirumala - Current Work
- Morgan Moyer - Linguistic Guidance
- Divya Appasamy - Current Work
- Knyckolas Sutherland - Initial work
GNU General Public License v3.0
See COPYING for the full text