In a nutshell: The slangID project tries to detect slang phrases. Something literally no one asked for...
slangID consists of two programs:
- slangID_demo.py lets you train a selection of classifiers, and prints out a test set of phrases with their predicted types (slang or normal).
- slangID_predict.py lets you also train a selection of classifiers and predict the type of your input.
All the models are pre-trained.
Due to a lack of data, the results, regardless of the classifier used, are not good enough right now. Certain bigram slang words like (a) real one are more difficult to resolve since the provided models do not take n-grams into consideration.
- Install Python 3.9 (3.8 and 3.10 is probably fine too, I used 3.9.12).
- Install the required packages by running
pip install -r requirements.txt
in your shell of choice. Make sure you are in the project directory. - And then run
python slangID_demo.py
orpython slangID_predict.py
. - Follow the displayed instructions.
Most of the phrases come from archive.org's Twitter Stream of June 6th, some come from me personally.
- scikit-learn
- pandas