Skip to content
/ slangID Public

Slang Identification via Machine Learning using sklearn.

License

Notifications You must be signed in to change notification settings

m4cit/slangID

Repository files navigation

Introducing slangID

In a nutshell: The slangID project tries to detect slang phrases. Something literally no one asked for...

slangID consists of two programs:

  1. slangID_demo.py lets you train a selection of classifiers, and prints out a test set of phrases with their predicted types (slang or normal).
  2. slangID_predict.py lets you also train a selection of classifiers and predict the type of your input.

All the models are pre-trained.

Challenges

Due to a lack of data, the results, regardless of the classifier used, are not good enough right now. Certain bigram slang words like (a) real one are more difficult to resolve since the provided models do not take n-grams into consideration.

How to run slangID_demo and slangID_predict

  1. Install Python 3.9 (3.8 and 3.10 is probably fine too, I used 3.9.12).
  2. Install the required packages by running pip install -r requirements.txt in your shell of choice. Make sure you are in the project directory.
  3. And then run python slangID_demo.py or python slangID_predict.py.
  4. Follow the displayed instructions.

What you will be greeted with when you run slangID_demo

demo

What you will be greeted with when you run slangID_predict

predict

Source of the data

Most of the phrases come from archive.org's Twitter Stream of June 6th, some come from me personally.

Recognition of Open Source use

  • scikit-learn
  • pandas

Releases

No releases published

Packages

No packages published

Languages