# DEMO ToxicityFilter

---

## ToxicityFilter

The ToxicityFilter submodule has **several components**:
1. **ToxicityFilter**: A two-tiered approach using *word lists* as a quick, rule-based approach as a first stage and a a more **more sophisticated approach** by utilizing a module called [**detoxify**](https://github.com/unitaryai/detoxify) as a second stage.
2. **SpanDetector**: Returns the span of toxic parts in a text. Can also **sanitize, tag and highlight** toxic span.

The first import of ToxicityFilter might take a some time.

In [2]:
from TextProcessingModule.toxicity_filter import ToxicityFilter

In [4]:
# When initializing ToxicityFilter() for the first time, a model is downloaded by detoxify
# which might take a little while as well.
toxicity_filter = ToxicityFilter()

In [5]:
# This will trigger the first filter stage.
print(toxicity_filter.apply("This is stupid!"))
print(toxicity_filter.apply("This is stupid!", verbose=True))

1
(1, {'This is stupid!': 'stupid'})


In [6]:
# This will trigger the second filter stage
print(toxicity_filter.apply("Women are only good for cleaning the house!"))
print(toxicity_filter.apply("Women are only good for cleaning the house!", verbose=True))

1
(1, {'Detoxify: Toxicity above threshold!': {'Toxicity': 0.64452136, 'Threshold': 0.5}})


In [7]:
# You can also adapt the threshold of the toxicity filter. Default = 0.5
# It is strongly encouraged to dial it in on your specific use case.
threshold = 0.1
print(toxicity_filter.apply("Sheep are only good for cleaning the house!", threshold))
print(toxicity_filter.apply("Sheep are only good for cleaning the house!", threshold, verbose=True))

1
(1, {'Detoxify: Toxicity above threshold!': {'Toxicity': 0.23198009, 'Threshold': 0.1}})


In [8]:
# This will *NOT* trigger any filter stage (which is a bit unfortunate imho)
print(toxicity_filter.apply("Men are only good for cleaning the house!"))
print(toxicity_filter.apply("Men are only good for cleaning the house!", verbose=True))

None
None


In [9]:
# However, if you want to know what amount of toxicity was detected
# despite beeing lower than the threshold you set, you can do
# this like so:
print(toxicity_filter.detoxify_filter.get_toxicity())  # Returns only the toxicity value

# Or in even more detail:
print(toxicity_filter.detoxify_filter.get_scores())  # Additionally returns the likelihood for different toxic categories

# Or in a beautified form as pandas data frame:
toxicity_filter.detoxify_filter.get_scores(as_dataframe=True)

{'Men are only good for cleaning the house!': {'toxicity': 0.018119397}}
{'Men are only good for cleaning the house!': {'toxicity': 0.018119397, 'severe_toxicity': 0.00012027769, 'obscene': 0.00041868672, 'threat': 0.00018928485, 'insult': 0.00071454945, 'identity_attack': 0.0004367168}}


Unnamed: 0,toxicity,severe_toxicity,obscene,threat,insult,identity_attack
Men are only good for cleaning the house!,0.01812,0.00012,0.00042,0.00019,0.00071,0.00044


## Separate usage of filter stages
You can also use the filter stages on their own

### Word List Filter

In [11]:
from TextProcessingModule.toxicity_filter import WordListFilter

In [12]:
demo_text = "This is stupid!"
word_list_filter = WordListFilter()  # You can also pass the path to your own word list.
print(word_list_filter.apply(demo_text))  # Returns 1 when the filter is triggered and None otherwise.
print(word_list_filter.apply(demo_text, verbose=True))

1
(1, {'This is stupid!': 'stupid'})


### Detoxify Filter
When initialized for the first time, detoxify will **download a model from huggingface** which might take a little time and in the future uses the downloaded file automatically. You can also download the model separately and pass its path as `model` argument.

You can load **different models** to use for this filter (see: https://github.com/unitaryai/detoxify).
Default: `model='original'`

In [13]:
from TextProcessingModule.toxicity_filter import DetoxifyFilter

In [15]:
detoxify_filter = DetoxifyFilter()

In [16]:
print(detoxify_filter.apply("This is stupid!"))
print(detoxify_filter.apply("This is stupid!", verbose=True))

# Returns 1 if toxicity is greater than `threshold`, else 0 (toxicity <= `threshold`)

1
(1, {'Detoxify: Toxicity above threshold!': {'Toxicity': 0.9581592, 'Threshold': 0.5}})


After applying the filter you can also view **toxicity score** and the scores of **different toxicity categories** like so:

In [17]:
# Show toxicity score
detoxify_filter.get_toxicity()

{'This is stupid!': {'toxicity': 0.9581592}}

In [18]:
# Show scores for different toxicity categories
detoxify_filter.get_scores()

{'This is stupid!': {'toxicity': 0.9581592,
  'severe_toxicity': 0.0062750294,
  'obscene': 0.3832781,
  'threat': 0.0018581518,
  'insult': 0.11622658,
  'identity_attack': 0.0015853597}}

In [19]:
# Show scores as data frame
detoxify_filter.get_scores(as_dataframe=True)

Unnamed: 0,toxicity,severe_toxicity,obscene,threat,insult,identity_attack
This is stupid!,0.95816,0.00628,0.38328,0.00186,0.11623,0.00159


### Comparison of both filter stages
For this test, we keep track of the results of both filters applied to several small texts using variables `results1` and `results2`.

To also keep track of low-level results with more detail, we initialize both filters with `keep_results=True` to keep track of the detected tokens and toxicity. One could also set this on an existing instance via `set_keep_result(True)`.

> [!NOTE]
> Note that for `text3` the **word list filter triggers** while the **detoxify filter does not**, which might change when using a different model for detoxify.

> [!NOTE]
> Also note that for `text6` the **detoxify filter triggers**, although this represents more irony than toxic language.

> [!NOTE]
> Also note that the **word list filter is much faster**

In [21]:
from TextProcessingModule.toxicity_filter import WordListFilter, DetoxifyFilter

word_list_filter = WordListFilter(keep_results=True)
detoxify_filter = DetoxifyFilter(keep_results=True)

In [24]:
# Examples:
text1 = "This is shit!"
text2 = "This is great!"
text3 = "I wish for dead people in the future."
text4 = "I wish for dead people in the future and a lot of blood!"
text5 = "What you say is total bull s***."
text6 = "Your mother is a big pile of feathers!"
text7 = "I like a big pile of feathers!"

texts = [text1, text2, text3, text4, text5, text6, text7]

# Running the texts through the first filter:
%time results1 = [word_list_filter.apply(text) for text in texts]

# Running the texts through the second filter:
%time results2 = [detoxify_filter.apply(text) for text in texts]

print("")
print("Results:")
print(f" * WordListFilter: {results1}")
print(f" * DetoxifyFilter: {results2}")
# filter2 returns 0 (non-toxic) or 1 (toxic).
# Further insight into the classification is demonstrated below.
# You can also specifiy 

CPU times: user 14 ms, sys: 4.8 ms, total: 18.8 ms
Wall time: 79.5 ms
CPU times: user 2.35 s, sys: 428 ms, total: 2.78 s
Wall time: 492 ms

Results:
 * WordListFilter: [1, None, 1, 1, None, None, None]
 * DetoxifyFilter: [1, None, None, 1, 1, 1, None]


**Since we setup the filter to keep the results, we can access them in post:**

In [25]:
word_list_filter.get_results()

{'This is shit!': {'toxic_token': 'shit'},
 'This is great!': {'toxic_token': None},
 'I wish for dead people in the future.': {'toxic_token': 'dead'},
 'I wish for dead people in the future and a lot of blood!': {'toxic_token': 'dead'},
 'What you say is total bull s***.': {'toxic_token': None},
 'Your mother is a big pile of feathers!': {'toxic_token': None},
 'I like a big pile of feathers!': {'toxic_token': None}}

In [26]:
detoxify_filter.get_toxicity(as_dataframe=True)

Unnamed: 0,toxicity
This is shit!,0.97914
This is great!,0.00073
I wish for dead people in the future.,0.0988
I wish for dead people in the future and a lot of blood!,0.72205
What you say is total bull s***.,0.92656
Your mother is a big pile of feathers!,0.91677
I like a big pile of feathers!,0.00215


In [27]:
detoxify_filter.get_scores(as_dataframe=True)

Unnamed: 0,toxicity,severe_toxicity,obscene,threat,insult,identity_attack
This is shit!,0.97914,0.0447,0.92385,0.00223,0.11734,0.00252
This is great!,0.00073,0.00012,0.0002,0.00011,0.00018,0.00014
I wish for dead people in the future.,0.0988,0.00128,0.00168,0.03233,0.00238,0.00569
I wish for dead people in the future and a lot of blood!,0.72205,0.01523,0.01691,0.45062,0.0216,0.04721
What you say is total bull s***.,0.92656,0.01389,0.78472,0.00122,0.25074,0.00429
Your mother is a big pile of feathers!,0.91677,0.0041,0.1361,0.00082,0.7478,0.00612
I like a big pile of feathers!,0.00215,0.0001,0.0002,0.00012,0.00022,0.00019
