Hi dear reviewer ! due to the size of model used (BERT is around 1Gb actually) we are using this notebook as a web application to present our model.
* **All you have to do is run the notebook entirely (click on "run everything")** [it takes about 15s to setup]
* **A link should appear at the bottom of the notebook (we are using [Dash](https://github.com/plotly/jupyter-dash) to make it possible 😉**)
* **click on it and ENJOY OUR WORK!**

# FIGHT GBV Web application

  ![](https://unwomen.org.au/wp-content/uploads/2020/11/Orange-the-world-banner.jpg)

In [None]:
!pip install ktrain

Collecting ktrain
[?25l  Downloading https://files.pythonhosted.org/packages/99/67/31cab9d7c0e23333aebc28b082659c1528f9ab7e22d00e7237efe4fc14f6/ktrain-0.26.2.tar.gz (25.3MB)
[K     |████████████████████████████████| 25.3MB 1.6MB/s 
[?25hCollecting scikit-learn==0.23.2
[?25l  Downloading https://files.pythonhosted.org/packages/f4/cb/64623369f348e9bfb29ff898a57ac7c91ed4921f228e9726546614d63ccb/scikit_learn-0.23.2-cp37-cp37m-manylinux1_x86_64.whl (6.8MB)
[K     |████████████████████████████████| 6.8MB 40.9MB/s 
Collecting langdetect
[?25l  Downloading https://files.pythonhosted.org/packages/56/a3/8407c1e62d5980188b4acc45ef3d94b933d14a2ebc9ef3505f22cf772570/langdetect-1.0.8.tar.gz (981kB)
[K     |████████████████████████████████| 983kB 39.3MB/s 
Collecting cchardet
[?25l  Downloading https://files.pythonhosted.org/packages/80/72/a4fba7559978de00cf44081c548c5d294bf00ac7dcda2db405d2baa8c67a/cchardet-2.1.7-cp37-cp37m-manylinux2010_x86_64.whl (263kB)
[K     |██████████████████████████

In [None]:
import ktrain

In [None]:
!pip install jupyter-dash

Collecting jupyter-dash
  Downloading https://files.pythonhosted.org/packages/46/21/d3893ad0b7a7061115938d6c38f5862522d45c4199fb7e8fde0765781e13/jupyter_dash-0.4.0-py3-none-any.whl
Collecting dash
[?25l  Downloading https://files.pythonhosted.org/packages/bc/b4/0bd5c94fdcb0eccb93c3c8068fe10f5607e542337d0b8f6e2d88078316a9/dash-1.19.0.tar.gz (75kB)
[K     |████████████████████████████████| 81kB 6.9MB/s 
Collecting ansi2html
  Downloading https://files.pythonhosted.org/packages/c6/85/3a46be84afbb16b392a138cd396117f438c7b2e91d8dc327621d1ae1b5dc/ansi2html-1.6.0-py3-none-any.whl
Collecting flask-compress
  Downloading https://files.pythonhosted.org/packages/c6/d5/69b13600230d24310b98a52da561113fc01a5c17acf77152761eef3e50f1/Flask_Compress-1.9.0-py3-none-any.whl
Collecting dash_renderer==1.9.0
[?25l  Downloading https://files.pythonhosted.org/packages/be/a6/dd1edfe7b1102274e93991736c35b2a5e1a63b524c8d9f41bbb30f17340b/dash_renderer-1.9.0.tar.gz (1.0MB)
[K     |██████████████████████████████

## Downloading the model (Way to heavy [999M] to be downloaded locally )

In [None]:
import re
def remove_users(x):
  return re.sub(r"@\w+",' @user',x)

def remove_all_nonalphanumeric(text):
  return re.sub(r"[^a-z'À-ÖØ!?-öø-ÿ,]+", ' ', text)

def remove_hashtag(x):
  return re.sub(r"#\w+","",x)

def remove_hyperlinks(text):
  text=re.sub(r'http\S+', '', text, flags=re.MULTILINE)
  return re.sub(r'www\S+', '', text, flags=re.MULTILINE)

In [None]:
def preprocess(s):
  s=s.lower()
  s=remove_hashtag(s)
  s=remove_users(s)
  s=remove_hyperlinks(s)
  s=remove_all_nonalphanumeric(s)
  return s

In [None]:
import requests
import os

def get_confirm_token(response):
    for key, value in response.cookies.items():
        if key.startswith('download_warning'):
            return value

    return None

def save_response_content(response, destination):
    CHUNK_SIZE = 32768

    with open(destination, "wb") as f:
        for chunk in response.iter_content(CHUNK_SIZE):
            if chunk: # filter out keep-alive new chunks
                f.write(chunk)

def download_file_from_google_drive(id, destination):
    URL = "https://docs.google.com/uc?export=download"

    session = requests.Session()

    response = session.get(URL, params = { 'id' : id }, stream = True)
    token = get_confirm_token(response)

    if token:
        params = { 'id' : id, 'confirm' : token }
        response = session.get(URL, params = params, stream = True)

    save_response_content(response, destination)

In [None]:
!mkdir sexism_classifier

In [None]:
download_file_from_google_drive('1-0hFZUA2ZcoR6Pn_MRVBkSnSQjs7dbvA','sexism_classifier/config.json')

In [None]:
download_file_from_google_drive('1-57sNgARicmhteb9W4co8AKm1oAl8JzA','sexism_classifier/tf_model.h5')

In [None]:
download_file_from_google_drive('1-Bm0mxz_KR3pImHW74qgAt1_S0pmrjAV','sexism_classifier/tf_model.preproc')

In [None]:
predictor=ktrain.load_predictor('sexism_classifier')

## Setting up application

In [None]:
# imports
from jupyter_dash import JupyterDash
import dash
from dash.dependencies import Input, Output, State
import dash_core_components as dcc
import dash_html_components as html
from dash import no_update
import base64
external_stylesheets = ['https://codepen.io/chriddyp/pen/bWLwgP.css']

app = JupyterDash(__name__, external_stylesheets=external_stylesheets)
app.layout = html.Div([
    html.H1(children="Gender-Based Violence Detection", style={'textAlign': 'center'}),
    dcc.Markdown('''
                ![](https://teentalk.ca/wp-content/uploads/2020/11/ened-gbv-702x330.jpg)
                ###### Step 1: Type a sentence (a short one preferably **et en FRANÇAIS**).
                ###### Step 2: Click submit
                ###### Step 3: Wait for your prediction to appear!

                example: "va à la cuisine" (go to kitchen) vs "femme vas à la cuisine" (women go to kitchen)
                
    ''',style={'textAlign': 'center'}),
    
    dcc.Input(id='username',type='text',style={'height':'50px','width':'250px'}),
    html.Button(id='submit-button', type='submit', children='Submit'),
    html.Div(id='output_div')
    ],style={'textAlign': 'center','justify':'center','align':'middle','verticalAlign':'middle'})

@app.callback(Output('output_div', 'children'),
                [Input('submit-button', 'n_clicks')],
                [State('username', 'value')],
                )
def update_output(clicks, input_value):
    if clicks is not None:
        if input_value is not None:
          if len(input_value.split(' '))<70:
            answ = predictor.predict([preprocess(input_value)])
            answ_proba = predictor.predict_proba([input_value])
            answ_proba ={
                'DOUBTFUL':answ_proba[0][0],
                'NON-SEXIST':answ_proba[0][1],
                'SEXIST':answ_proba[0][2]
            }
            return(html.Div([html.H2(children=str(answ[0]), style={'textAlign': 'center'}),
                            html.H4(children=str(answ_proba), style={'textAlign': 'center'})
                    ]))
          return("The sentence is too long!")
                
        return('Please enter a valid sentence')
           


# APPLICATION 👇🐱‍🏍

In [None]:
# After running all the notebook, a link should appear just below, just click on it
app.run_server(mode='external')

Dash app running on:


<IPython.core.display.Javascript object>

This app is a demonstration of how machine learning can help make the internet a better place for women who experience free violence on social media. Therefore, like the embryonic bots set up by these large companies to detect hate speech, we offer the possibility of spotting and reporting gender-based violence speech. Because they are our mothers, our sisters and our wives, we must protect them, together we will succeed.

Learn more about GBV in Africa with those links :

* [Online Gender Violence affects 45% of women on social media in west and central-africa](https://internetwithoutborders.org/iwd2019-online-gender-based-violence-affects-45-of-women-on-social-media-in-west-and-central-africa/)
* [GBV prevention in Africa](https://preventgbvafrica.org/)
* [The silent epidemic](https://blogs.worldbank.org/africacan/silent-epidemic)
* [Hate on social network in Cameroon](https://defyhatenow.org/wp-content/uploads/2020/09/1_dhn-Cameroon_FG_FR_FINAL_ONLINE.pdf)