# Basic Needs Basic Rights Kenya - Tech4MentalHealth (Zindi Hackathon)


![Alt text](https://assets.zindi.africa/media/00aa4a928f3c44f881834d47fe624d91.png)


####  ***Leaderboard Rank*** :  136/809

### context :

Around 1 in 4 people will experience a mental health problem this year. Low-income countries have an estimated treatment gap of 85% (as compared with high-income countries with a gap of 35% to 50%). While Kenya has a mental illness prevalence rate that is comparable to that of high-income countries, there are still less than 500 healthcare professionals serving the country.

In Kenya, there are growing concerns about mental health among young people, particularly university students that face a challenging and unique conflation of stressors that put them at risk of challenges like depression and substance abuse.

From the use of app-based solutions for screening to electronically delivered therapies, the use of technologies including machine learning and AI will potentially transform the delivery of mental health services in the coming years.

The objective of this challenge is to develop a machine learning model that classifies statements and questions expressed by university students in Kenya when speaking about the mental health challenges they struggle with. The four categories are depression, suicide, alchoholism, and drug abuse.

This solution will be used for a prototype of a mental health chatbot designed specifically for university students. This initiative is a first step in leveraging technology to make mental health services more accessible and more user-friendly for young people in Kenya and around the world.

![Atl text](https://zindpublic.blob.core.windows.net/public/uploads/image_attachment/image/393/24364b84-71d1-4f15-a8a2-4fae8e94fc39.png)

This challenge is sponsored by Basic Needs Basic Rights (BNBR) Kenya. BNBR supports people with or at increased risk of mental health problems to live and work successfully in their communities by facilitating access to mental health care and social support services.


### Objective :

Classify text from university students in Kenya towards a mental health chatbot


### Data Description :

The data consists of statements and questions expressed by students from multiple universities across Kenya who reported suffering from these different mental health challenges.

The wording of the statements is intended to respond to the prompting question, “What is on your mind?”

The labels for the training set are contained in Train.csv, corresponding to one of the four categories of mental health problems (depression, suicide, alchoholism, and drug abuse). Your task is to develop a machine learning model to predict the labels for the test set, following the format in sample_submission.csv.

In [1]:
!pip install simpletransformers

Collecting simpletransformers
[?25l  Downloading https://files.pythonhosted.org/packages/d2/5e/19374f874fe2aaf417ea92b41bdbe70978aac9d440973e93c07bc0b19b46/simpletransformers-0.40.0-py3-none-any.whl (190kB)
[K     |█▊                              | 10kB 18.2MB/s eta 0:00:01[K     |███▍                            | 20kB 1.7MB/s eta 0:00:01[K     |█████▏                          | 30kB 2.2MB/s eta 0:00:01[K     |██████▉                         | 40kB 2.5MB/s eta 0:00:01[K     |████████▋                       | 51kB 2.0MB/s eta 0:00:01[K     |██████████▎                     | 61kB 2.2MB/s eta 0:00:01[K     |████████████                    | 71kB 2.5MB/s eta 0:00:01[K     |█████████████▊                  | 81kB 2.7MB/s eta 0:00:01[K     |███████████████▌                | 92kB 2.9MB/s eta 0:00:01[K     |█████████████████▏              | 102kB 2.7MB/s eta 0:00:01[K     |██████████████████▉             | 112kB 2.7MB/s eta 0:00:01[K     |████████████████████▋          

In [2]:
# memory footprint support libraries/code
!ln -sf /opt/bin/nvidia-smi /usr/bin/nvidia-smi
!pip install gputil
!pip install psutil
!pip install humanize
import psutil
import humanize
import os
import GPUtil as GPU
GPUs = GPU.getGPUs()
# XXX: only one GPU on Colab and isn’t guaranteed
gpu = GPUs[0]
def printm():
 process = psutil.Process(os.getpid())
 print("Gen RAM Free: " + humanize.naturalsize( psutil.virtual_memory().available ), " | Proc size: " + humanize.naturalsize( process.memory_info().rss))
 print("GPU RAM Free: {0:.0f}MB | Used: {1:.0f}MB | Util {2:3.0f}% | Total {3:.0f}MB".format(gpu.memoryFree, gpu.memoryUsed, gpu.memoryUtil*100, gpu.memoryTotal))
printm() 

Collecting gputil
  Downloading https://files.pythonhosted.org/packages/ed/0e/5c61eedde9f6c87713e89d794f01e378cfd9565847d4576fa627d758c554/GPUtil-1.4.0.tar.gz
Building wheels for collected packages: gputil
  Building wheel for gputil (setup.py) ... [?25l[?25hdone
  Created wheel for gputil: filename=GPUtil-1.4.0-cp36-none-any.whl size=7413 sha256=d0f837e71c5c69661175279e3c6e76f067d2b4c4a6789ba10000ba13bbe89a87
  Stored in directory: /root/.cache/pip/wheels/3d/77/07/80562de4bb0786e5ea186911a2c831fdd0018bda69beab71fd
Successfully built gputil
Installing collected packages: gputil
Successfully installed gputil-1.4.0
Gen RAM Free: 12.7 GB  | Proc size: 160.7 MB
GPU RAM Free: 11441MB | Used: 0MB | Util   0% | Total 11441MB


In [None]:
#!pip install

In [None]:
!kill -9 -1

In [3]:
import torch

# If there's a GPU available...
if torch.cuda.is_available():    

    # Tell PyTorch to use the GPU.    
    device = torch.device("cuda")

    print('There are %d GPU(s) available.' % torch.cuda.device_count())

    print('We will use the GPU:', torch.cuda.get_device_name(0))

# If not...
else:
    print('No GPU available, using the CPU instead.')
    device = torch.device("cpu")

There are 1 GPU(s) available.
We will use the GPU: Tesla K80


In [4]:
import numpy as np
import pandas as pd
from google.colab import files
from tqdm import tqdm
import warnings
warnings.simplefilter('ignore')
import gc
from scipy.special import softmax
from simpletransformers.classification import ClassificationModel
from sklearn.model_selection import train_test_split, StratifiedKFold, KFold
import sklearn
from sklearn.metrics import log_loss
from sklearn.metrics import *
from sklearn.model_selection import *
import re
import random
import torch
pd.options.display.max_colwidth = 1000
from sklearn.metrics import log_loss

In [15]:
from google.colab import files
uploaded = files.upload()

Saving Train.csv to Train (2).csv


In [16]:
#Loading The Dataset
import io
#The command written below is generally used to load .csv format file or .data format file.
train = pd.read_csv(io.BytesIO(uploaded['Train.csv']))
train.head()

Unnamed: 0,ID,Text,label
0,SUAVK39Z,I feel that it was better I dieAm happy,Depression
1,9JDAGUV3,Why do I get hallucinations?,Drugs
2,419WR1LQ,I am stresseed due to lack of financial support in school,Depression
3,6UY7DX6Q,Why is life important?,Suicide
4,FYC0FTFB,How could I be helped to go through the depression?,Depression


In [17]:
ord_lab = {'Depression':0,'Alcohol':1,'Suicide':2,'Drugs':3}

train['target']=train['label'].map(ord_lab)

In [18]:
train=train[['Text','target']]
train

Unnamed: 0,Text,target
0,I feel that it was better I dieAm happy,0
1,Why do I get hallucinations?,3
2,I am stresseed due to lack of financial support in school,0
3,Why is life important?,2
4,How could I be helped to go through the depression?,0
...,...,...
611,What should I do to stop alcoholism?,1
612,How to become my oldself again,2
613,How can someone stop it?,1
614,I feel unworthy,0


## **Roberta large 3 epochs**

---



In [20]:
train_df,train_val=train_test_split(train,test_size=0.3,random_state=9,stratify=train['target'])

model = ClassificationModel('roberta', 'roberta-large', num_labels=4, use_cuda=True, 
                            args={'fp16': False,
                                  'learning_rate': 3e-5,
                                  'do_lower_case': True,
                                   'max_seq_length':128,
                                  'regression':False, 
                                  'overwrite_output_dir': True, 
                                  'num_train_epochs': 3,
                                  'manual_seed': 9
                                  })
model.train_model(train_df)
scores1, model_outputs, wrong_predictions = model.eval_model(train_val)

raw_outputs_val = softmax(model_outputs,axis=1)
print(f"Log_Loss: {log_loss(train_val['target'], raw_outputs_val)}")

HBox(children=(FloatProgress(value=0.0, max=431.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, description='Epoch', max=3.0, style=ProgressStyle(description_width='i…

HBox(children=(FloatProgress(value=0.0, description='Running Epoch 0', max=54.0, style=ProgressStyle(descripti…

Running loss: 0.281972


HBox(children=(FloatProgress(value=0.0, description='Running Epoch 1', max=54.0, style=ProgressStyle(descripti…

Running loss: 0.034460


HBox(children=(FloatProgress(value=0.0, description='Running Epoch 2', max=54.0, style=ProgressStyle(descripti…

Running loss: 0.223860



HBox(children=(FloatProgress(value=0.0, max=185.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, description='Running Evaluation', max=24.0, style=ProgressStyle(descri…


Log_Loss: 0.3431942176627549


In [21]:
from google.colab import files
uploaded = files.upload()

Saving Test.csv to Test (1).csv


In [22]:
#Loading The Dataset
import io
#The command written below is generally used to load .csv format file or .data format file.
test = pd.read_csv(io.BytesIO(uploaded['Test.csv']))
test.head()

Unnamed: 0,ID,Text
0,02V56KMO,How to overcome bad feelings and emotions
1,03BMGTOK,I feel like giving up in life
2,03LZVFM6,I was so depressed feel like got no strength to continue
3,0EPULUM5,I feel so low especially since I had no one to talk to
4,0GM4C5GD,can i be successful when I am a drug addict?


In [23]:
id_cols=test['ID']
test=test['Text']
test=pd.DataFrame(test)
test

Unnamed: 0,Text
0,How to overcome bad feelings and emotions
1,I feel like giving up in life
2,I was so depressed feel like got no strength to continue
3,I feel so low especially since I had no one to talk to
4,can i be successful when I am a drug addict?
...,...
304,Yes
305,My girlfriend dumped me
306,How can I go back to being my old self?
307,Is it true bhang is medicinal?


In [24]:
predictions, raw_output = model.predict(test['Text'])
raw_output_test = softmax(raw_output,axis=1)

HBox(children=(FloatProgress(value=0.0, max=309.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=39.0), HTML(value='')))




In [25]:
final=raw_output_test
final=pd.DataFrame(final)
final.columns=['Depression','Alcohol','Suicide','Drugs']

In [26]:
final=pd.concat([id_cols,final],1)
final

Unnamed: 0,ID,Depression,Alcohol,Suicide,Drugs
0,02V56KMO,0.989248,0.000767,0.009097,0.000888
1,03BMGTOK,0.998249,0.000123,0.001421,0.000207
2,03LZVFM6,0.998570,0.000099,0.001116,0.000215
3,0EPULUM5,0.998697,0.000097,0.001030,0.000176
4,0GM4C5GD,0.002362,0.031442,0.007554,0.958643
...,...,...,...,...,...
304,Z9A6ACLK,0.995233,0.000414,0.003594,0.000760
305,ZDUOIGKN,0.994671,0.000308,0.004668,0.000353
306,ZHQ60CCH,0.710301,0.042809,0.235755,0.011134
307,ZVIJMA4O,0.001465,0.001619,0.003123,0.993794


In [27]:
final.to_csv('roberta-3-epoch.csv')

In [28]:
from google.colab import files
files.download('roberta-3-epoch.csv') 


<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>