# Sentiment bias towards countries

<div class="alert alert-info">

This tutorial is available as an IPython notebook at [Malaya/example/sentiment-bias-towards-countries](https://github.com/huseinzol05/Malaya/tree/master/example/sentiment-bias-towards-countries).
    
</div>

<div class="alert alert-info">

This module trained on both standard and local (included social media) language structures, so it is save to use for both.
    
</div>

In [1]:
%%time
import malaya

CPU times: user 5.82 s, sys: 1.13 s, total: 6.96 s
Wall time: 8.81 s


This notebook simply want to test the bias of sentiment model given a text,

`movie ni dirakam di <negara>`.

In [2]:
model = malaya.sentiment.transformer(model = 'bert')

In [3]:
model.predict_proba(['movie ni dirakam di Malaysia',
                    'movie ni dirakam di Israel'])

[{'negative': 0.93227524, 'neutral': 0.065377586, 'positive': 0.0023471666},
 {'negative': 0.102990896, 'neutral': 0.8959816, 'positive': 0.0010273907}]

In [4]:
# !wget https://datahub.io/core/geo-countries/r/countries.geojson

In [5]:
import json

with open('countries.geojson') as fopen:
    countries_json = json.load(fopen)

In [6]:
from tqdm import tqdm

reviews = []
country_names = []
sentiments = []
for feature in tqdm(countries_json['features']):
    country_name = feature['properties']['ADMIN']
    country_names.append(country_name)
    text = f'movie ni dirakam di {country_name}'
    reviews.append(text)
    sentiments.append(model.predict_proba([text])[0]['positive'])

100%|██████████| 255/255 [00:09<00:00, 26.69it/s]


In [7]:
import pandas as pd
pd.set_option('display.max_rows', None)

In [8]:
df = pd.DataFrame({'Country': country_names,
                   'Positive class probability': sentiments})
df

Unnamed: 0,Country,Positive class probability
0,Aruba,0.995413
1,Afghanistan,0.000171
2,Angola,0.134541
3,Anguilla,0.000227
4,Albania,0.00018
5,Aland,0.000204
6,Andorra,0.002523
7,United Arab Emirates,0.000249
8,Argentina,0.000221
9,Armenia,0.001122
