# Pendahuluan

Penelitian sentimen analisis tentang produk skincare merek MS Glow menjadi subjek yang menarik dalam konteks perkembangan industri kecantikan modern. Dalam beberapa tahun terakhir, produk skincare dari berbagai merek telah menjadi pusat perhatian konsumen yang semakin peduli dengan perawatan kulit. MS Glow sebagai salah satu merek yang memperoleh popularitas yang signifikan, menarik perhatian para peneliti untuk menganalisis sentimen konsumen terhadap produk-produknya. Analisis sentimen merupakan alat penting dalam memahami respons emosional dan pandangan konsumen terhadap sebuah merek, membantu perusahaan untuk memperbaiki produk mereka sesuai dengan kebutuhan serta harapan pelanggan.

Penelitian ini memperlihatkan semakin meningkatnya minat terhadap produk skincare MS Glow yang menandakan perubahan perilaku konsumen terkait keinginan untuk merawat kulit. Kajian sentimen ini akan melibatkan pengumpulan data dari berbagai sumber, termasuk ulasan online, media sosial, dan platform e-commerce yang mengulas produk-produk dari MS Glow. Tujuan utama penelitian ini adalah untuk menganalisis pola-pola sentimen yang muncul dari berbagai komentar dan ulasan konsumen terkait kualitas, keefektifan, dan pengalaman penggunaan produk skincare MS Glow. Dengan demikian, penelitian ini diharapkan dapat memberikan wawasan yang mendalam mengenai persepsi konsumen terhadap merek ini, memberikan kontribusi bagi perkembangan strategi pemasaran yang lebih baik serta peningkatan kualitas produk skincare MS Glow di masa mendatang.


Data Source:

[Ms.Glow_Shopee](https://shopee.co.id/MS-GLOW-Paket-Wajah-i.17566419.541734403?sp_atk=8a14f9d8-c119-4029-aadc-10127ed44b74&xptdk=8a14f9d8-c119-4029-aadc-10127ed44b74)


**Project Goals:**

Adapun tujuan dari project ini dilakukan adalah untuk mengetahui bagaimana sentimen atau opini masyarakat tentang product skincare yang cukup terkenal di Indonesia yaitu Ms.Glow.

## Mengimport Library

In [1]:
# library yang dibutuhkan
import pandas as pd
import re
import string
from nltk.tokenize import word_tokenize
import demoji
import nltk
from nltk.corpus import stopwords
from Sastrawi.Stemmer.StemmerFactory import StemmerFactory
from sklearn.feature_extraction.text import CountVectorizer
nltk.download('words')
nltk.download('stopwords')
nltk.download('punkt')
pd.set_option('display.max_colwidth', 1)


[nltk_data] Downloading package words to /Users/apa/nltk_data...
[nltk_data]   Package words is already up-to-date!
[nltk_data] Downloading package stopwords to /Users/apa/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
[nltk_data] Downloading package punkt to /Users/apa/nltk_data...
[nltk_data]   Package punkt is already up-to-date!


## Memuat dan Pemahaman Data 

In [2]:
# memuat data
data = pd.read_csv('dataset/msglow_shopee.csv')
# pustaka data untuk label
lex_pos = pd.read_csv('dataset/text-library/lexicon_positive_ver1.csv')
lex_neg = pd.read_csv('dataset/text-library/lexicon_negative_ver1.csv')
# data key normalization
key_norm = pd.read_csv('dataset/text-library/key_norm.csv')

In [3]:
# menampilkan 5 data pertama
data.head()

Unnamed: 0,web-scrapper-order,web-scrapper-start-url,review,date
0,1703673315-1,https://shopee.co.id/MS-GLOW-Paket-Wajah-i.17566419.541734403?sp_atk=8a14f9d8-c119-4029-aadc-10127ed44b74&xptdk=8a14f9d8-c119-4029-aadc-10127ed44b74,Kemasan BB cream sama Day cream yg sblumnya sama aja gak ada bedanya aku kira salah tpi dalamnya beda \nUntuk yg lainnya oke\nPengiriman cepat cman 1 hari pakai sicepat 🥰🥰🥰😘😘\nThanks MSglow,2021-11-26 15:04
1,1703673315-2,https://shopee.co.id/MS-GLOW-Paket-Wajah-i.17566419.541734403?sp_atk=8a14f9d8-c119-4029-aadc-10127ed44b74&xptdk=8a14f9d8-c119-4029-aadc-10127ed44b74,Alhamdulillah baru banget sampe nih barang nya. Dalem nya aman buble nya banyak ga ada yang cacat sesuai baru order di toko ini sih tapi Alhamdulillah packing an nya oke. Pengiriman nya oke. Semoga cocok deh,2022-05-07 09:53
2,1703673315-3,https://shopee.co.id/MS-GLOW-Paket-Wajah-i.17566419.541734403?sp_atk=8a14f9d8-c119-4029-aadc-10127ed44b74&xptdk=8a14f9d8-c119-4029-aadc-10127ed44b74,"Kualitas produk sangat baik. Harga produk sangat baik. Kecepatan pengiriman sangat baik. Kecepatan pengiriman sangat baik. Respon penjual sangat baik. Semua baik, bahkan orang jahatpun aslinya baik, hanya saja mungkin lingkungan dan kondisilah yg merubah mereka menjadi tidak baik...",2023-07-11 17:35
3,1703673315-4,https://shopee.co.id/MS-GLOW-Paket-Wajah-i.17566419.541734403?sp_atk=8a14f9d8-c119-4029-aadc-10127ed44b74&xptdk=8a14f9d8-c119-4029-aadc-10127ed44b74,"Produk original , packing rapih double2 babble warp ,pengiriman cepat ,pengemasan cepat... Trim's MSGlow Official 😘",2021-10-29 06:09
4,1703673315-5,https://shopee.co.id/MS-GLOW-Paket-Wajah-i.17566419.541734403?sp_atk=8a14f9d8-c119-4029-aadc-10127ed44b74&xptdk=8a14f9d8-c119-4029-aadc-10127ed44b74,"Pengemasan cepat, packing aman. Produk ori, expired date masih lama. Tekstur kental, tp akan berasa watery begitu di aplikasikan ke wajah, cepat meresap. First time coba, semoga cocok & bisa terus pake.",2021-04-22 17:14


In [4]:
# informasi umum data
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3006 entries, 0 to 3005
Data columns (total 4 columns):
 #   Column                  Non-Null Count  Dtype 
---  ------                  --------------  ----- 
 0   web-scrapper-order      2440 non-null   object
 1   web-scrapper-start-url  2440 non-null   object
 2   review                  2281 non-null   object
 3   date                    3006 non-null   object
dtypes: object(4)
memory usage: 94.1+ KB


Kita mendapati kolom date yang masih mempunyai type data `object`, untuk itu kita akan mengubahnya ke tipe `datetime` dan mengekstrak tanggal tersebut pada tahap selanjutnya. Hal ini dilakukan untuk tahap Exploratory Data Analysis (EDA) dan kita akan menghapus kolom yang tidak digunakan.

In [5]:
# memeriksa duplikat
data.duplicated().sum()

0

In [6]:
# memeriksa nilai yang hilang
missing_report = data.isna().sum().to_frame()
missing_report.columns = ['count']
missing_report['percentage'] = (missing_report['count'] / len(data) * 100).round(2).astype('str')+'%'
missing_report

Unnamed: 0,count,percentage
web-scrapper-order,566,18.83%
web-scrapper-start-url,566,18.83%
review,725,24.12%
date,0,0.0%


Sepertinya dataset kita memiliki *missing value*, kita tidak dapat mengisi dengan nilai pengganti seperti metode `modus` ataupun dengan nilai lain seperti `unknown`. Karena project ini dilakukan dalam analysis sentimen. Untuk itu kita akan menghapus data yang terdapat *missing value*.

# Data Pre-Processing

## Menghapus *Missing Value* dan kolom yang tidak digunakan

In [7]:
# menghapus missing value
data.dropna(inplace=True)
# menghapus kolom yang tidak diperlukan
data = data.drop(['web-scrapper-order','web-scrapper-start-url'], axis=1)

### Memperkaya Kualitas Data

In [8]:
# mengubah tipe data kolom date ke datetime
data['date'] = data['date'].astype(str).str.strip()
data['date'] = pd.to_datetime(data['date'], format='%Y-%m-%d %H:%M')
# menambahkan kolom years dan month
data['years'] = data['date'].dt.year
data['month'] = data['date'].dt.month
# menghapus kolom date 
data = data.drop(['date'], axis=1)

In [9]:
data.info()

<class 'pandas.core.frame.DataFrame'>
Index: 2281 entries, 0 to 2439
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   review  2281 non-null   object
 1   years   2281 non-null   int32 
 2   month   2281 non-null   int32 
dtypes: int32(2), object(1)
memory usage: 53.5+ KB


## Data Cleaning

### Case Folding  

In [10]:
#Merubah semua huruf menjadi huruf kecil
#Menghilangkan semua tanda baca, angka, dan simbol
def casefolding(text):
    text = text.lower() # mengubah kalimat menjadi huruf kecil
    text = re.sub(r'\w+:\/{2}[\d\w-]+(\.[\d\w-]+)*(?:(?:\/[^\s/]*))*', '', text)
    text = re.sub(r'[?|$|.|@#%^&*=!_:")(-+,]', '', text)
    text = text.replace('\n', '') # menghilangkan simbol utk baris baru
    text = demoji.replace(text, '')
    text = text.strip(" ") # hapus spasi dari kiri dan kanan teks

    return text


In [11]:
# menerapkan fungsi
data['review'] = data['review'].apply(casefolding)
data.head()

Unnamed: 0,review,years,month
0,kemasan bb cream sama day cream yg sblumnya sama aja gak ada bedanya aku kira salah tpi dalamnya beda untuk yg lainnya okepengiriman cepat cman 1 hari pakai sicepat thanks msglow,2021,11
1,alhamdulillah baru banget sampe nih barang nya dalem nya aman buble nya banyak ga ada yang cacat sesuai baru order di toko ini sih tapi alhamdulillah packing an nya oke pengiriman nya oke semoga cocok deh,2022,5
2,kualitas produk sangat baik harga produk sangat baik kecepatan pengiriman sangat baik kecepatan pengiriman sangat baik respon penjual sangat baik semua baik bahkan orang jahatpun aslinya baik hanya saja mungkin lingkungan dan kondisilah yg merubah mereka menjadi tidak baik,2023,7
3,produk original packing rapih double2 babble warp pengiriman cepat pengemasan cepat trim's msglow official,2021,10
4,pengemasan cepat packing aman produk ori expired date masih lama tekstur kental tp akan berasa watery begitu di aplikasikan ke wajah cepat meresap first time coba semoga cocok bisa terus pake,2021,4


### Data Normalization

Data normalisasi merupakan tahapan untuk memproses data text yang terdapat singkatan atau semacamnya untuk diolah menjadi kata normal.

In [12]:
# membuat fungsi untuk normalisasi kata atau kalimat
def WordNormalization(text):
    text = ' '.join([key_norm[key_norm['singkat'] == word]['hasil'].values[0] if (key_norm['singkat'] == word).any() else word for word in text.split()])
    text = str.lower(text)
    return text


In [13]:
# menerapkan fungsi
data['review'] = data['review'].apply(WordNormalization)
data.head()

Unnamed: 0,review,years,month
0,kemasan bb cream sama hari cream yang sebelumnya sama saja tidak ada bedanya saya kira salah tapi dalamnya beda untuk yang lainnya pengiriman cepat cuma 1 hari pakai sicepat terimakasih msglow,2021,11
1,alhamdulillah baru banget sampai nih barang nya dalem nya aman buble nya banyak tidak ada yang cacat sesuai baru order di toko ini sih tapi alhamdulillah packing an nya oke pengiriman nya oke semoga cocok deh,2022,5
2,kualitas produk sangat baik harga produk sangat baik kecepatan pengiriman sangat baik kecepatan pengiriman sangat baik respon penjual sangat baik semua baik bahkan orang jahatpun aslinya baik hanya saja mungkin lingkungan dan kondisilah yang merubah mereka menjadi tidak baik,2023,7
3,produk original packing rapih double2 bubble warp pengiriman cepat pengemasan cepat terimakasih msglow official,2021,10
4,pengemasan cepat packing aman produk original expired date masih lama tekstur kental tetapi akan berasa watery begitu di aplikasikan ke wajah cepat meresap pertama waktu coba semoga cocok bisa terus pakai,2021,4


### Tokenizing

Pada tahap ini kita akan melakukan tokenisasi kalimat menjadi kata-kata dan menghapus kata yang terdapat bahasa inggris.

In [14]:
def remove_english_words(sentence):
    words = word_tokenize(sentence)  # tokenisasi kalimat menjadi kata-kata
    english_words = set(nltk.corpus.words.words()) # set kata-kata dalam bahasa Inggris dari NLTK
    words_to_remove = set(['double2', 'buble', 'bubble'])  # kata-kata yang ingin dihapus, diubah menjadi set
    non_english_words = [word for word in words if word not in english_words and word not in words_to_remove]  # memfilter kata-kata yang bukan dalam bahasa Inggris dan bukan termasuk kata-kata yang ingin dihapus
    return non_english_words

In [15]:
data.review = data.review.apply(remove_english_words)
data.head()

Unnamed: 0,review,years,month
0,"[kemasan, bb, sama, hari, sebelumnya, sama, saja, tidak, ada, bedanya, kira, salah, tapi, dalamnya, beda, untuk, lainnya, pengiriman, cepat, cuma, 1, hari, pakai, sicepat, terimakasih, msglow]",2021,11
1,"[alhamdulillah, banget, sampai, nih, barang, nya, dalem, nya, aman, nya, banyak, tidak, ada, cacat, sesuai, ini, sih, tapi, alhamdulillah, packing, nya, oke, pengiriman, nya, oke, semoga, cocok, deh]",2022,5
2,"[kualitas, produk, sangat, baik, harga, produk, sangat, baik, kecepatan, pengiriman, sangat, baik, kecepatan, pengiriman, sangat, baik, respon, penjual, sangat, baik, semua, baik, bahkan, jahatpun, aslinya, baik, hanya, saja, mungkin, lingkungan, kondisilah, merubah, mereka, menjadi, tidak, baik]",2023,7
3,"[produk, packing, rapih, pengiriman, cepat, pengemasan, cepat, terimakasih, msglow]",2021,10
4,"[pengemasan, cepat, packing, aman, produk, expired, masih, tekstur, kental, tetapi, akan, berasa, begitu, aplikasikan, ke, wajah, cepat, meresap, pertama, waktu, coba, semoga, cocok, bisa, terus, pakai]",2021,4


### Filtering Stop Words

In [16]:
#Menghapus kata-kata yang tidak penting atau tidak memiliki makna
def stopword_removal(text):
    filtering = stopwords.words('indonesian','english')
    filtering.extend(["🏻","🙂","🙏","👉","🇮🇩","🎉","➡️","🤣","🤭",
                        "🙈","⬆️","♀️","🤦🏻","😎","🙌🏻","😊","👇🏼","🧐",
                        "😁","🙏","😂","😅","🤔","👍","🚭","✌️","‼️",
                        "🖤","🤍","😍","🔃","❤️","💬","😠","😡","😑",
                        "😉","⚠️","🙊","😭","💥","✨","🙄","😔","✋",
                        "🤒","😖","💆","💉","👇","🙂","🤩","🇺🇸","�",
                        "🅸","🅽","🅵","🅾","🅿","🅴","🆃","🅶","✅","😳","🙌","gpp🙌🏻","1",
                        "🏿","🏻","gw","aja","enggak","ya","udah","nya","dgn","gak",
                      "ga","yg","gkaja","yah","mgkn","haha","klo","silahkn",
                      "ngg","hp","si","dah","iya","drb","g","nggak","bbrp",
                      "deh","sbg","dlm","krn","sj","msg","jgn","ttg","lu",
                      "ku","mrk","lg","gk","bs","tk","lbh","pd","blm",
                      "ngak","mb","donk","knp","td","dg","sy","kalo","ato",
                      "tp","jd","kyk","org","pake","udh","lha","tdk",
                      "ortu","jg","ndak","sih","pas","mu","la","dr","sdh",
                      "kl","bgt","rt","rw","utk","dpt","adlh","bw","yak",
                      "hmm","kbm","hehehe","wkwkwkwk","yuk","tdak","cpt",
                      "to","spp","smg","pln","ttp","loh","bro","spt",
                      "min","tuji","skrg","gin","indo","5g","4g","dll",
                      "org2","tip","mah","smp","sma","sd","pjj","psbb",
                      "pin","yeah","un","inna","lillahi","inna","ilaihi",
                      "raji","wa","duh","bkn","cuap","tri","anjiiiinnng",
                      "mi","klas","trs","pel","habibi","ptm","mbak","dng",
                      "seko","mai","klw","anak2","ge","moga","bb","okepengiriman",
                      "merestart","nak","un","raji","innalilahi","ni","nih",
                      "allh","amiiiin","rpp","lah","lo","kagak","kaga","gpp",
                      "Ferguso","tak","gini","dr","bhw","lho","kok","mak","mmg","2","bgtau",
                      "tak","tgkp","gmbr","ape","tnjuk","dri","mcm","tu","da","rse","Knpe","bg","bln","u/","jt",
                      "mulu","jul","aug","btw","ngga","kmrn","cm","yth","nahh","au ah","puyeng","selinting","ig","drp",
                      "je","UK","wa","ndroooo","dapet","nyesel","wooow","ampul","ribet","di","mui","tdk","jg","btw","klo",
                      "cus","kopit","krn","yuuuk","njih","lur","knp","sdh","yg","stlh","iaitu","tsb","guys","kes","rep",
                      "yaa","ga","kat","tu","sbg","kpd","bkn","utk","skg","deh","eh","sih","org","mak","aeh","neh","sm",
                      "doi","dgan","smua","dng","hp","so","skrg","lo","bln","wow","lbh","gpp","jlsnya","pd","in","orang","baru",
                      "berbeza","sukatan","ter","di","meng","nya","se","an","dos","kesihatan","waduh","udah","gak","bahawa",
                      "plih","gimana","nunggu","pahang","bawak","nak","korang","mana","sahaja","kemana","senarai","makin","lepas",
                      "mengawan","je","jepun","tak","dibazirkan","piye", "toh","gue","bikin","semaleman","typo","malah","aku","keren",
                      "agak","teteh","aa","silakan","daptar","aja","jaksel","harusss","sesapa","komuniti","nyesel","kagak",
                      "kemarenan","dah","megap-megap","sesek","monggo","sang","kipi","gue","anjmmm","nak","syedeeppp",
                      "syekali","jgn","vial","sih","mmg","lgsg","aja","tdk","az","org","yg","karna","jg","uda","gausa","wkwk",
                      "blm","dlm","tak","bce","hbis","mjlis","indon","diaorang","distop","thu","tpi","mngatakan","kerana","dn",
                      "fc","patennta","byk","dapet","gini","amat","başqanı","sırada","önə ","keçməmək","üçün", "vurdurmayıb","sən",
                      "yalanı", "at", "inanacaq","və", "hətta", "düşünməyəcək","sorğulamayacaq", "çox", "insan", "var","euy","ni",
                      "nih","lah","lo","kaga","kemaren-kemarenan","megapmegap","sesek","napas","dapetin","sih","dos","nyesel",
                      "skrg","kagak","dah","sesetengah","mentaliti","kaya","bahawa","guys","la","dose","ore", "kb","ko","pks",
                      "dtg","ampang","org","but","je","lg","sinopharm","utk","ngga","cepet","diteken","abis","pake","kalo",
                      "kerasa","klo","malem2","ngadep","bkn", "ni","si","pastu","dok","bgtaw","nk","annoying","mcm","dpt","xde",
                      "sapa","pn","pentaksuk"<"tuh","nehh","doi","ga","mikirin","cuan","sm","kena","sdh","manakala","sihat","perbezaan",
                      "az","dlm","sbb", "n","eh","berasa","je","lol","so","geng","mana","ambik","bgtau","kaya","len","resipi","mrna",
                      "sila","va","yaaa","temen","makasi","yuk", "ngasi","dtg","puskes","buldep","bansos","pkh","nunjukin","sertif", 
                      "belingsatan","digebukin","akn","ia","slps","tulah","tu","jugak","gituhh","perli","ckp","amik","lg","pekak",
                      "je","masup","emoll","gpp","a","b","c","d","e","f","g","h","i","j","k","l","m","n","o","p","q","r","s","t","u","v","w","x","y","z","oiii","dh","geng","megelitat","kome","sikit","korang","kecoh","faham","nyari","sbg","faskes","by","kpd"])
    x = []
    data = []
    def myFunc(x):
        if x in filtering:
            return False
        else:
            return True
    fit = filter(myFunc, text)
    for x in fit:
        data.append(x)
    return data

In [17]:
# menerapkan fungsi
data['review'] = data['review'].apply(stopword_removal)
data.head()

Unnamed: 0,review,years,month
0,"[kemasan, bedanya, salah, dalamnya, beda, pengiriman, cepat, pakai, sicepat, terimakasih, msglow]",2021,11
1,"[alhamdulillah, banget, barang, dalem, aman, cacat, sesuai, alhamdulillah, packing, oke, pengiriman, oke, semoga, cocok]",2022,5
2,"[kualitas, produk, harga, produk, kecepatan, pengiriman, kecepatan, pengiriman, respon, penjual, jahatpun, aslinya, lingkungan, kondisilah, merubah]",2023,7
3,"[produk, packing, rapih, pengiriman, cepat, pengemasan, cepat, terimakasih, msglow]",2021,10
4,"[pengemasan, cepat, packing, aman, produk, expired, tekstur, kental, aplikasikan, wajah, cepat, meresap, coba, semoga, cocok, pakai]",2021,4


### Stemming

In [18]:
# Inisialisasi stemmer dari Sastrawi
def stemming(texts):
    factory = StemmerFactory()
    stemmer = factory.create_stemmer()
    do = []
    for text in texts:
        dt = stemmer.stem(text)
        do.append(dt)
    d_clean=[]
    d_clean= " ".join(do)
    return d_clean

In [19]:
data.review = data['review'].apply(stemming)
data.head()

Unnamed: 0,review,years,month
0,kemas beda salah dalam beda kirim cepat pakai sicepat terimakasih msglow,2021,11
1,alhamdulillah banget barang dalem aman cacat sesuai alhamdulillah packing oke kirim oke moga cocok,2022,5
2,kualitas produk harga produk cepat kirim cepat kirim respon jual jahat asli lingkung kondisi rubah,2023,7
3,produk packing rapih kirim cepat emas cepat terimakasih msglow,2021,10
4,emas cepat packing aman produk expired tekstur kental aplikasi wajah cepat resap coba moga cocok pakai,2021,4


In [20]:
# menyimpan hasil data yang sudah bersih
data.to_csv('dataset/cleaned_data.csv', index=False)