# Extract aspects from sentence using denpendency parsing and POS tags
> Extract aspects in the sentences for aspect sentiment analysis uisng dependency parsing and pos tags.

- toc: true 
- badges: false
- comments: true
- categories: [implementation]
- image: images/DP.jpg

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


# Install required modules

In [None]:
%%capture
!pip install pyvi
!pip install https://github.com/trungtv/vi_spacy/raw/master/packages/vi_spacy_model-0.2.1/dist/vi_spacy_model-0.2.1.tar.gz
!pip install spacy==2.1.4

**Note:** In this post, we will use spacy version 2 since in version it changes the way to load model, and I don't know how to make it work for the `vi_spacy_model` :).

In [None]:
import pandas as pd
import numpy as np
import spacy
from string import punctuation

In [None]:
# Not limit the column width of dataframe
pd.set_option("display.max_colwidth", None)

# Load dataset

In [None]:
df = pd.read_csv("/content/drive/MyDrive/SLSOPS/dataset/vietnamese/spacy.csv")

In [None]:
df.head()

Unnamed: 0,db_id,raw,tokenized_spacy,spacy_no_sw,star_rating
0,1,"Shipper giao đúng hẹn, tận tình mang vào đến cửa căn hộ chung cư.",shipper giao đúng hẹn tận_tình mang vào đến cửa căn_hộ chung_cư,shipper giao hẹn tận_tình cửa căn_hộ chung_cư,5
1,1,"Nhân viên lắp đặt thân thiện, làm việc tốt, tuy nhiên hơi làm dơ tường và đồ đạc trong nhà.",nhân_viên lắp_đặt thân_thiện làm_việc tốt tuy_nhiên hơi làm dơ tường và đồ_đạc trong nhà,nhân_viên lắp_đặt thân_thiện làm_việc tốt hơi dơ tường đồ_đạc,5
2,1,"Sản phẩm đẹp, giá cạnh tranh.",sản_phẩm đẹp giá cạnh_tranh,sản_phẩm đẹp giá cạnh_tranh,5
3,1,Nhưng giá Tiki bán là ko tặng vật tư giống ở ngoài bán.,nhưng giá tiki bán là không tặng_vật_tư giống ở ngoài bán,giá tiki tặng_vật_tư,5
4,1,Lúc máy mình lắp xong thì giá vật tư là khoảng hơn 1 triệu.,lúc máy mình lắp xong thì giá vật_tư là khoảng hơn 1 triệu,máy lắp xong giá vật_tư 1 triệu,5


# Extract Aspects using NER and POS

In [None]:
from dataclasses import dataclass
@dataclass
class AsOp:
    aspect: Any = None
    opinion: Any = None
    adv: Any = None 
    verb: Any = None
    negative: bool = False
    aspect_compound: Any = None

@dataclass
class Aspect_Opinion:
    aspect: str = None
    opinion: str = None
    verb: str = None
    negative: bool = False



In [None]:
# Test the libary
import spacy
nlp = spacy.load('vi_spacy_model')
doc = nlp('Cộng đồng xử lý ngôn ngữ tự nhiên')
for token in doc:
    print(token.text, token.tag_, token.dep_)

Cộng_đồng N nsubj
xử_lý V ROOT
ngôn_ngữ N obj
tự_nhiên A compound


In [None]:
# Load the preprocess data
data = list(df['spacy_no_sw'])
data[0]

'shipper giao hẹn tận_tình cửa căn_hộ chung_cư'

In [None]:
from spacy import displacy # module for visualize the denpendency parsing
displacy.render(nlp(df['tokenized_spacy'][2]), style="dep", jupyter=True)

In [None]:
displacy.render(nlp(sentences[20]), style="dep", jupyter=True)

## Rules

We check 2 cases:
1. The token POS is `N` (Noun) and the Dependency tag is not `nsubj`.
1. The token POS is `N` (Noun) and the Dependency tag is `nsubj`.

**Note:** I come up with these rules by trying with many samples =)) No magic behind it.

In [None]:
aspects = []
doc = nlp(sentences[25])
print(doc)
for i, token in enumerate(doc):
    # Case 1: 
    if token.tag_ == "N" and token.dep_ != "nsubj":
        aspect = token.text
        for child in token.children:
            if child.dep_ == "compound":
                if child in token.lefts:
                    aspect = f"{child.text} {aspect}"
                elif child in token.rights:
                    aspect = f"{aspect} {child.text}"
            elif child.dep_ == "amod":
                opinion = child.text
                for gchild in child.children:
                    if gchild.dep_ == "advmod":
                        if gchild in child.lefts:
                            opinion = f"{gchild.text} {opinion}"
                        elif gchild in child.rights:
                            opinion = f"{opinion} {gchild.text}"
                asp_obj = AsOp(aspect=aspect, opinion=opinion)
                aspects.append(asp_obj)
            elif child.dep_ == "conj":
                asp_obj = None
                if child.tag_ == "N":
                    aspect = child.text
                    opinion = ""
                    for gchild in child.rights:
                        if gchild.dep_ == "xcomp" and gchild.tag_ == "A":
                            opinion = gchild.text
                    if opinion:
                        asp_obj = AsOp(aspect=aspect, opinion=opinion)
                elif child.tag_ == "A":
                    asp_obj = AsOp(aspect=aspect, opinion=child.text)
                if asp_obj: 
                    aspects.append(asp_obj)
            
    # Case 2:
    elif token.tag_ == "N" and token.dep_ == "nsubj":
        opinion = ""
        if token.head.tag_ == "A":
            opinion = token.head.text
            
        for child in token.head.children:
            if child.text == token.text:
                continue
            elif child.tag_ == "A" or child.tag_ == "R":
                if opinion:
                    if child in token.head.lefts: 
                        opinion = f"{child.text} {opinion}"
                    else:
                        opinion = f"{opinion} {child.text}"
                else:
                    opinion = child.text
                    for gchild in child.children:
                        if gchild.dep_ == "advmod":
                            if gchild in child.lefts:
                                opinion = f"{gchild.text} {opinion}"
                            elif gchild in child.rights:
                                opinion = f"{opinion} {gchild.text}"
                asp_obj = AsOp(aspect=token.text, opinion=opinion)
                aspects.append(asp_obj)
print(aspects)

đến chán giờ cứ sử_dụng là bị cúp nguồn điện thì ai dám bật
[]


In [None]:
from dataclasses import dataclass
from typing import Any
@dataclass
class AsOp:
    aspect: Any = None
    opinion: Any = None

In [None]:
from pyvi import ViTokenizer, ViPosTagger
sentence = df['tokenized_spacy'][6]
# print(sentence)
sentences = list(df['tokenized_spacy'])[:100]
aspects = []
cnt = 0
for sentence in sentences:
    try:
        doc = nlp(sentence)
        for i, token in enumerate(doc):
            if token.tag_ == "N" and token.dep_ != "nsubj":
                aspect = token.text
                for child in token.children:
                    if child.dep_ == "compound":
                        if child in token.lefts:
                            aspect = f"{child.text} {aspect}"
                        elif child in token.rights:
                            aspect = f"{aspect} {child.text}"
                    elif child.dep_ == "amod":
                        opinion = child.text
                        for gchild in child.children:
                            if gchild.dep_ == "advmod":
                                if gchild in child.lefts:
                                    opinion = f"{gchild.text} {opinion}"
                                elif gchild in child.rights:
                                    opinion = f"{opinion} {gchild.text}"
                        asp_obj = AsOp(aspect=aspect, opinion=opinion)
                        aspects.append(asp_obj)
                    elif child.dep_ == "conj":
                        asp_obj = None
                        if child.tag_ == "N":
                            aspect = child.text
                            opinion = ""
                            for gchild in child.rights:
                                if gchild.dep_ == "xcomp" and gchild.tag_ == "A":
                                    opinion = gchild.text
                            if opinion:
                                asp_obj = AsOp(aspect=aspect, opinion=opinion)
                        elif child.tag_ == "A":
                            asp_obj = AsOp(aspect=aspect, opinion=child.text)
                        if asp_obj: 
                            aspects.append(asp_obj)
                    
            elif token.tag_ == "N" and token.dep_ == "nsubj":
                opinion = ""
                if token.head.tag_ == "A":
                    opinion = token.head.text
                    
                for child in token.head.children:
                    if child.text == token.text:
                        continue
                    elif child.tag_ == "A" or child.tag_ == "R":
                        if opinion:
                            if child in token.head.lefts: 
                                opinion = f"{child.text} {opinion}"
                            else:
                                opinion = f"{opinion} {child.text}"
                        else:
                            opinion = child.text
                            for gchild in child.children:
                                if gchild.dep_ == "advmod":
                                    if gchild in child.lefts:
                                        opinion = f"{gchild.text} {opinion}"
                                    elif gchild in child.rights:
                                        opinion = f"{opinion} {gchild.text}"
                        asp_obj = AsOp(aspect=token.text, opinion=opinion)
                        aspects.append(asp_obj)
    except:
        print(sentence)
        cnt += 1

In [None]:
aspects

[AsOp(aspect='shipper', opinion='vào'),
 AsOp(aspect='nhân_viên', opinion='tốt'),
 AsOp(aspect='sản_phẩm', opinion='đẹp'),
 AsOp(aspect='giá', opinion='không tặng_vật_tư'),
 AsOp(aspect='giá', opinion='không tặng_vật_tư giống'),
 AsOp(aspect='giá', opinion='khoảng'),
 AsOp(aspect='giá', opinion='hơn khoảng'),
 AsOp(aspect='giá', opinion='hơn'),
 AsOp(aspect='tiền', opinion='hơn'),
 AsOp(aspect='tiền', opinion='không hơn'),
 AsOp(aspect='hàng', opinion='đúng'),
 AsOp(aspect='tiki giao_nhận', opinion='tốt'),
 AsOp(aspect='máy', opinion='tốt'),
 AsOp(aspect='máy', opinion='rất êm'),
 AsOp(aspect='hàng', opinion='nhanh'),
 AsOp(aspect='hàng', opinion='nhanh ổn'),
 AsOp(aspect='sản_phẩm', opinion='tạm'),
 AsOp(aspect='hàng', opinion='nhanh'),
 AsOp(aspect='máy_lạnh', opinion='chưa'),
 AsOp(aspect='máy_lạnh', opinion='chưa được'),
 AsOp(aspect='máy', opinion='chỉ'),
 AsOp(aspect='tiền', opinion='ồn_ào'),
 AsOp(aspect='âm_thanh', opinion='ồn_ào'),
 AsOp(aspect='hàng', opinion='mới'),
 AsOp(as

# Discussion

The quality of this approach depends on the accuracy of the dependecy parsing and POS model. 

# Tags


    A - Adjective
    C - Coordinating conjunction
    E - Preposition
    I - Interjection
    L - Determiner
    M - Numeral
    N - Common noun
    Nc - Noun Classifier
    Ny - Noun abbreviation
    Np - Proper noun
    Nu - Unit noun
    P - Pronoun
    R - Adverb
    S - Subordinating conjunction
    T - Auxiliary, modal words
    V - Verb
    X - Unknown
    F - Filtered out (punctuation)



**ccomp**: functions like an object of the verb, or adjective.

**xcomp**: if the subject of the clausal complement is controlled (that is, must be the same as the higher subject or object, with no other possible interpretation)

**predicative**: a part of a sentence containing a verb that makes a statement about the subject of the verb, such as went home in John went home.