# WordNet Class
## 1. Description
A *WordNet* class is defined in *ChilectoUtility.processor*. This class is used to process wordnet data file in XML format.

__Note:__


## 2. Initialize
The following section is required if the required package(s) is not located in system search path or the current folder.

In [2]:
import os
import sys
package_dir = '../../' # the location of the ChilectoUtility package
sys.path.insert(0, os.path.abspath(package_dir))

In [3]:
# import libs
from ChilectoUtility.processor import WordNet

## 3. Process example
### Set file path

In [4]:
# root path of file
root_dir = '/home/Documents/01_Personal/05_Weiwei/01_Corpus/04_Wordnet/'

# file of frequency list
vocab_file_name = root_dir + 'zbn_sg.type.vocab'

# data file of wordnet
chinese_wordnet_file = root_dir + 'ChineseWordnet/cwn_16_201002.xml'
# output file for wordnet
chinese_wordnet_output = root_dir + 'ChineseWordnet_output.csv'

# data file of open wordnet
open_wordnet_file = root_dir + 'OpenWordnet/wn-cmn-lmf.xml'
# output file for open wordnet
open_wordnet_output = root_dir + 'OpenWordnet_output.csv'

### Process

In [5]:
#Initial the WordNet object
chinese_wordnet = WordNet(chinese_wordnet_file)
#Read entry from wordnet XML file
chinese_wordnet.read_entry()
#Read entry frequency
chinese_wordnet.get_entry_freq(vocab_file_name)
#Sort entry by frquency
chinese_wordnet.sort_entry_freq()
#Write entry data to CSV file
chinese_wordnet.write_to_file(chinese_wordnet_output)

2019-03-05 21:29:46,879 - cu.processor - INFO - Reading entry file: /home/Documents/01_Personal/05_Weiwei/01_Corpus/04_Wordnet/ChineseWordnet/cwn_16_201002.xml


FileNotFoundError: [Errno 2] No such file or directory: '/home/Documents/01_Personal/05_Weiwei/01_Corpus/04_Wordnet/ChineseWordnet/cwn_16_201002.xml'

In [15]:
chinese_wordnet.convert_to_dataframe(500)

2019-01-15 15:25:20,874 - cu.processor - INFO - Convert entry list to dataframe, minimum frq is 500. 
100%|██████████| 8349/8349 [00:01<00:00, 5058.72it/s]


## 4. Process both ChineseWordNet and OpenWordNet
In this section, both ChineseWordNet and OpenWordNet data will be processed based on frequency files from **ml/sg/tw** corpus.

### Set file path
The *land_str* is defined corresponding to freq lists from different corpus. It will be used latter in the loop to generated input and output names. 

In [16]:
# root path of file
root_dir = '/home/Documents/01_Personal/05_Weiwei/01_Corpus/04_Wordnet/'

# strings for three different freq files. 
land_str = ['ml', 'sg', 'tw']

# data file of wordnet
chinese_wordnet_file = root_dir + 'ChineseWordnet/cwn_16_201002.xml'

# data file of open wordnet
open_wordnet_file = root_dir + 'OpenWordnet/wn-cmn-lmf.xml'


### Process

In [18]:
#Initial the WordNet object
chinese_wordnet = WordNet(chinese_wordnet_file)
#Read entry from wordnet XML file (only need to read once)
chinese_wordnet.read_entry()

2019-01-15 15:26:14,192 - cu.processor - INFO - Reading entry file: /home/Documents/01_Personal/05_Weiwei/01_Corpus/04_Wordnet/ChineseWordnet/cwn_16_201002.xml
100%|██████████| 30899/30899 [00:02<00:00, 14656.23it/s]
2019-01-15 15:26:17,233 - cu.processor - INFO - 8349 entries imported.


In [19]:
#Initial the WordNet object
open_wordnet = WordNet(open_wordnet_file)
#Read entry from wordnet XML file
open_wordnet.read_entry()

2019-01-15 15:26:24,908 - cu.processor - INFO - Reading entry file: /home/Documents/01_Personal/05_Weiwei/01_Corpus/04_Wordnet/OpenWordnet/wn-cmn-lmf.xml
100%|██████████| 76967/76967 [00:33<00:00, 2287.38it/s] 
2019-01-15 15:27:00,834 - cu.processor - INFO - 71456 entries imported.


In [21]:
# A loop to run process for ml/sg/tw
for land in land_str:
    #The freq file name, used for both ChineseWordnet and OpenWordnet
    vocab_file_name = root_dir + 'wordnet_concept/giga_' + land + '.target.vocab'
    
    #For ChineseWordnet------------
    #Output file name
    output_file_name = root_dir + 'output/ChineseWordNet_' + land + '.csv'
    #Read entry frequency
    chinese_wordnet.get_entry_freq(vocab_file_name)
    #Sort entry by frquency
    # chinese_wordnet.sort_entry_freq()
    #Write entry data to CSV file
    chinese_wordnet.write_to_file(output_file_name)
    
    #For OpenWordnet------------
    #Output file name
    output_filename = root_dir + 'output/OpenWordNet_' + land + '.csv'
    #Read entry frequency
    open_wordnet.get_entry_freq(vocab_file_name)
    #Sort entry by frquency
    # open_wordnet.sort_entry_freq()
    #Write entry data to CSV file
    open_wordnet.write_to_file(output_file_name)    

2019-01-15 15:27:59,302 - cu.processor - INFO - Reading freq list from /home/Documents/01_Personal/05_Weiwei/01_Corpus/04_Wordnet/wordnet_concept/giga_ml.target.vocab
2019-01-15 15:27:59,590 - cu.processor - INFO - Write entry data to /home/Documents/01_Personal/05_Weiwei/01_Corpus/04_Wordnet/output/ChineseWordNet_ml.csv
2019-01-15 15:27:59,590 - cu.processor - INFO - Reading freq list from /home/Documents/01_Personal/05_Weiwei/01_Corpus/04_Wordnet/wordnet_concept/giga_ml.target.vocab
2019-01-15 15:28:00,157 - cu.processor - INFO - Write entry data to /home/Documents/01_Personal/05_Weiwei/01_Corpus/04_Wordnet/output/ChineseWordNet_ml.csv
2019-01-15 15:28:00,158 - cu.processor - INFO - Reading freq list from /home/Documents/01_Personal/05_Weiwei/01_Corpus/04_Wordnet/wordnet_concept/giga_sg.target.vocab
2019-01-15 15:28:00,250 - cu.processor - INFO - Write entry data to /home/Documents/01_Personal/05_Weiwei/01_Corpus/04_Wordnet/output/ChineseWordNet_sg.csv
2019-01-15 15:28:00,251 - cu.pr