OpCluster-PT is a customized version of OpCluster for the Portuguese language.
The Opcluster is an algorithm for extracting and hierarchical clustering of implicit and explicit fine-grained opinions (also called aspects from web constumer reviews. This method relies on the organization of similar implicit and explicit aspects (considering their context of use) inside a tree. For example, in the follow review: "she considers the price of camera very expensive”, here, the consumer employed the term “price” to evaluate an aspect (propriety) of camera. However, consumers may also use the terms “cost”, “value”, “investment”, "cost-benefit", etc. In addition, consumers may use implicit or explicit aspects to refer to the same aspect, e.g., “she got calls at the São Francisco river” and “working anywhere” were employed in smarphone product reviews to implicitly evaluate the aspect “signal”. It is also interest to notice that, in wide range of domains, proper names may also be employed to refer to the aspects. For instance, the proper names “Sony” and “Nikon” may be used to evaluate the “product brand” aspect of digital cameras. Hence, this task is hard!
- Get the download git file folder;
- Open the file "OpClusterPT.py" (It's necessary any IDE and the Python Version 2 or 3 installed);
- Check if all the input files are in the same folder as the "OpClusterPT.py" file;
- Unzip the folders: "OntoPT.tar.xz" and "corp_xml_reli.zip";
- Run the algorithm.
We also provide a set taxonomies of aspects and annotated reviews that were used in this master's degree work. However, if you need to apply this algorithm to other data, you need: (1) Download the CORP system - desktop version - (available here: https://www.inf.pucrs.br/linatural/wordpress/recursos-e-ferramentas/) and run it on the new dataset reviews. It will generate a set of XML files with the labeled reviews. These files will be used as input in the OpCluster-PT. Will soon be available the Opcluster-PT 2.0 web version. Finally, additional information can be obtained from my full Master's thesis available here: http://www.teses.usp.br/teses/disponiveis/55/55134/tde-31072018-170236/en.php.
The OpCluster-PT web version is available here: http://www.nilc.icmc.usp.br/opcluster/
Vargas, F.A. and Pardo, T.A.S. (2018). Aspect clustering methods for sentiment analysis. Proceedings of the 13th International Conference on the Computational Processing of Portuguese (PROPOR). pp. 365-374. Canela-RS/Brazil.
@inproceedings{DBLP:conf/propor/VargasAndPardo18, author = {Francielle A. Vargas and Thiago A. S. Pardo}, title = {Aspect Clustering Methods for Sentiment Analysis}, booktitle = {Proceedings of the 13th International Conference on the Computational Processing of Portuguese, {PROPOR} }, pages = {365–374}, year = {2018}, address = {Canela, Brazil}, url = {https://link.springer.com/chapter/10.1007/978-3-319-99722-3_37} }