# Analysis of Text (aText) is a Natural Language Processing (NLP) package

The package includes most fundamental aspects of deep linguistics processing:

- parsing (tokenization, stemming, POS tagging)
- named-entity extraction
- co-reference resolution
- supervised classications (NBC, kNBC, SVM, Fisher Kernel)
- unsupervised classifications (LSA, pLSA, LDA)
- sentiment and social network analyses
- text summarization at the sentence and clauses levels
- subject-verb-object triple extraction
- web-scraping and social media analyses (Facebook, Twitter)

The original package has been written in platform-independent Java with a graphical user interface and made accessible via  API to the jar file. A Python wrapper has been added to expose most of the functions to data scientists and engineers. The code in this file provides some example usage of the package aText - basic sentiment analysis, sentence-level summarization, and Naive Bayesian Classifer (NBC).

Requirements: Java 1.8 installed, packages py4j, requests, and subprocess, 400MB of space in the Python package directory.

In [10]:
# pip install atext
import aText

import os.path
import sys
print(sys.executable)
python_dir, python_exe = os.path.split(sys.executable)
print(python_dir)

C:\Users\sjskd\Anaconda3\python.exe
C:\Users\sjskd\Anaconda3


In [14]:
# first usage of aText takes time to download some necessary files
# make sure to have write permission for the Lib/site-package directory
aText.start_atext()
aText.test()

aText started


'This is a test!'

In [3]:
from py4j.java_gateway import JavaGateway
gateway = JavaGateway()                 # connect to the JV
misc_app = gateway.entry_point          # get the an application instance

In [15]:
# addition routine to test that the JVM is spawned
random = gateway.jvm.java.util.Random()   # create a java.util.Random instance
number1 = random.nextInt(10)              # call the Random.nextInt method
number2 = random.nextInt(10)
print(number1, number2)
value = misc_app.addition(number1, number2) # call the addition method
print(value)

5 8
13


In [5]:
# sentiment method
good_review = "Wonderful for families - a find   We were lucky enough to learn about the Hotel Suisse from friends in our hometown and it was such a treat  We loved staying there  The location could not be better and the family room was huge  Our friends stayed at the Hassler during the smae time up the block and our room was 3 times the size  I loved the security the place had as well with only a few guests at a time and our kids loved Rome  The hotel is a little tricky as it is on the second floor of a mixed use building after several design shops  This hotel was the best value of all of the hotels we stayed in on our 3 week trip  I would not go back to Rome and not stay at the Suisse  "
topics = misc_app.sentiment_topics()
print(*topics)
prob = misc_app.sentiment(good_review) # call the sentiment method
print(*prob)
bad_review = "awful  had to move out   This was our third trip to Rome in the last year and we thought we would stay a bit nearer the centre of the city  The Forum is not a four star hotel by any stretch of the imagination  We were given a room on the fourth floor  which we thought would be good because of the view over the forum site  Unfortunately the air conditioning did not work  and this necessitated the windows being left open at night  The street noise and bar opposite made sleeping difficult  The room was cramped  the breakfast very sub-standard  The second night we were awoken at midnight by staff scraping tables across the restaurant floor until the early hours of the morning  despite several complaints to the hotel desk  We arranged to move hotel and the final straw was the presence of a large number of homeless people outside the hotel who used the area as a toilet and dosshouse  we saw several rats running around  Do yourselves a favour  look elsewhere"
prob = misc_app.sentiment(bad_review) # call the sentiment method
print(*prob)

Positive Negative
0.9 0.09999999999999998
0.2222222222222222 0.7777777777777778


In [7]:
# summarization method
input_text = "On two separate occasions, when making a deposit on two separate HSBC ATMs, \
the first malfunction took place at their Ocean Side branch, in Long Island, NY. \
This machine took my money, then shut down. I went inside the bank to complain and \
when I went back to this machine, the money door on this ATM was wide open with my \
money fully exposed and if someone had used this ATM behind me, my money would have \
stolen. The 2nd malfunction took place recently on 08/23/14. I made a sizable deposit at 10 PM. This ATM located in Lynbrook Long Island, NY took my deposit, shut down and showed a error message and didn't deposit my money and gave me a blank receipt. I contacted HSBC Security. They claim they made a report of this (I'm not sure if they did or not). Anyway, they also said it would take up 10 business days to expect any help in this matter. I'm still waiting for help. I'm done with this lousy bank. THIS BANK IS A DOG!"
s = misc_app.summary(input_text, 25, 3) # call the summary method
print(s)

TOPICS: I, my money, someone

SUMMARY: This machine took my money, then shut down.  I went inside the bank to complain and when I went back to this machine, the money door on this ATM was wide open with my money fully exposed and if someone had used this ATM behind me, my money would have stolen.  


In [8]:
# NBC classifier
excelFile = 'C:/MY FILES/Java/aText/python-package/package/aText/AlcoholPython.xlsx'
# call the nbc with the excel filename, sheet name, MI threshold
# MI threshold can vary between 0 to 1 - higher value means lower number of nodes, 0.0 keeps all
misc_app.NBC(excelFile, "Training", 0.0) 
classes = misc_app.nbc_classes()
print(*classes)
size = misc_app.nbc_size()
print(size)
input_text = "Italy is the largest producer wine"
prob = misc_app.nbc_infer(input_text)
print(*prob)
input_text = "Whiskey has higher alcohol content than wine"
prob = misc_app.nbc_infer(input_text)
print(*prob)

Beer Whiskey Wine
56
0.10373405244544728 3.7344258880361027e-06 0.8962622131286646
0.9652509652509654 0.017374517374517378 0.017374517374517378


In [12]:
# kill the spawned JVM process
aText.stop_atext()