========================================== SemanticTextDB - When NLP meets databases.
A database for document-storage/retrieval with automated curation and structure discovery, so that documents may be efficiently organized and queried not only based on human-labeled attributes/metadata, but also using a variety of optional automatically-inferred latent features including: semantics, topics, sentiment, eloquence, and entities of interest.
Inference of these properties is done using various statistical models and NLP algorithms stored and run inside the database.
========================================== What makes SemanticTextDB so cool?
We support augmented postgreSQL SELECT statments via the semanticSelect() API. This method provides you the power of cutting edge NLP algorithms, with no additional coding. Its as easy as:
semanticSelect(table_name, postgreSQL_SELECT_statment, NLP_feature, feature_param)
For example, we can find President Obama's approval rating given a twitter table as follows:
statement = "SELECT COUNT(*) FROM tweets WHERE content LIKE '%Barack Obama%'AND tweets.country = 'US'"
posCount = semanticSelect('twitter_text', statement, 'positive_only', 0.8)
negCount = semanticSelect('twitter_text', statement, 'negative_only', -0.8)
approval_rating = posCount / negCount #assumes negCount != 0.
========================================== Use Cases: The power of SemanticTextDB
SELECT documents by topic. (e.g. lawyers can search for laws pertaining to the topic "transportation safety.")
SELECT documents with a summary view. A short document summary allows viewing of the document query results in a concise form.
Discover population trends with sentiment analysis. (e.g. determining approval of candidates in upcoming elections)
Educational purposes - spelling correction and graded of student homework documents added to database.
Future NLP use cases. word_counts, word_frequencies, etc. can be selected for each document.
You will need Python 3.2 and pip installed.
See next two sections for server and client installation.
Server (where postgresql database is running) Installation
The postgresql server requires:
Python 3 installed
PL/Python installed. This is installed as follows in postgresql on the server:
CREATE OR REPLACE LANGUAGE plypython3u;
You can also just run this (from the client) in python using the psycopg2 library as follows:
cur = conn.cursor() #where conn is the psycopg2.connect() connection to the database
cur.execute("CREATE OR REPLACE LANGUAGE plypthon3u;")
Other library dependencies:
numpy (pip install numpy)
$ [sudo] pip install numpy
scipy (pip install scipy)
$ [sudo] pip install scipy
pyscopg2 which can be installed with pip as follows:
$ [sudo] pip install psycopg2
Client Installation - clients use the SemanticTextDB library built on psycopg2 python interface driver.
Clients using SemanticTextDB requires:
$ [sudo] pip install psycopg2
NLTK - download within python terminal. A GUI will pop-up. Click download.
textblob (and its dependencies)
$ [sudo] pip install -U textblob
sumy (and its dependencies)
$ [sudo] pip install sumy
Simply clone the repo and refer to SemanticTextDB_Tutorial.py for documentation.
With respect to viewing the tutorial, we STRONGLY recommend using iPython Notebook for viewing the SemanticTextDB_Tutorial.py. Use SemanticTextDB_Tutorial.ipynb when viewing in ipython notebook. The experience is highly enhanced.