Eventually, this will be a web application for annotating relationships between entities in text in a semi-automated way. For example, you can use it to extract a social network from a novel in which characters are nodes and edges indicate how often two characters speak to each other throughout the novel.
Currently, only the character extraction is supported. The interface allows the user to upload a text for processing, then displays the characters with alias groups (all the different ways a character is mentioned). The user can then alter the groups by splitting, merging, and deleting alias groups. Each alias group will make up a node in the final social network (the extraction of which has yet to be implemented).
Clone this directory onto a webserver that has PHP 5.5+ and Java JDK 8 or higher installed.
Copy config/settings-EXAMPLE.jsonc
to config/settings.jsonc
and edit it.
Note that you must choose where to store original texts (must be writable by the
apache user) and what database to use to store metadata. Currently supported
databases:
- sqlite
- PostgreSQL
No property should be defined more than once; for any repeated property, the
last value set is used. You may use //
to comment out lines.
Run the database migrations:
php bin/migrate-database.php up
See Database Migrations for details on migrations and troubleshooting.
Install https://github.com/dbamman/book-nlp with the supporting models in the
entities
directory:
git clone https://github.com/dbamman/book-nlp
## As per the BookNLP README, acquire and install the models:
cd book-nlp
curl https://nlp.stanford.edu/software/stanford-corenlp-full-2017-06-09.zip -O
unzip stanford-corenlp-full-2017-06-09.zip
mv stanford-corenlp-full-2017-06-09/stanford-corenlp-3.8.0-models.jar lib/
cd ..
Make a copy or symlink to the book-nlp/files/
directory in the
entities
directory. E.g.,
ln -s book-nlp/files .
Download remaining dependencies and compile the Java side of things by running:
bin/make.sh
Start the BookNLPServer by running the run-java-server.sh
script:
bin/run-java-server.sh
To start up a development server, go into the web
directory and use the built
in PHP server.
cd web/
php -S localhost:3535 routes.php
Then open a browser and go to: http://localhost:3535. You can change the port number to something other than 3535 if you wish.
In order to route everything through web/routes.php
, add the
config/apache.conf
configuration files to your sites
(e.g., in /etc/apache2/sites-available
), modify it to use your domain, adjust
the DOCUMENT_ROOT
path accordingly, then enable it:
## Note: this is the location on Ubuntu.
sudo cp apache.conf /etc/apache2/sites-available/entities.conf
## Edit the file.
sudo vim /etc/apache2/sites-available/entities.conf
## Enable the site.
sudo a2ensite entities.conf
We strongly recommend that you use SSL to service requests. An easy and
free way to do this is to use EEF's certbot.
Follow the instructions to install and run certbot. Choose to secure the
(sub)domain specified in entities.conf
. We also recommend that you select to
redirect all non-SSL traffic to SSL; certbot will take care of updating your
entities.conf
file.
Apache runs as a special user (e.g., www-data
on Debian systems), which means
that files and folders created via PHP will be owned by that user and a group
of the same name. The data directory where texts and annotatation
information are stored needs to be writable by both the Apache user as well
as the user that the Java server is running under. One way to make this work
is as follows (with Ubuntu commands):
- Make a new system user named
entities
(or whatever you'd like)
sudo useradd entities
## Create a password.
sudo password entities
- Add entities to the Apache user group (e.g.,
www-data
on Ubuntu)
sudo usermod -a -G www-data entities
- This user also needs to be able to write to some log files in the
book-nlp
directory, so makeentities
the owner of that
sudo chown -R entities book-nlp
- Run the Java server as the
entities
user:
su entities
./run-java-server
- If you are using sqlite3 for the database (not a good idea in production), make sure to make it group writable once it's created (e.g., after making an initial user account)
## Supposing your sqlite3 database is in data/database.sqlite3
sudo chmod g+w data/database.sqlite3
The migrate-database.php
script contains the code to create and modify
database tables. It allows adapting existing tables (e.g., due to new
development) without losing data. It also allows for down migration—tearing down
tables and removing columns. This is helpful if you want to start over,
usually while developing.
To perform a normal up migration, run:
php migrate-database.php up [<num-migrations>]
<num-migrations>
is an optional argument and should include the number of
migrations to perform. If omitted, all migrations are performed.
To undo all migrations, run:
php migrate-database.php down <num-migrations>
<num-migrations>
is required for down migration; use -1 for "all". To rollback
to the previous migration, do:
php migraiton-database.php down 1
You'll be asked to confirm before a down migration (which will result in the loss of data) is performed.
If you plan to develop EntiTies and you need to add new columns or tables,
create a new function at the bottom of migrate-database.php
, then add the
function name as a string to the end of the $migrations
array near the top of
the file.
If you are encountering an error while running migrate-database.php
, here are
a few things to know. The id of the current migration of the database is stored
in .entities-migration
. If you've manually removed your database or changed
databases, remove this file before running migrate-database.php
.
Annotations are stored in JSON organized as follows:
{
## These are the distinct entities prior to co-reference resolution (so
## Peter Pan, Peter, and Pan are each an entity). It is not required that
## an entity have any corresponding location.
entities: {
id: {
name: "",
group_id: ""
}
}
## Each group is a set of entities that refer to the same canonical entity.
## E.g., Peter Pan, Peter, and Pan might all belong to the same group.
groups: {
id: {
name: ""
}
}
## These are the locations where entities (not groups) are mentioned.
## The id is in the format: start_end.
locations: {
id: {
start: 0,
end: 0,
entity_id: ""
}
}
## These describe relationships or edges between two entities (not groups).
## Ideally, these are marked in the text (start and end) and involve two
## entity locations. However, an explicit location in the text is not
## required.
ties: {
id: {
start: 0, // optional
end: 0, // optional
source_entity: {
// ONE of the following two.
location_id: ""
entity_id: ""
},
target_entity: {
// ONE of the following two.
location_id: ""
entity_id: ""
},
label: "",
weight: 0.0, // optional; default 1.0
directed: false // optional; default: false
}
}
}