Step 1: Install docker.
Step 2: Install python.
Step 3: Install RabbitMQ
Step 4: Choosing Brach
master brach is the ligth weight version of biryani which has only specified annotators(annotators, tokenize, ssplit, pos, parse)
If you want to run biryani will all annotators refer to brach All-Annotators
If you want to run biryani will all annotators and also dynamically decide how many documents to be processed using kalman filter refer to brach kalman_filter_all_anno
Step 5: Sending documents to the RabbitMQ queue
Download the local copy of the biryani repo. Open the file
and make the necessary changes according to how
your RabbitMQ server is setup
Once the changes to the
are complete run the file using the following command
Step 6: Making changes to corenlp.json and files
Go to biryani/corenlp/
folder, you can find corenlp.json
file contains the RabbitMq server configuration information and queue name in which the documents are present
Make sure you make neccessary changes to the corenlp.json
file according to how you setup your RabbitMQ server and Queue name
file containes the logging configuration details. Make the necessary changes for the ip address and port you want to use for logging.
<Socket name="socket" host="logstash server host" port="5000">
Step 7:
Download the docker-elk repo from
Go to the directory docker-elk/logstash/config
you will find logstash.conf
add the following code below the tcp in logstash.conf
port => the port number you added in log4j.xml file
Note: Make sure that you add the port number in the docker-compose.yml
file of the root directory.
you can find ports section in the file, just add the port you added in logstash.conf
Step 8:
Go to the folder corenlp and run the following command.
docker build -t image-name .
Note: There is a period after image-name, which specifies that Docker file is in current directory.
Example: docker build -t phani\ccnlp:1.0 .
Step 9: To run the image created
docker run image-name java -cp ".:"lib/*" corenlp_worker #threads #documents(batch size) #Log_token #Database Name
Note: Be careful with the image name you give. If the image is not present, docker searches for the image in the dockerhub and if there an image it would download the image and run the for you.
Example: docker run phani\ccnlp:1.0 java -cp ".:"lib/*" corenlp_worker 16 200 logging test_database
Step 10:
Install Petrarch2
Install petrarch2 by using the following command.
pip install git+
Step 11:
Extracting phrases from corenlp parsed tree and storing them in mongodb
Once the container has parsed all the documents copy the database file to biryani/utilities/
In the directory you can find
. Run the following command
python corenlp_databasefile.db # documents to be processed per batch #threads
python test_database.db 5000 16
The extarcted phrases are stored in test_database_petrarch.db