Skip to content
This repository has been archived by the owner on Jul 7, 2021. It is now read-only.

Latest commit

 

History

History
104 lines (62 loc) · 4.71 KB

README.md

File metadata and controls

104 lines (62 loc) · 4.71 KB

As I no longer have time to maintain this project I am looking for collaborators to help to maintain. You can sign up by sending a pull request which fixes a bug or adds a feature.

ITU Turkish NLP Pipeline Caller

Build Status PyPI version Join the chat at https://gitter.im/freecodecamp/freecodecamp Codacy Badge

A Python3 wrapper tool to help using ITU Turkish NLP Pipeline API

For details of the pipeline, please check the pipeline page and the sources below.

Eryigit, Gülsen. "ITU Turkish NLP Web Service." EACL. 2014.

Gülşen Eryiğit, Joakim Nivre, and Kemal Oflazer. Dependency Parsing of Turkish. Computational Linguistics, 34 no.3, 2008.

Usage

To be able to use the pipeline, you need an authentication token (details on API web page).

If you experience any problem please contact with me via the gitter chat room.

Setup

This repository is tested with Python 3.4, 3.5 and 3.6 versions, but using the most up-to-date one is always better.

Recommended way

Using PyPI just run pip3 install ITU-Turkish-NLP-Pipeline-Caller

Alternative way

Download the latest release, extract the archive and inside that directory simply run python3 ./setup.py install to install.

As a Command Line Tool

The tool reads the token from pipeline.token file (under the same directory with the tool) by default.

Simply pipeline_caller <filename> reads the input file, prints the output under ./output/output<system_time>

You can select the pipeline tool by using -t option pipeline_caller <filename> --tool <tool_name> default is "pipelineNoisy"

You can force the encoding for I/O by using -e option pipeline_caller <filename> -e <encoding> default is your system locale

You can switch processing type using -p option. Input text can be processed whole at once, sentence by sentence or word by word. For some tools (isturkish for example) in the Pipeline, word by word processing is necessary at the moment. Default type is whole at once. Example: pipeline_caller <filename> --tool isturkish -p word sends input text to isturkish tool, word by word.

And you can change the output directory by using -o option pipeline_caller <filename> -o <another_directory> default is "output"

Also pipeline_caller --help shows the help menu.

Using As a Module

import pipeline_caller

caller = pipeline_caller.PipelineCaller()

result = caller.call(<tool_name>, <text>, <api_token>)

Defaults (Optional)

Check DEFAULTS block in the source code if you need (generally, you don't) to change one of these:

api_url = "http://tools.nlp.itu.edu.tr/SimpleApi"

pipeline_encoding = 'UTF-8'

token_path = "pipeline.token" for command line tool

default_output_dir = "output"

default_enconding = locale.getpreferredencoding(False) default encoding in your OS, for I/O operations in command line tool

default_sentence_split_delimiter_class = "[\.\?:;!]" for command line tool, to separate sentences and process sentence by sentence

Special Thanks

Special thanks to Asst. Prof. Dr. Peter Schüller for his great suggestions!

Author, Copyright & License

This work was a part of a KnowLP research project.

Copyright 2015-2018 Maintainers:

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License version 2 as published by the Free Software Foundation.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see http://www.gnu.org/licenses/.