This project is a simple tokenizer and parser that parse Java programming language
- Breaks text into tokens with the use of regular expressions
- Retrieve Java identifiers
- Retrieve Java comment tags
- Remove comments
- Retrieve class names
- Retrieves frequency of keywords
To get a local copy up and running follow these steps.
- Python 3.10.0
-
Firsly, clone the repo:
git clone https://github.com/leFos-95/Python-Parser-and-Tokenizer.git
-
Open cmd and type:
pip install nltk
-
At cmd type python and then:
import nltk nltk.download('wordnet') -
Run the application
Distributed under the MIT License. See LICENSE for more information.
Lefteris Soulis - lefteris95.soulis@gmail.com
Project Link: https://github.com/leFos-95/Python-Parser-and-Tokenizer