Skip to content

Word frequency checker based on Wikipedia corpus written in Rust

License

Notifications You must be signed in to change notification settings

Intsights/pywordfreq

Repository files navigation

Logo

Word frequency checker based on Wikipedia corpus written in Rust

license Python OS Build PyPi

Table of Contents

About The Project

Rust library for checking against the Wikipedia word frequency corpus. The library is fast, memory efficient, and secure. The data structure used to do full lookups is the Hashmap. A Suffix Array data structure suffix is used to perform quick lookups of sub-patterns over the dictionary.

Built With

Installation

pip3 install pywordfreq

Usage

import pywordfreq


# On the first use of library, the engine is loaded with the dictionary.
# It is worth to mention that there is a significant ammount
# of memory overhead for the engine.

# This function checks the frequency of the word "the" in the corpus
pywordfreq.full_frequency(
    word="the",
)
# This function checks the frequency of the word "inter" as a pattern
# in other words of the dictionary.
pywordfreq.partial_frequency(
    pattern="inter",
)

License

Distributed under the MIT License. See LICENSE for more information.

Contact

Gal Ben David - gal@intsights.com

Project Link: https://github.com/intsights/pywordfreq