Skip to content
The amazing 🐕will normalize non-standard Finnish
Python
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
murre normalize sentences Dec 16, 2019
.gitignore murre Sep 20, 2019
MANIFEST.in
README.md
setup.cfg
setup.py normalize sentences Dec 16, 2019
testi.py

README.md

🐶 Murre 🐕

DOI

The amazing Murre (genitive Murren 🐕) will normalize non-standard Finnish (puhekieli) to standard Finnish (kirjakieli). This repository is maintained by Mika Hämäläinen.

Installation

This library is designed for Python 3 and it may not work on Python 2.

pip3 install murre
python3 -m murre.download

Usage

To normalize Finnish, all you need to do is to run:

from murre import normalize_sentence

print(normalize_sentence("mä syön paljo karkkii".split(" ")))
>> minä syön paljon karkkia

To use the same chunk level BRNN model as described in the paper, you can pass wnut19_model=True, however this model might only work on Linux.

You can normalize multiple sentences at the same time by running

from murre import normalize_sentences

sents = ["kissa syö karkkii", "jok laulaa tuol puole", "en tiiä oikee et kuka se o", "kyl on hölömöö"]
sentences = [x.split(" ") for x in sents] #tokenize each sentence [["kissa", "syö", "karkkii"], ["jok", "laulaa"...]...]

print(normalize_sentences(sentences))
>> ['kissa syö karkkia', 'joka laulaa tuolla puolen', 'en tiedä oikein että kuka se on', 'kyllä on hölmöä']

Cite

Niko Partanen, Mika Hämäläinen, and Khalid Alnajjar. 2019. Dialect Text Normalization to Normative Standard Finnish. In the Proceedings of the 5th Workshop on Noisy User-generated Text (W-NUT).

You can’t perform that action at this time.