## Normalizer - Use Case
---

### Use Case 1: Convert text to lowercase
---
This use case demonstrates how the lower_case function can be used to convert a string of text to lowercase for the Turkish language. The function handles all Turkish characters which are not handled properly by python's built-in lower() method. For example, the uppercase 'İ' is converted to 'i', 'I' is converted to 'ı' and so on. The result is a string in lowercase form.



In [1]:
from mintlemon import Normalizer

text = "Merhaba Dünya! İyi Günler İstanbul'a"

# convert the text to lowercase
lowercase_text = Normalizer.lower_case(text)
print(lowercase_text.split())

['merhaba', 'dünya!', 'iyi', 'günler', "istanbul'a"]


### Use Case 2: Removing Punctuations
----
This use case demonstrates how the remove_punctuations function can be used to remove all the punctuation characters from a given text. The function uses the python's built-in string.punctuation to define the set of punctuation characters and uses re.sub() method to replace them with an empty string. The result is a string stripped from punctuations.

In [2]:
from mintlemon import Normalizer

# a string containing punctuation marks
text = "#Merhaba, Dünya! Bu bir örnek metin."

# remove the punctuations from the text
text_without_punctuations = Normalizer.remove_punctuations(text)
print(text_without_punctuations.split())

['Merhaba', 'Dünya', 'Bu', 'bir', 'örnek', 'metin']


### Use Case 3: Removing Accent Marks
---
This use case demonstrates how the remove_accent_marks function can be used to remove accent marks from a given string. The function uses a dictionary to define the set of accent marks and their corresponding letters without accent marks, and uses the replace() method to replace them. The result is a string stripped from accent marks.

In [3]:
from mintlemon import Normalizer

# a string containing accent marks
text = "merhâbâ îyi günler"

# remove the accent marks from the text
text_without_accents = Normalizer.remove_accent_marks(text)
print(text_without_accents.split())

['merhaba', 'iyi', 'günler']


### Use Case 4: Convert numbers in a text to words
---
This use case demonstrates how the convert_text_numbers function can be used to convert numbers in a given text to words in Turkish language. The function uses regular expressions to find and extract numbers in the text, and then uses the number_to_word function to convert the numbers to words. If the number is too large, a warning is issued. If the decimal number is represented by a period, a warning is issued. (because in Turkish language decimal number is represented by comma.) The last text where numbers were converted to words is returned. As you can see in the output, it is converting the numbers to Turkish format which is using comma instead of period for decimal numbers.




In [4]:
from mintlemon import Normalizer

# convert numbers in the text to words
text_with_number_to_words = Normalizer.convert_text_numbers("2021 yılında, İstanbul toplam nüfusu 15840900")
print(text_with_number_to_words.split())

['iki', 'bin', 'yirmi', 'bir', 'yılında', 'virgül', 'İstanbul', 'toplam', 'nüfusu', 'on', 'beş', 'milyon', 'sekiz', 'yüz', 'kırk', 'bin', 'dokuz', 'yüz']


#### Use Case 4: Convert numbers in a text to words -> If the decimal number is represented by a period, a warning is issued. (because in Turkish language decimal number is represented by comma.)
---

In [5]:
from mintlemon import Normalizer

text = "2021 İstanbul toplam nüfusu: 15.840.900"

# catch warning
text_with_number_to_words = Normalizer.convert_text_numbers(text)
print(text_with_number_to_words.split())

['iki', 'bin', 'yirmi', 'bir', 'İstanbul', 'toplam', 'nüfusu:', 'on', 'beş']




In [6]:
#### Use Case 4: Convert numbers in a text to words ->  If the number is too large, a warning is issued.

In [7]:
from mintlemon import Normalizer

text = "2021 İstanbul toplam nüfusu: 1500008400900"

# catch warning
text_with_number_to_words = Normalizer.convert_text_numbers(text)
print(text_with_number_to_words.split())

['iki', 'bin', 'yirmi', 'bir', 'İstanbul', 'toplam', 'nüfusu:', 'bir', 'trilyon', 'beş', 'yüz', 'milyar', 'sekiz', 'milyon', 'dört', 'yüz', 'bin', 'dokuz', 'yüz']


In [8]:
# ah sorry, for very big large numbers :)))
# let's catching warning :D

In [9]:
from mintlemon import Normalizer

# catch warning
text_with_number_to_words = Normalizer.convert_text_numbers("2021 İstanbul toplam nüfusu: 1500008400900000000000")
print(text_with_number_to_words.split())

['iki', 'bin', 'yirmi', 'bir', 'İstanbul', 'toplam', 'nüfusu:']




In [10]:
from mintlemon import Normalizer
Normalizer().deasciify("O sirada bahcede cıcekleri kokluyorduk. Hersey bahcıvanın islik calmasiyla yasandi...")

'O sırada bahçede çiçekleri kokluyorduk. Herşey bahçıvanın ıslık çalmasıyla yaşandı...'

#### Use Case 5: Normalize Turkish chars

       > Normalize Turkish characters in the given text.

       > Parameters
       > ----------
       > text : str
       >     The text to be normalized.

       > Returns
       > -------
       > str
       >     The normalized text with Turkish characters replaced by their ASCII equivalents.
---

In [11]:
from mintlemon import Normalizer
Normalizer().normalize_turkish_chars("Türkiye'de 85 milyon insan vardır.")

"Turkiye'de 85 milyon insan vardir."

In [12]:
Normalizer().normalize_turkish_chars("Gazi Üniversitesine Hoşgeldiniz.")

'Gazi Universitesine Hosgeldiniz.'