Various Kurdi related work done by Kurdish developers.

kurdi

There are some hopefully useful files/scripts/chunks etc. to share with Kurdi developers.

kurdi_words.txt: a list of Kurdish words (currently 1,668,692), unique and alphabetically ordered (thanks to @dolanskurd). Note that in the bar chart below, each of (و) and (ی) counted as both vowel and consonant.
unicode_list.txt: list of unicode values for Kurdish alphabet (Arabic script) standard accepted and published on http://unicode.ekrg.org/ku_unicodes.html
gettext translations, includes ku.po for Drupal. Most of the translations come from https://localize.drupal.org/translate/languages/ku (now almost dead
KRG health institutions data (lat/lng and names) throughout KRG (see health)

Now that we have some good unique nad cleaned up wordlist. We can do some statistics on them (in R for now):

w = readLines("https://raw.githubusercontent.com/layik/kurdi/master/corpus/kurdi_words.txt")

## Warning in readLines("https://raw.githubusercontent.com/layik/kurdi/
## master/corpus/kurdi_words.txt"): incomplete final line found on 'https://
## raw.githubusercontent.com/layik/kurdi/master/corpus/kurdi_words.txt'

length(unique(w)) == length(w)

## [1] TRUE

length(w)

## [1] 1668692

# sample of those including ئا
length(grep("ئا", w))

## [1] 49401

# read in list of Kurdi chars
ku_v = readLines("https://raw.githubusercontent.com/layik/kurdi/master/corpus/letters_lines.txt")
message("Kurdish alphabet: ",  length(ku_v), " letters.")

## Kurdish alphabet: 34 letters.

letters_used = sapply(ku_v, function(x){
  length(grep(x, w))
})

# change h to doucheshme
names(letters_used)[names(letters_used) == 'ه'] = "ھ"
letters_used = sort(letters_used, decreasing = TRUE)

library(ggplot2)
ggplot() + geom_bar(aes(x=names(letters_used),y=letters_used), stat='identity') + xlab('Alphabet') + ylab('Frequency') + theme(axis.text.x = element_text(face = "bold", size = 18)) + scale_y_continuous(labels = scales::comma) + 
  scale_x_discrete(limits=names(letters_used))

letters_used['ە']

##       ە 
## 1255122

Name		Name	Last commit message	Last commit date
Latest commit History 71 Commits
README_files/figure-gfm		README_files/figure-gfm
corpus		corpus
health		health
markdown		markdown
translations		translations
unicode		unicode
.gitignore		.gitignore
README.Rmd		README.Rmd
README.md		README.md
ckb-kurdi.R		ckb-kurdi.R
kurdi.Rproj		kurdi.Rproj
support-Kurdi.md		support-Kurdi.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Various Kurdi related work done by Kurdish developers.

kurdi

About

Releases 1

Packages

Contributors 3

Languages

layik/kurdi

Folders and files

Latest commit

History

Repository files navigation

Various Kurdi related work done by Kurdish developers.

kurdi

About

Resources

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 3

Languages

Packages