Skip to content
This repository

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
Browse code

Add the Swedish language.

There is a script in wordlists/generators/swedish.rb to regenerate the wordlist from a CC-licensed source. The build_lang_from_wordlists script ignores the generators directory.
  • Loading branch information...
commit 6b4684af908432721e072ee9af387132f94060a5 1 parent 8d35a1f
Henrik Nyh henrik authored
3  build_lang_from_wordlists.rb
@@ -6,7 +6,8 @@
6 6 wordlists_folder = File.join(File.dirname(__FILE__), "wordlists")
7 7
8 8 Dir.entries(wordlists_folder).grep(/\w/).each do |lang|
  9 + next if lang == 'generators'
9 10 puts "Doing #{lang}"
10 11 filter = WhatLanguage.filter_from_dictionary(File.join(wordlists_folder, lang))
11 12 File.open(File.join(languages_folder, lang + ".lang"), 'w') { |f| f.write filter.dump }
12   -end
  13 +end
BIN  lang/swedish.lang
Binary file not shown
4 test/test_whatlanguage.rb
@@ -18,6 +18,10 @@ def test_french
18 18 def test_spanish
19 19 assert_equal :spanish, @wl.language("La palabra mezquita se usa en español para referirse a todo tipo de edificios dedicados.")
20 20 end
  21 +
  22 + def test_swedish
  23 + assert_equal :swedish, @wl.language("Den spanska räven rev en annan räv alldeles lagom.")
  24 + end
21 25
22 26 def test_nothing
23 27 assert_nil @wl.language("")
30 wordlists/generators/swedish.rb
... ... @@ -0,0 +1,30 @@
  1 +#!/usr/bin/env ruby
  2 +
  3 +# Run this script to regenerate the Swedish wordlist.
  4 +
  5 +# Data is from http://www.dsso.se/download.html
  6 +# under a Creative Commons ShareAlike license (http://creativecommons.org/licenses/sa/1.0/).
  7 +
  8 +URL = "http://hem.bredband.net/dsso1/dsso-1.29.txt"
  9 +WORDLIST = File.join(File.dirname(__FILE__), '../swedish')
  10 +
  11 +require "open-uri"
  12 +require "iconv"
  13 +
  14 +puts "Fetching source data..."
  15 +data = open(URL)
  16 +
  17 +puts "Writing to word list..."
  18 +open(WORDLIST, 'w') do |file|
  19 + data.each do |line|
  20 + next unless line =~ /^\d+r\d+<.+?>([^:]+)/
  21 + line = $1
  22 +
  23 + line.gsub!(/\s*,\s*/, "\n") # Some word variations are written like "word, variation"
  24 + line = Iconv.iconv('UTF-8', 'ISO-8859-1', line) # Convert Latin-1 to UTF-8
  25 +
  26 + file.puts(line)
  27 + end
  28 +end
  29 +
  30 +puts "All done."
54,818 wordlists/swedish
54,818 additions, 0 deletions not shown

0 comments on commit 6b4684a

Please sign in to comment.
Something went wrong with that request. Please try again.