Skip to content

nakov/cognates-and-false-friends-tools

Repository files navigation

cognates-and-false-friends-tools

Automatically exported from code.google.com/p/cognates-and-false-friends-tools

The project aims to provide a set of tools for extracting cognates and false friends from text corpora. Currently it supports Bulgarian, Russian and partially English.

More information: http://www.nakov.com/blog/2009/09/30/open-source-toolkit-for-extraction-of-cognates-and-false-friends-tecff/

= Implemented Algorithms = The toolkit provides implementation of the following algorithms:

  • MMEDR: measures modified orthographic similarity between Bulgarian and Russian words
  • SemSim: measures semantic similarity betwen words by searching in Google and analysing the returned text snippets (supports Bulgarian, Russian and English)
  • CrossSim: measures cross-lingual semantic similarity by searching in Google and analysing the returned text snippets (supports Bulgarian and Russian)
  • FFExtract: extracts false friends from parallel corpus by determining candidates through MMEDR algorithm and combining statistical and semantic evidence for distinguishing between cognates and false friends

= Source Code = The source code is written in C# for .NET Framework 3.5 and can be compiled with Visual Studio 2008 Express Edition. The source code is available from its public SVN repository: [http://cognates-and-false-friends-tools.googlecode.com/svn/trunk/].

= Project Roadmap = The project is still in early alpha version.

About

Automatically exported from code.google.com/p/cognates-and-false-friends-tools

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages