Skip to content

sanchosteven/SemWiktionary

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SemWiktionary

Java API to access data from wiktionary. Specific target is the French wiktionary.

Vision

À partir d'un texte rédigé en langue française, concernant un domaine défini, extraire des informations sémantiques sur le contenu.

Objectifs

Objectif primaire

Fournir une API Java permettant d'accéder aux informations suivantes sur un mot de la langue française :

  1. nature (ou “classe lexicale”) ;
  2. relations avec d'autres mots de la base de données (liste non exhaustive : synonymie, homonymie…) ;
  3. définition.

Objectifs secondaires

Performance. Durées maximales envisageables : 1 journée pour le chargement de la base de données ; 5 minutes pour l'exécution d'une requête.

Contraintes

  1. Source de données : Wiktionnaire.
  2. Technologies : API disponible en Java. Base de données non-relationnelle, de préférence Neo4j.

License

GNU General Public License.

Equivalent projects and rationale

Coding Style, Philosophy & Implemented standards

Repo management

  • SemVer, Semantic Versioning.
  • README-Driven Development. Definitely applies to branches too.
  • GitHub-flow-like, that is:
    • master branch should always be deployable.
    • one branch per functionality, with explicit branch naming. Once functionality is implemented and tested, it is merged into master, and the branch is deleted.
  • code is considered valid only once it has been documented and tested. Automated tests are not mandatory for UI code (maintenance cost too high).
  • atomic commits: a commit is one change. It may be a documentation change, an API change, an implementation change, it may be split across several files or stand in one line, but it changes only one aspect of the application.

File hierarchy

  • src contains all source files.
  • test contains all test source files.
  • doc contains all documentation, except this README. Markdown is to be used for documentation formatting.
  • lib contains all third-party libraries.
  • bin contains class files.
  • build contains all build products packaged in JARs.
  • dist contains deliverables.

Coding style

OOP. DRY. Dynamicity. TDD with JUnit.

Writing

  • Follow Oracle's Java Code Conventions.
  • Scope opening brackets are on the same line as the control element that opens the scope; scope closing brackets are on their own line, except when in an if / else if / else construct, where we want to achieve a … } else { … look.
  • tabs. Spaces allowed in very specific contexts only, such as aligning multi-line arguments. For converters, tab:space ratio is set to 1:4.
  • Javadoc-style comments with Markdown instead of HTML. That seems to be Markdown-doclet-parsable, but the main goal is to have the most usable documentation in the code itself. Public documentation elements (i.e. parts that provide details about variables, methods rather than algorithmic details) should be in double-star comments (/**).
  • inline comments (//) and single-star comments (/*) comment a specific part of the implementation, and do not give any public-interest information.

Evils

  • code duplication;
  • hardcoded stuff;
  • coupling;
  • bad documentation;
  • non-explicit function names.

Basically, everything that will end up biting you bad later on.

Credits

Authors

Tutors

  • Michel Gautero
  • Carine Fédèle

Used projects

About

Java API to access data from [wiktionary](http://fr.wiktionary.org). Specific target is the French wiktionary.

Resources

Stars

Watchers

Forks

Packages

No packages published