Java API to access data from wiktionary. Specific target is the French wiktionary.
À partir d'un texte rédigé en langue française, concernant un domaine défini, extraire des informations sémantiques sur le contenu.
Fournir une API Java permettant d'accéder aux informations suivantes sur un mot de la langue française :
- nature (ou “classe lexicale”) ;
- relations avec d'autres mots de la base de données (liste non exhaustive : synonymie, homonymie…) ;
- définition.
Performance. Durées maximales envisageables : 1 journée pour le chargement de la base de données ; 5 minutes pour l'exécution d'une requête.
- Source de données : Wiktionnaire.
- Technologies : API disponible en Java. Base de données non-relationnelle, de préférence Neo4j.
- SemVer, Semantic Versioning.
- README-Driven Development. Definitely applies to branches too.
- GitHub-flow-like, that is:
master
branch should always be deployable.- one branch per functionality, with explicit branch naming. Once functionality is implemented and tested, it is merged into
master
, and the branch is deleted.
- code is considered valid only once it has been documented and tested. Automated tests are not mandatory for UI code (maintenance cost too high).
- atomic commits: a commit is one change. It may be a documentation change, an API change, an implementation change, it may be split across several files or stand in one line, but it changes only one aspect of the application.
src
contains all source files.test
contains all test source files.doc
contains all documentation, except this README. Markdown is to be used for documentation formatting.lib
contains all third-party libraries.bin
containsclass
files.build
contains all build products packaged in JARs.dist
contains deliverables.
OOP. DRY. Dynamicity. TDD with JUnit.
- Follow Oracle's Java Code Conventions.
- Scope opening brackets are on the same line as the control element that opens the scope; scope closing brackets are on their own line, except when in an
if / else if / else
construct, where we want to achieve a… } else { …
look. - tabs. Spaces allowed in very specific contexts only, such as aligning multi-line arguments. For converters, tab:space ratio is set to 1:4.
- Javadoc-style comments with Markdown instead of HTML. That seems to be Markdown-doclet-parsable, but the main goal is to have the most usable documentation in the code itself. Public documentation elements (i.e. parts that provide details about variables, methods rather than algorithmic details) should be in double-star comments (
/**
). - inline comments (
//
) and single-star comments (/*
) comment a specific part of the implementation, and do not give any public-interest information.
- code duplication;
- hardcoded stuff;
- coupling;
- bad documentation;
- non-explicit function names.
Basically, everything that will end up biting you bad later on.
- Matti Schneider-Ghibaudo
- Fabien Brossier
- Ngoc Nguyen Thinh Dong
- Steven Sancho
- Michel Gautero
- Carine Fédèle