Permalink
Browse files

Big global search update for 1.9. Will resolve major bug points in Gl…

…obal Search component section.
  • Loading branch information...
1 parent 68c5857 commit 323a68467c2527bc2f2233df9b4d4ff6c7120c88 diml committed Dec 17, 2008
Showing with 801 additions and 581 deletions.
  1. +40 −40 search/LISEZMOI.txt
  2. +91 −152 search/README.txt
  3. +153 −0 search/README_ARCHIVE.txt
  4. +38 −35 search/add.php
  5. +1 −5 search/cron.php
  6. +38 −33 search/delete.php
  7. +42 −31 search/indexer.php
  8. +6 −10 search/indexersplash.php
  9. +2 −1 search/indexlib.php
  10. +85 −49 search/lib.php
  11. +137 −103 search/query.php
  12. +93 −74 search/querylib.php
  13. +34 −0 search/searchtypes.php
  14. +4 −12 search/stats.php
  15. +37 −36 search/update.php
View
@@ -2,59 +2,56 @@ Cette distribution partielle contient une refonte du moteur de
recherche globale de Moodle.
Le moteur de recherche est capable d'indexer et de rechercher
-des informations dans un grand nombre de contenus stockés
-dans la plate-forme à travers la manipulation des activités et
+des informations dans un grand nombre de contenus stock�s
+dans la plate-forme travers la manipulation des activit�s et
des blocs.
-Le moteur de recherche procède à une première indexation des
+Le moteur de recherche proc�de � une premi�re indexation des
ressources disponibles par action de l'administrateur. Une fois
-cette indexation effectuée, le moteur maintient régulièrement les
-indexes, en ajoutant les nouvelles entrées et en nettoyant les
-entrées obsolètes.
+cette indexation effectu�e, le moteur maintient r�guli�rement les
+indexes, en ajoutant les nouvelles entr�es et en nettoyant les
+entr�es obsol�tes.
-La recherche permet d'obtenir des références d'accès au contexte
+La recherche permet d'obtenir des r�f�rences d'acc�s au contexte
qui diffuse cette information, au nom de l'utilisateur courant.
-Le filtrage des résultats enlève de la liste des réponses toute
-ressource que la situation de l'utilisateur empêcherait de voir
-s'il y accédait dans son contexte habituel.
+Le filtrage des r�sultats enl�ve de la liste des r�ponses toute
+ressource que la situation de l'utilisateur emp�cherait de voir
+s'il y acc�dait dans son contexte habituel.
Mise en oeuvre
##############
-Pour déployer le moteur :
+La distribution fait d�sormais partie du noyau de Moodle.
+Il sera probablement n�cessaire d'ajouter un certain nombre de librairies additionnelles
+pour la conversion de documents physiques en vue de leur indexation. Ces librairies sont
+actuellement fournies dans le CVS dans la rubrique contrib/patches/global_search_libraries
+(antiword et xpdf). La prise en charge des fichiers "shockwave" est assur�e, sous r�serve
+de l'obtention des libairies de conversion aupr�s de Adobe (http://www.adobe.com/licensing/developer/)
-* Copie de fichiers
+1. Allez sur le bloc d'administration et r�glez les param�tres du bloc Recherche Globale.
+Ceci initialisera un certain nombre de fonctions dans le moteur.
-1. Ajouter les deux librairies fournies aux librairies de Moodle
-2. Ecraser le répertoire "search" par le répertoire fourni
-3. Ecraser le bloc "blocs/search" par le bloc fourni.
+2. Ins�rer un nouveau bloc de recherche globale dans la plate-forme
-* Installation logique
+3. Effectuer une recherche vide (en administrateur)
-4. Aller dans les notifications administratives et dérouler la procédure d'installation/mise à jour du bloc. L'installation crée la table image
-des documents indexés et utilisés dans le module search.
+4. Aller sur la page des statistiques
-5. Insérer un nouveau bloc de recherche globale dans la plate-forme
+5. Activer l'indexation (indexsplash.php). Attention, si la plate-form contient beaucoup de contenus cette indexation peut �tre TRES LONGUE.
-6. Effectuer une recherche vide (en administrateur)
+Pour effectuer des recherches, une fois la premi�re indexation termin�e, retourner au bloc de recherche et tenter une recherche.
-7. Aller sur la page des statistiques
-
-8. Activer l'indexation (indexsplash.php). Attention, si la plate-form contient beaucoup de contenus cette indexation peut être TRES LONGUE.
-
-Pour effectuer des recherches, une fois la première indexation terminée, retourner au bloc de recherche et tenter une recherche.
-
-Eléments pris en charge
+El�ments pris en charge
#######################
-Dans l'état actuel, les éléments indexés par le moteur sont :
+Dans l'�tat actuel, les �l�ments index�s par le moteur sont :
-- les entrées de forum
-- les fiches de base de données
-- les commentaires sur fiches de données
-- les entrées de glossaire
-- les commentaires sur entrées de glossaire
+- les entr�es de forum
+- les fiches de base de donn�es
+- les commentaires sur fiches de donn�es
+- les entr�es de glossaire
+- les commentaires sur entr�es de glossaire
- les ressources natives Moodle
- les ressources physiques de type MSWord
- les ressources physiques de type PDF
@@ -63,27 +60,30 @@ Dans l'état actuel, les éléments indexés par le moteur sont :
- les ressources physiques de type XML (.xml)
- les ressources physiques de type (Microsoft) Powerpoint (.ppt)
- les pages de wiki
-- les entités de projet technique
- les sessions de chat
+Des modules tiers ont �t� rendus indexables
+
+- Techproject
+
Extensions
##########
-L'API du moteur de recherche permet désormais :
+L'API du moteur de recherche permet d�sormais :
- l'indexation de contenus de blocs.
- l'indexation de modules contenant une information complexe ou de plusieurs types distincts
-- la sécurisation des informations indexées lors des extractions de résultats
-- l'indexation de tout module tiers par ajout d'un fichier php calibré
-- l'indexation de toute nouvelle resource physique par ajout d'un fichier php calibré
+- la s�curisation des informations index�es lors des extractions de r�sultats
+- l'indexation de tout module tiers par ajout d'un fichier php calibr�
+- l'indexation de toute nouvelle resource physique par ajout d'un fichier php calibr�
Extensions futures
##################
- De nouvelles prises en charge de contenus tels que les attachements des forums, les attachement des glossaires, ainsi que d'autres modules non encore
-implémentés.
+impl�ment�s.
-- l'extension mnet de la recherche dans un réseau de moodle interconnectés.
+- l'extension mnet de la recherche dans un r�seau de moodle interconnect�s.
View
@@ -1,153 +1,92 @@
-2006/09/08
-----------
-Google Summer of Code is finished, spent a couple of weeks away from
-the project to think about it and also to take a break. Working on it
-now I discovered bugs in the query parser (now fixed), and I also
-un-convoluted the querylib logic (well slighlty).
+This directoery contains the central implementation of
+Moodle's Global Search Engine.
-Updated ZFS files to latest SVN.
-
-2006/08/21
-----------
-Fixed index document count, and created new config variable to store
-the size. (Search now has 3 global vars in $CFG, date, size and complete,
-see indexer.php for var names). Index size is cached to provide an always
-current value for the index - this is to take into account the fact that
-deleted documents are in fact not removed from the index, but instead just
-marked as deleted and not returned in search results. The actual document
-still features in the index, and skews sizes. When the index optimiser is
-completed in ZFS, then these deleted documents will be pruned, thus
-correctly modifying the index size.
-
-Additional commenting added.
-
-Query page logic very slightly modified to clean up GET string a bit (removed
-'p' variable).
-
-Add/delete functions added to other document types.
-
-A few TODO fields added to source, indicating changes still to come (or at
-least to be considered).
-
-2006/08/16
-----------
-Add/delete/update cron functions finished - can be called seperately
-or all at once via cron.php.
-
-Document date field added to index and database summary.
-
-Some index db functionality abstracted out to indexlib.php - can
-use IndexDBControl class to add/del documents from database, and
-to make sure the db table is functioning.
-
-DB sql files changed to add some extra fields.
-
-Default 'simple' query modified to search title and author, as well
-as contents of document, to provide better results for users.
-
-2006/08/14
-----------
-First revision of the advanced search page completed. Functional,
-but needs a date search field still.
-
-2006/08/02
-----------
-Added resource search type, and the ability to specify custom 'virtual'
-models to search - allowing for non-module specific information to be
-indexed. Specify the extra search types to use in lib.php.
-
-2006/07/28
-----------
-Added delete logic to documents; the moodle database log is checked
-and any found delete events are used to remove the referenced documents
-from the database table and search index.
-
-Added database table name constant to lib.php, must change files using
-the static table name.
-
-Changed documents to use 'docid' instead of 'id' to reference the moodle
-instance id, since Zend Search adds it's own internal 'id' field. Noticed
-this whilst working on deletions.
-
-Added some additional fields to the permissions checking method, must still
-implement it though.
-
-2006/07/25
-----------
-Query logic moved into the SearchQuery class in querylib.php. Should be able
-to include this file in any page and run a query against the index (PHP 5
-checks must be added to those pages then, though).
-
-Index info can be retrieved using IndexInfo class in indexlib.php.
-
-Abstracted some stuff away, to reduce rendundancy and decrease the
-likelihood of errors. Improved the stats.php page to include some
-diagnostics for adminstrators.
-
-delete.php skeleton created for removing deleted documents from the
-index. cron.php will contain the logic for running delete.php,
-update.php and eventually add.php.
-
-2006/07/11
-----------
-(Warning: It took me 1900 seconds to index the forum, go make coffee
-whilst you wait.) [Moodle.org forum data]
-
-Forum search functions changed to use 'get_recordset' instead of
-'get_records', for speed reasons. This provides a significant improvement,
-but indexing is still slow - getting data from the database and Zend's
-tokenising _seem_ to be the prime suspects at the moment.
-
-/search/tests/ added - index.php can be used to see which modules are
-ready to be included in the search index, and it informs you of any
-errors - should be a prerequisite for indexing.
-
-Search result pagination added to query.php, will default to 20 until
-an admin page for the search module is written.
-
-2006/07/07
-----------
-Search-enabling functions moved out've the mod's lib.php files and into
-/search/documents/mod_document.php - this requires the search module to
-operate without requiring modification of lib files.
-
-SearchDocument base class improved, and the way module documents extend
-it. A custom-data field has been added to allow modules to add any custom
-data they wish to be stored in the index - this field is serialised into
-the index as a binary field.
-
-Database field 'type' renamed to 'doctype' to match the renaming in the
-index, 'type' seems to be a reserved word in Lucene. Several index field
-names change to be more descriptive (cid -> course_id). URLs are now
-stored in the index, and don't have to be generated on the fly during
-display of query results.
-
-2006/07/05
-------
-Started cleaning and standardising things.
-
-cvs v1.1
---------
-This is the initial release (prototype) of Moodle's new search module -
-so basically watch out for sharp edges.
-
-The structure has not been finalised, but this is what is working at the
-moment, when I start looking at other content to index, it will most likely
-change. I don't recommend trying to make your own content modules indexable,
-at least not until the whole flow is finalised. I will be implementing the
-functions needed to index all of the default content modules on Moodle, so
-expect that around mid-August.
-
-Wiki pages were my goal for this release, they can be indexed and searched,
-but not updated or deleted at this stage (was waiting for ZF 0.14 actually).
-
-I need to check the PostgreSQL sql file, I don't have a PG7 install lying
-around to test on, so the script is untested.
-
-To index for the first time, login as an admin user and browse to /search/index.php
-or /search/stats.php - there will be a message and a link telling you to go index.
-
--- Michael Champanis (mchampan)
- email: cynnical@gmail.com
- skype: mchampan
- Summer of Code 2006
+The Global Search Engine stores indexes about a huge quantity
+of information from within modules, block or resources stored
+by Moodle either in the database or the file system.
+
+The administrator initialy indexes the existing content. Once this
+first initialization performed, the search engine maintains indexes
+regularily, adding new entries, deleting obsolete one or updating
+some that have changed.
+
+Search will produce links for acceding the information in a similar
+context as usually accessed, from the current user point of view.
+Results filtering removes from results any link to information the
+current user would not be allowed to acces on a straight situation.
+
+Deployment
+###########
+
+The search engine is now part of Moodle core distribution.
+
+Some extra libraries might be added for converting physical documents to text
+so it can be indexed. Moodle CVS (entry contrib/patches/global_search_libraries)
+provides packs for antiword and xpdf GPL libraries the search engine is ready for
+shockwave indexing, but will not provide Adobe Search converters that should be
+obtained at http://www.adobe.com/licensing/developer/
+
+1. Go to the block administration panel and setup once the Global Search
+block. This will initialize useful parameters for the global search engine.
+
+2. Insert a new Global Search block somewhere in a course or top-level screen.
+
+3. Launch an empty search (you must be administrator).
+
+4. Go to the statistics screen.
+
+5. Activate indexation (indexersplash.php). Beware, if your Moodle has
+a large amount of content, indexing process may be VERY LONG.
+
+To search, go back to the search block and try a query.
+
+Handled information for indexing
+################################
+
+In the actual state, the engine indexes the following information:
+
+- assignment descriptions
+- forum posts
+- database records (using textual fields only)
+- database comments
+- glossary entries
+- glossary comments on entries
+- Moodle native resources
+- physical MSWord files as resources (.doc)
+- physical Powerpoint files as resources (.ppt)
+- physical PDF files as resources
+- physical text files as resources (.txt)
+- physical html files as resources (.htm and .html)
+- physical xml files as resources (.xml)
+- wiki pages
+- chat sessions
+- lesson pages
+
+Some third party plugins are also searchable using the new Search API implementation
+
+- Techproject
+
+Extensions
+##########
+
+The reviewed search engine API allows:
+
+- indexing of blocks contents
+- indexation of modules or blocks containing a complex information model
+- securing the access to the results
+- adding indexing handling for additional modules and plugins adding a php calibrated script
+- adding physical filetype handling adding a php calibrated script
+
+Global Search on NFS Mounted clusters
+#####################################
+
+This version contains a patched Lucene Zend implementation that allows using the Global Search engine in an NFS mounted shared volume for Web clustering. This implementation
+remains highly experimental and not all tests have been processed. Some changes may
+occur in the SoftLockManager that was added to the Lucene engine.
+
+Future extensions
+#################
+
+- Should be added more information to index such as forum and glossary attachements,
+ so will other standard module contents.
+- extending the search capability to a mnet network information space by aggregating remote search responses.
Oops, something went wrong.

0 comments on commit 323a684

Please sign in to comment.