Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
A simple university exercise with Apache Lucene 3.1 - This application performs indexing and search on a series of files that represent the post of some newsgroups
Java
branch: master

Fetching latest commit…

Cannot retrieve the latest commit at this time

Failed to load latest commit information.
dist
nbproject
src/lucenenewsgroups
.gitignore
README.markdown
build-before-profiler.xml
build.xml
data.zip
manifest.mf

README.markdown

LuceneNewsgroups

Hi fellows,
LuceneNewsgroup is a simple command line software that I wrote for an university exercise. Its sole purpose is to index a bunch of mailing list posts (stored locally) and to perform query based searches on them. It is developed in java with Netbeans and uses the Lucene library (v 3.1) as indexing and searching engine.

Dependencies

You should have java 1.5 or higher installed on your system.

Installation

LuceneNewsgroup does not require you to perform any kind of setup. It's distributed as jar file (you can find it in the dist folder). You can copy the jar file wherever you want and just run it from the console by using the command java -jar path/to/LuceneNewsgroups.jar. Obviously you should replace the path/to/ with the real path where you put the jar file. As it is a command line software, you have to type every single command. Yeah, this sounds really boooring! You can ease things a bit by creating an alias to shorten the ugly java -jar path/to/LuceneNewsgroups.jar command. On unix-based systems you can do this with the command alias lns='java -jar path/to/LuceneNewsgroups.jar'. This way you created an alias called lns and now you can just type lns instead of java -jar path/to/LuceneNewsgroups.jar.
( Thanks to my great friend saro for the hint ).

The workflow

To use LuceneNewsgroups you should have a set of files that represents newsgroup posts. You can find some of them in the data.zip file. These files should be placed on a directory and can be even organized in subdirectories if needed. Each subdirectory is intended as a category. The default workflow follow these steps:

  1. Create the index for a set of posts placed inside a directory
  2. Perform searches on the indexed directory of posts
  3. Open some of the matched files

So generally you would write commands like these:

cd path/to/my/posts/directory
lns createIndex
lns search god OR hell
lns open 21580

(N.B. I'm supposing you've created an alias called lns as described above)

Usage demo

Commands

Here's the complete list of available commands

version Shows informations about the version of the software
createIndex [<folder>] Creates an index for all the files contained in the folder directory. If you don't specify a folder, it will use the current working directory.
search <query> Search within the index of the current dir for documents that match the given query.
open <id> display the content of the document indexed with the given id.

That's all

As the title says that's all for the moment! If you have any question feel free to ask: lmammino [at] oryzone [dot] com.
In the meanwhile please cross your fingers hoping I'll pass the exam :)

Something went wrong with that request. Please try again.