Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Search numbers and dates #12

Open
dtosato opened this issue Mar 18, 2015 · 1 comment
Open

Search numbers and dates #12

dtosato opened this issue Mar 18, 2015 · 1 comment
Labels

Comments

@dtosato
Copy link

dtosato commented Mar 18, 2015

Hi, it is clear that BoboBrowse.Net works only with strings, but how can you deal with this limitation during a search of numbers and dates?
I try to make a clear example.
Suppose to index a field such as

int number = 19;
string numberFormatted = number.ToString("000000", System.Globalization.CultureInfo.InvariantCulture);
Field f = new Field("number", numberFormatted, Field.Store.NO, Field.Index.NOT_ANALYZED_NO_NORMS);
f.OmitTermFreqAndPositions = true;
doc.Add(f); 

using a standard analyzer (which is the best choice to analyze a number as text) into the index writer

IndexWriter modifier = new IndexWriter(directory, new StandardAnalyzer(Lucene.Net.Util.Version.LUCENE_30), true, Lucene.Net.Index.IndexWriter.MaxFieldLength.UNLIMITED);

Now, you would like to retrieve the indexed value (19). You can do that only writing a (lucene) query like fileld='number' text='000019', which assume that the user knows in advance the correct format for the field number. In fact, if you write fileld='number' text='19' (which is much more intuitive) it returns no results. Thus, the only option you have is to format the user query as done for the indexed value. Is that right?
I ask you that because I am used to build Lucene queries exploiting MultiFieldQueryParser that cannot be used if you format the input number or dates before indexing.

@NightOwl888
Copy link
Owner

The simplest solution is to store 2 fields with the same data in 2 different formats. An index is not a database, there is no rule that says that duplicating the data is bad. You just have to keep in mind that BoboBrowse.Net has these requirements and design the index to work with these formatted string data types as well as any other types you require. If you need to make them native data types for some other reason, you just need to add another field to the document.

int number = 19;

string numberFormatted = number.ToString("000000", System.Globalization.CultureInfo.InvariantCulture);
Field f = new Field("numberBobo", numberFormatted, Field.Store.NO, Field.Index.NOT_ANALYZED_NO_NORMS);
f.OmitTermFreqAndPositions = true;
doc.Add(f); 

Field f2 = new NumericField("number").SetIntValue(number);
doc.Add(f2);

In the above case, you would use the "numberBobo" field with one (or more than one) facet handlers. But the index is still tracking the numeric field "number" in case you need to utilize it in another way, such as for search and/or feeding it back to the application after a hit is made.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants