-
Notifications
You must be signed in to change notification settings - Fork 7
Allow Wildcards in Searches #441
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Sorry mate, I just tried such searches and Lucene finds nothing, for unknown reasons. E.g. I tried to search for 見? in hope to find miru but Lucene find nothing. So, I have to mark this as won't fix :( |
I see the wontfix and the reasoning. Is it possible to, for the above "見?" example, to do a "..見.." search, then filter the results through some code before presenting only those that fit the "見?" pattern, rather than rely on the function being in-built to lucene? |
Hmm, so basically if you put 見* or 見? into the search box, this will override the search setting and will activate the ..見.. search? This is a great idea! Even better: |
Sounds good to me! |
Great, thanks, I will implement this. Unfortunately, ? or |
Hmm. Fair enough, though I can imagine such would be useful in the case of a poor photocopy or handwriting, those are fairly rare occurrences in my line of work. certainly, you could make wildcards 'short-cuts' to the "ends-with" and "begins-with" functions, and I'm sure that would be much faster to implement; however, assuming you have any way of filling an array of strings with the search results of a "contains X" search, it should be fairly trivial to filter that array based on whether the strings match the pattern in the search box. Regular expression matches like this are easy to implement in java, though I'll admit I've never done so for Japanese text (maybe the reference http://stackoverflow.com/questions/13876955/regex-that-allows-chinese-characters is useful).. To be totally clear, this is how I imagine the code (though, I've no idea how this might fit into the existing code; my apologies if this is entirely useless to you) import java.util.regex;
//the following variable would be the search string at the moment the search was initiated
String example_search= "電??学"; // should match "電磁気学"
//String example_search = "電*学"; // should also match "電磁気学"
String temporary_search = "";
//check if there are wildcards
if(example_search.contains("?") || example_search.contains("*")){
//there are wildcards. Get the longest contiguous string of searchable characters
String[] s = example_search.split("\\*|\\?");//this should split any string using * and ? as delimiters
for(int i = 0; i <s.len(); i++){
if(s[i].len()>temporary_search.len()){
temporary_search=s[i].len();
}
}
aedictSearchFor(temporary_search); //do a normal search on the longest contiguous normal string
//get a list of the search results:
String[] contains_X_search_results = getAllSearchResultsAsStringArray()
//since the initial example_search string is already a regular expression, we can just do a regex test
//against it.
for(int i =0; i<contains_X_search_results.len();i++)
results_to_display.add(contains_X_search_results.matches(example_text))
displayResults();
}else carryOnAsUsual(); |
Hmm, this is good idea. Instead of searching for a longest string, I can perhaps search for all components in s[], say, 電 AND 学, and then filter out any entries not matching 電??学. Let me play with this a bit. |
Well, the motivation for using the longest string was just to minimize the size of the last for-loop. You could use any non-wildcard component of the query to perform a search, and then only show those that pass the .matches() test; however, I imagine some characters will lead to a very long list of results to filter. |
The regexp matching is inevitable anyway; to trim down the search results, it is good to include all strings in the query I believe. |
If you're saying that the aedict query function can accept multiple arguments in the query to narrow down the results list, then that's great, and indeed removes the need for searching for the longest string. Otherwise, what I mean is that (for example) if the query was "あとう", then using the longest string "とう" is likely to have fewer results than "あ"; thus, when one does get as far as running .matches("あとう"), there will be fewer calls to .matches(). That said, I'm not sure whether the speedup would be noticeable or not. |
Implemented in Aedict 3.37 |
Some tips to use the wildcards:
|
Original issue reported on code.google.com by
jebj...@gmail.com
on 8 Mar 2015 at 2:40The text was updated successfully, but these errors were encountered: