Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retrieve input name or input path #16

Closed
ghost opened this issue Sep 21, 2016 · 14 comments
Closed

Retrieve input name or input path #16

ghost opened this issue Sep 21, 2016 · 14 comments
Labels

Comments

@ghost
Copy link

ghost commented Sep 21, 2016

I'm searching through 2 txt files for words. Is there any way to retrieve as result except the row ,with the word in, of the file and its name?
Currently i am running the project like this :

 hadoop jar solr-hadoop-job-2.2.5.jar 
    com.lucidworks.hadoop.ingest.IngestJob  
    -Dlww.commit.on.close=true -DcsvDelimiter= 
   -cls com.lucidworks.hadoop.ingest.CSVIngestMapper -c spyros1  
    - i  /usr/local/hadoop/input 
    -of com.lucidworks.hadoop.io.LWMapRedOutputFormat 
    -s http://127.0.1.1:8983/solr

Is somehow possible make it?

@ctargett
Copy link
Contributor

I'm not sure I understand what you would like to see. Are you saying you just want to see the row number and file name?

@ghost
Copy link
Author

ghost commented Sep 21, 2016

@ctargett Yes i want to see also the file's name that the results came from.

@acesar
Copy link
Contributor

acesar commented Sep 21, 2016

I'm searching through 2 txt files for words.

you are ingesting the 2 txt files and then doing a query Solr to do the search, correct?

are you searching for a specify word?

You can try the RegexIngestMapper.

  • The com.lucidworks.hadoop.ingest.RegexIngestMapper.regex argument is a regex, in the example "\\w+", will add every word as a field.
   hadoop jar solr-hadoop-job-2.2.5.jar com.lucidworks.hadoop.ingest.IngestJob  
    -Dlww.commit.on.close=true 
    -Dcom.lucidworks.hadoop.ingest.RegexIngestMapper.regex="\\w+" 
    -Dcom.lucidworks.hadoop.ingest.RegexIngestMapper.groups_to_fields=0=match_s  
    -cls com.lucidworks.hadoop.ingest.RegexIngestMapper  
    -c collection1 -i /path/* -zk 127.0.0.1:2181/solr 
    -of com.lucidworks.hadoop.io.LWMapRedOutputFormat 

Result the Solr document:

id=/path/file-name.txt-<ROW>, match_s=[wordA, wordB, wordC], path=/path/file-name.txt

@ghost
Copy link
Author

ghost commented Sep 21, 2016

@acesar When i try your command with the zk i am taking this message :

Solr server not available on: http://127.0.1.1:9983/solr
Make sure that collection [spyros44] exists

solr

when i use -s http://127.0.1.1:8983/solr everything runs fine, but no data is sent to Solr...
I create the collections with this: bin/solr -e cloud
Any ideas?

@acesar
Copy link
Contributor

acesar commented Sep 21, 2016

@SpyrosAv -zk is for the zookeeper String, your port is 9983.

everything runs fine, but no data is sent to Solr...

Can you share the output of the job?
do you have a field name path in your Solr Schema?

@ghost
Copy link
Author

ghost commented Sep 21, 2016

@acesar I have the field path at my schema
Whit this :
-zk http://127.0.1.1:9983/solr
solr1
sol2

whit this :
-s http://127.0.1.1:8983/solr
solr4

solr5

@acesar
Copy link
Contributor

acesar commented Sep 21, 2016

@SpyrosAv

-zk http://127.0.1.1:9983/solr

The zk string should be -zk 127.0.1.1:9983 but if you only have one Solr node. It should be the same as the -s option.

The output seems to be correct.

it has a DOCS_ADDED value of 12761, you should have those docs in Solr.

Go to http://127.0.1.1:8983/solr and look at the collection.

@ghost
Copy link
Author

ghost commented Sep 21, 2016

@acesar In this case i have 2 nodes. But i have tried with 1 also.
Looking for "book" word
quer1

no data retrieved

@acesar
Copy link
Contributor

acesar commented Sep 21, 2016

@SpyrosAv Can you try again with the field match_ss?

 hadoop jar solr-hadoop-job-2.2.5.jar com.lucidworks.hadoop.ingest.IngestJob  
    -Dlww.commit.on.close=true 
    -Dcom.lucidworks.hadoop.ingest.RegexIngestMapper.regex="\\w+" 
   -Dcom.lucidworks.hadoop.ingest.RegexIngestMapper.groups_to_fields=0=match_ss  
   -cls com.lucidworks.hadoop.ingest.RegexIngestMapper  
   -c collection1 -i /path/* -zk 127.0.0.1: 9983
   -of com.lucidworks.hadoop.io.LWMapRedOutputFormat 

Can you first search with the *:* value to see how many docs you have.

The next step should be to search with match_ss:book

@ghost
Copy link
Author

ghost commented Sep 21, 2016

@acesar match_ss worked!!!
Thanks for the help and the your time!!!

@acesar
Copy link
Contributor

acesar commented Sep 21, 2016

Sure np!

@acesar acesar closed this as completed Sep 21, 2016
@ghost
Copy link
Author

ghost commented Sep 21, 2016

@acesar If you could provide some information why match_ss worked and match_s didn't, i would appreciate it.

@acesar
Copy link
Contributor

acesar commented Sep 22, 2016

@SpyrosAv TheRegexIngestMapper add the matches from the regex to an Array of values (multivalue in Solr). A Dynamic multivalue in Solr has the suffix _ss by default.

@ghost
Copy link
Author

ghost commented Oct 7, 2016

@acesar Which Ingest Mapper should i use if i am using .pdf or .doc files? Is GrokIngestMapper the right one ?
Also is there any documentation about the project that will help me figure out queries like this?
Thanks in advance.
(I am not opening a new issue but if you feel is needed just tell me...)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants