Retrieve input name or input path #16

ghost · 2016-09-21T19:37:49Z

I'm searching through 2 txt files for words. Is there any way to retrieve as result except the row ,with the word in, of the file and its name?
Currently i am running the project like this :

 hadoop jar solr-hadoop-job-2.2.5.jar 
    com.lucidworks.hadoop.ingest.IngestJob  
    -Dlww.commit.on.close=true -DcsvDelimiter= 
   -cls com.lucidworks.hadoop.ingest.CSVIngestMapper -c spyros1  
    - i  /usr/local/hadoop/input 
    -of com.lucidworks.hadoop.io.LWMapRedOutputFormat 
    -s http://127.0.1.1:8983/solr

Is somehow possible make it?

The text was updated successfully, but these errors were encountered:

ctargett · 2016-09-21T19:59:15Z

I'm not sure I understand what you would like to see. Are you saying you just want to see the row number and file name?

ghost · 2016-09-21T20:04:27Z

@ctargett Yes i want to see also the file's name that the results came from.

acesar · 2016-09-21T20:51:32Z

I'm searching through 2 txt files for words.

you are ingesting the 2 txt files and then doing a query Solr to do the search, correct?

are you searching for a specify word?

You can try the RegexIngestMapper.

The com.lucidworks.hadoop.ingest.RegexIngestMapper.regex argument is a regex, in the example "\\w+", will add every word as a field.

   hadoop jar solr-hadoop-job-2.2.5.jar com.lucidworks.hadoop.ingest.IngestJob  
    -Dlww.commit.on.close=true 
    -Dcom.lucidworks.hadoop.ingest.RegexIngestMapper.regex="\\w+" 
    -Dcom.lucidworks.hadoop.ingest.RegexIngestMapper.groups_to_fields=0=match_s  
    -cls com.lucidworks.hadoop.ingest.RegexIngestMapper  
    -c collection1 -i /path/* -zk 127.0.0.1:2181/solr 
    -of com.lucidworks.hadoop.io.LWMapRedOutputFormat

Result the Solr document:

id=/path/file-name.txt-<ROW>, match_s=[wordA, wordB, wordC], path=/path/file-name.txt

ghost · 2016-09-21T22:45:06Z

@acesar When i try your command with the zk i am taking this message :

Solr server not available on: http://127.0.1.1:9983/solr
Make sure that collection [spyros44] exists

when i use -s http://127.0.1.1:8983/solr everything runs fine, but no data is sent to Solr...
I create the collections with this: bin/solr -e cloud
Any ideas?

acesar · 2016-09-21T22:54:14Z

@SpyrosAv -zk is for the zookeeper String, your port is 9983.

everything runs fine, but no data is sent to Solr...

Can you share the output of the job?
do you have a field name path in your Solr Schema?

ghost · 2016-09-21T23:11:50Z

@acesar I have the field path at my schema
Whit this :
-zk http://127.0.1.1:9983/solr

whit this :
-s http://127.0.1.1:8983/solr

acesar · 2016-09-21T23:20:19Z

@SpyrosAv

-zk http://127.0.1.1:9983/solr

The zk string should be -zk 127.0.1.1:9983 but if you only have one Solr node. It should be the same as the -s option.

The output seems to be correct.

it has a DOCS_ADDED value of 12761, you should have those docs in Solr.

Go to http://127.0.1.1:8983/solr and look at the collection.

ghost · 2016-09-21T23:32:01Z

@acesar In this case i have 2 nodes. But i have tried with 1 also.
Looking for "book" word

no data retrieved

acesar · 2016-09-21T23:39:35Z

@SpyrosAv Can you try again with the field match_ss?

 hadoop jar solr-hadoop-job-2.2.5.jar com.lucidworks.hadoop.ingest.IngestJob  
    -Dlww.commit.on.close=true 
    -Dcom.lucidworks.hadoop.ingest.RegexIngestMapper.regex="\\w+" 
   -Dcom.lucidworks.hadoop.ingest.RegexIngestMapper.groups_to_fields=0=match_ss  
   -cls com.lucidworks.hadoop.ingest.RegexIngestMapper  
   -c collection1 -i /path/* -zk 127.0.0.1: 9983
   -of com.lucidworks.hadoop.io.LWMapRedOutputFormat

Can you first search with the *:* value to see how many docs you have.

The next step should be to search with match_ss:book

ghost · 2016-09-21T23:52:54Z

@acesar match_ss worked!!!
Thanks for the help and the your time!!!

acesar · 2016-09-21T23:55:44Z

Sure np!

ghost · 2016-09-21T23:58:23Z

@acesar If you could provide some information why match_ss worked and match_s didn't, i would appreciate it.

acesar · 2016-09-22T00:10:12Z

@SpyrosAv TheRegexIngestMapper add the matches from the regex to an Array of values (multivalue in Solr). A Dynamic multivalue in Solr has the suffix _ss by default.

ghost · 2016-10-07T11:48:58Z

@acesar Which Ingest Mapper should i use if i am using .pdf or .doc files? Is GrokIngestMapper the right one ?
Also is there any documentation about the project that will help me figure out queries like this?
Thanks in advance.
(I am not opening a new issue but if you feel is needed just tell me...)

acesar added the question label Sep 21, 2016

acesar closed this as completed Sep 21, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Retrieve input name or input path #16

Retrieve input name or input path #16

ghost commented Sep 21, 2016

ctargett commented Sep 21, 2016

ghost commented Sep 21, 2016

acesar commented Sep 21, 2016 •

edited

ghost commented Sep 21, 2016

acesar commented Sep 21, 2016

ghost commented Sep 21, 2016

acesar commented Sep 21, 2016

ghost commented Sep 21, 2016 •

edited by ghost

acesar commented Sep 21, 2016

ghost commented Sep 21, 2016

acesar commented Sep 21, 2016

ghost commented Sep 21, 2016

acesar commented Sep 22, 2016

ghost commented Oct 7, 2016

Retrieve input name or input path #16

Retrieve input name or input path #16

Comments

ghost commented Sep 21, 2016

ctargett commented Sep 21, 2016

ghost commented Sep 21, 2016

acesar commented Sep 21, 2016 • edited

ghost commented Sep 21, 2016

acesar commented Sep 21, 2016

ghost commented Sep 21, 2016

acesar commented Sep 21, 2016

ghost commented Sep 21, 2016 • edited by ghost

acesar commented Sep 21, 2016

ghost commented Sep 21, 2016

acesar commented Sep 21, 2016

ghost commented Sep 21, 2016

acesar commented Sep 22, 2016

ghost commented Oct 7, 2016

acesar commented Sep 21, 2016 •

edited

ghost commented Sep 21, 2016 •

edited by ghost