Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NullPointerException thrown from MultithreadedMapper doing a forest import. #4

Closed
jxchen-us opened this issue Jul 20, 2016 · 12 comments

Comments

@jxchen-us
Copy link
Contributor

mlcp-9.0/bin/mlcp.sh import -username admin -password admin -host jchen -port 5275 -input_file_type forest -input_file_path /space/projects/head/xdmp/src/Data2/Forests/Documents

16/07/20 15:23:01 ERROR contentpump.MultithreadedMapper:
java.lang.NullPointerException
at com.marklogic.contentpump.MultithreadedMapper$SubMapRecordReader.nextKeyValue(MultithreadedMapper.java:285)
at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
at com.marklogic.contentpump.BaseMapper.runThreadSafe(BaseMapper.java:45)
at com.marklogic.contentpump.MultithreadedMapper$MapRunner.run(MultithreadedMapper.java:376)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

Have a forest with properties fragments. Not sure why it's not a problem with other errors. Seems like something we are not covering in regression tests.

@sravanr
Copy link

sravanr commented Sep 1, 2016

I still see this issue happening with latest mlcp.zip form QA directory

here is the stack trace

/space/Head/qa/mlcp/mlcp-9.0/bin/mlcp.sh IMPORT -input_file_path /var/opt/MarkLogic/Forests/mlcp-f15a -input_file_type forest -host localhost -port 5275 -username admin -password admin -output_collections A -mode local
16/08/31 19:31:40 INFO contentpump.LocalJobRunner: Content type is set to MIXED. The format of the inserted documents will be determined by the MIME type specification configured on MarkLogic Server.
16/08/31 19:31:41 INFO contentpump.ContentPump: Job name: local_1755926875_1
16/08/31 19:31:41 INFO input.FileInputFormat: Total input paths to process : 5
16/08/31 19:31:41 ERROR mapreduce.ForestReader: Unexpected error occurred reading forest data
java.lang.NullPointerException
at com.marklogic.tree.CompressedTreeDecoder.decode(CompressedTreeDecoder.java:503)
at com.marklogic.mapreduce.ForestReader.getNextTree(ForestReader.java:375)
at com.marklogic.mapreduce.ForestReader.nextKeyValue(ForestReader.java:159)
at com.marklogic.contentpump.LocalJobRunner$TrackingRecordReader.nextKeyValue(LocalJobRunner.java:444)
at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
at com.marklogic.contentpump.MultithreadedMapper$SubMapRecordReader.nextKeyValue(MultithreadedMapper.java:275)
at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
at com.marklogic.contentpump.BaseMapper.runThreadSafe(BaseMapper.java:45)
at com.marklogic.contentpump.MultithreadedMapper$MapRunner.run(MultithreadedMapper.java:379)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
16/08/31 19:31:41 INFO contentpump.LocalJobRunner: completed 100%
16/08/31 19:31:41 INFO contentpump.LocalJobRunner: com.marklogic.mapreduce.MarkLogicCounter:
16/08/31 19:31:41 INFO contentpump.LocalJobRunner: INPUT_RECORDS: 8
16/08/31 19:31:41 INFO contentpump.LocalJobRunner: OUTPUT_RECORDS: 8
16/08/31 19:31:41 INFO contentpump.LocalJobRunner: OUTPUT_RECORDS_COMMITTED: 8
16/08/31 19:31:41 INFO contentpump.LocalJobRunner: OUTPUT_RECORDS_FAILED: 0
16/08/31 19:31:41 INFO contentpump.LocalJobRunner: Total execution time: 0 sec

I have a test mlcp-da-test15.xml which has the scenario.

@sravanr sravanr assigned mattsunsjf and unassigned sravanr Sep 1, 2016
@sravanr sravanr added new and removed test labels Sep 1, 2016
@jxchen-us
Copy link
Contributor Author

Sravan,

Please file a separate bug. This seems a different stack. I'll take a look.

Jane

@jxchen-us
Copy link
Contributor Author

Your forest is generated from an 8.0 database?

@mattsunsjf mattsunsjf added test and removed new labels Sep 1, 2016
@mattsunsjf mattsunsjf assigned sravanr and unassigned mattsunsjf Sep 1, 2016
@jxchen-us
Copy link
Contributor Author

If this works against a 9.0 forest, file a separate bug; otherwise, you can reopen this bug and assign it to me.

@sravanr
Copy link

sravanr commented Sep 1, 2016

I ran the test against 9.0 trunk build and apart from the properties, I also have permissions, collections in the forests if that makes difference here.

@sravanr sravanr added the fix label Sep 1, 2016
@sravanr sravanr removed their assignment Sep 1, 2016
@jxchen-us jxchen-us added test and removed fix labels Sep 1, 2016
@jxchen-us jxchen-us assigned sravanr and unassigned jxchen-us Sep 1, 2016
@jxchen-us
Copy link
Contributor Author

Don't understand what you mean.

When you ran it against a 9.0 forest, did you reproduce NullPointerException?

If not, what did you get after the import?

@sravanr sravanr added fix and removed test labels Sep 1, 2016
@sravanr sravanr assigned sravanr and jxchen-us and unassigned sravanr Sep 1, 2016
@sravanr
Copy link

sravanr commented Sep 1, 2016

I ran the test mlcp-da-test15.xml against 9.0 nightly and found mlcp throwing below error
16/09/01 11:29:17 ERROR mapreduce.ForestReader: Unexpected error occurred reading forest data
java.lang.NullPointerException
at com.marklogic.tree.CompressedTreeDecoder.decode(CompressedTreeDecoder.java:503)
at com.marklogic.mapreduce.ForestReader.getNextTree(ForestReader.java:375)
at com.marklogic.mapreduce.ForestReader.nextKeyValue(ForestReader.java:159)
at com.marklogic.contentpump.LocalJobRunner$TrackingRecordReader.nextKeyValue(LocalJobRunner.java:444)
at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
at com.marklogic.contentpump.MultithreadedMapper$SubMapRecordReader.nextKeyValue(MultithreadedMapper.java:275)
at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
at com.marklogic.contentpump.BaseMapper.runThreadSafe(BaseMapper.java:45)
at com.marklogic.contentpump.MultithreadedMapper$MapRunner.run(MultithreadedMapper.java:379)
at com.marklogic.contentpump.MultithreadedMapper.run(MultithreadedMapper.java:215)
at com.marklogic.contentpump.LocalJobRunner$LocalMapTask.call(LocalJobRunner.java:378)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

mlcp-da-test14.xml is a new test which I wrote yesterday based on the bug desription , after running mlcp command I can only see documents being imported but no properties

command that I ran

/space/Head/qa/mlcp/marklogic-contentpump/bin/mlcp.sh IMPORT -input_file_path /var/opt/MarkLogic/Forests/mlcp-f15a -input_file_type forest -host localhost -port 5275 -username admin -password admin -output_collections A -output_uri_prefix /Test1 -mode local

Please find more details in the test.

@jxchen-us
Copy link
Contributor Author

I tried against both a 9.0 and an 8.0 forest. Both worked. Please give me access to the forest which failed for you. Make sure the forest is offline when you copy it.

@sravanr
Copy link

sravanr commented Sep 2, 2016

I am trying to do this as dynamic, here are my steps

  1. Create 2 forests f1 and f2 and attach f1 to DB
  2. Load documents with permissions, collections, properties and naked properties into DB
  3. Now detach the forest from the database
  4. Attach f2 to the DB
  5. Now run mlcp command, which throws null pointer exception

Am I missing anything here, am I suppose to do another step of setting forest to offline? detaching from database isn't work?

@jxchen-us
Copy link
Contributor Author

The above steps sound fine. But you need to give me access to the forest, so that I can reproduce. Basically with my forests, the NPE isn't reproducible.

@jxchen-us
Copy link
Contributor Author

jxchen-us commented Sep 2, 2016

There's a test Sravan told me about: mlcp-da-test15. Thank you. Here's what I got:

[jchen@jchen-z620 mlcp]$ mlcp-9.0/bin/mlcp.sh IMPORT -input_file_path /var/opt/MarkLogic/Forests/mlcp-f15a -input_file_type forest -host localhost -port 5275 -username admin -password admin -output_collections A -output_uri_prefix /Test1 -mode local
16/09/02 11:33:48 INFO contentpump.LocalJobRunner: Content type is set to MIXED. The format of the inserted documents will be determined by the MIME type specification configured on MarkLogic Server.
16/09/02 11:33:48 INFO contentpump.ContentPump: Job name: local_1801332137_1
16/09/02 11:33:48 INFO input.FileInputFormat: Total input paths to process : 5
16/09/02 11:33:50 WARN contentpump.ImportDocumentMapper: Skipped record: () from /Test/mlcp-export-xmlquery-filter/text/2.txt in file:/var/opt/MarkLogic/Forests/mlcp-f15a/00000000/TreeData, reason: fragment or link
16/09/02 11:33:50 ERROR mapreduce.ForestReader: Unexpected error occurred reading forest data
java.lang.NullPointerException
at com.marklogic.tree.CompressedTreeDecoder.decode(CompressedTreeDecoder.java:487)
at com.marklogic.mapreduce.ForestReader.getNextTree(ForestReader.java:367)
at com.marklogic.mapreduce.ForestReader.nextKeyValue(ForestReader.java:153)
at com.marklogic.contentpump.LocalJobRunner$TrackingRecordReader.nextKeyValue(LocalJobRunner.java:444)
at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
at com.marklogic.contentpump.MultithreadedMapper$SubMapRecordReader.nextKeyValue(MultithreadedMapper.java:275)
at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
at com.marklogic.contentpump.BaseMapper.runThreadSafe(BaseMapper.java:45)
at com.marklogic.contentpump.MultithreadedMapper$MapRunner.run(MultithreadedMapper.java:379)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
16/09/02 11:33:50 INFO contentpump.LocalJobRunner: completed 100%
16/09/02 11:33:50 INFO contentpump.LocalJobRunner: com.marklogic.mapreduce.MarkLogicCounter:
16/09/02 11:33:50 INFO contentpump.LocalJobRunner: INPUT_RECORDS: 8
16/09/02 11:33:50 INFO contentpump.LocalJobRunner: OUTPUT_RECORDS: 7
16/09/02 11:33:50 INFO contentpump.LocalJobRunner: OUTPUT_RECORDS_COMMITTED: 7
16/09/02 11:33:50 INFO contentpump.LocalJobRunner: OUTPUT_RECORDS_FAILED: 0
16/09/02 11:33:50 INFO contentpump.LocalJobRunner: Total execution time: 1 sec

jxchen-us added a commit that referenced this issue Sep 4, 2016
jxchen-us added a commit that referenced this issue Sep 4, 2016
jxchen-us added a commit that referenced this issue Sep 5, 2016
@jxchen-us jxchen-us added test and removed fix labels Sep 5, 2016
@jxchen-us jxchen-us assigned sravanr and unassigned jxchen-us Sep 5, 2016
@sravanr sravanr modified the milestones: 9.0-ea4, 9.0-ea3 Sep 7, 2016
@sravanr sravanr added ship and removed test labels Sep 8, 2016
@sravanr
Copy link

sravanr commented Sep 8, 2016

Verified on the latest build and updated the key, looks good to me

@sravanr sravanr closed this as completed Sep 8, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants