Unable to load more than 50GB data in hdfs through tcpds script #14

ktania · 2017-02-23T06:32:20Z

Hi,
I am running hawq tpc-ds benchmarks. I have a cluster with 1 master(512 GB RAM) and 6 segments(130GB RAM/segment). I am using tpcds.sh script to load data in hdfs. I am able to run the tests successfully till 50 GB. But beyond that I am getting the many errors in the segmentdd/pg_logs. Attached is the log.

_"3rd party error log:
2017-02-21 13:39:21.899176, p205200, th139795568425216, ERROR Pipeline: Failed to build pipeline for block [block pool ID: BP-1649228503-10.200.6.49-1487239694963 block ID 1073832818_272724] file /hawq_default/16385/16508/18823/11, new generation stamp is 0,
Pipeline.cpp: 240: HdfsIOException: Failed to add new datanode into pipeline for block: [block pool ID: BP-1649228503-10.200.6.49-1487239694963 block ID 1073832818_272724] file /hawq_default/16385/16508/18823/11, set ""output.replace-datanode-on-failure"" to ""false"" to disable this feature.
@ Hdfs::Internal::PipelineImpl::
hawq-2017-02-21_130521.txt****

buildForAppendOrRecovery(bool)
@ Hdfs::Internal::PipelineImpl::send(std::shared_ptrHdfs::Internal::Packet)
@ Hdfs::Internal::OutputStreamImpl::sendPacket(std::shared_ptrHdfs::Internal::Packet)
@ Hdfs::Internal::OutputStreamImpl::appendInternal(char const*, long)
@ Hdfs::Internal::OutputStreamImpl::append(char const*, long)
@ hdfsWrite**
--------------------------------------------------_**
"3rd party error log:
2017-02-21 13:39:21.977826, p205198, th139795568425216, ERROR Failed to flush pipeline on datanode ILDSS8(192.168.4.12) for block [block pool ID: BP-1649228503-10.200.6.49-1487239694963 block ID 1073832864_273847] file /hawq_default/16385/16508/18775/10.
TcpSocket.cpp: 69: HdfsNetworkException: Read 8 bytes failed from ""192.168.4.12:50010"": (errno: 104) Connection reset by peer
@ Hdfs::Internal::TcpSocketImpl::read(char*, int)
@ Hdfs::Internal::BufferedSocketReaderImpl::readVarint32(int, int)
@ Hdfs::Internal::PipelineImpl::processResponse()
@ Hdfs::Internal::PipelineImpl::checkResponse(bool)
@ Hdfs::Internal::PipelineImpl::waitForAcks(bool)
@ Hdfs::Internal::OutputStreamImpl::flushInternal(bool)
@ Hdfs::Internal::OutputStreamImpl::sync()

Do I need to set any parameter in hadoop or tpcds configuration files, or my system parameters need tuning? Not sure if I am missing anything. Please help.

Thanks,
Tania

hawq-2017-02-21_130521.txt

RunningJon · 2017-02-23T15:45:28Z

HDFS can't keep up which fails the data node. Eventually, this will make other nodes "fail" and then there won't be enough to handle the insert thus it fails. To avoid this, you basically want the write to HDFS to not cause the node to fail but instead, retry. That is why the error message from HAWQ is telling you to change output.replace-datanode-on-failure to false.

If you are using Ambari, go to HAWQ, Advanced, Advanced hdfs-client and change the output.replace-datanode-on-failure to false (uncheck the box).

If you are not using Ambari, go to /usr/local/hawq/etc/ and change the hdfs-client.xml file so that output.replace-datanode-on-failure is set to false. Next, copy that new configuration file to every node in the cluster.

ktania · 2017-02-24T12:43:24Z

Thanks Jon for your quick response. I am not using Ambari and I tried out the above changes you mentioned. It made things better but I could see some new set of exceptions in hadoop logs and pg_logs. Probably hadoop is failing to load data. I am attaching the logs here for your reference. Thank you once again for your help.

hadoop-bigdata-datanode-ILDSS2_errors.txt
pg_log_ILDSS2_errors.txt

RunningJon · 2017-02-24T13:57:33Z

Yes, you have something wrong in HDFS. Maybe it is a misconfiguration.

Review the settings here:
http://hdb.docs.pivotal.io/211/hdb/install/install-cli.html
http://hdb.docs.pivotal.io/211/hawq/requirements/system-requirements.html

Are you using hash distribution or random? This is set in the tpcds_variables.sh file. You should be using random with HAWQ. You could also then reduce the number of virtual segments which would decrease the load on HDFS. This is done by changing hawq_rm_nvseg_perquery_perseg_limit from the default of 6 to 4.

ktania · 2017-02-27T06:58:26Z

I am using Random distribution. I will review the settings and rerun the test. Thanks a lot for your help.

ktania · 2017-03-03T05:47:40Z

There were few discrepancies in the system requirements. I have corrected those and its loading the data now. Thanks Jon. :)

ktania · 2017-03-06T13:35:50Z

hi, I am very surprised to see that when I give a Scale factor of 1000(1TB), the data loaded in the hdfs is only 55/56 % of the SF. (total DFS used 560 GB, total Non DFS 1254 GB).
Afterloading1TBdata.pdf

RunningJon · 2017-03-06T15:56:04Z

Compression! All tables are stored in Parquet format and medium and large sized tables are also compressed with Snappy compression. You can look at the size of the raw files that are stored in posix filesystem and see how large it is. It is located in each segment directory in the pivotalguru subdirectory. That should total to 1TB across all nodes.

ktania · 2017-03-07T05:20:04Z

Oh Sure! I missed that. Thanks for your help.

ktania closed this as completed Mar 3, 2017

ktania reopened this Mar 6, 2017

ktania closed this as completed Mar 7, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unable to load more than 50GB data in hdfs through tcpds script #14

Unable to load more than 50GB data in hdfs through tcpds script #14

ktania commented Feb 23, 2017

RunningJon commented Feb 23, 2017

ktania commented Feb 24, 2017

RunningJon commented Feb 24, 2017

ktania commented Feb 27, 2017

ktania commented Mar 3, 2017

ktania commented Mar 6, 2017

RunningJon commented Mar 6, 2017

ktania commented Mar 7, 2017

Unable to load more than 50GB data in hdfs through tcpds script #14

Unable to load more than 50GB data in hdfs through tcpds script #14

Comments

ktania commented Feb 23, 2017

RunningJon commented Feb 23, 2017

ktania commented Feb 24, 2017

RunningJon commented Feb 24, 2017

ktania commented Feb 27, 2017

ktania commented Mar 3, 2017

ktania commented Mar 6, 2017

RunningJon commented Mar 6, 2017

ktania commented Mar 7, 2017