Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Storage labs #2

Open
rgaleanog opened this issue Nov 15, 2016 · 4 comments
Open

Storage labs #2

rgaleanog opened this issue Nov 15, 2016 · 4 comments
Assignees
Milestone

Comments

@rgaleanog
Copy link
Owner

No description provided.

@rgaleanog rgaleanog added this to the Labs milestone Nov 15, 2016
@rgaleanog rgaleanog self-assigned this Nov 15, 2016
@rgaleanog
Copy link
Owner Author

'HDFS Lab: Test HDFS performance' made

@rgaleanog
Copy link
Owner Author

'HDFS Lab: Replicate to another cluster'

I need a partner in the same subnet in AWS.
The instances of the cluster are in Frankfurt and the Availability Zone is eu-central-1a

@rgaleanog
Copy link
Owner Author

I try to make distcp to another partner, but i get this error when I put data into his cluster:

hadoop distcp hdfs:///user/rgaleanog/teragenDiscp hdfs://ec2-52-3-88-79.compute-1.amazonaws.com:8022/tmp/.
16/11/16 07:15:27 INFO tools.DistCp: Input Options: DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false, ignoreFailures=false, overwrite=false, skipCRC=false, blocking=true, numListstatusThreads=0, maxMaps=20, mapBandwidth=100, sslConfigurationFile='null', copyStrategy='uniformsize', preserveStatus=[], preserveRawXattrs=false, atomicWorkPath=null, logPath=null, sourceFileListing=null, sourcePaths=[hdfs:/user/rgaleanog/teragenDiscp], targetPath=hdfs://ec2-52-3-88-79.compute-1.amazonaws.com:8022/tmp/, targetPathExists=true, filtersFile='null'}
16/11/16 07:15:27 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
16/11/16 07:15:27 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
16/11/16 07:15:27 INFO tools.SimpleCopyListing: Paths (files+dirs) cnt = 3; dirCnt = 1
16/11/16 07:15:27 INFO tools.SimpleCopyListing: Build file listing completed.
16/11/16 07:15:27 INFO Configuration.deprecation: io.sort.mb is deprecated. Instead, use mapreduce.task.io.sort.mb
16/11/16 07:15:27 INFO Configuration.deprecation: io.sort.factor is deprecated. Instead, use mapreduce.task.io.sort.factor
16/11/16 07:15:27 INFO tools.DistCp: Number of paths in the copy list: 3
16/11/16 07:15:27 INFO tools.DistCp: Number of paths in the copy list: 3
16/11/16 07:15:27 INFO jvm.JvmMetrics: Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
16/11/16 07:15:27 INFO mapreduce.JobSubmitter: number of splits:1
16/11/16 07:15:27 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local1703093952_0001
16/11/16 07:15:27 INFO mapreduce.Job: The url to track the job: http://localhost:8080/
16/11/16 07:15:27 INFO tools.DistCp: DistCp job-id: job_local1703093952_0001
16/11/16 07:15:27 INFO mapreduce.Job: Running job: job_local1703093952_0001
16/11/16 07:15:27 INFO mapred.LocalJobRunner: OutputCommitter set in config null
16/11/16 07:15:28 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
16/11/16 07:15:28 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.tools.mapred.CopyCommitter
16/11/16 07:15:28 INFO mapred.LocalJobRunner: Waiting for map tasks
16/11/16 07:15:28 INFO mapred.LocalJobRunner: Starting task: attempt_local1703093952_0001_m_000000_0
16/11/16 07:15:28 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
16/11/16 07:15:28 INFO mapred.Task:  Using ResourceCalculatorProcessTree : [ ]
16/11/16 07:15:28 INFO mapred.MapTask: Processing split: file:/tmp/hadoop-rgaleanog/mapred/staging/rgaleanog2133302670/.staging/_distcp-312733630/fileList.seq:0+701
16/11/16 07:15:28 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
16/11/16 07:15:28 INFO mapred.CopyMapper: Copying hdfs://ip-172-31-25-141.eu-central-1.compute.internal:8020/user/rgaleanog/teragenDiscp to hdfs://ec2-52-3-88-79.compute-1.amazonaws.com:8022/tmp/teragenDiscp
16/11/16 07:15:28 INFO mapred.CopyMapper: Copying hdfs://ip-172-31-25-141.eu-central-1.compute.internal:8020/user/rgaleanog/teragenDiscp/_SUCCESS to hdfs://ec2-52-3-88-79.compute-1.amazonaws.com:8022/tmp/teragenDiscp/_SUCCESS
16/11/16 07:15:28 INFO mapred.CopyMapper: Skipping copy of hdfs://ip-172-31-25-141.eu-central-1.compute.internal:8020/user/rgaleanog/teragenDiscp/_SUCCESS to hdfs://ec2-52-3-88-79.compute-1.amazonaws.com:8022/tmp/teragenDiscp/_SUCCESS
16/11/16 07:15:28 INFO mapred.CopyMapper: Copying hdfs://ip-172-31-25-141.eu-central-1.compute.internal:8020/user/rgaleanog/teragenDiscp/part-m-00000 to hdfs://ec2-52-3-88-79.compute-1.amazonaws.com:8022/tmp/teragenDiscp/part-m-00000
16/11/16 07:15:28 INFO mapred.RetriableFileCopyCommand: Creating temp file: hdfs://ec2-52-3-88-79.compute-1.amazonaws.com:8022/tmp/.distcp.tmp.attempt_local1703093952_0001_m_000000_0
16/11/16 07:15:28 INFO mapreduce.Job: Job job_local1703093952_0001 running in uber mode : false
16/11/16 07:15:28 INFO mapreduce.Job:  map 0% reduce 0%
16/11/16 07:15:34 INFO mapred.LocalJobRunner: 1.1% Copying hdfs://ip-172-31-25-141.eu-central-1.compute.internal:8020/user/rgaleanog/teragenDiscp/part-m-00000 to hdfs://ec2-52-3-88-79.compute-1.amazonaws.com:8022/tmp/teragenDiscp/part-m-00000 [5.1M/476.8M] > map
16/11/16 07:15:34 INFO mapreduce.Job:  map 100% reduce 0%
16/11/16 07:16:29 INFO hdfs.DFSClient: Exception in createBlockOutputStream
org.apache.hadoop.net.ConnectTimeoutException: 60000 millis timeout while waiting for channel to be ready for connect. ch : java.nio.channels.SocketChannel[connection-pending remote=/10.0.0.250:50010]
    at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:533)
    at org.apache.hadoop.hdfs.DFSOutputStream.createSocketForPipeline(DFSOutputStream.java:1850)
    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1593)
    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1546)
    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:688)
16/11/16 07:16:29 WARN hdfs.DFSClient: Abandoning BP-91629980-10.0.0.253-1479262320120:blk_1073745893_5069
16/11/16 07:16:29 WARN hdfs.DFSClient: Excluding datanode DatanodeInfoWithStorage[10.0.0.250:50010,DS-9b762b7f-ee37-4c0b-b457-6c9854709001,DISK]
16/11/16 07:17:29 INFO hdfs.DFSClient: Exception in createBlockOutputStream
org.apache.hadoop.net.ConnectTimeoutException: 60000 millis timeout while waiting for channel to be ready for connect. ch : java.nio.channels.SocketChannel[connection-pending remote=/10.0.0.251:50010]
    at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:533)
    at org.apache.hadoop.hdfs.DFSOutputStream.createSocketForPipeline(DFSOutputStream.java:1850)
    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1593)
    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1546)
    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:688)
16/11/16 07:17:29 WARN hdfs.DFSClient: Abandoning BP-91629980-10.0.0.253-1479262320120:blk_1073745897_5073
16/11/16 07:17:29 WARN hdfs.DFSClient: Excluding datanode DatanodeInfoWithStorage[10.0.0.251:50010,DS-21fe7e75-5a22-4a14-926b-33f095c42c9c,DISK]
16/11/16 07:18:29 INFO hdfs.DFSClient: Exception in createBlockOutputStream

Distcp creates directory inside his /tmp directory, but fails after 100% of maps.

@rgaleanog
Copy link
Owner Author

Now is working. We must to enable this property dfs.client.use.datanode.hostname and add remote IPs to /etc/hosts in the local cluster

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants