Browse files

notes on logging, thinking it might be helpful to have a http filesys…

…tem for doing a directory listing...? workflows need a method for getting/setting logdir
  • Loading branch information...
1 parent c308ef7 commit 791fc1bc60d06ff4ecb6969ae64e4ffd6ed32a4e @thedatachef thedatachef committed Feb 16, 2011
Showing with 20 additions and 0 deletions.
  1. +20 −0 notes.txt
@@ -0,0 +1,20 @@
+1. All output from the launched workflow should go to a workflow log file
+2. Hadoop output is special and should be pulled down from the jobtracker
+ - jobconf.xml
+ - job details page
+Workflow should specify a logdir, defualts to workdir + '/logs'
+Fetching hadoop job stats:
+1. Get job id
+2. Use curl to fetch the latest logs listing: "http://jobtracker:50030/logs/history/"
+3. Parse the logs listing and pull out the two urls we want (something-jobid.xml, something-jobid....)
+4. Fetch the two urls we care about and dump into the workflow's log dir.
+5. Possibly parse the results into an ongoing workflow-statistics.tsv file
+Other output:
+Output that would otherwise go to the terminal (nohup.out or some such) should be collected and dumped into the logdir as well.

0 comments on commit 791fc1b

Please sign in to comment.