Skip to content

Commit

Permalink
All output files are written in a directory provided with option -o.
Browse files Browse the repository at this point in the history
  • Loading branch information
Sébastien Boisvert committed Sep 6, 2011
1 parent e5f0ac7 commit e77388e
Show file tree
Hide file tree
Showing 32 changed files with 175 additions and 109 deletions.
57 changes: 30 additions & 27 deletions MANUAL_PAGE.txt
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,9 @@ DESCRIPTION:
-help
Displays this help page.

-version
Displays Ray version and compilation options.

K-mer length

-k kmerLength
Expand All @@ -32,16 +35,16 @@ DESCRIPTION:

Outputs

-o outputPrefix
Specifies the prefix for outputted files.
-o outputDirectory
Specifies the directory for outputted files. Default is RayOutput

-amos
Writes the AMOS file called PREFIX.AMOS.afg
Writes the AMOS file called RayOutput/AMOS.afg
An AMOS file contains read positions on contigs.
Can be opened with software with graphical user interface.

-write-kmers
Writes k-mer graph to PREFIX.kmers.txt
Writes k-mer graph to RayOutput/kmers.txt
The resulting file is not utilised by Ray.
The resulting file is very large.

Expand Down Expand Up @@ -144,80 +147,80 @@ FILES

Scaffolds

PREFIX.Scaffolds.fasta
RayOutput/Scaffolds.fasta
The scaffold sequences in FASTA format
PREFIX.ScaffoldComponents.txt
RayOutput/ScaffoldComponents.txt
The components of each scaffold
PREFIX.ScaffoldLengths.txt
RayOutput/ScaffoldLengths.txt
The length of each scaffold
PREFIX.ScaffoldLinks.txt
RayOutput/ScaffoldLinks.txt
Scaffold links

Contigs

PREFIX.Contigs.fasta
RayOutput/Contigs.fasta
Contiguous sequences in FASTA format
PREFIX.ContigLengths.txt
RayOutput/ContigLengths.txt
The lengths of contiguous sequences

Summary

PREFIX.OutputNumbers.txt
RayOutput/OutputNumbers.txt
Overall numbers for the assembly

de Bruijn graph

PREFIX.CoverageDistribution.txt
RayOutput/CoverageDistribution.txt
The distribution of coverage values
PREFIX.CoverageDistributionAnalysis.txt
RayOutput/CoverageDistributionAnalysis.txt
Analysis of the coverage distribution
PREFIX.degreeDistribution.txt
RayOutput/degreeDistribution.txt
Distribution of ingoing and outgoing degrees
PREFIX.kmers.txt
RayOutput/kmers.txt
k-mer graph, required option: -write-kmers
The resulting file is not utilised by Ray.
The resulting file is very large.

Assembly steps

PREFIX.SeedLengthDistribution.txt
RayOutput/SeedLengthDistribution.txt
Distribution of seed length
PREFIX.<rank>.RaySeeds.fasta
RayOutput/<rank>.RaySeeds.fasta
Seed DNA sequences, required option: -write-seeds
PREFIX.<rank>.RayExtensions.fasta
RayOutput/<rank>.RayExtensions.fasta
Extension DNA sequences, required option: -write-extensions

Paired reads

PREFIX.LibraryStatistics.txt
RayOutput/LibraryStatistics.txt
Estimation of outer distances for paired reads
PREFIX.Library<LibraryNumber>.txt
RayOutput/Library<LibraryNumber>.txt
Frequencies for observed outer distances (insert size + read lengths)

Partition

PREFIX.NumberOfSequences.txt
RayOutput/NumberOfSequences.txt
Number of reads in each file
PREFIX.SequencePartition.txt
RayOutput/SequencePartition.txt
Sequence partition

Ray software

PREFIX.RayVersion.txt
RayOutput/RayVersion.txt
The version of Ray
PREFIX.RayCommand.txt
RayOutput/RayCommand.txt
The exact same command provided

AMOS

PREFIX.AMOS.afg
RayOutput/AMOS.afg
Assembly representation in AMOS format, required option: -amos

Communication

PREFIX.MessagePassingInterface.txt
RayOutput/MessagePassingInterface.txt
Number of messages sent
PREFIX.NetworkTest.txt
RayOutput/NetworkTest.txt
Latencies in microseconds

DOCUMENTATION
Expand Down
12 changes: 6 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -99,7 +99,7 @@ Tested C++ compilers:
mpirun -np 512 Ray-Large-k-mers/Ray -k 63 -p lib1_1.fastq lib1_2.fastq \
-p lib2_1.fastq lib2_2.fastq -o DeadlyBug,Assembler=Ray,K=63
# wait
ls DeadlyBug,Assembler=Ray,K=63.Scaffolds.fasta
ls DeadlyBug,Assembler=Ray,K=63/Scaffolds.fasta

## Compilation options

Expand All @@ -116,14 +116,14 @@ see the Makefile for more.

To run Ray on paired reads:

mpirun -np 25 Ray -p lib1.left.fasta lib1.right.fasta -p lib2.left.fasta lib2.right.fasta -o prefix
ls prefix.Contigs.fasta
ls prefix.Scaffolds.fasta
ls prefix.*
mpirun -np 25 Ray -p lib1.left.fasta lib1.right.fasta -p lib2.left.fasta lib2.right.fasta -o RayOutput
ls RayOutput/Contigs.fasta
ls RayOutput/Scaffolds.fasta
ls RayOutput/

# Outputted files

PREFIX.Contigs.fasta and PREFIX.Scaffolds.fasta
RayOutput/Contigs.fasta and RayOutput/Scaffolds.fasta

type Ray -help for a full list

Expand Down
4 changes: 2 additions & 2 deletions code/assembler/Partitioner.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -83,7 +83,7 @@ void Partitioner::masterMethod(){
/* write the number of sequences */
ostringstream fileName;
fileName<<m_parameters->getPrefix();
fileName<<".NumberOfSequences.txt";
fileName<<"NumberOfSequences.txt";
ofstream f2(fileName.str().c_str());

f2<<"Files: "<<m_parameters->getNumberOfFiles()<<endl;
Expand Down Expand Up @@ -116,7 +116,7 @@ void Partitioner::masterMethod(){
/* write the partition */
ostringstream fileName2;
fileName2<<m_parameters->getPrefix();
fileName2<<".SequencePartition.txt";
fileName2<<"SequencePartition.txt";
ofstream f3(fileName2.str().c_str());
uint64_t perRank=totalSequences/m_parameters->getSize();
f3<<"#Rank FirstSequence LastSequence NumberOfSequences"<<endl;
Expand Down
2 changes: 1 addition & 1 deletion code/assembler/SeedExtender.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -105,7 +105,7 @@ int minimumCoverage,OpenAssemblerChooser*oa,bool*edgesReceived,int*m_mode){
/** write extensions for debugging purposes */
if(m_parameters->hasOption("-write-extensions")){
ostringstream fileName;
fileName<<m_parameters->getPrefix()<<"."<<m_parameters->getRank()<<".RayExtensions.fasta";
fileName<<m_parameters->getPrefix()<<"."<<m_parameters->getRank()<<"RayExtensions.fasta";
ofstream f(fileName.str().c_str());
for(int i=0;i<(int)ed->m_EXTENSION_identifiers.size();i++){
uint64_t id=ed->m_EXTENSION_identifiers[i];
Expand Down
2 changes: 1 addition & 1 deletion code/assembler/SeedingData.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -322,7 +322,7 @@ void SeedingData::sendSeedLengths(){
void SeedingData::writeSeedStatistics(){
ostringstream file;
file<<m_parameters->getPrefix();
file<<".SeedLengthDistribution.txt";
file<<"SeedLengthDistribution.txt";
ofstream f(file.str().c_str());
for(map<int,int>::iterator i=m_masterSeedLengths.begin();i!=m_masterSeedLengths.end();i++){
int length=i->first;
Expand Down
2 changes: 1 addition & 1 deletion code/communication/NetworkTest.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -141,7 +141,7 @@ void NetworkTest::masterWork(){
}else if(m_doneWithNetworkTest==m_size){
ostringstream file;
file<<m_parameters->getPrefix();
file<<".NetworkTest.txt";
file<<"NetworkTest.txt";
ofstream f(file.str().c_str());
f<<"# average latency in microseconds (10^-6 seconds) when requesting a reply for a message of "<<MAXIMUM_MESSAGE_SIZE_IN_BYTES<<" bytes"<<endl;
f<<"# Message passing interface rank\tName\tLatency in microseconds"<<endl;
Expand Down
40 changes: 33 additions & 7 deletions code/core/Machine.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -56,15 +56,19 @@ Machine::Machine(int argc,char**argv){
string param=argv[1];
if(param.find("help")!=string::npos){
m_parameters.showUsage();
m_messagesHandler.destructor();
exit(EXIT_NEEDS_ARGUMENTS);
}else if(param.find("usage")!=string::npos){
m_parameters.showUsage();
m_messagesHandler.destructor();
exit(EXIT_NEEDS_ARGUMENTS);
}else if(param.find("man")!=string::npos){
m_parameters.showUsage();
m_messagesHandler.destructor();
exit(EXIT_NEEDS_ARGUMENTS);
}else if(param.find("-version")!=string::npos){
showRayVersionShort();
m_messagesHandler.destructor();
exit(EXIT_NEEDS_ARGUMENTS);
}
}
Expand Down Expand Up @@ -237,11 +241,32 @@ void Machine::start(){
}

/** only show the version. */
if(fullReport)
if(fullReport){
m_messagesHandler.destructor();
return;
}

m_parameters.constructor(m_argc,m_argv,getRank());


/** create the directory for the assembly */

string directory=m_parameters.getPrefix();
if(fileExists(directory.c_str())){
if(m_parameters.getRank() == MASTER_RANK)
cout<<"Error, "<<directory<<" already exists, change the -o parameter to another value."<<endl;

m_messagesHandler.destructor();
return;
}

m_messagesHandler.barrier();

if(m_parameters.getRank() == MASTER_RANK)
createDirectory(directory.c_str());



m_seedExtender.constructor(&m_parameters,&m_directionsAllocator,m_ed,&m_subgraph,&m_inbox);
ostringstream prefixFull;
prefixFull<<m_parameters.getMemoryPrefix()<<"_Main";
Expand Down Expand Up @@ -396,7 +421,7 @@ m_seedingData,
cout<<endl;
cout<<"Rank "<<getRank()<<" wrote "<<m_parameters.getOutputFile()<<endl;
cout<<"Rank "<<getRank()<<" wrote "<<m_parameters.getScaffoldFile()<<endl;
cout<<"Check for "<<m_parameters.getPrefix()<<".*"<<endl;
cout<<"Check for "<<m_parameters.getPrefix()<<"*"<<endl;
cout<<endl;
if(m_parameters.useAmos()){
cout<<"Rank "<<getRank()<<" wrote "<<m_parameters.getAmosFile()<<" (reads mapped onto contiguous sequences in AMOS format)"<<endl;
Expand Down Expand Up @@ -580,6 +605,7 @@ void Machine::call_RAY_SLAVE_MODE_SEND_SEED_LENGTHS(){
}

void Machine::call_RAY_MASTER_MODE_LOAD_CONFIG(){

if(m_argc==2 && m_argv[1][0]!='-'){
ifstream f(m_argv[1]);
if(!f){
Expand Down Expand Up @@ -759,7 +785,7 @@ void Machine::call_RAY_MASTER_MODE_SEND_COVERAGE_VALUES(){

ostringstream g;
g<<m_parameters.getPrefix();
g<<".CoverageDistributionAnalysis.txt";
g<<"CoverageDistributionAnalysis.txt";
ofstream outputFile(g.str().c_str());
outputFile<<"k-mer length:\t"<<m_parameters.getWordSize()<<endl;
outputFile<<"Lowest coverage observed:\t"<<lowestCoverage<<endl;
Expand Down Expand Up @@ -883,7 +909,7 @@ void Machine::call_RAY_MASTER_MODE_WRITE_KMERS(){
}else if(m_numberOfRanksDone==m_parameters.getSize()){
if(m_parameters.writeKmers()){
cout<<endl;
cout<<"Rank "<<getRank()<<" wrote "<<m_parameters.getPrefix()<<".kmers.txt"<<endl;
cout<<"Rank "<<getRank()<<" wrote "<<m_parameters.getPrefix()<<"kmers.txt"<<endl;
}

m_master_mode=RAY_MASTER_MODE_TRIGGER_INDEXING;
Expand All @@ -892,7 +918,7 @@ void Machine::call_RAY_MASTER_MODE_WRITE_KMERS(){
return;

ostringstream edgeFile;
edgeFile<<m_parameters.getPrefix()<<".degreeDistribution.txt";
edgeFile<<m_parameters.getPrefix()<<"degreeDistribution.txt";
ofstream f(edgeFile.str().c_str());

f<<"# Most of the vertices should have an ingoing degree of 1 and an outgoing degree of 1."<<endl;
Expand Down Expand Up @@ -1373,7 +1399,7 @@ void Machine::call_RAY_MASTER_MODE_KILL_RANKS(){
void Machine::call_RAY_SLAVE_MODE_DIE(){
/** write message-passing interface file */
ostringstream file;
file<<m_parameters.getPrefix()<<".MessagePassingInterface.txt";
file<<m_parameters.getPrefix()<<"MessagePassingInterface.txt";
const char*outputFile=file.str().c_str();
m_messagesHandler.appendStatistics(outputFile);

Expand Down Expand Up @@ -1416,7 +1442,7 @@ void Machine::call_RAY_MASTER_MODE_KILL_ALL_MPI_RANKS(){

/** empty the file if it exists */
ostringstream file;
file<<m_parameters.getPrefix()<<".MessagePassingInterface.txt";
file<<m_parameters.getPrefix()<<"MessagePassingInterface.txt";

FILE*fp=fopen(file.str().c_str(),"w+");
fprintf(fp,"# Source\tDestination\tTag\tCount\n");
Expand Down
38 changes: 34 additions & 4 deletions code/core/OperatingSystem.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -191,15 +191,23 @@ void getMicroSeconds(uint64_t*seconds,uint64_t*microSeconds){
*
* \see http://msdn.microsoft.com/en-us/library/aa363855(v=vs.85).aspx
*/
void createDirectory(char*directory){
void createDirectory(const char*directory){
#ifdef OS_POSIX

/* read, write for owner */
/* read, write for group */
mode_t mode=S_IRUSR | S_IWUSR | S_IRGRP | S_IWGRP;
/*
* S_IRWXU
read, write, execute/search by owner
*
* S_IRWXG
* read, write, execute/search by group
* */
mode_t mode=S_IRWXU | S_IRWXG;

int status=mkdir(directory,mode);

#ifdef ASSERT
assert(status==0);
#endif

#elif defined(OS_WIN)

Expand All @@ -209,6 +217,28 @@ void createDirectory(char*directory){
#endif
}

/** \see http://pubs.opengroup.org/onlinepubs/009695399/functions/stat.html
* \see http://blog.kowalczyk.info/article/Check-if-file-exists-on-Windows.html */
bool fileExists(const char*file){
#ifdef OS_POSIX
struct stat st;
int returnValue=stat(file,&st);

bool theFileExists=(returnValue == 0);
return theFileExists;

#elif defined(OS_WIN)
/* Return TRUE if file 'fileName' exists */
DWORD fileAttr = GetFileAttributes(fileName);
if(0xFFFFFFFF == fileAttr)
return false;
return true;

#else
/* not implemented */
#endif
}

void showRayVersionShort(){
cout<<"Ray version "<<RAY_VERSION<<" ";

Expand Down
4 changes: 3 additions & 1 deletion code/core/OperatingSystem.h
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,9 @@ void showMemoryUsage(int rank);
void getMicroSeconds(uint64_t*seconds,uint64_t*microSeconds);

/** create a directory */
void createDirectory(char*directory);
void createDirectory(const char*directory);

bool fileExists(const char*file);

void showRayVersion(MessagesHandler*messagesHandler,bool fullReport);

Expand Down
Loading

0 comments on commit e77388e

Please sign in to comment.