Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Abacus file requirements for Qspec #471

Closed
Ellior2 opened this Issue Feb 7, 2017 · 41 comments

Comments

Projects
None yet
4 participants
@Ellior2
Copy link
Contributor

Ellior2 commented Feb 7, 2017

@emmats
I am running into a few problems with Qspec http://www.nesvilab.org/qspec.php/

Here is my abacus file that I want to use.

  • file limits to use this program are 10MB. My file is 29MB.
  • not sure if my header necessarily looks like what is require to run the program.
  • there are no values in my "protLen" column...
@kubu4

This comment has been minimized.

Copy link
Collaborator

kubu4 commented Feb 7, 2017

Have you tried installing/running Qspec locally?

https://sourceforge.net/projects/qprot/

According to the website you linked, Qspec is now a component of QPROT. I'm fairly certain there won't be a file size limitation when running the software on your own computer (vs. uploading to their servers via the web).

@kubu4

This comment has been minimized.

Copy link
Collaborator

kubu4 commented Feb 7, 2017

Also, I see in your ABACUS parameters file that your output is set to "default". This can be changed to set the formatting specifically for entry in Qspec. See the Abacus manual on this page: https://sourceforge.net/projects/abacustpp/files/

@kubu4

This comment has been minimized.

Copy link
Collaborator

kubu4 commented Feb 7, 2017

To add to my last post, it's possible that setting your output format to Qspec will reduce your file size; thus allowing you to use the Qspec web interface.

@emmats

This comment has been minimized.

Copy link
Contributor

emmats commented Feb 7, 2017

I can't see your abacus file, but my guess is that you didn't format it to the specifications on the website. You should only have columns for protein length, name, and then spectral counts for your "control" and "treatment" replicates. The online version is also picky about file name (no spaces) and you need to make sure it is saved in the correct txt format.

@Ellior2

This comment has been minimized.

Copy link
Contributor Author

Ellior2 commented Feb 8, 2017

I tried changing my parameters file like Sam suggested by changing the output format to Qspec instead of default: Abacus_parameters_qspec.txt

For some reason, it looks like it's still using the default output format...

image

I also see this error message occurring...

image

@kubu4

This comment has been minimized.

Copy link
Collaborator

kubu4 commented Feb 8, 2017

Can you please 'cat' your parameters file and post a screenshot?

@Ellior2

This comment has been minimized.

Copy link
Contributor Author

Ellior2 commented Feb 8, 2017

image
image

It didn't all fit it one screenshot, but I overlapped the images so you can tell that nothing is missing.

@kubu4

This comment has been minimized.

Copy link
Collaborator

kubu4 commented Feb 8, 2017

Thanks. Hmmm, @emmats how do you get your file formatted for Qspec analysis?

@kubu4

This comment has been minimized.

Copy link
Collaborator

kubu4 commented Feb 8, 2017

It might be worth trying to specify qspec as output file format using all lowercase. I notice that "default" is all lowercase in this Abacus parameters file you have.

This is confusing because the confirmation received when you initiate a run capitalizes "Default" and the manual also uses capital letters when describing output options. However, in regards to the latter, that manual is specifically referring to the Graphical User Interface (GUI) and the GUI uses capitalized letters. So, the manual is written to accurately reflect the options presented to the user when using the GUI...

@Ellior2

This comment has been minimized.

Copy link
Contributor Author

Ellior2 commented Feb 8, 2017

I'll give that a go when I have decent internet at home tonight. Thanks Sam!

@kubu4

This comment has been minimized.

Copy link
Collaborator

kubu4 commented Feb 8, 2017

I'll give that a go when I have decent internet at home tonight.

If you're so inclined, you could use a program that's installed on Emu called: tmux

This allows you to exit a SSH session w/o killing any jobs that you initiated during a SSH session. That means, you could hop onto Emu (even with a janky internet connection), get the jobs running, and then exit the Emu connection. You can hop back onto Emu at any time to check on the status of your jobs - it'll be like you never left; all the terminal ouputs will be displayed as normal and all of that.

Here's an example of how I initiate my SSH sessions (I always use tmux to prevent problems that would be caused by a disconnection):

  1. SSH into computer.
  2. After connecting, type in the following (and then press Enter): tmux
  • The terminal prompt won't look different, but you should notice a green bar is now displayed across the bottom of your terminal screen; this indicates you're in a tmux session.
  1. Initiate any computing stuff you want.

After they're started, you can exit the SSH session in the following fashion.

  1. Press ctrl-b on your keyboard (hold the control button and press the b key on the keyboard). This is a special tmux key combination to tell it to interpret the next command.
  2. Press d on your keyboard. This "detaches" your tmux session. All your stuff that you started in tmux is still running in that session. When you do this, you should see a message indicating that you've detached and which tmux session you've detached from (indicated by a number; usually 0 if this is your first/only tmux session).
  3. Exit your SSH session (type exit and press enter).

You've successfully initiated long-running computing junk over a janky internet connection and no longer have to worry about a bad connection interrupting your computing jobs!

To come back to your running jobs, do the following:

  1. SSH into the same computer as before.
  2. Attach your previous tmux session where your jobs are running (type the following and press Enter): tmux attach -t0

This tells tmux to attach terminal session 0. (If you happen to have additional tmux sessions running, you can see which one you want by listing your tmux sessions with the following command: tmux ls)

@emmats

This comment has been minimized.

Copy link
Contributor

emmats commented Feb 8, 2017

By the way, I just download the default output and then create a new file with the correct columns. This can be done in R, ipython notebook, or excel pretty easily.

@kubu4

This comment has been minimized.

Copy link
Collaborator

kubu4 commented Feb 8, 2017

Do you have R code or a Jupyter notebook that you've used to do this in the past that you can share?

@emmats

This comment has been minimized.

Copy link
Contributor

emmats commented Feb 8, 2017

If you are using ipython or R, it is just a simple column select command and you list the columns you want. But I really just do this in excel. As I mentioned previously, you want the following columns (with appropriate headers listed on the qspec webpage):
protein id
protein length
spectral counts for each sample (multiple columns)
All of these columns can be found in your abacus file.

@kubu4

This comment has been minimized.

Copy link
Collaborator

kubu4 commented Feb 8, 2017

Have chatted with @emmats about this. @Ellior2 you should have a column in your ABACUS output file called: NUMSPECTOT

Here's what @emmats file header looks like:

screenshot 2017-02-08 09 34 01

However, your ABACUS file header looks like this:

screenshot 2017-02-08 09 29 49

It appears something is wrong with the way your ABACUS output file was generated.

@kubu4

This comment has been minimized.

Copy link
Collaborator

kubu4 commented Feb 8, 2017

@Ellior2 - What would help us the most in figuring this out is for us to be able to see what instructions you're following to use this pipeline.

Before playing with this data any further, could you please post a link to the instructions you're following so that we can review them?

@Ellior2

This comment has been minimized.

Copy link
Contributor Author

Ellior2 commented Feb 8, 2017

I have been following Sean's notebook

Here are my Jupyter notebooks

Here is specifically my abacus notebook although, Jupyter notebook on Emu has been really clunky so I have been just using the command line.

@sr320

This comment has been minimized.

Copy link
Owner

sr320 commented Feb 8, 2017

@Ellior2 please create a single, canonical markdown document to follow (what we discussed last week).

Likely revising and editing: https://github.com/sr320/LabDocs/wiki/DDA-data-Analyses

@Ellior2

This comment has been minimized.

Copy link
Contributor Author

Ellior2 commented Feb 8, 2017

@sr320

This comment has been minimized.

Copy link
Owner

sr320 commented Feb 8, 2017

@kubu4

This comment has been minimized.

Copy link
Collaborator

kubu4 commented Feb 8, 2017

Here's the TL;DR - Something's wrong with your input files for ABACUS and I think you should start over using a single file as a practice to see if you can complete the entire pipeline. @sr320 will advise.

OK here is a bunch of stuff I noticed, as well as some suggestions:

  1. Your Jupyter Notebooks leading up to the Abacus step are too large to open (they fail to load in GitHub and probably crash the browser). This is good info to have about using Jupyter notebooks with these programs - means stdout should be written to external files (instead of written to the screen; see bottom of this post for how to do this) when using these programs.

  2. Something is clearly different than your ProteinProphet run to combine samples and Sean's test run. Here's his test using one of your samples. Notice the highlighted line and the resulting output:

rpubs_-concise_proteomics_walkthrough__abacus

Now, notice the same file in your run and the highlighted line:

fish-546-bioinformatics_006-abacus_ipynb_at_master_ _ellior2_fish-546-bioinformatics

And, look at your output:

fish-546-bioinformatics_006-abacus_ipynb_at_master_ _ellior2_fish-546-bioinformatics2

The "read in" values don't match Sean's, for the same file; that's a red flag.

Also, you have a ton of errors, indicating that something has already gone wrong.

For long-term time-savings, and improved documentation, I'd recommend taking a single sample all the way through the process and verify that you can get it to work (I'd try to repeat exactly what Sean did). Document what output file headers (use the head -1 command to see just the header) are at each step, so we know what files should look like. Write stdouts from each step to separate files so that we can easily view the outputs (you can do this by running: your_bash_command_goes_here &> custom_output_filename_of_your_choosing

@kubu4

This comment has been minimized.

Copy link
Collaborator

kubu4 commented Feb 9, 2017

@Ellior2 - I think the source of your issue lies in the first step (converting .raw to .mzXML).

Here's the command you indicated you used to do that conversion:

for file in ~/Documents/rhonda2016oyster/trochophora/C_gigas_proteome2016/20161205Sample*.raw do no_path=${file##*/} no_ext=${no_path%.raw} WINEPREFIX=~/.wine32 wine ReAdW.2016010.msfilereader.exe "no_ext".mzXML done

Firstly, your command doesn't specify an input file. In order for this to work, you need "$file" after ReAdW.2016010.msfilereader.exe.

Secondly, I believe the "no_ext".mzXML should be "$no_ext".mzXML.

I think this is partly my fault, as I helped out with getting a functional for loop to process files. Sorry about that.

I've updated the DDA data Analyses wiki page to reflect this.

@emmats

This comment has been minimized.

Copy link
Contributor

emmats commented Feb 9, 2017

I'm still not sure why you have to convert your files to mzXML since Comet does this automatically if you give it raw files.

@Ellior2

This comment has been minimized.

Copy link
Contributor Author

Ellior2 commented Feb 9, 2017

I will say Emu's performance (whatever this means) has deteriorated over the past month since I have had to redo the whole pipeline with the contaminant files added to my database. The first time I went through this whole pipeline in late December- early January, I actually had everything in one Jupyter notebook and it worked perfectly! I was able to run all of the cells seamlessly without interruptions or hiccups. I just let it run all night with no problems. I did not see the errors like I did this time around.

I learned my lesson the hard way, in that I mistakenly decided to just rerun all the code with my updated database and overwrite the files I had created before. I know now that I should have made a new directory and saved all my previous work. This time around however, I had to divide my one notebook into six, and even now I have to wait several minutes for each one to load. I had to run every cell one by one for the TPP process (44 samples) because sometimes it required 5-6 times per cell until I finally got a confirmation that it completed. This required exiting out of the notebook, pressing "shutdown", reopening it and rerunning the cell. It seems to keep interrupting and displaying all sorts of errors. This second time around has been painstakingly slow. I thought it was due to my crappy internet connection, but it seems like the same issue exists when you are sitting in front of Emu.

@sr320

This comment has been minimized.

Copy link
Owner

sr320 commented Feb 9, 2017

@Ellior2 please note I indicated that you should not run the pipeline remotely / nor do anymore more analysis at this time. Focus on writing up the instructions of how to do on the wiki page.

@sr320 sr320 closed this Feb 9, 2017

@Ellior2

This comment has been minimized.

Copy link
Contributor Author

Ellior2 commented Feb 14, 2017

@emmats

@sr320 went through the pipeline again and gave me the abacus_ouput.tsv file for my data. I'm running into the same problems with file size. The .tsv file Steven made is 8.23mb and after I convert to .txt it turns into a 25.5mb file. The maximum size allowed for the Qspec web-based version is 10mb.

Emma gave me an example of what an abacus output should look like. The .tsv file she gave me is 2.89mb and converted to .txt is 7.17mb.

I have 44 samples included in my document so it is rather large. Do I need to download software instead of using the web-based version? I also noticed that the "protLen" column just contains zeros.
image

@Ellior2 Ellior2 reopened this Feb 14, 2017

@sr320

This comment has been minimized.

Copy link
Owner

sr320 commented Feb 14, 2017

@emmats

This comment has been minimized.

Copy link
Contributor

emmats commented Feb 14, 2017

I'm worried that that something is wrong with how Rhonda is running abacus so I sent her the most recent parameters file I used and the instructions from the UWPR page to double check what she is doing. Attached is that parameters file and an example of a qspec input file.
Abacus_parameters.txt
EFvsLF.txt

@Ellior2

This comment has been minimized.

Copy link
Contributor Author

Ellior2 commented Feb 14, 2017

Hmmm.... looks like I am supposed to keep just one column for every sample. It looks like the 2nd column of 7 columns. This should reduce file size. Still doesn't answer question about protein length

@emmats

This comment has been minimized.

Copy link
Contributor

emmats commented Feb 14, 2017

It should be the column with the addendum "NUMSPECSTOT"

@Ellior2

This comment has been minimized.

Copy link
Contributor Author

Ellior2 commented Feb 14, 2017

I've got numbers ranging from 0-54 in that column

@sr320

This comment has been minimized.

Copy link
Owner

sr320 commented Feb 14, 2017

Thanks Emma!
We will redo abacus as there are three differences

untitled_text_and_Abacus_parameters_txt_1E53BCE2.png

I presume important?

@emmats

This comment has been minimized.

Copy link
Contributor

emmats commented Feb 14, 2017

@Ellior2 that sounds about right for spectral counts of proteins. Although I would expect some above 54, but then again, that may be because the Lumos doesn't collect data in the same way as the mass specs that I am used to.

@sr320 I'm wondering if NSAF=false prevents protein length from being calculated because NSAF requires protein length.

@sr320

This comment has been minimized.

Copy link
Owner

sr320 commented Feb 14, 2017

re Abacus - your prior instructions said

You can change the output format so that the file is formatted for use in Qspec (the subsequent step). To do this, change this line in the Abacus parameters file: output=Default to output=ProtQspec (per the Abacus Google Group thread).

I just changed it back to Default based on the file you provide as an example.

@emmats

This comment has been minimized.

Copy link
Contributor

emmats commented Feb 14, 2017

My instructions? I've never done that in abacus. I'm not sure where those instructions came from.

It's pretty easy to pare down the abacus file with the default selection. And it is nice to have it calculate NSAF for you as well. It sounds like the qpsec option wasn't even giving a usable qspec format anyway.

@sr320

This comment has been minimized.

Copy link
Owner

sr320 commented Feb 14, 2017

Sorry! It is at
https://github.com/sr320/LabDocs/wiki/DDA-data-Analyses

@Ellior2 Where might have these instructions come from - they do not appear to be from Emma.

specifically

You can change the output format so that the file is formatted for use in Qspec (the subsequent step). To do this, change this line in the Abacus parameters file: output=Default to output=ProtQspec (per the Abacus Google Group thread).

@Ellior2

This comment has been minimized.

Copy link
Contributor Author

Ellior2 commented Feb 14, 2017

I have no idea where those specific instructions came from. The sentences surrounding those statements in the wiki are from what Emma wrote in the original GitHub issue. I don't know who inserted that.

I was looking at https://dronedata.dl.sourceforge.net/project/abacustpp/Abacus_Manual.pdf for my output options and thought I should use "Qspec" based on what I read on page 5.

@sr320

This comment has been minimized.

Copy link
Owner

sr320 commented Feb 15, 2017

@sr320

This comment has been minimized.

Copy link
Owner

sr320 commented Feb 15, 2017

The stdout (what shows up on the screen)
and parameter files for the entire pipelin are at
http://owl.fish.washington.edu/halfshell/index.php?dir=working-directory%2F17-02-14b%2F

@kubu4

This comment has been minimized.

Copy link
Collaborator

kubu4 commented Feb 15, 2017

@sr320 sr320 added the derailed label Feb 15, 2017

@sr320

This comment has been minimized.

Copy link
Owner

sr320 commented Feb 15, 2017

A bit off-topic -
I have created a new issue #487

indicating where we are.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.