Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing required input fields #15

Closed
erleholgersen opened this issue Aug 17, 2018 · 3 comments
Closed

Missing required input fields #15

erleholgersen opened this issue Aug 17, 2018 · 3 comments

Comments

@erleholgersen
Copy link

Hi again Amaro,

Thanks for all your help so far! I've now successfully run deTiN, but I ran into a few errors with missing input fields (not mentioned on the Wiki) that I figured I'd report here.

First, I got an error from the mutation statistics file:

Error reading call stats skipping first two rows and trying again
Traceback (most recent call last):
  File "/scratch/DBC/BCRBIOIN/SHARED/software/deTiN/20180816/deTiN/deTiN.py", line 588, in <module>
    main()
  File "/scratch/DBC/BCRBIOIN/SHARED/software/deTiN/20180816/deTiN/deTiN.py", line 518, in main
    di.read_and_preprocess_data()
  File "/scratch/DBC/BCRBIOIN/SHARED/software/deTiN/20180816/deTiN/deTiN.py", line 216, in read_and_preprocess_data
    self.read_and_preprocess_SSNVs()
  File "/scratch/DBC/BCRBIOIN/SHARED/software/deTiN/20180816/deTiN/deTiN.py", line 196, in read_and_preprocess_SSNVs
    self.read_call_stats_file()
  File "/scratch/DBC/BCRBIOIN/SHARED/software/deTiN/20180816/deTiN/deTiN.py", line 111, in read_call_stats_file
    comment='#', skiprows=2, usecols=fields, dtype=fields_type)
  File "/home/breakthr/eholgersen/.local/lib/python2.7/site-packages/pandas-0.23.4-py2.7-linux-x86_64.egg/pandas/io/parsers.py", line 678, in parser_f
    return _read(filepath_or_buffer, kwds)
  File "/home/breakthr/eholgersen/.local/lib/python2.7/site-packages/pandas-0.23.4-py2.7-linux-x86_64.egg/pandas/io/parsers.py", line 440, in _read
    parser = TextFileReader(filepath_or_buffer, **kwds)
  File "/home/breakthr/eholgersen/.local/lib/python2.7/site-packages/pandas-0.23.4-py2.7-linux-x86_64.egg/pandas/io/parsers.py", line 787, in __init__
    self._make_engine(self.engine)
  File "/home/breakthr/eholgersen/.local/lib/python2.7/site-packages/pandas-0.23.4-py2.7-linux-x86_64.egg/pandas/io/parsers.py", line 1014, in _make_engine
    self._engine = CParserWrapper(self.f, **self.options)
  File "/home/breakthr/eholgersen/.local/lib/python2.7/site-packages/pandas-0.23.4-py2.7-linux-x86_64.egg/pandas/io/parsers.py", line 1749, in __init__
    _validate_usecols_names(usecols, self.orig_names)
  File "/home/breakthr/eholgersen/.local/lib/python2.7/site-packages/pandas-0.23.4-py2.7-linux-x86_64.egg/pandas/io/parsers.py", line 1134, in _validate_usecols_names
    "columns expected but not found: {missing}".format(missing=missing)
ValueError: Usecols do not match columns, columns expected but not found: ['alt_allele', 't_ref_sum', 'n_alt_count', 'tumor_name', 'normal_name', 'n_ref_count', 'judgement', 't_alt_sum', 't_alt_count', 'position', 'contig', 'ref_allele', 't_ref_count', 'failure_reasons']

Adding dummy columns t_ref_sum and t_alt_sum fixed this issue. I used MuTect2 rather than MuTect to call variants, and thus had to assemble my own input file rather than using a pre-made call_stats file.

The other error I got was was from the aSCNA segmentation file:

changing header of seg file from Start to Start.bp
changing header of seg file from End to End.bp
missing required header n_probes and could not replace with any one of alternates

I fixed this by adding a column n_probes to the input, set equal to Num_SNPs from the Allelic CNV output (I wasn't sure if I should use Num_SNPs or Num_Targets?)

Thanks again!

@amarotaylor
Copy link
Collaborator

amarotaylor commented Aug 17, 2018

Hey Erle,

Sorry for the errors I will add your headers and fix the wiki to reflect the code. For clarification Num_Targets is the correct equivalent to n_probes. I just pushed a fix that removes the requirement for the ref sum and alt sum columns and will automatically fix the header for the seg file.

Thanks for pointing these out!

Best
Amaro

amarotaylor pushed a commit that referenced this issue Aug 17, 2018
@Diogopell
Copy link

Hi, I was getting stuck on the same problem, as I read in the "Description of inputs" page on the wiki, those columns shouldn't be required, and I got the impression that they aren't used in the code at all.

I eddited one line at the "read_call_stats_file" funtion on deTiN/deTiN.py as bellow:
def read_call_stats_file(self):
# on 'fields' I remove several field requirements
# they weren't on wiki's "Description of inputs" and apparently weren't actually being used.
fields = ['contig', 'position',
't_alt_count', 't_ref_count' , 'n_alt_count', 'n_ref_count', 'failure_reasons', 'judgement']
#... continues as normal

It appears to fix those problems for good.

@amarotaylor
Copy link
Collaborator

Hi it seems like from the most recent version you just removed the alt/ref alleles and tumor and normal sample names? You're right these aren't required by deTiN so that should work just fine. I just include them because they are useful later on.

fields = ['contig', 'position', 'ref_allele', 'alt_allele', 'tumor_name', 'normal_name', 't_alt_count',
't_ref_count'
, 'n_alt_count', 'n_ref_count', 'failure_reasons', 'judgement']

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants