Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

limix_converter problem1 & dataset.getPhenotypes problem2 #22

Open
JonLJ opened this issue Mar 2, 2016 · 1 comment
Open

limix_converter problem1 & dataset.getPhenotypes problem2 #22

JonLJ opened this issue Mar 2, 2016 · 1 comment

Comments

@JonLJ
Copy link

JonLJ commented Mar 2, 2016

Hi

PROBLEM1
I am having a problem following the 'loading files into LIMIX' tutorial, specifically with using limix_converter to convert my phenotype file 'phenotypes.csv into hdf5 format.

If I use:

limix_converter -O ./my_file.hdf5 -C ./phenotypes.csv

I obtain something like that:

/home/jon/anaconda2/lib/python2.7/site-packages/limix/io/conversion.py:78: ParserWarning: Falling back to the 'python' engine because the 'c' engine does not support sep=None with delim_whitespace=False; you can avoid this warning by specifying engine='python'.
  C = pandas.io.parsers.read_csv(csv_file,sep=sep,header=None,index_col=False,*args,**kw_args)

On the other hand, if I use:

limix_converter -O ./my_file.hdf5 -C ./phenotypes.csv -D ,

in order to avoid this warning, I obtain:

......,21478,21479,21480,21481,21482,21483,21484,21485,21486,21487,21488,21489,21490,21491,21492,21493,21494,21495,21496,21497,21498,21499,21500,21501,21502,21503,21504,21505,21506,21507,21508,21509,21510,21511,21512,21513,21514,21515,21516,21517,21518,21519,21520,21521,21522,21523,21524,21525,21526,21527,21528,21529,21530,21531,21532,21533,21534,21535,21536,21537,21538,21539,21540,21541,21542,21543,21544,21545,21546,21547,21548,21549,21550,21551,21552,21553,21554,21555,21556,21557,21558,21559,21560,21561,21562,21563,21564,21565,21566,21567,21568,21569,21570,21571,21572,21573,21574,21575,21576,21577,21578,21579,21580,21581,21582,21583,21584,21585,21586,21587,21588,21589,21590,21591,21592,21593,21594,21595,21596,21597,21598,21599,21600,21601,21602,21603,21604,21605,21606,21607,21608,21609,21610,21611,21612,21613,21614,21615,21616,21617,21618,21619,21620,21621,21622,21623,21624,21625,21626,21627,21628,21629,21630,21631,21632,21633,21634,21635,21636,21637,21638,21639,21640,21641,21642,21643,21644,21645,21646,21647,21648,21649,21650,21651,21652,21653,21654,21655,21656,21657,21658,21659,21660,21661,21662,21663,21664,21665,21666,21667,21668,21669,21670,21671,21672,21673,21674,21675,21676,21677,21678,21679,21680,21681,21682,21683,21684,21685,21686,21687,21688,21689,21690,21691,21692,21693,21694) have mixed types. Specify dtype option on import or set low_memory=False.

In both cases, it seems to convert to hdf5 correctly. However, I do not if this last warning is due to the massive table of phenotypes I am using (I have around 21000 genes and 184 samples).

PROBLEM2
Anyway, once I made the conversion of the file, I do:

geno_reader  = gr.genotype_reader_tables('my_file.hdf5')
pheno_reader = phr.pheno_reader_tables('my_file.hdf5')
dataset = data.QTLData(geno_reader=geno_reader,pheno_reader=pheno_reader)

If I look at:

pheno_reader.pheno_matrix
pheno_reader.sample_ID
pheno_reader.phenotype_ID

Everything seems to be OK. However, when I do:

phenotypes,sample_idx=pheno_reader.getPhenotypes()

I have the following warning:

/home/jon/anaconda2/lib/python2.7/site-packages/numpy/core/_methods.py:59: RuntimeWarning: Mean of empty slice.
  warnings.warn("Mean of empty slice.", RuntimeWarning)
/home/jon/anaconda2/lib/python2.7/site-packages/numpy/core/_methods.py:82: RuntimeWarning: Degrees of freedom <= 0 for slice
  warnings.warn("Degrees of freedom <= 0 for slice", RuntimeWarning)

And 'phenotypes' is an empty dataframe, and only the column names are defined ([0 rows x 21694 columns]). I do not understand why, but it also happens to me when I use your sample file of phenotypes.

Many thanks,
Jon

@jeffhsu3
Copy link

jeffhsu3 commented Mar 3, 2016

Can you load your phenotypes csv in pandas fine? Do you have a header row in your phenotypes data? That might be causing the mixed type errors.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants