Different shape/ length error #3

lianyunhuang · 2021-02-17T16:24:40Z

Hey There,

Thanks for your reading. I'm using this package and get an error:

[WARNING] 8634629 SNPs are found in the annotation files and in all the sumstats files
[INFO] reading M files...
100%|???????????????????????????????????????????????????????????????????????| 22/22 [16:14<00:00, 44.29s/it]
/S-PCGC/pcgc_main.py:397: DeprecationWarning: np.object is a deprecated alias for the builtin object. To silence this warning, use object by itself.
Doing this will not modify any behavior and is safe.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
gencov_arr = np.empty((len(pcgc_data_list), len(pcgc_data_list)), dtype=np.object)
Traceback (most recent call last):
File "/S-PCGC/pcgc_main.py", line 857, in
pcgc_obj = SPCGC(args)
File "/S-PCGC/pcgc_main.py", line 402, in init
cov_ij = self.create_cov_obj(args, oi, oj,
File "/pcgc_main.py", line 628, in create_cov_obj
self.compute_taus(args, oi, oj,
File "/S-PCGC/pcgc_main.py", line 753, in compute_taus
z1_anno = df_annotations_sumstats_noneg.values * sumstats1[:, np.newaxis] * np.sqrt(trace_ratios1)
ValueError: operands could not be broadcast together with shapes (8634629,97) (8636723,1)

And with the same data, same codes we get a different error message when performed by another person:

[WARNING] 8636723 SNPs are found in the annotation files and in all the sumstats files
[INFO] reading M files...
[INFO] reading annot files...
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 22/22 [21:54<00:00, 59.77s/it]
Traceback (most recent call last):
File "pcgc_main.py", line 857, in
pcgc_obj = SPCGC(args)
File "pcgc_main.py", line 394, in init
self.load_annotations_data(args, df_prodr2, index_intersect)
File "pcgc_main.py", line 488, in load_annotations_data
is_same = (df.index == index_intersect).all()
File "/anaconda3/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 123, in cmp_method
raise ValueError("Lengths must match to compare")
ValueError: Lengths must match to compare

Do you have any idea how this error comes and how to solve it? Thanks a lot and looking forward to your reply :))

The text was updated successfully, but these errors were encountered:

omerwe · 2021-02-17T18:53:46Z

Hi,

Can you please let me know if you manage to run the tl;dr example from the main GitHub page? If you can, we need to figure out what's the difference between my example data and the data that you're using. Can you please send me the exact command that you used to generate this output?

Thanks,

Omer

lianyunhuang · 2021-02-17T22:30:54Z

Hey Omer,

Thanks a lot for your reply. The toy example from the main page runs nicely. How can I show you my data structure? Like .. the head of each file? Then what files do you think are necessary to be listed?

As for the command, the error comes in the step of calculating h2 and genetic correlation (between Case/Control of phenotype A2 in my case). The main codes are like this:

python $dir_softpcgc/pcgc_main.py
--annot-chr $dir_data/baselineLD.
--sync $dir_data/baselineLD.
--sumstats-chr $dir_data/Case_A2.chr,$dir_data/Control_A2.chr
--prodr2-chr $dir_data/baselineLD.goodSNPs.
--out $wdir/pcgc

Thanks!
Lianyun

omerwe · 2021-02-18T08:22:00Z

Hi Lianyun,

Since the example data works well, there must be something off in your input files. Do you think you could send me a small sample (just the first few lines) of each of these files, so that I'll try to figure out what's wrong? I'll also update the code to give a more meaningful error message if this happens in the future. If it's ok, please send these to oweissbrod@hsph.harvard.edu

Thanks,

Omer

lianyunhuang · 2021-02-18T11:06:11Z

Hey Omer,

I've sent you the email. Thanks! :))

Lianyun

omerwe · 2021-02-19T09:39:01Z

Hi Lianyun,

Thanks for sending me the files. It looks like there's a problem in the .prodr2 files --- some of the annotations are missing from the header line (e.g. FetalDHS_Trynka). I also see some annotations that are only in the .prodr2 files (e.g. FetalDHS_TrynkaFetalDHS).

Do you have any idea how this happened? Maybe you used slightly different annotation files in different parts of the pipeline? If you're sure you haven't, can you please send me a small reproducible example that I can run from scratch (using e.g. small/fake files)?

Thanks,

Omer

lianyunhuang · 2021-02-19T12:44:07Z

Hey Omer,

I see, that is quite interesting. I will check the whole procedure and maybe re-run it before sending you an example, which might take a while. I will let you know how it goes.

Thanks!

Best,
Lianyun

lianyunhuang · 2021-02-21T14:58:52Z

Hey Omer,

I checked the annotations, they are fine. The dimension of prodr2 file is 97*97. Most of the annotations are the same in annot file and prodr2 file except for 4 more columns in the annotation file which are CHR, BP, SNP and CM. Maybe you get the difference due to an imperfect file format that I sent.
I tried to re-run step2 to generate prodr2 file on another cluster. I get a same prodr2 file as the previous one.
Now i'm re-running step3 to generate sumstats files, which might take a long time.
Then if I send you a small example to run, how should I subset the data to make sure it includes all necessary info?

Thanks!

Best,
Lianyun

omerwe · 2021-02-21T19:43:44Z

Hi Lianyun,

Thanks for the update. For my understanding, can you please say which of these annotations appeared in the original annotation files: (1) FetalDHS_Trynka; (2) FetalDHS_TrynkaFetalDHS; or (3) both?

I think the simplest possibility for you is to subset a small number of SNPs (e.g. 5000) and run the pipeline on only these SNPs? If you can reproduce the problem, I can work on files derived from these small files.

Thanks,

Omer

lianyunhuang · 2021-02-22T13:18:53Z

Hey Omer,

I've sent you the detail of annotation as well as the data link per email, please check. Thanks!

Best,
Lianyun

lianyunhuang · 2021-02-22T13:31:17Z

Hey Omer,

Plus, I get a fresh error in step3 just now (creating sumstats files), which is:

Traceback (most recent call last):
File "/softwares/spcgc/pcgc_sumstats_creator.py", line 590, in
sumstats_creator.compute_all_sumstats(args.chunk_size)
File "/softwares/spcgc/pcgc_sumstats_creator.py", line 271, in compute_all_sumstats
self.set_locus(snp1, snp2)
File "/softwares/spcgc/pcgc_sumstats_creator.py", line 318, in set_locus
snp_maf = self.mafs[snp1+j]
File "/anaconda3/envs/xyb/lib/python3.8/site-packages/pandas/core/series.py", line 821, in getitem
return self._values[key]
IndexError: index 116914 is out of bounds for axis 0 with size 116914

Best,
Lianyun

omerwe · 2021-02-28T12:25:58Z

Hi,

Apparently the problem was due to duplicate rsids in the input files. I modified the code to allow better handling of this situation. Can you please git pull the latest code and try again?

lianyunhuang · 2021-02-28T13:29:07Z

Hi Omer,

Thanks a lot! I will try and let you know. :))

Lianyun

lianyunhuang · 2021-03-05T10:15:01Z

Hi Omer,

A quick update. Seems the new codes are working well. I get the final result files regardless of a lot of Warning messages. I'm runing everything all over again on the data exluding duplicated rsids. Will let you know if there are any news.

Thanks a lot!

Lianyun

lianyunhuang · 2021-04-01T15:34:39Z

Hi Omer,

I've finished a new run of the same data. Still get some weird results. I've email you the details. Please check.
Thanks a lot for your help!

Lianyun

omerwe closed this as completed Apr 18, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Different shape/ length error #3

Different shape/ length error #3

lianyunhuang commented Feb 17, 2021

omerwe commented Feb 17, 2021

lianyunhuang commented Feb 17, 2021

omerwe commented Feb 18, 2021

lianyunhuang commented Feb 18, 2021

omerwe commented Feb 19, 2021

lianyunhuang commented Feb 19, 2021

lianyunhuang commented Feb 21, 2021

omerwe commented Feb 21, 2021

lianyunhuang commented Feb 22, 2021

lianyunhuang commented Feb 22, 2021 •

edited

Loading

omerwe commented Feb 28, 2021

lianyunhuang commented Feb 28, 2021

lianyunhuang commented Mar 5, 2021

lianyunhuang commented Apr 1, 2021

Different shape/ length error #3

Different shape/ length error #3

Comments

lianyunhuang commented Feb 17, 2021

omerwe commented Feb 17, 2021

lianyunhuang commented Feb 17, 2021

omerwe commented Feb 18, 2021

lianyunhuang commented Feb 18, 2021

omerwe commented Feb 19, 2021

lianyunhuang commented Feb 19, 2021

lianyunhuang commented Feb 21, 2021

omerwe commented Feb 21, 2021

lianyunhuang commented Feb 22, 2021

lianyunhuang commented Feb 22, 2021 • edited Loading

omerwe commented Feb 28, 2021

lianyunhuang commented Feb 28, 2021

lianyunhuang commented Mar 5, 2021

lianyunhuang commented Apr 1, 2021

lianyunhuang commented Feb 22, 2021 •

edited

Loading