-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Different shape/ length error #3
Comments
Hi, Can you please let me know if you manage to run the tl;dr example from the main GitHub page? If you can, we need to figure out what's the difference between my example data and the data that you're using. Can you please send me the exact command that you used to generate this output? Thanks, Omer |
Hey Omer, Thanks a lot for your reply. The toy example from the main page runs nicely. How can I show you my data structure? Like .. the head of each file? Then what files do you think are necessary to be listed? As for the command, the error comes in the step of calculating h2 and genetic correlation (between Case/Control of phenotype A2 in my case). The main codes are like this: python $dir_softpcgc/pcgc_main.py Thanks! |
Hi Lianyun, Since the example data works well, there must be something off in your input files. Do you think you could send me a small sample (just the first few lines) of each of these files, so that I'll try to figure out what's wrong? I'll also update the code to give a more meaningful error message if this happens in the future. If it's ok, please send these to oweissbrod@hsph.harvard.edu Thanks, Omer |
Hey Omer, I've sent you the email. Thanks! :)) Lianyun |
Hi Lianyun, Thanks for sending me the files. It looks like there's a problem in the .prodr2 files --- some of the annotations are missing from the header line (e.g. FetalDHS_Trynka). I also see some annotations that are only in the .prodr2 files (e.g. FetalDHS_TrynkaFetalDHS). Do you have any idea how this happened? Maybe you used slightly different annotation files in different parts of the pipeline? If you're sure you haven't, can you please send me a small reproducible example that I can run from scratch (using e.g. small/fake files)? Thanks, Omer |
Hey Omer, I see, that is quite interesting. I will check the whole procedure and maybe re-run it before sending you an example, which might take a while. I will let you know how it goes. Thanks! Best, |
Hey Omer,
Thanks! Best, |
Hi Lianyun, Thanks for the update. For my understanding, can you please say which of these annotations appeared in the original annotation files: (1) FetalDHS_Trynka; (2) FetalDHS_TrynkaFetalDHS; or (3) both? I think the simplest possibility for you is to subset a small number of SNPs (e.g. 5000) and run the pipeline on only these SNPs? If you can reproduce the problem, I can work on files derived from these small files. Thanks, Omer |
Hey Omer, I've sent you the detail of annotation as well as the data link per email, please check. Thanks! Best, |
Hey Omer, Plus, I get a fresh error in step3 just now (creating sumstats files), which is: Traceback (most recent call last): Best, |
Hi, Apparently the problem was due to duplicate rsids in the input files. I modified the code to allow better handling of this situation. Can you please git pull the latest code and try again? |
Hi Omer, Thanks a lot! I will try and let you know. :)) Lianyun |
Hi Omer, A quick update. Seems the new codes are working well. I get the final result files regardless of a lot of Warning messages. I'm runing everything all over again on the data exluding duplicated rsids. Will let you know if there are any news. Thanks a lot! Lianyun |
Hi Omer, I've finished a new run of the same data. Still get some weird results. I've email you the details. Please check. Lianyun |
Hey There,
Thanks for your reading. I'm using this package and get an error:
[WARNING] 8634629 SNPs are found in the annotation files and in all the sumstats files
[INFO] reading M files...
100%|???????????????????????????????????????????????????????????????????????| 22/22 [16:14<00:00, 44.29s/it]
/S-PCGC/pcgc_main.py:397: DeprecationWarning:
np.object
is a deprecated alias for the builtinobject
. To silence this warning, useobject
by itself.Doing this will not modify any behavior and is safe.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
gencov_arr = np.empty((len(pcgc_data_list), len(pcgc_data_list)), dtype=np.object)
Traceback (most recent call last):
File "/S-PCGC/pcgc_main.py", line 857, in
pcgc_obj = SPCGC(args)
File "/S-PCGC/pcgc_main.py", line 402, in init
cov_ij = self.create_cov_obj(args, oi, oj,
File "/pcgc_main.py", line 628, in create_cov_obj
self.compute_taus(args, oi, oj,
File "/S-PCGC/pcgc_main.py", line 753, in compute_taus
z1_anno = df_annotations_sumstats_noneg.values * sumstats1[:, np.newaxis] * np.sqrt(trace_ratios1)
ValueError: operands could not be broadcast together with shapes (8634629,97) (8636723,1)
And with the same data, same codes we get a different error message when performed by another person:
[WARNING] 8636723 SNPs are found in the annotation files and in all the sumstats files
[INFO] reading M files...
[INFO] reading annot files...
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 22/22 [21:54<00:00, 59.77s/it]
Traceback (most recent call last):
File "pcgc_main.py", line 857, in
pcgc_obj = SPCGC(args)
File "pcgc_main.py", line 394, in init
self.load_annotations_data(args, df_prodr2, index_intersect)
File "pcgc_main.py", line 488, in load_annotations_data
is_same = (df.index == index_intersect).all()
File "/anaconda3/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 123, in cmp_method
raise ValueError("Lengths must match to compare")
ValueError: Lengths must match to compare
Do you have any idea how this error comes and how to solve it? Thanks a lot and looking forward to your reply :))
The text was updated successfully, but these errors were encountered: