Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loess error in compute-score after subsampling #25

Closed
twoneu opened this issue Aug 3, 2022 · 5 comments
Closed

Loess error in compute-score after subsampling #25

twoneu opened this issue Aug 3, 2022 · 5 comments

Comments

@twoneu
Copy link

twoneu commented Aug 3, 2022

Hi scDRS team,

I am encountering a loess error when trying to run a subsampled, lognormalized h5ad file through scdrds compute-score:

Call: scdrs compute-score
--h5ad-file ordovas_immune_downsampled.h5ad
--h5ad-species human
--cov-file None
--gs-file all.gs
--gs-species human
--ctrl-match-opt mean_var
--weight-opt vs
--adj-prop None
--flag-filter-data False
--flag-raw-count False
--n-ctrl 1000
--flag-return-ctrl-raw-score False
--flag-return-ctrl-norm-score True
--out-folder scDRS/immune

Loading data:
--h5ad-file loaded: n_cell=10727, n_gene=5000 (sys_time=0.2s)
First 3 cells: ['N51.LPB.TCAGCAACATACGCTA-0', 'N111.LPA2.CTTCTCTGTGGCAAAC-0', 'N20.LPA.TGGATGTGCACTTT-0']
First 5 genes: ['A2M', 'A2M-AS1', 'A2ML1', 'AAED1', 'AAK1']
--gs-file loaded: n_trait=6 (sys_time=0.2s)
Print info for first 3 traits:
First 3 elements for 'ukbb_CRC': ['CLTCL1', 'DAB2IP', 'NBPF1'], [4.0019, 3.5014, 3.4717]
First 3 elements for 'ukbb_IBD': ['ZNF236', 'C3orf38', 'LAMB2'], [3.17, 3.1623, 3.1358]
First 3 elements for 'alkes_UC': ['FCGR2A', 'IL23R', 'DLD'], [7.964, 7.0309, 6.897]

Preprocessing:
/Users/user/opt/anaconda3/lib/python3.9/site-packages/scdrs/pp.py:323: RuntimeWarning: invalid value encountered in log10
x = np.log10(df_gene["ct_mean"].values[not_const])
Traceback (most recent call last):
File "/Users/user/opt/anaconda3/bin/scdrs", line 723, in
fire.Fire()
File "/Users/user/opt/anaconda3/lib/python3.9/site-packages/fire/core.py", line 141, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/Users/user/opt/anaconda3/lib/python3.9/site-packages/fire/core.py", line 466, in _Fire
component, remaining_args = _CallAndUpdateTrace(
File "/Users/user/opt/anaconda3/lib/python3.9/site-packages/fire/core.py", line 681, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "/Users/user/opt/anaconda3/bin/scdrs", line 211, in compute_score
scdrs.preprocess(adata, cov=df_cov, adj_prop=ADJ_PROP, n_mean_bin=20, n_var_bin=20, copy=False)
File "/Users/user/opt/anaconda3/lib/python3.9/site-packages/scdrs/pp.py", line 205, in preprocess
df_gene, df_cell = compute_stats(
File "/Users/user/opt/anaconda3/lib/python3.9/site-packages/scdrs/pp.py", line 325, in compute_stats
model.fit()
File "_loess.pyx", line 899, in _loess.loess.fit
ValueError: b'Extrapolation not allowed with blending'

I used the following code to generate the subsampled dataset from the original data:

crc_immune_adata = sc.read_h5ad("ordovas_immune_processed.h5ad")
target_cells = 250

crc_immune_adata_ds = [crc_immune_adata[crc_immune_adata.obs.Cluster == clust] for clust in crc_immune_adata.obs.Cluster.cat.categories]

for dat in crc_immune_adata_ds:
    if dat.n_obs > target_cells:
         sc.pp.subsample(dat, n_obs=target_cells)

crc_immune_adata_downsampled = crc_immune_adata_ds[0].concatenate(*crc_immune_adata_ds[1:])
crc_immune_adata_downsampled.write("ordovas_immune_downsampled_250.h5ad")

The subsampled dataset is attached here:
ordovas_immune_downsampled_250.h5ad.zip

Thanks for all your time and help!

@KangchengHou
Copy link
Collaborator

/Users/user/opt/anaconda3/lib/python3.9/site-packages/scdrs/pp.py:323: RuntimeWarning: invalid value encountered in log10
x = np.log10(df_gene["ct_mean"].values[not_const])

looks like there is a log(0) problem (which is quite strange). Somehow I am not able to read h5ad file. Could you send a raw text file (perhaps .csv)?

@twoneu
Copy link
Author

twoneu commented Aug 3, 2022

ordovas_immune_downsampled.zip
Hopefully this works! I'm very sorry if it ends up being a data format issue - I really appreciate your time and patience.

@KangchengHou
Copy link
Collaborator

KangchengHou commented Aug 3, 2022

@twoneu I was being confusing. I was referring to the expression matrix in .csv format (rows are cell, columns are genes). I can not open the h5ad matrix somehow.

Using df = pd.DataFrame(data=adata.X, index=adata.obs_names, column=adata.var_names) should work.

PS. within scdrs, there is a step for calculating the ct_mean (gene-wise count matrix average) where scdrs transform your data in log-transformed space to count space using np.expm1. One thing to check is to make sure your log-transformed expression values are all >= 0.

@twoneu
Copy link
Author

twoneu commented Aug 3, 2022

Thank you so much - it turns out there was one NaN within my matrix that was throwing the error. This package is great and so are you!

@KangchengHou
Copy link
Collaborator

that's great! we will do some more checking on the input matrix in next version (including NaN, expression values >= 0 for log-transformed matrix).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants