Loess error in compute-score after subsampling #25

twoneu · 2022-08-03T03:08:34Z

Hi scDRS team,

I am encountering a loess error when trying to run a subsampled, lognormalized h5ad file through scdrds compute-score:

Call: scdrs compute-score
--h5ad-file ordovas_immune_downsampled.h5ad
--h5ad-species human
--cov-file None
--gs-file all.gs
--gs-species human
--ctrl-match-opt mean_var
--weight-opt vs
--adj-prop None
--flag-filter-data False
--flag-raw-count False
--n-ctrl 1000
--flag-return-ctrl-raw-score False
--flag-return-ctrl-norm-score True
--out-folder scDRS/immune

Loading data:
--h5ad-file loaded: n_cell=10727, n_gene=5000 (sys_time=0.2s)
First 3 cells: ['N51.LPB.TCAGCAACATACGCTA-0', 'N111.LPA2.CTTCTCTGTGGCAAAC-0', 'N20.LPA.TGGATGTGCACTTT-0']
First 5 genes: ['A2M', 'A2M-AS1', 'A2ML1', 'AAED1', 'AAK1']
--gs-file loaded: n_trait=6 (sys_time=0.2s)
Print info for first 3 traits:
First 3 elements for 'ukbb_CRC': ['CLTCL1', 'DAB2IP', 'NBPF1'], [4.0019, 3.5014, 3.4717]
First 3 elements for 'ukbb_IBD': ['ZNF236', 'C3orf38', 'LAMB2'], [3.17, 3.1623, 3.1358]
First 3 elements for 'alkes_UC': ['FCGR2A', 'IL23R', 'DLD'], [7.964, 7.0309, 6.897]

Preprocessing:
/Users/user/opt/anaconda3/lib/python3.9/site-packages/scdrs/pp.py:323: RuntimeWarning: invalid value encountered in log10
x = np.log10(df_gene["ct_mean"].values[not_const])
Traceback (most recent call last):
File "/Users/user/opt/anaconda3/bin/scdrs", line 723, in
fire.Fire()
File "/Users/user/opt/anaconda3/lib/python3.9/site-packages/fire/core.py", line 141, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/Users/user/opt/anaconda3/lib/python3.9/site-packages/fire/core.py", line 466, in _Fire
component, remaining_args = _CallAndUpdateTrace(
File "/Users/user/opt/anaconda3/lib/python3.9/site-packages/fire/core.py", line 681, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "/Users/user/opt/anaconda3/bin/scdrs", line 211, in compute_score
scdrs.preprocess(adata, cov=df_cov, adj_prop=ADJ_PROP, n_mean_bin=20, n_var_bin=20, copy=False)
File "/Users/user/opt/anaconda3/lib/python3.9/site-packages/scdrs/pp.py", line 205, in preprocess
df_gene, df_cell = compute_stats(
File "/Users/user/opt/anaconda3/lib/python3.9/site-packages/scdrs/pp.py", line 325, in compute_stats
model.fit()
File "_loess.pyx", line 899, in _loess.loess.fit
ValueError: b'Extrapolation not allowed with blending'

I used the following code to generate the subsampled dataset from the original data:

crc_immune_adata = sc.read_h5ad("ordovas_immune_processed.h5ad")
target_cells = 250

crc_immune_adata_ds = [crc_immune_adata[crc_immune_adata.obs.Cluster == clust] for clust in crc_immune_adata.obs.Cluster.cat.categories]

for dat in crc_immune_adata_ds:
    if dat.n_obs > target_cells:
         sc.pp.subsample(dat, n_obs=target_cells)

crc_immune_adata_downsampled = crc_immune_adata_ds[0].concatenate(*crc_immune_adata_ds[1:])
crc_immune_adata_downsampled.write("ordovas_immune_downsampled_250.h5ad")

The subsampled dataset is attached here:
ordovas_immune_downsampled_250.h5ad.zip

Thanks for all your time and help!

The text was updated successfully, but these errors were encountered:

KangchengHou · 2022-08-03T03:46:00Z

/Users/user/opt/anaconda3/lib/python3.9/site-packages/scdrs/pp.py:323: RuntimeWarning: invalid value encountered in log10
x = np.log10(df_gene["ct_mean"].values[not_const])

looks like there is a log(0) problem (which is quite strange). Somehow I am not able to read h5ad file. Could you send a raw text file (perhaps .csv)?

twoneu · 2022-08-03T03:54:36Z

ordovas_immune_downsampled.zip
Hopefully this works! I'm very sorry if it ends up being a data format issue - I really appreciate your time and patience.

KangchengHou · 2022-08-03T03:59:10Z

@twoneu I was being confusing. I was referring to the expression matrix in .csv format (rows are cell, columns are genes). I can not open the h5ad matrix somehow.

Using df = pd.DataFrame(data=adata.X, index=adata.obs_names, column=adata.var_names) should work.

PS. within scdrs, there is a step for calculating the ct_mean (gene-wise count matrix average) where scdrs transform your data in log-transformed space to count space using np.expm1. One thing to check is to make sure your log-transformed expression values are all >= 0.

twoneu · 2022-08-03T04:44:21Z

Thank you so much - it turns out there was one NaN within my matrix that was throwing the error. This package is great and so are you!

KangchengHou · 2022-08-03T04:58:00Z

that's great! we will do some more checking on the input matrix in next version (including NaN, expression values >= 0 for log-transformed matrix).

martinjzhang mentioned this issue Aug 3, 2022

Check inputs #26

Closed

martinjzhang closed this as completed Aug 3, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Loess error in compute-score after subsampling #25

Loess error in compute-score after subsampling #25

twoneu commented Aug 3, 2022 •

edited

KangchengHou commented Aug 3, 2022

twoneu commented Aug 3, 2022

KangchengHou commented Aug 3, 2022 •

edited

twoneu commented Aug 3, 2022

KangchengHou commented Aug 3, 2022

Loess error in compute-score after subsampling #25

Loess error in compute-score after subsampling #25

Comments

twoneu commented Aug 3, 2022 • edited

KangchengHou commented Aug 3, 2022

twoneu commented Aug 3, 2022

KangchengHou commented Aug 3, 2022 • edited

twoneu commented Aug 3, 2022

KangchengHou commented Aug 3, 2022

twoneu commented Aug 3, 2022 •

edited

KangchengHou commented Aug 3, 2022 •

edited