Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CiteSeqDataset #40

Merged
merged 9 commits into from
Jun 26, 2018
Merged

CiteSeqDataset #40

merged 9 commits into from
Jun 26, 2018

Conversation

Edouard360
Copy link
Contributor

I suggest this to complete CbmcDataset information.

  • change CbmcDataset dataset to CiteSeqDataset with name option "cbmc"or "pbmc" and info about ADT counts (proteins markers), added as attributes of the dataset.

This is the data from epitopes useful for having further labelling information.

…or "pbmc" and info about ADT counts (proteins markers), added as attributes of the dataset.

This is the data from epitopes useful for having further labelling information.
@Edouard360
Copy link
Contributor Author

I haven't had the time to create the preprocessed files for unit tests, but:

CiteSeqDataset("pbmc") and CiteSeqDataset("cbmc") both work and also have as attributes information about the epitopes

@@ -30,7 +30,7 @@ def test_retina():


def test_cbmc():
run_benchmarks("cbmc", n_epochs=1, show_batch_mixing=False, save_path='tests/data/')
run_benchmarks("cite_seq_cbmc", n_epochs=1, show_batch_mixing=False, save_path='tests/data/')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is cite_seq_cbmc the same data as cbmc once it's loaded? Can we just continue to call it cbmc then?

jeff-regier and others added 2 commits June 19, 2018 16:17
…or "pbmc" and info about ADT counts (proteins markers), added as attributes of the dataset.

This is the data from epitopes useful for having further labelling information.
- Datasets might have multiple urls from which to download (ex. cite-Seq data): we might either specify `url`, `download_name` attributes or `urls`, `download_names`. Move check if file exists in `_download`.

- `CbmcDataset()` -> CiteSeqDataset('cbmc'), with information about epitopes.

- `PbmcDataset()` can be obtained with:

```
gene_dataset = concat_datasets(
     Dataset10X("pbmc8k", save_path=save_path),
     Dataset10X("pbmc4k", save_path=save_path)
)
```

So I removed data/PBMC

- From citeSeq methods there are actually 3 available datasets (cmbc, pbmc, and cd8). Since there are also 10X pbmc datasets, the `pbmc` nameis misleading in the `load_datasets` function. For now we leave as default romain's initial pbmc dataset, which consists in the concatenation of `pbmc8k` and `pbmc4k`

- `concat_datasets` test
@Edouard360
Copy link
Contributor Author

  • Datasets might have multiple urls from which to download (ex. cite-Seq data): we might either specify url, download_name attributes or urls, download_names. Move check if file exists in _download.

  • CbmcDataset() -> CiteSeqDataset('cbmc'), with information about epitopes.

  • PbmcDataset() can be obtained with:

gene_dataset = concat_datasets(
     Dataset10X("pbmc8k", save_path=save_path),
     Dataset10X("pbmc4k", save_path=save_path)
)

So I removed data/PBMC

  • From citeSeq methods there are actually 3 available datasets (cmbc, pbmc, and cd8). Since there are also 10X pbmc datasets, the pbmc name is misleading in the load_datasets function. For now we leave as default romain's initial pbmc dataset, which consists in the concatenation of pbmc8k and pbmc4k

  • concat_datasets test

@codecov-io
Copy link

codecov-io commented Jun 26, 2018

Codecov Report

Merging #40 into master will increase coverage by 2.33%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #40      +/-   ##
==========================================
+ Coverage   89.08%   91.42%   +2.33%     
==========================================
  Files          33       32       -1     
  Lines        1393     1388       -5     
==========================================
+ Hits         1241     1269      +28     
+ Misses        152      119      -33
Impacted Files Coverage Δ
scvi/dataset/cite_seq.py 100% <100%> (ø)
scvi/dataset/dataset10X.py 100% <100%> (ø) ⬆️
scvi/dataset/dataset.py 93.51% <100%> (+9.68%) ⬆️
scvi/dataset/__init__.py 100% <100%> (ø) ⬆️
scvi/dataset/utils.py 95.65% <100%> (+17.87%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 597ccd3...dfb610d. Read the comment docs.

@Edouard360
Copy link
Contributor Author

closes #47

@jeff-regier
Copy link
Contributor

Very nice!

@jeff-regier jeff-regier merged commit 6f2a934 into master Jun 26, 2018
@jeff-regier jeff-regier deleted the citeSeq branch June 26, 2018 16:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants