Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scheme of r-hyperspec packages #5

Closed
GegznaV opened this issue Jul 14, 2020 · 10 comments
Closed

Scheme of r-hyperspec packages #5

GegznaV opened this issue Jul 14, 2020 · 10 comments
Assignees

Comments

@GegznaV
Copy link
Member

GegznaV commented Jul 14, 2020

For me, it is a bit unclear, where we are going to with this project and how it should look at the end of this summer. The vision in the form of a scheme/flowchart would be helpful. The scheme should contain the names of r-hyperspec family packages and other non-package repositories we are going to create and the dependencies between them. The scheme may change later.

And I'm preparing a draft scheme which could be a starting point for the discussion.

@GegznaV
Copy link
Member Author

GegznaV commented Jul 14, 2020

image

@GegznaV
Copy link
Member Author

GegznaV commented Jul 14, 2020

@r-hyperspec/r-hyperspec These schemes are for discussion on Wednesday's meeting. They show how I understand our vision of r-hyperspec family (two versions of the vision are presented). Please, study and prepare your suggestions.


UPDATE: I updated the schemes and moved them to the message below.


The graphs are implemented with GraphViz via DiagrammeR in RStudio. Sources: r-hyperspec-schemes-GraphViz.zip Unzip, open in RStudio and press "Preview" button. Modify and press "Preview" again.

@GegznaV GegznaV changed the title Sheme of r-hyperspec packages Scheme of r-hyperspec packages Jul 14, 2020
@eoduniyi
Copy link

Great job @GegznaV

@bryanhanson
Copy link
Collaborator

A couple of comments.

  • First, and this applies globally to the project: the key goal for GSOC is to make things easier to maintain. The overall idea of breaking into pieces is a good one. At the same time, to me, another way to conceptualize making things easier to maintain is to make things simpler. I think we should aim for as few pieces as possible.
  • Version 1 is closer to how I understand what we were/are aiming for.
  • I don't recall discussion of hySpc.io previously, but I may have forgotten. If it is a wrapper for importing should it be positioned between hyperSpec and hySpc.read.*?
  • To date, the concept of hySpc.pkgs was to serve the data packages that are too large for CRAN. We could of course have it serve anything, but it would be simpler and certainly more standard if develop versions are installed from their respective repos via remotes::install_github.

@GegznaV
Copy link
Member Author

GegznaV commented Jul 15, 2020

I corrected some issues in the schemes.

Scheme 1

image

Scheme 2

image

Legend

Legend
Font color:

  • Black: already implemented packages
  • Red: not implemented yet
  • Green: non-package repos

Lines/Arrows:

  • red: automatic relationship via CI.
  • blue: package dependencies (e.g., via "imports")
  • dashed purple: package dependencies (e.g., via "suggests"; only if installed on the user's computer):
    a. destination package is used to load other installed packages),
    b. This destination package may also reexport functions from the other packages.

@cbeleites
Copy link
Collaborator

cbeleites commented Jul 17, 2020

Here's my proposal, edited from @GegznaV's list on slack:

Main package

  • hyperSpec CRAN

Bridge packages

... connect hyperSpec with other packages where interaction does not work automatically. They can go on CRAN since we don't need huge test data sets.

  • hySpc.ggplot2 CRAN
  • hySpc.dplyr CRAN
  • hySpc.matrixStats CRAN (future)
  • hySpc.baseline CRAN (future)
  • hySpc.EMSC CRAN (future)

Data packages:

  • hySpc.chondro GH only

Helper/Utility packages:

  • hySpc.skeleton GH only/no actual publication as package
  • hySpc.testthat CRAN

Helper GH repos:

  • r-hyperspec.github.io/
  • hySpc.pkgs/

Input/Output packages:

This is where things are more complicated...

If we want to cut down dependencies (cbeleites/hyperSpec#215), at least some file import packages should go by file format rather than manufacturer:

  • hySpc.read.mat for Matlab .mat based file formats (read.mat.Witec() and read.mat.Cytospec())
    depends on R.matlab.
  • hySpc.read.spe (for Princeton Instruments/Winspec) depends on xml2.
  • hySpc.read.JCAMP.DX will rather be a bridge package to @bryanhanson 's readJDX package, so obviously depends on readJDX.

There are import filters that do not add dependencies for binary formats:

  • read.ENVI.*() -> hySpc.read.ENVI
  • read.spc.*() -> hySpc.read.spc

These two file formats are sufficiently widespread and well-known that I believe they should each go into its own package.

There are import filters for a large variety of ASCII/text based formats:

  • files that usually have .asc ending: read.asc.Andor(), read.asc.PerkinElmer()
  • files that usually have .txt ending: read_txt_Witec(), read_txt_Witec_TrueMatch(), read.txt.Horiba(), read.txt.Renishaw(), read.txt.Shimadzu()
  • the Witec multi-ASCII-file formats: read_dat_Witec(), read_txt_Witec_Graph()

Should these be bundled into, say, hySpc.read.txt?

Last but not least, there is a number of file formats where we have example data but no import functions yet. At least some of them will have their own dependencies.

  • Import of Shimadzu .spc files (Support for SPC files from Shimadzu instruments cbeleites/hyperSpec#102) will require OLE reading.
    (Unfortunately Shimadzu uses a file ending here that coincides with the well-known and widely used Thermo Galactic .spc file format ending - but the formats are completely different)

  • .jaz is ASCII, I don't think we'll have dependencies here

  • .pz2 is ASCII, I don't think we'll have dependencies here (we have some import code here that would need polishing)

  • Diffrac .uxd: ASCII, I don't think we'll have dependencies here

  • Perkin Elmer .sp: binary. Not sure about inner structure or dependencies

  • Gasmet .spe: binary, not the same as Princeton Instruments/Winspec .spe. Not sure about inner structure or dependencies

  • Trivista .tvf: XML-based

  • Renishaw WiRE .wdf binary, probably doable without dependencies.

  • Witec .WIP: binary, not sure about dependencies. Witec refused to discuss their file format with me, but see e.g. Gwyddion

  • Bruker Opus .0, .1, ...: binary (we don't actually have example data, but I could easily obtain some). Bruker has let me have their file format whitepaper in the past, but not recently. I.e.


@bryanhanson, @GegznaV , @eoduniyi, @ximeg: What do you think:

It may be better to have the file import packages named consistently and have them all by file format name.
This would mean that we drop hySpc.read.Witec (or rather, rename it into hySpc.read.txt). We have e.g. several manufacturers exporting in Thermo Galactic .spc format, and their files are slightly different so we have not only read.spc(), but also read.spc.KaiserMap() etc. Putting the latter into a package hySpc.read.Kaiser would have that package depending on hySpc.read.spc which I'd like to avoid.

@ximeg
Copy link

ximeg commented Jul 17, 2020

As long as the end user can easily install all r-hyperSpec packages and easily (automatically) load all of them, we can split the file format function between packages however we want. It is important to remove this burden from the end user. I like and support the idea to do the packaging based on the dependencies, trying to minimize them.

My point is that as an end user (data analyst/spectroscopist) I want to be able to

  • setup and update my working environment easily, something like install.packages('hyperSpec-EVERYTHING')
  • load all available functions into my environment without thinking about all these granularities, I'd prefere to just call library(hyperSpec) as opposed to
library(hySpc.ggplot2)
library(hySpc.chondro)
library(hySpc.matrixStats)
library(hySpc.baseline)
library(hySpc.read.ENVI)
library(hySpc.read.spc)
library(hySpc.read.txt)

# Now I can finally write a line of code that reads a file, subtracts a baseline, and makes a plot
...
???
# Wait, I forgot to load a package that provides the `filter()` function...
# What was its name? ... Google it... Ah, `dplyr`
library(hySpc.dplyr)
...
# works!

@eoduniyi
Copy link

eoduniyi commented Jul 18, 2020

vision-model

@GegznaV this is still useful:
Screen Shot 2020-07-18 at 3 47 26 AM
via RGSOC_2020_Proposal

io

@cbeleites It sounds like the hySpc.read.Witec will be turned into hySpc.read.txt, which means this will be a larger package that supports import filters: Witec, Reinshaw, Andor, PerkinElmer, and Horiba. The remaining file io packages will support reading spectra data from: MATLAB, Winspec, Shimandzu?, and JCAMP.DX. Additionally, dedicated packages for ENVI and spc.

ux/ui

@ximeg I totally agree with you on this; I wonder about other ways to support the friendliness/experience for typical spectroscopic work.

maintainability

@bryanhanson I think the documentation on functionality and contributing/style has made it easier to maintain

@eoduniyi
Copy link

From the perspective of time I think we've gotten more specific about the implementation details: @cbeleites 2011 hyperSpec figure -> RGSOC_2020 figures -> @GegznaV figures

@bryanhanson
Copy link
Collaborator

Closing, as we have pretty much settled on a naming scheme and the issue is old.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants