New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sheet file format #175

Closed
nokome opened this Issue Apr 24, 2016 · 8 comments

Comments

Projects
None yet
2 participants
@nokome
Member

nokome commented Apr 24, 2016

This issue is to discuss changes to the file format for sheets as proposed in https://stenci.la/stencila/blog/humane-sheets/

Please feel free to add comments and suggestions below. I'll probably start a branch for it soon.

An example of what the proposed format might look like follows. It may be a useful reference against which alternatives could be compared. I've added "comment" starting with -- to indicate sections.

-- Meta attributes of the sheet (don't appear in cells)
#title An example sheet for proposed syntax
#summary Demonstrates the use of alternative syntax elements
#environ r
#requires ggplot2

-- Cell source
A1 6
A2 7
A3 = A1*A2
A4 ? A3==42
B1 _ A normal distribution |N ~(42,5)|
B2 data = simulate(1000)
B3 ggplot(data) + 
    geom_bar(aes(x=answer))
C1 = A1*2
C2:C3 == C1

-- Extra source (for code that you don't want in cells)
`sim <- function(x) data.frame(answer=rnorm(1000,6,2) * 7)

-- Cell styling
&A1 color:"grey"

-- Cell height and width
%A 10cm

-- Cell display (i.e. clipped, expanded, or overlay)
*B1 ove
*C10:D13 cli
@nokome

This comment has been minimized.

Show comment
Hide comment
@nokome

nokome Apr 24, 2016

Member

So, I'll kick this off with a bit of self criticism. For a format that purports to be human friendly, there is quite a lot of codified hieroglyphics in it. e.g &A, *B1.

The symbols within the cell source section (e.g. the ? in line A4 ? A3==42) are OK because that is what the user enters into the cell (the ? indicates a test assertion).

But the other lines are prefixed with symbols that indicate their purpose e.g the backtick to indicate extra source, the ampersand to indicate styling. Instead, a more human friendly syntax could be to use explicit named section. This would look something like this:

#meta
title An example sheet for proposed syntax
summary Demonstrates the use of alternative syntax elements
environ r
requires ggplot2

#source
A1 6
A2 7
A3 = A1*A2
A4 ? A3==42
B1 _ A normal distribution |N ~(42,5)|
B2 data = simulate(1000)
B3 ggplot(data) + 
    geom_bar(aes(x=answer))
C1 = A1*2
C2:C3 == C1

#extra
sim <- function(x) data.frame(answer=rnorm(1000,6,2) * 7)

#style
A1 color:"grey"

#size
A 10cm

#display
B1 ove
C10:D13 cli

That is certainly easier to read and understand. Can't see any downsides to it except it is probably a bit harder to implement.

Member

nokome commented Apr 24, 2016

So, I'll kick this off with a bit of self criticism. For a format that purports to be human friendly, there is quite a lot of codified hieroglyphics in it. e.g &A, *B1.

The symbols within the cell source section (e.g. the ? in line A4 ? A3==42) are OK because that is what the user enters into the cell (the ? indicates a test assertion).

But the other lines are prefixed with symbols that indicate their purpose e.g the backtick to indicate extra source, the ampersand to indicate styling. Instead, a more human friendly syntax could be to use explicit named section. This would look something like this:

#meta
title An example sheet for proposed syntax
summary Demonstrates the use of alternative syntax elements
environ r
requires ggplot2

#source
A1 6
A2 7
A3 = A1*A2
A4 ? A3==42
B1 _ A normal distribution |N ~(42,5)|
B2 data = simulate(1000)
B3 ggplot(data) + 
    geom_bar(aes(x=answer))
C1 = A1*2
C2:C3 == C1

#extra
sim <- function(x) data.frame(answer=rnorm(1000,6,2) * 7)

#style
A1 color:"grey"

#size
A 10cm

#display
B1 ove
C10:D13 cli

That is certainly easier to read and understand. Can't see any downsides to it except it is probably a bit harder to implement.

@sirinath

This comment has been minimized.

Show comment
Hide comment
@sirinath

sirinath Apr 25, 2016

Is it possible to choose a Homoiconic format like in Curl. Also see: #182

Also don't limit this to sheets. I think best is to have 1 format to rule them all! (Maybe you can borrow some ideas from orgmode on this.) Sheets can be a structure in the main language.

sirinath commented Apr 25, 2016

Is it possible to choose a Homoiconic format like in Curl. Also see: #182

Also don't limit this to sheets. I think best is to have 1 format to rule them all! (Maybe you can borrow some ideas from orgmode on this.) Sheets can be a structure in the main language.

@nokome

This comment has been minimized.

Show comment
Hide comment
@nokome

nokome Apr 26, 2016

Member

Thanks @sirinath, I wasn't aware of Curl - looks neat. Will consider your ideas.

Member

nokome commented Apr 26, 2016

Thanks @sirinath, I wasn't aware of Curl - looks neat. Will consider your ideas.

@nokome

This comment has been minimized.

Show comment
Hide comment
@nokome

nokome Apr 26, 2016

Member

Elsewhere, @blahah suggested putting metadata like author, title etc in a different file e.g. out/meta.json instead of out/out.tsv. Note that currently metadata doesn't actually get stored anywhere and out/out.tsv is really just a cell value cache that allows a sheet to be opened and viewed without having an execution context (although usually an execution context is present). To make that more explicit out/out.tsv could be renamed to out/cache.tsv. The topic of a meta.json then really is an issue of export format (rather than this issue which is mainly about source file formats) and could be implemented as an option to the export method of Sheets e.g. mysheet.export('my-sheet.csv',meta='meta.json')

Member

nokome commented Apr 26, 2016

Elsewhere, @blahah suggested putting metadata like author, title etc in a different file e.g. out/meta.json instead of out/out.tsv. Note that currently metadata doesn't actually get stored anywhere and out/out.tsv is really just a cell value cache that allows a sheet to be opened and viewed without having an execution context (although usually an execution context is present). To make that more explicit out/out.tsv could be renamed to out/cache.tsv. The topic of a meta.json then really is an issue of export format (rather than this issue which is mainly about source file formats) and could be implemented as an option to the export method of Sheets e.g. mysheet.export('my-sheet.csv',meta='meta.json')

@nokome

This comment has been minimized.

Show comment
Hide comment
@nokome

nokome Apr 26, 2016

Member

Just came across @manns pyspread file format (http://manns.github.io/pyspread/first_steps.html ; should've checked before, added section to blog post) which looks a lot like the version in my comment above in that it has section delimiters:

[Pyspread save file version]
0.1
[shape]
1000 100 3
[grid]
7 22 0 'Testcode1'
8 9 0 'Testcode2'
[attributes]
[] [] [] [] [(0, 0)] 0 'textfont' u'URW Chancery L'
[] [] [] [] [(0, 0)] 0 'pointsize' 20
[row_heights]
0 0 56.0
7 0 25.0
[col_widths]
0 0 80.0
[macros]
Macro text
Member

nokome commented Apr 26, 2016

Just came across @manns pyspread file format (http://manns.github.io/pyspread/first_steps.html ; should've checked before, added section to blog post) which looks a lot like the version in my comment above in that it has section delimiters:

[Pyspread save file version]
0.1
[shape]
1000 100 3
[grid]
7 22 0 'Testcode1'
8 9 0 'Testcode2'
[attributes]
[] [] [] [] [(0, 0)] 0 'textfont' u'URW Chancery L'
[] [] [] [] [(0, 0)] 0 'pointsize' 20
[row_heights]
0 0 56.0
7 0 25.0
[col_widths]
0 0 80.0
[macros]
Macro text
@sirinath

This comment has been minimized.

Show comment
Hide comment
@sirinath

sirinath Apr 27, 2016

Ability to split among files based on logic will be great but being forced to split due to certain meta data having to go into a particular file may not be that great.

Ability to split among files based on logic will be great but being forced to split due to certain meta data having to go into a particular file may not be that great.

@sirinath

This comment has been minimized.

Show comment
Hide comment
@sirinath

sirinath Apr 27, 2016

Also please consider homoiconicity like in curl and turing completeness like in TeX please.

Also please consider homoiconicity like in curl and turing completeness like in TeX please.

@nokome nokome referenced this issue Jun 16, 2016

Closed

Comments : syntax #201

0 of 3 tasks complete

@nokome nokome removed 1 - Ready labels Jul 3, 2017

@nokome

This comment has been minimized.

Show comment
Hide comment
@nokome

nokome Dec 21, 2017

Member

Current internal format for sheets is XML. Having plain text formats for sheets is still an important feature but will be handled in stencila/convert repo.

Member

nokome commented Dec 21, 2017

Current internal format for sheets is XML. Having plain text formats for sheets is still an important feature but will be handled in stencila/convert repo.

@nokome nokome closed this Dec 21, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment