New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cell syntax #136

Closed
nokome opened this Issue Feb 25, 2016 · 2 comments

Comments

Projects
None yet
2 participants
@nokome
Member

nokome commented Feb 25, 2016

Stencila sheets, like other spreadsheet software, define a syntax for cell expressions. This cell syntax differs from most spreadsheet syntax because it combines the usual cell reference expressions e.g. A1, A1:B3 with the syntax of the host language e.g. plot(A1:A15,ylab="Weight (kg)").

This issue is for proposing and discussing extensions to the cell syntax to allow for different types of cell execution. This issue thus anticipates features that may, or may not, be added in the future. This is necessary to create an initial syntax design with some syntactic consistency for different cell types.

This issue overlaps with #109 and #116. Some changes in terminology may be used given the expansion in scope. As much as possible, we should use terms commonly used in general programming. This issue is likely to be converted into documentation sometime.

Cell kind

We use the word kind to refer to different kinds of cell execution and dependency (the word type is used to to refer to different types of cell values e.g. numeric, string, image). We use the terms "predecessors" and "successors" to describes relationships between cells in the dependency graph.

Literal cells

Cells can be literals (i.e. constants) either:

  • valid literals in the host context language (e.g "a string", 3.14, 42), or
  • a literal for which we add parsing and conversion to a native type (e.g. '25 Feb 2016' becomes a Python datetime.datetime object).

Literals are "data" cells, they don't have any predecessors, and in that sense they are "static". Nonetheless, they are inserted into the dependency graph because they may have successors.

Expression cells =

Prefixing a cell with = indicates that the cell contains an expression which requires evaluation in the execution context. These cells are inserted as nodes in the dependency graph and are automatically updated when their predecessors are updated. In this sense they are "dynamic" cells. (Note that the other types of cells below are also expressions but that the "expression" kind refers to this simplest case).

Mapping cells :

Prefixing a cell with : indicates an object in the execution context should be mapped to the adjacent cells. For example, if cell A1 was set to : data.frame(y=1:10,y=rep(3,10)) the resulting R data frame would be displayed in cells A1:B11 with row A containing the headers x and y and cells A2:B11 containing the values. Any changes to the values in those cells would get mapped into the data.frame object in the context. For example, editing the value in A3 to "99" would actually execute the equivalent of A1[2,1] = 99 in the context (i.e. set the second row, first column).

Cell mapping is in effect a projection of native objects onto the spreadsheet. It allows for more efficient, and convenient, representation of large amounts of data.

Requirement cells ^

Prefixing a cell with ^ indicates that it should be evaluated once, before other cells, and not inserted into the dependency graph. Usage: importing packages required elsewhere in the sheet e.g. in R ^ library(ggplot2), in Python ^ import numpy. Right now we automatically recognise these imports but using the ^ prefix makes it more explicit and allows for some front-end optimisations.

Independent cells |

Prefixing a cell with | indicates that the cell should not be inserted into the dependency tree and should only be updated explicitly by the user. This can be used for applying functions that have side effects i.e. alter other cell values and/or are computationally expensive. For example, to replicate the functionality of "solver" in Excel a sheet could make use of the R function optim to do non-linear optimization subject to constraints.

Test cells ?

Prefixing a cell with ? indicates that it is a test assertion e.g. ? sum(A1:A10)==100 could be used as a test that some calculated percentages add to 100%.

A common criticism of spreadsheets is that they are not amenable to testing. This provides a very quick way to define a test. Using the dependency graph we can report to the user which tests have failed and provide test coverage statistics (e.g. 5 out of 10 dynamic cells (those with evaluated expressions) are covered by 2 tests).

Visualization cells ~

Prefixing a cell with ~ indicates that it contains an expression defining a browser based visualization. These cells don't require any execution in the context (they just require client side rendering) but are still part of the dependency graph (they will have predecessors but no successors).

Currently, any data visualizations need to be written in the host language, executed in the context and a PNG (or SVG) returned to the browser. Browser based visualizations may be faster (don't require server roundtrips) and allow for interactivity (e.g. select points on a plot and see which sheet cells they represent).

Furthermore, there is potential to create a more concise visualisation syntax for sheet cells. For example, the following R expression can be used to create a facetted scatterplot using ggplot2:

= ggplot(data.frame(height=A2:A10,weight=B2:B10,ethnicity=C2:C10,gender=D2:D10)) + geom_point(aes(x=height,y=weight,color=gender)) + facet_grid(ethnicity~.)

A specific visualisation syntax might be able to be a lot shorter and have some visualisation "smarts" like recognizing (as Excel does) that the first row of the cell range contains labels that can be used as axis labels:

~ A1:D10 as points with x=A,y=B,color=C,row=D

Currently experimenting with implementing a shorthand visualization language which gets transformed into a Vega-lite JSON spec which in turn gets transformed to a Vega JSON spec which in turn gets rendered using D3. This approach allows for alternative levels of accessibility versus sophistication. The viz shorthand would be integrated into stencils as well (hence the Cila like "as" and "with" syntax in the example above).

Cila cells _

Cila cells start with a leading underscore, are parsed and converted into HTML to provide for rich text annotations (see #117). Content cells will usually be used for providing explanations about the sheet. For example, a content cell might provide a link to the source of some data:

Data obtained from [Vincent, Van and Goh 1988](http://doi.org/10.1000/182)

If the user wants to "protect" the cell from parsing as markdown that can explicitly mark it as a string literal:

"A string with _underscores_ not to be interpreted as italics" 

Text cells

If the characters in a cell do not match the regexes for any of the above cell kinds it is considered a text cell. Text cells are converted into a string expression.

Cell names

All cells, regardless of their kind, can have a name. A cell name (e.g. exchange_rate) is, in effect, an alias for its id (e.g. A1). e.g.

exchange_rate = 0.75
data : getURL("https://raw.github.com/..../data.csv")
figure_xy ~ A1:B100 as points

At this stage, we are not thinking of implementing named cell ranges as in Excel, since mapping cells (described above) provide a better approach (given the integration with native objects like R data.frames) to the same idiom.

@nokome

This comment has been minimized.

Show comment
Hide comment
@nokome

nokome Mar 6, 2016

Member

Changed syntax so that mapping cells use ":", visualization uses "~" and manual cells use "|". Added Cila cells which use leading underscore.

Member

nokome commented Mar 6, 2016

Changed syntax so that mapping cells use ":", visualization uses "~" and manual cells use "|". Added Cila cells which use leading underscore.

@michael

This comment has been minimized.

Show comment
Hide comment
@michael

michael Dec 20, 2017

Member

@nokome it would be good if we could get rid of all the old (outdated) issues. Could you do such a round please?

Member

michael commented Dec 20, 2017

@nokome it would be good if we could get rid of all the old (outdated) issues. Could you do such a round please?

@nokome nokome closed this Dec 21, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment