New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Table type #1

Open
crawshaw opened this Issue Jan 3, 2017 · 0 comments

Comments

Projects
None yet
1 participant
@crawshaw
Member

crawshaw commented Jan 3, 2017

This issue covers the proposed design and tradeoffs in the neugram table type.

A data table, or frame, is an two-dimensional data set arranged in columns.
It is intended to be used like an R data.frame, Matlab table, or an SQL table.
An ideal table has a small number of columns and any number of rows.

Columns can be named. The intention is a typical program understands the set of columns statically, but this is not required (and so not baked into the static type system).

Table is an abstract type. Any concrete type satisfying an equivalent method set can be used as a table.

The table does not have to be in memory, or on the local machine. Operations can return errors. Neugram will ship with a table implementation built on SQL. In general, tables can be understood as language-specific syntax for working with SQL tables. Nowhere near all SQL features are supported and there are some important semantic differences that make only a subset of SQL tables suitable treating as a neugram table, specifically, rows are ordered in a neugram table, and neugram has no notion of primary keys.

Type and slice syntax

The syntax for a table type is [|]T, where T is the type of the table cells.
A table that can hold any type is [|]interface{}

For slicing, a table uses the same comma-separated syntax described in the Go multi-dimensional slice proposal: https://github.com/golang/proposal/blob/master/design/6282-table-data.md

Methods

Every table over data of type T must implement:

interface {
	Cols() []string          // reports number of columns and their names
	Len() (int, error)       // reports the number of rows
	Get(x, y int) (T, error) // gets the value of cell
}

A table can optionally implement more methods:

interface { Set(x, y int, value T) error }
interface { Slice(xlow, xhigh, ylow, yhigh int) [|]T }
interface { SliceCol(name ...string) [|]T }
interface { CopyFrom(src [|]T) (int, error) }
interface { CopyTo(dst [|]T) (int, error) }

Builtin make memory tables

The builtin function make can be used to create in-memory data tables.

x := make([|]T, 9, "Grape", "Vintage")
// x is a table with two columns named "Grape" and "Vintage", and 9 rows.

Composite literals

presidents := [|]interface{}{
	{|"ID", "Name", "Term1", "Term2"|},
	{1, "George Washington", 1789, 1792},
	{2, "John Adams", 1797, 0},
	{3, "Thomas Jefferson", 1800, 1804},
	{4, "James Madison", 1808, 1812},
}

TODO

  • append: Don't support it? We do want an equivalent to SQL insert.
  • Extend slicing/indexing syntax to allow using column names.
  • [|]interface{} is extremely common but clunky. ([|]val, [|]any?)
  • potential slicing variants:
presidents["Name"] == presidents[1] == [|]interface{}{
	{|"Name"|},
	{"George Washington"},
	{"John Adams"},
	{"Thomas Jefferson"},
	{"James Madison"},
}

x = [|]num{
	{|"Col0", "Col1", "Col2"|},
	{0.0, 0.1, 0.2},
	{1.0, 1.1, 1.2},
	{2.0, 2.1  2.2},
}

x[1] == x["Col1"] == [|]num{
	{|"Col1"|},
	{0.1},
	{1.1},
	{2.1},
}

x[,2] == [|]num{
	{|"Col0", "Col1", "Col2"|},
	{2.0, 2.1  2.2},
}

x[0|2] == x["Col0"|"Col2"] == [|]num{
	{|"Col0", "Col2"|},
	{0.0, 0.2},
	{1.0, 1.2},
	{2.0, 2.2},
}

x[0:1] == x[0|1] == x["Col0"|"Col1"]

x[1,0:1] == [|]num{
	{|"Col1"|},
	{0.1},
	{1.1},
}

(The first comment of this issue is kept up-to-date with the current proposal.
When commenting on it, quote any relevant sections and respond to the quote.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment