Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

[pull] main from kelindar:main #2

Open
wants to merge 26 commits into
base: main
Choose a base branch
from
Open

[pull] main from kelindar:main #2

wants to merge 26 commits into from

Conversation

pull[bot]
Copy link

@pull pull bot commented May 7, 2022

See Commits and Changes for more details.


Created by pull[bot]

Can you help keep this open source service alive? 馃挅 Please sponsor : )

@pull pull bot added the 猡碉笍 pull label May 7, 2022
siennathesane and others added 24 commits May 29, 2022 21:05
added a matrix of supported go versions so the supported versions of go are tested
* Refactor numeric types

* bump to 1.18
* Fixed duplicate primary keys

* typo
* expose race case with ubuntu container

* mutex over column apply

* cover ID < last commit ID case

* Alter double apply test

* switch readChunk fill lock to non-read

* remove all test changes
This PR generalizes `Merge()` function and removes now redundant `Add()`
functions for numbers. When `Merge()` function is called on numerical
columns without specifying a custom merge option on the column, the
behavior of `Add()` will be observed (i.e. increments by default).
Introduces `WithUnion`, which operates similarly to `Union`, but will
allocate a separate bitmap for calculating the multi-OR before
applying it to the txn's current index via an AND. This could be used to
solve #55.
This PR contains some refactoring and a new `record` column that allows
you to use a `BinaryMarshaler` and `BinaryUnmarshaler` type to be
stored. As such, it supports types that implement this standard way of
encoding, for example `time.Time`.


```go
col := NewCollection()
col.CreateColumn("timestamp", ForRecord(func() *time.Time {
	return new(time.Time)
}, nil))

// Insert the time, it implements binary marshaler
idx, _ := col.Insert(func(r Row) error {
	now := time.Unix(1667745766, 0)
	r.SetRecord("timestamp", &now)
	return nil
})

// We should be able to read back the time
col.QueryAt(idx, func(r Row) error {
	ts, ok := r.Record("timestamp")
	assert.True(t, ok)
	assert.Equal(t, "November", ts.(*time.Time).UTC().Month().String())
	return nil
})
```
This PR fixes merging options on numerical columns, the argument was not
passed properly. Also added a few tests and removed mandatory merge
function on record column, to make the API more consistent.
## Bugfixes
* `Swap...` was not properly swapping when an underlying buffer was
re-allocated due to `append()`
* Index for `Record` types was not working when strings are being merged
due to the fact that it could not see the newly appended values.

## Optimizations
* When swapping strings, if the string is of exactly the same size as
before, it will be swapped in-place (same as for other values).
This PR adds `trigger` functionality for the cases where built-in
capabilities aren't sufficient. Triggers allow you to add a callback
function that will get called whether a value is `inserted`, `updated`
or `deleted`. It functions similarly to the bitmap index.


```go
players.CreateTrigger("on_balance", "balance", func(r Reader) {
	switch {
	case r.IsDelete():
		updates = append(updates, fmt.Sprintf("delete %d", r.Index()))
	case r.IsUpsert():
		updates = append(updates, fmt.Sprintf("upsert %d=%v", r.Index(), r.Float()))
	}
})

// Perform a few deletions and insertions
for i := 0; i < 3; i++ {
	players.DeleteAt(uint32(i))
	players.Insert(func(r Row) error {
		r.SetFloat64("balance", 50.0)
		return nil
	})
}

// Must keep track of all operations
assert.Len(t, updates, 6)
assert.Equal(t, []string{
	"delete 0", 
	"upsert 500=50", 
	"delete 1", 
	"upsert 501=50", 
	"delete 2",
	"upsert 502=50",
}, updates)
```
This PR introduces a new Sorted Index that keeps an actively sorted
b-tree (github.com/tidwall/btree) for a column of the user's choosing
(currently limited to string-type only). The index holds one b-tree that
is not copied between transactions (mutexed).

Future work would consider other type columns being sorted (currently
only string columns), PK sorting, and custom `Less()` functionality for
users.
I recently found an issue when trying to load a snapshot created with a
large number of rows (in my case 10 million).
It would result in an index out of range panic. I was able to reproduce
this in the unit tests with around 3 million players.

```
Running tool: /usr/local/go/bin/go test -timeout 30s -run ^TestLargeSnapshot$ github.com/kelindar/column

--- FAIL: TestLargeSnapshot (4.20s)
panic: runtime error: index out of range [128] with length 128 [recovered]
	panic: runtime error: index out of range [128] with length 128

goroutine 4 [running]:
testing.tRunner.func1.2({0x1053a25c0, 0x14059d74930})
	/usr/local/go/src/testing/testing.go:1389 +0x1c8
testing.tRunner.func1()
	/usr/local/go/src/testing/testing.go:1392 +0x384
panic({0x1053a25c0, 0x14059d74930})
	/usr/local/go/src/runtime/panic.go:838 +0x204
github.com/kelindar/column.(*Collection).readState.func1.1(0x14010b50090)
	/Users/sgosiaco/repos/column/snapshot.go:188 +0x274
github.com/kelindar/column.(*Collection).Query(0x1400012c370, 0x1400005bd18)
	/Users/sgosiaco/repos/column/collection.go:354 +0x48
github.com/kelindar/column.(*Collection).readState.func1(0x1400005bd88?, 0x105250968?)
	/Users/sgosiaco/repos/column/snapshot.go:184 +0x78
github.com/kelindar/iostream.(*Reader).ReadRange(0x1402a07f020, 0x1400010fe10)
	/Users/sgosiaco/go/pkg/mod/github.com/kelindar/iostream@v1.3.0/reader.go:227 +0x8c
github.com/kelindar/column.(*Collection).readState(0x1400012c370, {0x1053c3380?, 0x14000163900?})
	/Users/sgosiaco/repos/column/snapshot.go:183 +0x1d4
github.com/kelindar/column.(*Collection).Restore(0x1400012c370, {0x1053c3080, 0x140001014a0})
	/Users/sgosiaco/repos/column/snapshot.go:44 +0x54
github.com/kelindar/column.TestLargeSnapshot(0x140001151e0)
	/Users/sgosiaco/repos/column/snapshot_test.go:229 +0x174
testing.tRunner(0x140001151e0, 0x1053bf558)
	/usr/local/go/src/testing/testing.go:1439 +0x110
created by testing.(*T).Run
	/usr/local/go/src/testing/testing.go:1486 +0x300
FAIL	github.com/kelindar/column	4.339s
FAIL
```

Looking at the code I saw that currently the maximum number of chunks
able to be read is 128 due to the initialized size of the commits array.
I've changed this to a map to allow for an unknown number of
chunks/commits to be loaded since when saving the snapshot there doesn't
seem to be a limit on the potential number of chunks that can be saved.
This fixes issue #89 which was due to an improperly initialized capacity
for new columns, occurring when inserting large number of rows that
exceed initial capacity and then creating a column. The initial capacity
was set, not the current collection capacity, resulting in an incorrect
number passed to `Grow()` method.
This PR fixes issue #87 where `WithValue()` wasn't unmarshaling the
record.
This PR adds `Index()` function to the transaction, returning the
current cursor index.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants