Skip to content

Commit

Permalink
use kotlin4example
Browse files Browse the repository at this point in the history
  • Loading branch information
jillesvangurp committed Dec 24, 2023
1 parent 6f7c5ea commit a6a8c7e
Show file tree
Hide file tree
Showing 6 changed files with 321 additions and 7 deletions.
124 changes: 118 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,16 +1,18 @@
# Pg Doc Store

[![Process Pull Request](https://github.com/formation-res/pg-docstore/actions/workflows/pr_master.yaml/badge.svg)](https://github.com/formation-res/pg-docstore/actions/workflows/pr_master.yaml)

Pg-docstore is a kotlin library that allows you to use postgres as a json document store from Kotlin.

## Why

Document stores are very useful in some applications and while popular in the nosql world, sql databases like
postgres provide a lot of nice functionality ant performance and is therefore a popular choice for data storage.
postgres provide a lot of nice functionality ant performance and is therefore a popular choice for data storage.
Using postgres as a document store makes a lot of sense. Transactionality ensures that your data is safe. You can use any
of a wide range of managed and cloud based service providers or just host it yourself.

At FORMATION, we use pg-docstore to store millions of documents of different types. We use Elasticsearch for querying, aggregations, and other functionality and therefore have no need
for the elaborate querying support in databases. But we do like to keep our data safe, which is why we like postgres.
for the elaborate querying support in databases. But we do like to keep our data safe, which is why we like postgres.
Additionally, we like having a clear separation between what we store and what we query on. So, our architecture includes
an ETL pipeline that builds our search index from the raw data in pg-docstore.

Expand All @@ -21,24 +23,134 @@ an ETL pipeline that builds our search index from the raw data in pg-docstore.
- serialization is done using kotlinx.serialization
- efficient bulk insert/updates
- efficient querying and dumping of the content of the store with database scrolls. We use this for our ETL pipeline.
- nice Kotlin API with suspend functions, flows, strong typing, etc.
- nice Kotlin API with suspend functions, flows, strong typing, etc.

This library builds on jasync-postgresql, which is one of the few database drivers out there that is written in Kotlin
and that uses non blocking IO.

## Usage

```kotlin
// jasync suspending connection
val connection = PostgreSQLConnectionBuilder
.createConnectionPool(
ConnectionPoolConfiguration(
host = "localhost",
port = 5432,
database = "docstore",
username = "postgres",
password = "secret",
)
).asSuspending

// recreate the docs table
db.reCreateDocStoreSchema("docs")

@Serializable
data class MyModel(
val title: String,
val description: String,
val categories: List<String> = listOf(),
val id: String = UUID.randomUUID().toString(),
)

// create a store for the docs table
val store = DocStore(
connection = connection,
serializationStrategy = MyModel.serializer(),
tableName = "docs",
idExtractor = MyModel::id,
// optional, used for text search
textExtractor = {
"${it.title} ${it.description}"
},
// optional, used for tag search
tagExtractor = {
it.categories
}
)

// do some crud
val doc1 = MyModel("Number 1", "a document", categories = listOf("foo"))
store.create(doc1)
store.getById(doc1.id)?.let {
println("Retrieved ${it.title}")
}
// update by id
store.update(doc1.id) {
it.copy(title = "Number One")
}
// or just pass in the document
store.update(doc1) {
it.copy(title = "Numero Uno")
}
// Retrieve it again
store.getById(doc1.id)?.let {
println("Retrieved ${it.title}")
}

// you can also do bulk inserts using flows or lists
flow {
repeat(200) {
emit(
MyModel(
title = "Bulk $1",
description = "A bulk inserted doc",
categories = listOf("bulk")
)
)
}
}.let { f ->
// bulk insert 40 at a time
store.bulkInsert(flow = f, chunkSize = 40)
}


// and of course we can query
println( store.documentsByRecency(limit = 5).map { it.title })
// or we can scroll through the entire table
// and count the number of documents in the flow
println("Total documents: ${
store.documentsByRecencyScrolling().count()
}")

// and we can restrict the search using tags
println("Just the bulk documents: ${
store
.documentsByRecencyScrolling(
tags = listOf("bulk")
)
.count()
}")
```

Captured Output:

```
Retrieved Number 1
Retrieved Numero Uno
[Bulk $1, Bulk $1, Bulk $1, Bulk $1, Bulk $1]
Total documents: 201
Just the bulk documents: 200
```

## Future work

As FORMATION grows, we will no doubt need more features. Some features that come to mind are sharding, utilizing some of
the json features in postgres, or even it's text search and geospatial features.

## Development status

This is a relatively new project; so there may be some bugs, design flaws, etc. However, I've implemented similar
stores many times before in past projects and I think I know what I'm doing. If it works for us, it might also work for you.
This is a relatively new project; so there may be some bugs, design flaws, etc. However, I've implemented similar
stores many times before in past projects and I think I know what I'm doing. If it works for us, it might also work for you.

Give it a try!

## License and contributing

The code is provided as is under the [MIT](LICENSE.md). If you are planning to make a contribution, please
The code is provided as is under the [MIT](LICENSE.md). If you are planning to make a contribution, please
reach out via the issue tracker first.

This readme is generated with [kotlin4example](https://github.com/jillesvangurp/kotlin4example)

7 changes: 7 additions & 0 deletions build.gradle.kts
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,12 @@ plugins {

repositories {
mavenCentral()
maven(url = "https://jitpack.io") {
content {
includeGroup("com.github.jillesvangurp")
}
}

}

dependencies {
Expand All @@ -34,6 +40,7 @@ dependencies {
testImplementation("org.slf4j:jul-to-slf4j:_")
testImplementation("org.apache.logging.log4j:log4j-to-slf4j:_") // es seems to insist on log4j2
testImplementation("ch.qos.logback:logback-classic:_")
testImplementation("com.github.jillesvangurp:kotlin4example:_")
}

configure<ComposeExtension> {
Expand Down
147 changes: 147 additions & 0 deletions src/test/kotlin/com/tryformation/pgdocstore/docs/DocGenerationTest.kt
Original file line number Diff line number Diff line change
@@ -0,0 +1,147 @@
package com.tryformation.pgdocstore.docs

import com.github.jasync.sql.db.ConnectionPoolConfiguration
import com.github.jasync.sql.db.asSuspending
import com.github.jasync.sql.db.interceptor.LoggingInterceptorSupplier
import com.github.jasync.sql.db.postgresql.PostgreSQLConnectionBuilder
import com.jillesvangurp.kotlin4example.SourceRepository
import com.tryformation.pgdocstore.*
import kotlinx.coroutines.Dispatchers
import kotlinx.coroutines.flow.count
import kotlinx.coroutines.flow.flow
import kotlinx.serialization.Serializable
import org.junit.jupiter.api.Test
import java.io.File
import java.util.*
import kotlin.random.Random
import kotlin.random.nextULong

const val githubLink = "https://github.com/formation-res/pg-docstore"

val sourceGitRepository = SourceRepository(
repoUrl = githubLink,
sourcePaths = setOf("src/main/kotlin", "src/test/kotlin")
)

class DocGenerationTest {

@Test
fun `generate docs`() {
File(".", "README.md").writeText(
"""
# Pg Doc Store
""".trimIndent().trimMargin() + "\n\n" + readmeMd.value
)
}
}

val readmeMd = sourceGitRepository.md {
includeMdFile("intro.md")

section("Usage") {

suspendingBlock {
// jasync suspending connection
val connection = PostgreSQLConnectionBuilder
.createConnectionPool(
ConnectionPoolConfiguration(
host = "localhost",
port = 5432,
database = "docstore",
username = "postgres",
password = "secret",
)
).asSuspending

// recreate the docs table
db.reCreateDocStoreSchema("docs")

@Serializable
data class MyModel(
val title: String,
val description: String,
val categories: List<String> = listOf(),
val id: String = UUID.randomUUID().toString(),
)

// create a store for the docs table
val store = DocStore(
connection = connection,
serializationStrategy = MyModel.serializer(),
tableName = "docs",
idExtractor = MyModel::id,
// optional, used for text search
textExtractor = {
"${it.title} ${it.description}"
},
// optional, used for tag search
tagExtractor = {
it.categories
}
)

// do some crud
val doc1 = MyModel("Number 1", "a document", categories = listOf("foo"))
store.create(doc1)
store.getById(doc1.id)?.let {
println("Retrieved ${it.title}")
}
// update by id
store.update(doc1.id) {
it.copy(title = "Number One")
}
// or just pass in the document
store.update(doc1) {
it.copy(title = "Numero Uno")
}
// Retrieve it again
store.getById(doc1.id)?.let {
println("Retrieved ${it.title}")
}

// you can also do bulk inserts using flows or lists
flow {
repeat(200) {
emit(
MyModel(
title = "Bulk $1",
description = "A bulk inserted doc",
categories = listOf("bulk")
)
)
}
}.let { f ->
// bulk insert 40 at a time
store.bulkInsert(flow = f, chunkSize = 40)
}


// and of course we can query
println( store.documentsByRecency(limit = 5).map { it.title })
// or we can scroll through the entire table
// and count the number of documents in the flow
println("Total documents: ${
store.documentsByRecencyScrolling().count()
}")

// and we can restrict the search using tags
println("Just the bulk documents: ${
store
.documentsByRecencyScrolling(
tags = listOf("bulk")
)
.count()
}")
}


}

includeMdFile("outro.md")

+"""
This readme is generated with [kotlin4example](https://github.com/jillesvangurp/kotlin4example)
""".trimIndent()

}
27 changes: 27 additions & 0 deletions src/test/kotlin/com/tryformation/pgdocstore/docs/intro.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
[![Process Pull Request](https://github.com/formation-res/pg-docstore/actions/workflows/pr_master.yaml/badge.svg)](https://github.com/formation-res/pg-docstore/actions/workflows/pr_master.yaml)

Pg-docstore is a kotlin library that allows you to use postgres as a json document store from Kotlin.

## Why

Document stores are very useful in some applications and while popular in the nosql world, sql databases like
postgres provide a lot of nice functionality ant performance and is therefore a popular choice for data storage.
Using postgres as a document store makes a lot of sense. Transactionality ensures that your data is safe. You can use any
of a wide range of managed and cloud based service providers or just host it yourself.

At FORMATION, we use pg-docstore to store millions of documents of different types. We use Elasticsearch for querying, aggregations, and other functionality and therefore have no need
for the elaborate querying support in databases. But we do like to keep our data safe, which is why we like postgres.
Additionally, we like having a clear separation between what we store and what we query on. So, our architecture includes
an ETL pipeline that builds our search index from the raw data in pg-docstore.

## Features

- document store with crud operations for storing and retrieving json documents
- update function that retrieves, applies your lambda to the retrieved document, and then stores in a transaction.
- serialization is done using kotlinx.serialization
- efficient bulk insert/updates
- efficient querying and dumping of the content of the store with database scrolls. We use this for our ETL pipeline.
- nice Kotlin API with suspend functions, flows, strong typing, etc.

This library builds on jasync-postgresql, which is one of the few database drivers out there that is written in Kotlin
and that uses non blocking IO.
16 changes: 16 additions & 0 deletions src/test/kotlin/com/tryformation/pgdocstore/docs/outro.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
## Future work

As FORMATION grows, we will no doubt need more features. Some features that come to mind are sharding, utilizing some of
the json features in postgres, or even it's text search and geospatial features.

## Development status

This is a relatively new project; so there may be some bugs, design flaws, etc. However, I've implemented similar
stores many times before in past projects and I think I know what I'm doing. If it works for us, it might also work for you.

Give it a try!

## License and contributing

The code is provided as is under the [MIT](LICENSE.md). If you are planning to make a contribution, please
reach out via the issue tracker first.
Loading

0 comments on commit a6a8c7e

Please sign in to comment.