Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make tenant field optional #355

Merged
merged 2 commits into from
Mar 31, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
2 changes: 1 addition & 1 deletion doc/03_configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,7 @@ ingest:

## Event schema definition

In this section field types and names should be defined for metadata, impression and interaction events. Metarank supports
In this section field types and names should be defined for metadata, ranking and interaction events. Metarank supports
the following types of fields:
1. string: a regular UTF-8 string
2. number: a double-precision floating-point format
Expand Down
2 changes: 1 addition & 1 deletion doc/features/counters.md
Original file line number Diff line number Diff line change
Expand Up @@ -67,7 +67,7 @@ You can configure it in the following way:
// name of the feature extractor used as a dividend
top: click
// name of the feature extractor that is used as a divider
bottom: impression
bottom: ranking
scope: item
bucket: 24h
periods: [7,30]
Expand Down
4 changes: 2 additions & 2 deletions doc/tutorial_ranklens.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ This tutorial reproduces the system running on [demo.metarank.ai](https://demo.m

For the data input, we are using a repackaged copy of the ranklens dataset [available here](https://github.com/metarank/metarank/tree/master/src/test/resources/ranklens/events).
The only difference with the original dataset is that we have converted it to a metarank-compatible data model with
metadata/interaction/impression [event format](./xx_event_schema.md).
metadata/interaction/ranking [event format](./xx_event_schema.md).

### Configuration

Expand Down Expand Up @@ -83,7 +83,7 @@ features:
- name: ctr
type: rate
top: click
bottom: impression
bottom: examine
scope: item
bucket: 24h
periods: [7,30]
Expand Down
2 changes: 1 addition & 1 deletion doc/xx_api_schema.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ Ranking endpoint does the real work of personalizing items that are passed to it
}
```

- `id`: a request identifier later used to join impression and interaction events. This will be the same value that you will pass to feedback endpoint for impression and ranking events.
- `id`: a request identifier later used to join ranking and interaction events. This will be the same value that you will pass to feedback endpoint for impression and ranking events.
- `user`: unique visitor identifier.
- `session`: session identifier, a single visitor may have multiple sessions.
- `fields`: an optional array of extra fields that you can use in your model, for more information refer to [Supported events](xx_event_schema.md).
Expand Down
6 changes: 3 additions & 3 deletions doc/xx_event_schema.md
Original file line number Diff line number Diff line change
Expand Up @@ -84,7 +84,7 @@ This information is used by personalization algorithms to understand which items
}
```

- `id`: a request identifier later used to join impression and interaction events. Should match the value that is sent to [ranking API](xx_api_schema.md).
- `id`: a request identifier later used to join ranking and interaction events. Should match the value that is sent to [ranking API](xx_api_schema.md).
- `user`: unique visitor identifier.
- `session`: session identifier, a single visitor may have multiple sessions.
- `fields`: an optional array of extra fields that you can use in your model, as described above.
Expand All @@ -105,7 +105,7 @@ The `type` field must match the `name` provided in the [Configuration](03_config
{
"event": "interaction",
"id": "0f4c0036-04fb-4409-b2c6-7163a59f6b7d",// required
"impression": "81f46c34-a4bb-469c-8708-f8127cd67d27", //required
"ranking": "81f46c34-a4bb-469c-8708-f8127cd67d27", //required
"timestamp": "1599391467000",// required
"user": "user1",// required
"session": "session1",// required
Expand All @@ -119,7 +119,7 @@ The `type` field must match the `name` provided in the [Configuration](03_config
```

- `id`: a request identifier.
- `impression`: an identifier of the parent raning event event
- `ranking`: an identifier of the parent raning event event
- `user`: unique visitor identifier.
- `session`: session identifier, a single visitor may have multiple sessions.
- `type`: internal name of the event.
Expand Down
4 changes: 2 additions & 2 deletions src/main/scala/ai/metarank/mode/inference/api/RankApi.scala
Original file line number Diff line number Diff line change
Expand Up @@ -98,8 +98,8 @@ case class RankApi(
}

object RankApi {

implicit val requestDecoder: EntityDecoder[IO, RankingEvent] = jsonOf
import ai.metarank.model.Event.EventCodecs._
implicit val requestDecoder: EntityDecoder[IO, RankingEvent] = jsonOf[IO, RankingEvent]
implicit val itemScoreEncoder: EntityEncoder[IO, RankResponse] = jsonEncoderOf

object ExplainParamDecoder extends OptionalQueryParamDecoderMatcher[Boolean]("explain")
Expand Down
26 changes: 16 additions & 10 deletions src/main/scala/ai/metarank/model/Event.scala
Original file line number Diff line number Diff line change
Expand Up @@ -20,8 +20,8 @@ object Event {
id: EventId,
item: ItemId,
timestamp: Timestamp,
fields: List[Field],
tenant: String
fields: List[Field] = Nil,
tenant: String = "default"
) extends Event {}

sealed trait FeedbackEvent extends Event {
Expand All @@ -34,9 +34,9 @@ object Event {
timestamp: Timestamp,
user: UserId,
session: SessionId,
fields: List[Field],
fields: List[Field] = Nil,
items: List[ItemRelevancy],
tenant: String
tenant: String = "default"
) extends FeedbackEvent

case class InteractionEvent(
Expand All @@ -47,20 +47,26 @@ object Event {
user: UserId,
session: SessionId,
`type`: String,
fields: List[Field],
tenant: String
fields: List[Field] = Nil,
tenant: String = "default"
) extends FeedbackEvent

case class ItemRelevancy(id: ItemId, relevancy: Option[Double] = None)
object ItemRelevancy {
def apply(id: ItemId, relevancy: Double) = new ItemRelevancy(id, Some(relevancy))
}

implicit val relevancyCodec: Codec[ItemRelevancy] = deriveCodec
implicit val metadataCodec: Codec[MetadataEvent] = deriveCodec
implicit val rankingCodec: Codec[RankingEvent] = deriveCodec
implicit val interactionCodec: Codec[InteractionEvent] = deriveCodec
object EventCodecs {
implicit val conf = Configuration.default.withDefaults
implicit val relevancyCodec: Codec[ItemRelevancy] = deriveCodec
implicit val metadataCodec: Codec[MetadataEvent] = deriveConfiguredCodec
implicit val rankingCodec: Codec[RankingEvent] = deriveConfiguredCodec
implicit val interactionCodec: Codec[InteractionEvent] = deriveConfiguredCodec
}

import EventCodecs.metadataCodec
import EventCodecs.rankingCodec
import EventCodecs.interactionCodec
implicit val conf = Configuration.default
.withDiscriminator("event")
.withKebabCaseMemberNames
Expand Down
8 changes: 6 additions & 2 deletions src/main/scala/ai/metarank/source/FileEventSource.scala
Original file line number Diff line number Diff line change
Expand Up @@ -14,12 +14,16 @@ import io.circe.parser._
import java.io.{ByteArrayOutputStream, InputStream}
import ai.metarank.flow.DataStreamOps._
import ai.metarank.util.Logging
import org.apache.flink.connector.file.src.enumerate.NonSplittingRecursiveEnumerator

case class FileEventSource(path: String) extends EventSource {
override def eventStream(env: StreamExecutionEnvironment)(implicit ti: TypeInformation[Event]): DataStream[Event] =
env
.fromSource(
source = FileSource.forRecordStreamFormat(EventStreamFormat(), new Path(path)).processStaticFileSet().build(),
source = FileSource
.forRecordStreamFormat(EventStreamFormat(), new Path(path))
.processStaticFileSet()
.build(),
watermarkStrategy = EventWatermarkStrategy(),
sourceName = "events-source"
)
Expand Down Expand Up @@ -53,7 +57,7 @@ object FileEventSource {
if (line != null) decode[Event](line) match {
case Left(value) =>
logger.error(s"cannot decode line ${line}", value)
throw new IllegalArgumentException("json decoding error")
throw new IllegalArgumentException(s"json decoding error '$value' on line '${line}'")
case Right(value) =>
value
}
Expand Down
49 changes: 46 additions & 3 deletions src/test/scala/ai/metarank/model/EventJsonTest.scala
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,6 @@ class EventJsonTest extends AnyFlatSpec with Matchers {
| "id": "81f46c34-a4bb-469c-8708-f8127cd67d27",
| "item": "product1",
| "timestamp": "1599391467000",
| "tenant": "default",
| "fields": [
| {"name": "title", "value": "Nice jeans"},
| {"name": "price", "value": 25.0},
Expand All @@ -37,6 +36,52 @@ class EventJsonTest extends AnyFlatSpec with Matchers {
)
)
}
it should "decode metadata with tenant" in {
val json = """{
| "event": "metadata",
| "id": "81f46c34-a4bb-469c-8708-f8127cd67d27",
| "item": "product1",
| "timestamp": "1599391467000",
| "fields": [
| {"name": "title", "value": "Nice jeans"},
| {"name": "price", "value": 25.0},
| {"name": "color", "value": ["blue", "black"]},
| {"name": "availability", "value": true}
| ],
| "tenant": "foo"
|}""".stripMargin
decode[Event](json) shouldBe Right(
MetadataEvent(
id = EventId("81f46c34-a4bb-469c-8708-f8127cd67d27"),
item = ItemId("product1"),
timestamp = Timestamp(1599391467000L),
fields = List(
StringField("title", "Nice jeans"),
NumberField("price", 25),
StringListField("color", List("blue", "black")),
BooleanField("availability", true)
),
tenant = "foo"
)
)
}
it should "decode metadata with empty fields" in {
val json = """{
| "event": "metadata",
| "id": "81f46c34-a4bb-469c-8708-f8127cd67d27",
| "item": "product1",
| "timestamp": "1599391467000"
|}""".stripMargin
decode[Event](json) shouldBe Right(
MetadataEvent(
id = EventId("81f46c34-a4bb-469c-8708-f8127cd67d27"),
item = ItemId("product1"),
timestamp = Timestamp(1599391467000L),
fields = Nil,
tenant = "default"
)
)
}

it should "decode ranking" in {
val json = """{
Expand All @@ -45,7 +90,6 @@ class EventJsonTest extends AnyFlatSpec with Matchers {
| "timestamp": "1599391467000",
| "user": "user1",
| "session": "session1",
| "tenant": "default",
| "fields": [
| {"name": "query", "value": "jeans"},
| {"name": "source", "value": "search"}
Expand Down Expand Up @@ -85,7 +129,6 @@ class EventJsonTest extends AnyFlatSpec with Matchers {
| "timestamp": "1599391467000",
| "user": "user1",
| "session": "session1",
| "tenant": "default",
| "type": "purchase",
| "item": "product1",
| "fields": [
Expand Down