You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Data Prepper has an upcoming documentdb source. The issue #4458 proposes a simple data type solution. However, sometimes we need to get all the data that is available.
Solution
Provide options for complex and extended types coming out of DynamoDB.
Data Prepper can support the following mapping options for types.
complex - An alternative mapping provided by Data Prepper which does not include BSON type information, but does include all the data from complex objects.
If you use the relaxed mapping, then we will use extended for any type that does not support relaxed. The relaxed is more closely related to extended than simple types.
Options
We will add a new type_mappings option group within the documentdb source. It will have the following options.
default - Can be simple, extended, or relaxed. All types will use this form.
object_id - Can be simple, extended, or complex. This configures how BSON ObjectIds are mapped. When configured, this overrides default for BSON ObjectIds
bindata - Can be simple, extended, or complex. This configures how BSON BinData is mapped. When configured, this overrides default for BSON BinData fields.
timestamp - Can be simple, extended, or complex. This configures how BSON Timestamps are mapped. When configured, this overrides default for BSON Timestamps.
dlvenable
changed the title
Support complex representations of DocumentDb data types.
Support complex & extended representations of DocumentDb data types.
May 3, 2024
dlvenable
changed the title
Support complex & extended representations of DocumentDb data types.
Support complex & relaxed representations of DocumentDb data types.
May 3, 2024
dlvenable
changed the title
Support complex & relaxed representations of DocumentDb data types.
Support alternative representations of DocumentDb data types.
May 3, 2024
Problem/Background
Data Prepper has an upcoming
documentdb
source. The issue #4458 proposes a simple data type solution. However, sometimes we need to get all the data that is available.Solution
Provide options for complex and extended types coming out of DynamoDB.
Data Prepper can support the following mapping options for types.
simple
- For complex BSON types, lose some subtype information. See DocumentDb simple representations of BSON types #4458 for more details.relaxed
- Uses the MongoDB relaxed JSON formatextended
- Uses the MongoDB extended/canonical JSON formatcomplex
- An alternative mapping provided by Data Prepper which does not include BSON type information, but does include all the data from complex objects.If you use the
relaxed
mapping, then we will useextended
for any type that does not supportrelaxed
. Therelaxed
is more closely related toextended
thansimple
types.Options
We will add a new
type_mappings
option group within thedocumentdb
source. It will have the following options.default
- Can besimple
,extended
, orrelaxed
. All types will use this form.object_id
- Can besimple
,extended
, orcomplex
. This configures how BSON ObjectIds are mapped. When configured, this overridesdefault
for BSON ObjectIdsbindata
- Can besimple
,extended
, orcomplex
. This configures how BSON BinData is mapped. When configured, this overridesdefault
for BSON BinData fields.timestamp
- Can besimple
,extended
, orcomplex
. This configures how BSON Timestamps are mapped. When configured, this overridesdefault
for BSON Timestamps.Complex types
ObjectId
For BSON ObjectId, the complex form would include the timstamp.
Input:
Output:
BinData
The complex BinData will include the subtype. It solves this by making the field an object which will translate into a nested field in OpenSearch.
Input:
Output:
Timestamp
The complex BinData will include the ordinal. It solves this by making the field an object which will translate into a nested field in OpenSearch.
Input:
Output:
Relaxed types
Configuring the relaxed types will also provide BSON type information. These mappings will look similar to the MongoDB relaxedformat.
BinData
Input:
Output:
Timestamp
Input:
Output:
Extended
Additionally we can include
extended
as an option to include all type information.Alternative
The original proposal had
complex_
boolean fields.However, I've changed the proposal to use an enum option since we want to have three options:
simple
,complex
, andrelaxed
.References
The text was updated successfully, but these errors were encountered: