Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
131 changes: 120 additions & 11 deletions docs/reference/dsl_how_to_guides.md
Original file line number Diff line number Diff line change
Expand Up @@ -630,7 +630,7 @@ For more comprehensive examples have a look at the [DSL examples](https://github

### Document [doc_type]

If you want to create a model-like wrapper around your documents, use the `Document` class. It can also be used to create all the necessary mappings and settings in elasticsearch (see `life-cycle` for details).
If you want to create a model-like wrapper around your documents, use the `Document` class (or the equivalent `AsyncDocument` for asynchronous applications). It can also be used to create all the necessary mappings and settings in Elasticsearch (see [Document life cycle](#life-cycle) below for details).

```python
from datetime import datetime
Expand Down Expand Up @@ -721,9 +721,19 @@ class Post(Document):
published: bool # same as published = Boolean(required=True)
```

It is important to note that when using `Field` subclasses such as `Text`, `Date` and `Boolean`, they must be given in the right-side of an assignment, as shown in examples above. Using these classes as type hints will result in errors.
::::{note}
When using `Field` subclasses such as `Text`, `Date` and `Boolean` to define attributes, these classes must be given in the right-hand side.

```python
class Post(Document):
title = Text() # correct
subtitle: Text # incorrect
```

Python types are mapped to their corresponding field types according to the following table:
Using a `Field` subclass as a Python type hint will result in errors.
::::

Python types are mapped to their corresponding `Field` types according to the following table:

| Python type | DSL field |
| --- | --- |
Expand All @@ -735,7 +745,7 @@ Python types are mapped to their corresponding field types according to the foll
| `datetime` | `Date(required=True)` |
| `date` | `Date(format="yyyy-MM-dd", required=True)` |

To type a field as optional, the standard `Optional` modifier from the Python `typing` package can be used. When using Python 3.10 or newer, "pipe" syntax can also be used, by adding `| None` to a type. The `List` modifier can be added to a field to convert it to an array, similar to using the `multi=True` argument on the field object.
To type a field as optional, the standard `Optional` modifier from the Python `typing` package can be used. When using Python 3.10 or newer, "pipe" syntax can also be used, by adding `| None` to a type. The `List` modifier can be added to a field to convert it to an array, similar to using the `multi=True` argument on the `Field` object.

```python
from typing import Optional, List
Expand Down Expand Up @@ -763,7 +773,7 @@ class Post(Document):
comments: List[Comment] # same as comments = Nested(Comment, required=True)
```

Unfortunately it is impossible to have Python type hints that uniquely identify every possible Elasticsearch field type. To choose a field type that is different than the one that is assigned according to the table above, the desired field instance can be added explicitly as a right-side assignment in the field declaration. The next example creates a field that is typed as `Optional[str]`, but is mapped to `Keyword` instead of `Text`:
Unfortunately it is impossible to have Python type hints that uniquely identify every possible Elasticsearch `Field` type. To choose a type that is different than the one that is assigned according to the table above, the desired `Field` instance can be added explicitly as a right-side assignment in the field declaration. The next example creates a field that is typed as `Optional[str]`, but is mapped to `Keyword` instead of `Text`:

```python
class MyDocument(Document):
Expand All @@ -787,7 +797,7 @@ class MyDocument(Document):
category: str = mapped_field(Keyword(), default="general")
```

When using the `mapped_field()` wrapper function, an explicit field type instance can be passed as a first positional argument, as the `category` field does in the example above.
The `mapped_field()` wrapper function can optionally be given an explicit field type instance as a first positional argument, as the `category` field does in the example above to be defined as `Keyword` instead of the `Text` default.

Static type checkers such as [mypy](https://mypy-lang.org/) and [pyright](https://github.com/microsoft/pyright) can use the type hints and the dataclass-specific options added to the `mapped_field()` function to improve type inference and provide better real-time code completion and suggestions in IDEs.

Expand Down Expand Up @@ -829,17 +839,17 @@ s = MyDocument.search().sort(-MyDocument.created_at, MyDocument.title)

When specifying sorting order, the `+` and `-` unary operators can be used on the class field attributes to indicate ascending and descending order.

Finally, the `ClassVar` annotation can be used to define a regular class attribute that should not be mapped to the Elasticsearch index:
Finally, it is also possible to define class attributes and request that they are ignored when building the Elasticsearch mapping. One way is to type attributes with the `ClassVar` annotation. Alternatively, the `mapped_field()` wrapper function accepts an `exclude` argument that can be set to `True`:

```python
from typing import ClassVar

class MyDoc(Document):
title: M[str] created_at: M[datetime] = mapped_field(default_factory=datetime.now)
my_var: ClassVar[str] # regular class variable, ignored by Elasticsearch
anoter_custom_var: int = mapped_field(exclude=True) # also ignored by Elasticsearch
```


#### Note on dates [_note_on_dates]

The DSL module will always respect the timezone information (or lack thereof) on the `datetime` objects passed in or stored in Elasticsearch. Elasticsearch itself interprets all datetimes with no timezone information as `UTC`. If you wish to reflect this in your python code, you can specify `default_timezone` when instantiating a `Date` field:
Expand Down Expand Up @@ -878,7 +888,7 @@ first.meta.id = 47
first.save()
```

All the metadata fields (`id`, `routing`, `index` etc) can be accessed (and set) via a `meta` attribute or directly using the underscored variant:
All the metadata fields (`id`, `routing`, `index`, etc.) can be accessed (and set) via a `meta` attribute or directly using the underscored variant:

```python
post = Post(meta={'id': 42})
Expand Down Expand Up @@ -961,12 +971,111 @@ first = Post.get(id=42)
first.delete()
```

#### Integration with Pydantic models

::::{warning}
This functionality is in technical preview and may be changed or removed in a future release. Elastic will work to fix any issues, but features in technical preview are not subject to the support SLA of official GA features.
::::

::::{note}
This feature is available in the Python Elasticsearch client starting with release 9.2.0.
::::

Applications that define their data models using [Pydantic](https://docs.pydantic.dev/latest/) can combine these
models with Elasticsearch DSL annotations. To take advantage of this option, Pydantic's `BaseModel` base class
needs to be replaced with `BaseESModel` (or `AsyncBaseESModel` for asynchronous applications), and then the model
can include type annotations for Pydantic and Elasticsearch both, as demonstrated in the following example:

```python
from typing import Annotated
from pydantic import Field
from elasticsearch import dsl
from elasticsearch.dsl.pydantic import BaseESModel

class Quote(BaseESModel):
quote: str
author: Annotated[str, dsl.Keyword()]
tags: Annotated[list[str], dsl.Keyword(normalizer="lowercase")]
embedding: Annotated[list[float], dsl.DenseVector()] = Field(init=False, default=[])

class Index:
name = "quotes"
```

In this example, the `quote` attribute is annotated with a `str` type hint. Both Pydantic and Elasticsearch use this
annotation.

The `author` and `tags` attributes have a Python type hint and an Elasticsearch annotation, both wrapped with
Python's `typing.Annotated`. When using the `BaseESModel` class, the typing information intended for Elasticsearch needs
to be defined inside `Annotated`.

The `embedding` attribute includes a base Python type and an Elasticsearch annotation in the same format as the
other fields, but it adds Pydantic's `Field` definition as a right-hand side assignment.

Finally, any other items that need to be defined for the Elasticsearch document class, such as `class Index` and
`class Meta` entries (discussed later), can be added as well.

The next example demonstrates how to define `Object` and `Nested` fields:

```python
from typing import Annotated
from pydantic import BaseModel, Field
from elasticsearch import dsl
from elasticsearch.dsl.pydantic import BaseESModel

class Phone(BaseModel):
type: Annotated[str, dsl.Keyword()] = Field(default="Home")
number: str

class Person(BaseESModel):
name: str
main_phone: Phone # same as Object(Phone)
other_phones: list[Phone] # same as Nested(Phone)

class Index:
name = "people"
```

Note that inner classes do not need to be defined with a custom base class; these should be standard Pydantic model
classes. The attributes defined in these classes can include Elasticsearch annotations, as long as they are given
in an `Annotated` type hint.

All model classes that are created as described in this section function like normal Pydantic models and can be used
anywhere standard Pydantic models are used, but they have some added attributes:

- `_doc`: a class attribute that is a dynamically generated `Document` class to use with the Elasticsearch index.
- `meta`: an attribute added to all models that includes Elasticsearch document metadata items such as `id`, `score`, etc.
- `to_doc()`: a method that converts the Pydantic model to an Elasticsearch document.
- `from_doc()`: a class method that accepts an Elasticsearch document as an argument and returns an equivalent Pydantic model.

These are demonstrated in the examples below:

```python
# create a Pydantic model
quote = Quote(
quote="An unexamined life is not worth living.",
author="Socrates",
tags=["phillosophy"]
)

# save the model to the Elasticsearch index
quote.to_doc().save()

# get a document from the Elasticsearch index as a Pydantic model
quote = Quote.from_doc(Quote._doc.get(id=42))

# run a search and print the Pydantic models
s = Quote._doc.search().query(Match(Quote._doc.quote, "life"))
for doc in s:
quote = Quote.from_doc(doc)
print(quote.meta.id, quote.meta.score, quote.quote)
```

#### Analysis [_analysis]

To specify `analyzer` values for `Text` fields you can just use the name of the analyzer (as a string) and either rely on the analyzer being defined (like built-in analyzers) or define the analyzer yourself manually.

Alternatively you can create your own analyzer and have the persistence layer handle its creation, from our example earlier:
Alternatively, you can create your own analyzer and have the persistence layer handle its creation, from our example earlier:

```python
from elasticsearch.dsl import analyzer, tokenizer
Expand Down Expand Up @@ -1622,7 +1731,7 @@ for response in responses:

### Asynchronous Documents, Indexes, and more [_asynchronous_documents_indexes_and_more]

The `Document`, `Index`, `IndexTemplate`, `Mapping`, `UpdateByQuery` and `FacetedSearch` classes all have asynchronous versions that use the same name with an `Async` prefix. These classes expose the same interfaces as the synchronous versions, but any methods that perform I/O are defined as coroutines.
The `Document`, `BaseESModel`, `Index`, `IndexTemplate`, `Mapping`, `UpdateByQuery` and `FacetedSearch` classes all have asynchronous versions that use the same name with an `Async` prefix. These classes expose the same interfaces as the synchronous versions, but any methods that perform I/O are defined as coroutines.

Auxiliary classes that do not perform I/O do not have asynchronous versions. The same classes can be used in synchronous and asynchronous applications.

Expand Down
2 changes: 1 addition & 1 deletion docs/reference/dsl_tutorials.md
Original file line number Diff line number Diff line change
Expand Up @@ -134,7 +134,7 @@ In this example you can see:
* retrieving and saving the object into Elasticsearch
* accessing the underlying client for other APIs

You can see more in the `persistence` chapter.
You can see more in the [persistence](dsl_how_to_guides.md#_persistence_2) chapter.


## Pre-built Faceted Search [_pre_built_faceted_search]
Expand Down
51 changes: 38 additions & 13 deletions elasticsearch/dsl/document_base.py
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,8 @@
overload,
)

from typing_extensions import _AnnotatedAlias

try:
from types import UnionType
except ImportError:
Expand Down Expand Up @@ -343,6 +345,10 @@ def __init__(self, name: str, bases: Tuple[type, ...], attrs: Dict[str, Any]):
# the field has a type annotation, so next we try to figure out
# what field type we can use
type_ = annotations[name]
type_metadata = []
if isinstance(type_, _AnnotatedAlias):
type_metadata = type_.__metadata__
type_ = type_.__origin__
skip = False
required = True
multi = False
Expand Down Expand Up @@ -389,6 +395,12 @@ def __init__(self, name: str, bases: Tuple[type, ...], attrs: Dict[str, Any]):
# use best field type for the type hint provided
field, field_kwargs = self.type_annotation_map[type_] # type: ignore[assignment]

# if this field does not have a right-hand value, we look in the metadata
# of the annotation to see if we find it there
for md in type_metadata:
if isinstance(md, (_FieldMetadataDict, Field)):
attrs[name] = md

if field:
field_kwargs = {
"multi": multi,
Expand All @@ -401,17 +413,20 @@ def __init__(self, name: str, bases: Tuple[type, ...], attrs: Dict[str, Any]):
# this field has a right-side value, which can be field
# instance on its own or wrapped with mapped_field()
attr_value = attrs[name]
if isinstance(attr_value, dict):
if isinstance(attr_value, _FieldMetadataDict):
# the mapped_field() wrapper function was used so we need
# to look for the field instance and also record any
# dataclass-style defaults
if attr_value.get("exclude"):
# skip this field
continue
attr_value = attrs[name].get("_field")
default_value = attrs[name].get("default") or attrs[name].get(
"default_factory"
)
if default_value:
field_defaults[name] = default_value
if attr_value:
if isinstance(attr_value, Field):
value = attr_value
if required is not None:
value._required = required
Expand Down Expand Up @@ -490,12 +505,19 @@ def __delete__(self, instance: Any) -> None: ...
M = Mapped


class _FieldMetadataDict(dict[str, Any]):
"""This class is used to identify metadata returned by the `mapped_field()` function."""

pass


def mapped_field(
field: Optional[Field] = None,
*,
init: bool = True,
default: Any = None,
default_factory: Optional[Callable[[], Any]] = None,
exclude: bool = False,
**kwargs: Any,
) -> Any:
"""Construct a field using dataclass behaviors
Expand All @@ -505,22 +527,25 @@ def mapped_field(
options.

:param field: The instance of ``Field`` to use for this field. If not provided,
an instance that is appropriate for the type given to the field is used.
an instance that is appropriate for the type given to the field is used.
:param init: a value of ``True`` adds this field to the constructor, and a
value of ``False`` omits it from it. The default is ``True``.
value of ``False`` omits it from it. The default is ``True``.
:param default: a default value to use for this field when one is not provided
explicitly.
explicitly.
:param default_factory: a callable that returns a default value for the field,
when one isn't provided explicitly. Only one of ``factory`` and
``default_factory`` can be used.
when one isn't provided explicitly. Only one of ``factory`` and
``default_factory`` can be used.
:param exclude: Set to ``True`` to exclude this field from the Elasticsearch
index.
"""
return {
"_field": field,
"init": init,
"default": default,
"default_factory": default_factory,
return _FieldMetadataDict(
_field=field,
init=init,
default=default,
default_factory=default_factory,
exclude=exclude,
**kwargs,
}
)


@dataclass_transform(field_specifiers=(mapped_field,))
Expand Down
Loading
Loading