Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add unique constraint to list? #296

Open
samuelcolvin opened this issue Oct 17, 2022 · 14 comments
Open

Add unique constraint to list? #296

samuelcolvin opened this issue Oct 17, 2022 · 14 comments
Assignees

Comments

@samuelcolvin
Copy link
Member

I'm not sure about this, it would need to have minimal or no impact on performance when switched off.

Some questions if we are to support it:

  • Should we implement unique only on the list validator or on tuple, set, frozenset and general iterables as we build the vec?
  • Should we implement a unique check on set and frozenset validators - this would have minimal performance impact as it would involve simply checking that the length of the input before and after creating the set, but sets loose order so won't be sufficient for some scenarios
  • what's the most performant way to implement this in rust??? I guess for list[int] and a few other types, we could do something pretty performant with an AHashSet but a general case will require some fairly slow python.
@odiseo0
Copy link

odiseo0 commented Oct 20, 2022

I may be confused but, why set and frozenset? Those collections doesn't allow for duplicate values.

About the unique implementation, would it be compared by equality or by identity?

@samuelcolvin
Copy link
Member Author

equality.

I may be confused but, why set and frozenset? Those collections doesn't allow for duplicate values.

Well, the question is whether [1, 2, 2, 3] as input to a set should raise an error - I think that's a useful option, but doesn't solve all problems since sets loose order.

@odiseo0
Copy link

odiseo0 commented Oct 20, 2022

Ok, I understood. I think it's okay and is up to the developer whether to use it or not.

I'll try to do an implementation as a draft or as a reference.

@samuelcolvin
Copy link
Member Author

Great, thank you.

@ybressler
Copy link

Ok, I understood. I think it's okay and is up to the developer whether to use it or not.

I'll try to do an implementation as a draft or as a reference.

Q: You are doing this in python? Or rust? (Asking out of curiosity >> I want to learn how this stuff works.)

@samuelcolvin
Copy link
Member Author

this needs to be in rust since the list validator is all written in rust, see

impl Validator for ListValidator {

@sasanjac
Copy link

After the removal of unique_items in v2 and this issue there is no way in v2 to specify a unique array when serializing the model.

@PaarthShah
Copy link

After the removal of unique_items in v2 and this issue there is no way in v2 to specify a unique array when serializing the model.

Yeah, this one was an unfortunate loss for me; I just want a List[T] with unique, sorted elements.

@dmontagu
Copy link
Collaborator

dmontagu commented Jul 5, 2023

I think this is a way you can achieve this in v2:

from typing import Annotated

from pydantic import BaseModel, AfterValidator, PlainSerializer


def require_sorted_unique(v):
    if v != sorted(set(v)):
        raise ValueError('not sorted unique')
    return v


RequireSortUniqueDuringValidation = AfterValidator(require_sorted_unique)
SortUniqueDuringValidation = AfterValidator(lambda v: sorted(set(v)))
SortUniqueDuringSerialization = PlainSerializer(lambda v: sorted(set(v)))


class Model(BaseModel):
    x: Annotated[list[int], SortUniqueDuringValidation]
    y: Annotated[list[int], SortUniqueDuringSerialization]
    z: Annotated[list[int], RequireSortUniqueDuringValidation]


some_ints = [5, 5, 4, 4, 3, 3, 2, 2, 1, 1]
m = Model(x=some_ints, y=some_ints, z=[])
print(m)
# > x=[1, 2, 3, 4, 5] y=[5, 5, 4, 4, 3, 3, 2, 2, 1, 1] z=[]
print(m.model_dump())
# > {'x': [1, 2, 3, 4, 5], 'y': [1, 2, 3, 4, 5], 'z': []}

Model(x=[], y=[], z=[5, 4])
"""
pydantic_core._pydantic_core.ValidationError: 1 validation error for Model
z
  Value error, not sorted unique [type=value_error, input_value=[5, 4], input_type=list]
    For further information visit https://errors.pydantic.dev/2.0.2/v/value_error
"""

Does that work for you?

@orfisko
Copy link

orfisko commented Jul 14, 2023

@dmontagu the simplicity of the conlist implementation vs your solution is difficult to beat. I have some experience in pydantic now and it still took me too long to understand what your solution actually entails. I am definitely in favour of getting the conlist working again as it was.

@AIGeneratedUsername
Copy link

AIGeneratedUsername commented Jul 27, 2023

Here is a real use case. MongoDB + Pydantic.

class Document:
    unique_elements: set[int]

Using set will give an error from pymongo:

bson.errors.InvalidDocument: cannot encode object: set(), of type: <class 'set'>

It would be convenient to have something like

class Document:
    unique_elements: Annotated[list[int, ListConstraints(unique_items=True)]

Of course I can add a validator (or to use a workaround above #296 (comment))

class Document:
   unique_elements: list[int]

   @model_validator
   # validate here that `unique_elements` has only unique elements...

but restricting a list/tuple seems to be within the scope of Pydantic.

At the worst case, #296 (comment) can be added to the package but with a warning in the documentation that this is a potentially slow operation (pure Python).

@adriangb adriangb self-assigned this Jul 30, 2023
@grabov
Copy link

grabov commented Aug 8, 2023

In my case I use unique_items for list[], as I need to keep the same order of items as it was provided. And unique_items was a perfect option for my case.

Now,

  • if I use set[] the items are in random order.
  • if I use list[] there is no unique items in the schema and no validation.

Are there any critical issues with this option? I can try to create an PR to return it back.

@adriangb
Copy link
Member

adriangb commented Aug 9, 2023

See #820 (comment), I think this solves all of the requirements and will be about as fast as it can be.

@caniko
Copy link

caniko commented Nov 14, 2023

Before we had unique=True, but now we need several lines of code. If this is absolutely necessary, can we at least have some syntactic sugar for it and bring unique back that way?

Note that set is unordered. If we really have to use set, we will have to use an implementation that is ordered and outside the STD.

Edit: I changed my mind, we should follow @adriangb's example.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.