# Standardizing Strings

Often, when dealing with strings, we have to worry about things like case or leading/trailing whitespace.

For example, if we were to receive the following two strings in data:

In [1]:
s1 = "   python"
s2 = " python   \t"

Obviously, these strings are not equal:

In [2]:
s1 == s2

False

The question becomes, do we want `s1` and `s2` to compare equal or not?

Usually, the answer is yes, and we want to "ignore" the white space.

In plain Python we would do this:

In [3]:
s1.strip() == s2.strip()

True

We could use custom validators to help work around this issue, but Pydantic offers us a way to set this up at the model level:

In [4]:
from pydantic import BaseModel, ConfigDict

In [5]:
class Model(BaseModel):
    model_config = ConfigDict(str_strip_whitespace=True)

    field: str

In [6]:
m1 = Model(field="   python")
m2 = Model(field="  python   \t")

In [7]:
m1

Model(field='python')

In [8]:
m2

Model(field='python')

In [9]:
m1 == m2

True

As you can see, the whitespace was stripped, and we can now easily compare strings with automatic whitespace stripping.

Another issue we may face, is whether we want to standardize certain string fields to be all lower case or upper case.

Pydantic has a way to specify that all string types in a model should be converted to either lowercase or uppercase.

If we want a different standardization, then we'll have to use custom validators, which we'll look at soon!

But depending on your needs, using Pydantic's built-in auto lower or upper case conversions might be helpful.

In [10]:
class Model(BaseModel):
    model_config=ConfigDict(str_to_lower=True)

    field: str

In [11]:
m = Model(field="PYTHON")
m

Model(field='python')

As you can see, the string was converted to lower case when the data was deserialized.

We can choose uppercase conversion instead as well:

In [12]:
class Model(BaseModel):
    model_config=ConfigDict(str_to_upper=True)

    field: str

In [13]:
m = Model(field="Python")
m

Model(field='PYTHON')