marshmallow-dataframe

marshmallow-dataframe is a library that helps you generate marshmallow Schemas for Pandas DataFrames.

Usage

Let's start by creating an example dataframe for which we want to create a Schema. This dataframe has four columns: two of them are of string type, one is a float, and the last one is an integer.

import pandas as pd
import numpy as np
from marshmallow_dataframe import SplitDataFrameSchema

animal_df = pd.DataFrame(
    [
        ("falcon", "bird", 389.0, 2),
        ("parrot", "bird", 24.0, 2),
        ("lion", "mammal", 80.5, 4),
        ("monkey", "mammal", np.nan, 4),
    ],
    columns=["name", "class", "max_speed", "num_legs"],
)

You can then create a marshmallow schema that will validate and load dataframes that follow the same structure as the one above and that have been serialized with DataFrame.to_json with the orient=split format. The dtypes attribute of the Meta class is required, and other marshmallow Schema options can also be passed as attributes of Meta:

class AnimalSchema(SplitDataFrameSchema):
    """Automatically generated schema for animal dataframe"""

    class Meta:
        dtypes = animal_df.dtypes

When passing a valid payload for a new animal, this schema will validate it and build a dataframe:

animal_schema = AnimalSchema()

new_animal = {
    "data": [("leopard", "mammal", 58.0, 4), ("ant", "insect", 0.288, 6)],
    "columns": ["name", "class", "max_speed", "num_legs"],
    "index": [0, 1],
}

new_animal_df = animal_schema.load(new_animal)

print(type(new_animal_df))
# <class 'pandas.core.frame.DataFrame'>
print(new_animal_df)
#       name   class  max_speed  num_legs
# 0  leopard  mammal     58.000         4
# 1      ant  insect      0.288         6

However, if we pass a payload that doesn't conform to the schema, it will raise a marshmallow ValidationError exception with informative message about errors:

invalid_animal = {
    "data": [("leopard", "mammal", 58.0, "four")],  # num_legs is not an int
    "columns": ["name", "class", "num_legs"],  # missing  max_speed column
    "index": [0],
}

animal_schema.load(invalid_animal)

# Raises:
# marshmallow.exceptions.ValidationError: {
#     'columns': ["Must be equal to ['name', 'class', 'max_speed', 'num_legs']."],
#     'data': {0: {3: ['Not a valid integer.']}}
# }

marshmallow_dataframe can also generate Schemas for the orient=records format by following the above steps but using marshmallow_dataframe.RecordsDataFrameSchema as the superclass for AnimalSchema.

Installation

marshmallow-dataframe requires Python >= 3.6 and marshmallow >= 3.0. You can install it with pip:

pip install marshmallow-dataframe

Contributing

Contributions are welcome!

You can report a problem or feature request in the issue tracker. If you feel that you can fix it or implement it, please submit a pull request referencing the issues it solves.

Unit tests written using the pytest framework are in the tests directory, and are run using tox on Python 3.6 and 3.7. You can run the tests by installing tox:

pip install tox

and running the linters and tests for all Python versions by running tox, or for a specific Python version by running:

tox -e py36

We format the code with black, and you can format your checkout of the code before commiting it by running:

tox -e black -- .

Name		Name	Last commit message	Last commit date
Latest commit History 107 Commits
.github/workflows		.github/workflows
src/marshmallow_dataframe		src/marshmallow_dataframe
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
setup.py		setup.py
tox.ini		tox.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.github/workflows

.github/workflows

src/marshmallow_dataframe

src/marshmallow_dataframe

tests

tests

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

pyproject.toml

pyproject.toml

setup.py

setup.py

tox.ini

tox.ini

Repository files navigation

marshmallow-dataframe

Usage

Installation

Contributing

About

Releases

Packages

Contributors 2

Languages

License

facultyai/marshmallow-dataframe

Folders and files

Latest commit

History

Repository files navigation

marshmallow-dataframe

Usage

Installation

Contributing

About

Topics

Resources

License

Stars

Watchers

Forks

Languages