🗄️ df-and-order

Yeah, it's just like Law & Order, but Dataframe & Order!

pip install df_and_order

Using df-and-order your interactions with dataframes become very clean and predictable.

Tired of absolute file paths to data in shared notebooks in your repository?
Can't remember how your datasets were generated?
Want to have safe and reproducible data transformations?
Like declarative config-based solutions?

Good news for you!

How it looks in code?

Imagine the world where all you need to do for reading some dataframe you need just a few lines:

reader = MagicDfReader()
df = reader.read(df_id='user_activity_may_2020')

Maybe you are interested in some transformed version of that dataframe? No problem!

reader = MagicDfReader()
# ready to fit a model on!
model_input_df = reader.read(df_id='user_activity_may_2020', transform_id='model_input')

Wow. Is it really magic?

df-and-order works with yaml configs. Every config contains metadata about a dataset as well as all desired transfomations. Here's an example:

df_id: user_activity_may_2020  # here's the dataframe identifier
initial_df_format: csv
metadata:  # this section contains some useful information about the dataset
  author: Data Man
  data_collection_date: 2020-05-01
transforms:
  model_input:  # here's the transform identifier
    df_format: csv
    in_memory:  # means we want to perform transformations in memory every time we calling it, permanent transforms are supported as well
    - module_path: df_and_order.steps.pd.DropColsTransformStep  # file where to find class describing some transformation. this one drops columns
      params:  # init params for the transformation class
        cols:
        - redundant_col
    - module_path: df_and_order.steps.DatesTransformStep  # another transformation that converts str to datetime
      params:
        cols:
        - date_col

Okay, what exactly is a df-and-order's transform?

Every transformation is about changing an initial dataset in any way.

A transformation is made of one or many steps. Each step represents some operation. Here are examples of such operations:

dropping cols
adding cols
transforming existing cols
etc

df-and-order uses subclasses of DfTransformStepConfig to describe a step. It's possible and highly recommended to declare init parameters for any step in config. Using Single Responsibility principle we achieve a granular control over our entire transformation.

Just by looking at the config you can say how the transformed dataframe was created.

Take a look at the more detailed overview to find more exciting stuff.

I also wrote an article to describe the benefits, check it out! There are lemurs and stuff.

Hope the lib will help somebody to boost the productivity.

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
df_and_order		df_and_order
examples		examples
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
Pipfile		Pipfile
Pipfile.lock		Pipfile.lock
README.md		README.md
flake8.cfg		flake8.cfg
mypy.ini		mypy.ini
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🗄️ df-and-order

Yeah, it's just like Law & Order, but Dataframe & Order!

How it looks in code?

Wow. Is it really magic?

Okay, what exactly is a df-and-order's transform?

About

Releases 3

Packages

Languages

License

ityutin/df-and-order

Folders and files

Latest commit

History

Repository files navigation

🗄️ df-and-order

Yeah, it's just like Law & Order, but Dataframe & Order!

How it looks in code?

Wow. Is it really magic?

Okay, what exactly is a df-and-order's transform?

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 3

Packages 0

Languages

Packages