Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

327 add documentation for dataframeschema transformations #333

Conversation

abyz0123
Copy link
Contributor

Here is some basic documentation for the verb functions.

I also added some error handling on a few of the other functions, since for some, if you passed a non-existing column it would just return the original dataframeschema, without throwing an error.

@cosmicBboy
Copy link
Collaborator

thanks @ktroutman! looks like there are a few lines that need to be re-formatted to stay <= 79 chars:
https://travis-ci.org/github/pandera-dev/pandera/jobs/745757418#L637-L647

I think we should also edit this part of the sphinx docs to point users to the API reference:
https://github.com/pandera-dev/pandera/blob/master/docs/source/dataframe_schemas.rst#dataframeschema-transformations

I'll leave specifics up to you, but something like:

Pandera supports transforming a schema using :func:~pandera.schemas.DataFrameSchema.add_columns and :func:~pandera.schemas.DataFrameSchema.remove_columns.

Once you've defined a schema, you can add columns to the schema and to create a new schema object with the desired modifications

Then at the end of the section

The available schema transformation methods are:

  • :func:~pandera.schemas.DataFrameSchema.add_columns
  • :func:~pandera.schemas.DataFrameSchema.remove_columns
  • ...

@cosmicBboy
Copy link
Collaborator

hey FYI @ktroutman I'm about to merge a big changeset from dev -> master in the next few days! unless you're almost done with the changes discussed in #330 I think we should merge changes in this PR and make a new PR for the update_columns method.

The issues discussed in #333 (comment) should also be addressed in this PR. Let me know if you need any help!

@abyz0123
Copy link
Contributor Author

abyz0123 commented Dec 1, 2020

Hi @cosmicBboy , ok I'm going to submit the latest changes to this PR tonight, which include a bunch of docs updates, a possible bug fix that i found in rename_columns, and the update columns function. I just have to iron out the tests for update_columns and then should be good to go. If that's too much to pack in for this merge, then I can split the update_columns and the bug fix into separate PRs.

… update columns add, some suggested helpful hints for helping with documentation in the CONTRIBUTING.md
@@ -19,6 +19,21 @@ create a development environment that is separate from your existing Python
environment so that you can make and test changes without compromising your
own work environment.

### Contributing documentation
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a thought

)

new_columns: Dict[str, Dict[str, Any]] = {}
for col in new_schema.columns:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Open to sleeker implementations

rename_dict[col_name] if col_name in rename_dict else col_name
): col_attrs
for col_name, col_attrs in self.columns.items()
(rename_dict[col_name] if col_name in rename_dict else col_name): (
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The previous implementation left the name unchanged. So if you changed a name from col1 to col2, the name parameter remained col1. I understand why changing names is undesirable, but this is surely not intended?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, nice! thanks for catching this

Copy link
Collaborator

@cosmicBboy cosmicBboy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, thanks @ktroutman! 🚀

@codecov-io
Copy link

codecov-io commented Dec 4, 2020

Codecov Report

Merging #333 (4839b04) into master (ac41212) will decrease coverage by 0.09%.
The diff coverage is 94.44%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #333      +/-   ##
==========================================
- Coverage   98.79%   98.70%   -0.10%     
==========================================
  Files          18       18              
  Lines        1747     1780      +33     
==========================================
+ Hits         1726     1757      +31     
- Misses         21       23       +2     
Impacted Files Coverage Δ
pandera/schemas.py 97.80% <94.44%> (-0.32%) ⬇️
pandera/model.py 100.00% <0.00%> (ø)
pandera/typing.py 100.00% <0.00%> (ø)
pandera/model_components.py 100.00% <0.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ac41212...4839b04. Read the comment docs.

@cosmicBboy cosmicBboy merged commit 3335e1a into unionai-oss:master Dec 4, 2020
@abyz0123 abyz0123 deleted the 327_add_documentation_for_dataframeschema_transformations branch February 3, 2021 21:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants