Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

read_spss doens't return the metadata #54264

Closed
1 of 3 tasks
HadarEliyahu39 opened this issue Jul 26, 2023 · 4 comments · Fixed by #55472
Closed
1 of 3 tasks

read_spss doens't return the metadata #54264

HadarEliyahu39 opened this issue Jul 26, 2023 · 4 comments · Fixed by #55472
Assignees
Labels
Enhancement IO Data IO issues that don't fit into a more specific label metadata _metadata, .attrs

Comments

@HadarEliyahu39
Copy link

Feature Type

  • Adding new functionality to pandas

  • Changing existing functionality in pandas

  • Removing existing functionality in pandas

Problem Description

`def read_spss(
path: str | Path,
usecols: Sequence[str] | None = None,
convert_categoricals: bool = True,
) -> DataFrame:
"""
Load an SPSS file from the file path, returning a DataFrame.

.. versionadded:: 0.25.0

Parameters
----------
path : str or Path
    File path.
usecols : list-like, optional
    Return a subset of the columns. If None, return all columns.
convert_categoricals : bool, default is True
    Convert categorical columns into pd.Categorical.

Returns
-------
DataFrame
"""
pyreadstat = import_optional_dependency("pyreadstat")

if usecols is not None:
    if not is_list_like(usecols):
        raise TypeError("usecols must be list-like.")
    else:
        usecols = list(usecols)  # pyreadstat requires a list

df, _ = pyreadstat.read_sav(
    stringify_path(path), usecols=usecols, apply_value_formats=convert_categoricals
)
return df`

pandas has this function, which uses pyreadstat to read 'sav' files and returns a dataset back, but for some reason it ignores the metadata, and I guess some may use this, otherwise why using spss at first place

Feature Description

Return the metadata along with the dataframe

Alternative Solutions

`def read_spss(
path: str | Path,
usecols: Sequence[str] | None = None,
convert_categoricals: bool = True,
) -> DataFrame:
"""
Load an SPSS file from the file path, returning a DataFrame.

.. versionadded:: 0.25.0

Parameters
----------
path : str or Path
    File path.
usecols : list-like, optional
    Return a subset of the columns. If None, return all columns.
convert_categoricals : bool, default is True
    Convert categorical columns into pd.Categorical.

Returns
-------
DataFrame
"""
pyreadstat = import_optional_dependency("pyreadstat")

if usecols is not None:
    if not is_list_like(usecols):
        raise TypeError("usecols must be list-like.")
    else:
        usecols = list(usecols)  # pyreadstat requires a list

df, metadata = pyreadstat.read_sav(
    stringify_path(path), usecols=usecols, apply_value_formats=convert_categoricals
)
return df, metadata`

Additional Context

No response

@HadarEliyahu39 HadarEliyahu39 added Enhancement Needs Triage Issue that has not been reviewed by a pandas team member labels Jul 26, 2023
@lithomas1 lithomas1 added IO Data IO issues that don't fit into a more specific label metadata _metadata, .attrs and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Jul 31, 2023
@lithomas1
Copy link
Member

The best way is probably to store that in df.attrs.

Contributions are welcome if you'd like to take a look.

@rmhowe425
Copy link
Contributor

@lithomas1 Is it okay for me to work on this issue, or do other members need to approve this issue as well?

@rmhowe425
Copy link
Contributor

take

@lithomas1
Copy link
Member

lithomas1 commented Aug 18, 2023

@lithomas1 Is it okay for me to work on this issue, or do other members need to approve this issue as well?

Sure, feel free to give it a go. I think this has already been done with parquet, so it shouldn't be too controversial.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement IO Data IO issues that don't fit into a more specific label metadata _metadata, .attrs
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants