Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CLN: Refactor string special methods to common decorator + safe unicode everywhere #4090

Closed
jtratner opened this issue Jul 1, 2013 · 13 comments · Fixed by #4092
Closed

CLN: Refactor string special methods to common decorator + safe unicode everywhere #4090

jtratner opened this issue Jul 1, 2013 · 13 comments · Fixed by #4092
Milestone

Comments

@jtratner
Copy link
Contributor

jtratner commented Jul 1, 2013

I was implementing some new objects for another PR and noticed that string methods are duplicated throughout (particularly on objects that don't inherit from each other). Wrote this up on this branch - https://github.com/jtratner/pandas/tree/refactor_string_magic_methods .

If you thing this is worthwhile, I'll split it up a little, add a bit better documentation and submit.

This is used multiple times:

    def __str__(self):
        """
        Return a string representation for a particular object.

        Invoked by str(obj) in both py2/py3.
        Yields Bytestring in Py2, Unicode String in py3.
        """

        if py3compat.PY3:
            return self.__unicode__()
        return self.__bytes__()

    def __bytes__(self):
        """
        Return a string representation for a particular object.

        Invoked by bytes(obj) in py3 only.
        Yields a bytestring in both py2/py3.
        """
        from pandas.core.config import get_option

        encoding = get_option("display.encoding")
        return self.__unicode__().encode(encoding, 'replace')

    def __repr__(self):
        """
        Return a string representation for a particular object.

        Yields Bytestring in Py2, Unicode String in py3.
        """
        return str(self)

Unicode tends to vary, but often is like this:

    def __unicode__(self):
        """
        Return a string representation for a particular object.

        Invoked by unicode(obj) in py2 only. Yields a Unicode String in both
        py2/py3.
        """
        prepr = pprint_thing(self, escape_chars=('\t', '\r', '\n'), quote_strings=True)
        return '%s(%s)' % (type(self).__name__, prepr)

Additionally, a number of objects aren't using a unicode-safe representation of themselves, so this would resolve that as well. Would this be useful to include?

@jreback
Copy link
Contributor

jreback commented Jul 1, 2013

I had this change in my refactor of series - but u r welcome to do separately

should be a top level pandas base class ( really should be PandasObject but that is used now)

that PandasObject, NDFrame, Index and such inherit from

with methods like this

@jtratner
Copy link
Contributor Author

jtratner commented Jul 1, 2013

One problem is that core/generic imports MultiIndex...should the baseclass go in a different file to avoid the circular import?

@jreback
Copy link
Contributor

jreback commented Jul 1, 2013

should be like a mix-in

I think Index makes sense as well here

On Jun 30, 2013, at 8:41 PM, Jeff Tratner notifications@github.com wrote:

@jreback So would you want to make Index inherit from this separately too?


Reply to this email directly or view it on GitHub.

@jtratner
Copy link
Contributor Author

jtratner commented Jul 1, 2013

Okay, should be totally simple to adapt.

@jreback
Copy link
Contributor

jreback commented Jul 1, 2013

yes maybe core/base.py ?

@jtratner
Copy link
Contributor Author

jtratner commented Jul 1, 2013

@jreback haha, I had just starting editing core/base right before I saw your note.

@jtratner
Copy link
Contributor Author

jtratner commented Jul 1, 2013

@jreback we could rename PandasObject and call the other PandasContainer or something like that...

@jreback
Copy link
Contributor

jreback commented Jul 1, 2013

that works

@jreback
Copy link
Contributor

jreback commented Jul 1, 2013

FYI there are also some other public objects that prob should inherit from this as well

Categorical, HDFStore (maybe)

others I am sure

@jtratner
Copy link
Contributor Author

jtratner commented Jul 1, 2013

Yeah, I'd caught those. Covered by Series, Frame and Panel via
PandasObject. Will be inherited from by Index, Int64Index and MultiIndex.
Also Sparse* and a number of items in HDFStore. Would you want HDFStore to
inherit from PandasObject or would it make more sense to make a separate
StringMixin (that requires a __unicode__ method)

On Sun, Jun 30, 2013 at 8:57 PM, jreback notifications@github.com wrote:

FYI there are also some other public objects that prob should inherit from
this as well

Categorical, HDFStore (maybe)

others I am sure


Reply to this email directly or view it on GitHubhttps://github.com//issues/4090#issuecomment-20259037
.

@jreback
Copy link
Contributor

jreback commented Jul 1, 2013

there be a real base class mixin
and maybe some other base classes (eg for HDStore)

shouldn't be too complicated

@jtratner
Copy link
Contributor Author

jtratner commented Jul 1, 2013

@jreback what? Slightly confused. I was thinking fo:

class StringMixin(object): # adds string methods

class PandasObject(StringMixin) # baseclass for Index, Categorical, PandasContainer, etc.

class PandasContainer(PandasObject): # baseclass for NDFrame

@jreback
Copy link
Contributor

jreback commented Jul 1, 2013

yep that's good

first 2 in core/base

3rd replaces current PandasObject def

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants