Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Already on GitHub? Sign in to your account

series str as an iterator #3638

Closed
hayd opened this Issue May 17, 2013 · 9 comments

Comments

Projects
None yet
3 participants
Contributor

hayd commented May 17, 2013

This was noticed on SO by @dsm. It seems that the Series str method is a never ending iterator. I have no idea whether this should be classified as a bug, or just misuse. But here goes:

In [224]: g = (i for i in ds.str)

In [225]: next(g)
Out[225]:
google       NaN
wikimedia    NaN
wikipedia    NaN
wikitravel   NaN
dtype: float64

In [226]: next(g)
Out[226]:
google       NaN
wikimedia    NaN
wikipedia    NaN
wikitravel   NaN
dtype: float64

In [227]: next(g)
Out[227]:
google       NaN
wikimedia    NaN
wikipedia    NaN
wikitravel   NaN
dtype: float64

In [228]: next(g)
Out[228]:
google       NaN
wikimedia    NaN
wikipedia    NaN
wikitravel   NaN
dtype: float64

In [229]: list(g)  # lalalala
Contributor

jreback commented May 17, 2013

Not sure if this is a bug or a feature:

the reason __getitem__ exists in str (and thus its invoked when used in list context)

is that

In [18]: ds
Out[18]: 
google        40
wikimedia     22
wikipedia     10
wikitravel    33
dtype: int64

In [19]: Series(ds.index).str[0:5]
Out[19]: 
0    googl
1    wikim
2    wikip
3    wikit
dtype: object

not sure how useful that is though

Member

cpcloud commented May 17, 2013

u could have it iterate like zip and just throw StopIteration when the last character of the shortest string is reached or u could iterate to the last char of the longest and provide nans for the rest. the latter seems more pandas like

Contributor

hayd commented May 17, 2013

I'm a bit confused, the next(g) above is always the same (length of indexes doesn't seem to change).

Member

cpcloud commented May 17, 2013

@hayd @jreback check out my branch series-str-iter-3638 what do u think?

Member

cpcloud commented May 17, 2013

could change any to all to get the first type of behavior i mentioned. haven't tested it gotta run out, will test tmrw or later 2nite

Contributor

hayd commented May 18, 2013

I think any would be best, but... would anyone ever use this?

I suppose it makes sense, analogous to iterating through a string. I certainly have no idea what else it could/should do, perhaps more of an egg than a feature... :)

Member

cpcloud commented May 18, 2013

I agree. I'm not sure this is that useful but it's less surprising than
infinite iteration and it's consistent with the semantics of the rest of
vectorized strings.
On May 18, 2013 7:57 AM, "Andy Hayden" notifications@github.com wrote:

I think any would be best, but... would anyone ever use this?

I suppose it makes sense, analogous to iterating through a string. I
certainly have no idea what else it could/should do, perhaps more of an egg
than a feature... :)


Reply to this email directly or view it on GitHubhttps://github.com/pydata/pandas/issues/3638#issuecomment-18099694
.

Contributor

hayd commented May 18, 2013

I'm sold. pr?

Member

cpcloud commented May 18, 2013

sure. also i will revert back to self.get to avoid branching every time (no need to check for a slice when iterating)

@jreback jreback closed this in #3645 May 19, 2013

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment