Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

series str as an iterator #3638

Closed
hayd opened this issue May 17, 2013 · 9 comments · Fixed by #3645
Closed

series str as an iterator #3638

hayd opened this issue May 17, 2013 · 9 comments · Fixed by #3645

Comments

@hayd
Copy link
Contributor

hayd commented May 17, 2013

This was noticed on SO by @dsm. It seems that the Series str method is a never ending iterator. I have no idea whether this should be classified as a bug, or just misuse. But here goes:

In [224]: g = (i for i in ds.str)

In [225]: next(g)
Out[225]:
google       NaN
wikimedia    NaN
wikipedia    NaN
wikitravel   NaN
dtype: float64

In [226]: next(g)
Out[226]:
google       NaN
wikimedia    NaN
wikipedia    NaN
wikitravel   NaN
dtype: float64

In [227]: next(g)
Out[227]:
google       NaN
wikimedia    NaN
wikipedia    NaN
wikitravel   NaN
dtype: float64

In [228]: next(g)
Out[228]:
google       NaN
wikimedia    NaN
wikipedia    NaN
wikitravel   NaN
dtype: float64

In [229]: list(g)  # lalalala
@jreback
Copy link
Contributor

jreback commented May 17, 2013

Not sure if this is a bug or a feature:

the reason __getitem__ exists in str (and thus its invoked when used in list context)

is that

In [18]: ds
Out[18]: 
google        40
wikimedia     22
wikipedia     10
wikitravel    33
dtype: int64

In [19]: Series(ds.index).str[0:5]
Out[19]: 
0    googl
1    wikim
2    wikip
3    wikit
dtype: object

not sure how useful that is though

@cpcloud
Copy link
Member

cpcloud commented May 17, 2013

u could have it iterate like zip and just throw StopIteration when the last character of the shortest string is reached or u could iterate to the last char of the longest and provide nans for the rest. the latter seems more pandas like

@hayd
Copy link
Contributor Author

hayd commented May 17, 2013

I'm a bit confused, the next(g) above is always the same (length of indexes doesn't seem to change).

@cpcloud
Copy link
Member

cpcloud commented May 17, 2013

@hayd @jreback check out my branch series-str-iter-3638 what do u think?

@cpcloud
Copy link
Member

cpcloud commented May 17, 2013

could change any to all to get the first type of behavior i mentioned. haven't tested it gotta run out, will test tmrw or later 2nite

@hayd
Copy link
Contributor Author

hayd commented May 18, 2013

I think any would be best, but... would anyone ever use this?

I suppose it makes sense, analogous to iterating through a string. I certainly have no idea what else it could/should do, perhaps more of an egg than a feature... :)

@cpcloud
Copy link
Member

cpcloud commented May 18, 2013

I agree. I'm not sure this is that useful but it's less surprising than
infinite iteration and it's consistent with the semantics of the rest of
vectorized strings.
On May 18, 2013 7:57 AM, "Andy Hayden" notifications@github.com wrote:

I think any would be best, but... would anyone ever use this?

I suppose it makes sense, analogous to iterating through a string. I
certainly have no idea what else it could/should do, perhaps more of an egg
than a feature... :)


Reply to this email directly or view it on GitHubhttps://github.com//issues/3638#issuecomment-18099694
.

@hayd
Copy link
Contributor Author

hayd commented May 18, 2013

I'm sold. pr?

@cpcloud
Copy link
Member

cpcloud commented May 18, 2013

sure. also i will revert back to self.get to avoid branching every time (no need to check for a slice when iterating)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants