Skip to content

Conversation

@davidastephens
Copy link
Member

Fixes #28

@jreback
Copy link
Contributor

jreback commented Mar 27, 2015

setting the timezone is a bit odd and makes working with these times much harder for the user
datetimes are generally not as efficiently represented in the pandas structures

so unless you have a really compelling reason I would not put the tz

@davidastephens
Copy link
Member Author

Ya, I'm on the fence about the second one. Was unaware that it wasn't efficiently represented. I think we just have timezone on the quote time (first commit).

@jreback
Copy link
Contributor

jreback commented Mar 27, 2015

look at the dtypes of the frame before/after

@davidastephens
Copy link
Member Author

I see - datetime64 vs object. Looks like it doesn't matter if its in the index (both DateTimeIndex)?

I'm fine with leaving them naive. Thoughts @aisthesis?

@aisthesis
Copy link

The problem with naive is that applications will typically just interpret it as UTC, which it definitely isn't in the case of quote time. I feel like quote time at least should have a clear timezone. For performance in getting the data, the overhead in retrieval looks to me like it's coming from connecting to various Yahoo sources and not from Pandas putting everything together. But I could be wrong there, as I only see it as consumer.

It's also clear that the expirations are in a different spot both substantively (there's no actual time but just a date) and programmatically (they're a Pandas Timeseries within the index whereas quote time is a numpy datetime64 because numpy really only understands floats inside the array).

What do you guys think of making quote time have a timezone but leaving expiration tz-naive?

@aisthesis
Copy link

I disagree with jreback that setting the times is odd and most definitely that it makes working with these times more difficult for the user. Speaking as user, it creates problems for me not having the timezone. If you decide to leave it naive, I'll have to put in my own code to set the timezone in both cases. I can't speak to the internal efficiency issue within Pandas.

@davidastephens
Copy link
Member Author

As I see it, we have 3 options:

  1. Leave both naive
  2. Add timezone to quote time
  3. Add timezone to expiry date & quote time.

I'm inclined to the 2nd one. The timezone for the quote time has a meaning and would be useful for people not on the east coast. I'm iffy on adding a timezone to the expiry date. However, if we start adding option quotes for European exchanges, then timezones for expiry dates might start mattering.

@jreback @aisthesis Thoughts?

@jreback
Copy link
Contributor

jreback commented Apr 7, 2015

you have to be be consistent across the various methods. E.g. if you choose to do a timezone on say quote time, then the other data methods should do the same (I mean for say stock data). But to be honest I suspect most people either keep this data as relative and naive (e.g. 4pm, e.g. the reported time but as a naive time), or convert to UTC.

@aisthesis
Copy link

@jreback I suspect very few do anything with it as naive. You run into problems very quickly, as I have, unless your local machine doing the processing happens to be on the U.S. East Coast. Naive isn't a problem as long as you're just doing something like pulling it from command line to check it out. But as soon as the date matters in a program, you're forced (as I've done) simply to insert the proper timezone. Moreover, for those who keep the data as naive timezone, it wouldn't matter if it were correctly specified. Whereas not specifying it definitively creates problems for some. In other words, even if it doesn't particularly matter for some users, specifying the timezone hurts no one and helps most users who actually use the time programmatically.

Yahoo! presumably is just providing it as a string with no tz specification, right?

@aisthesis
Copy link

Maybe the least common denominator would be to put everything in UTC. The problem I have with naive timezones is that there is no reliable contract as to what the time actually is. So, I can't write reliable client code comparing times. Common use cases: Has the option expired? Is the given Quote_Time at market close? This will become a horrible mess if you ever expand the library beyond U.S. markets where then we would have naive times from clearly distinct timezones.

I can live with any of the following solutions:

  1. Everything has a timezone and correct time for that zone. I'm ok with 'UTC' or 'US/Eastern' for U.S. markets. I agree with @jreback that the choice should be consistent.
  2. Timezone is None but there is a clear contract that times are actually UTC. I find this suboptimal, but then I know what I'm dealing with and can write my code accordingly.

I find it unsatisfactory to have a naive timezone with no way to tell what the corresponding UTC time actually is.

@davidastephens
Copy link
Member Author

Just to continue this discussion. What if we just make objects that have a time attached to them (ie: just option quote time currently) have the correct time zone, while leaving objects that are just dates naive. Does that work? Or do we need to be all or none?

@aisthesis
Copy link

That would work for my use case.

@davidastephens davidastephens added this to the 0.2.0 milestone Aug 23, 2015
@davidastephens davidastephens modified the milestones: 0.2.1, 0.2.0 Oct 7, 2015
@davidastephens davidastephens deleted the issue28 branch November 26, 2015 05:44
@aisthesis
Copy link

What was the final verdict here? Are the Timestamps all in EST now?

@davidastephens
Copy link
Member Author

They are still all naive. @jreback is your concern about performance alleviated now in 0.17+?

http://pandas.pydata.org/pandas-docs/stable/whatsnew.html#whatsnew-0170-tz

@jreback
Copy link
Contributor

jreback commented Dec 6, 2015

yep putting them in tz would be ok now
(though an API change)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Options quotes are missing timezone

4 participants