-
-
Notifications
You must be signed in to change notification settings - Fork 573
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pandas dataframe values return as numpy.datetime64 objects in local time zone. parse_time does not understand these objects. #798
Comments
This is important for the LightCurve object, where you can see the same behaviour, e.g.
|
ping @DanRyanIrish this is what we were talking about at SIPWork no? |
Hi @Cadair. Apologies for my late reply. I have been away for much of the last month and I am just catching up on things. This is not exactly what we were talking about. What we were talking about is that when you enter at a lightcurve object time index to parse_time(), it returns and Timestamp object, not a datetime object. Therefore, parse_time does not consistently return the same type of object. For example: In [1]: import sunpy.lightcurve
In [2]: from sunpy.time import parse_time
In [3]: glc = sunpy.lightcurve.GOESLightCurve.create("2014-01-01", "2014-01-02")
In [4]: lc_time = parse_time(glc.data.index[0])
In [5]: lc_time
Out[5]: Timestamp('2014-01-01 00:00:00.421999', tz=None) Meanwhile for other inputs to parse_time() you get a datetime object. In [6]: str_time = parse_time("2014-01-01")
In [7]: str_time
Out[7]: datetime.datetime(2014, 1, 1, 0, 0) |
@aringlis @DanRyanIrish can you check that this is fixed now. |
@aringlis @Cadair: This is now fixed for my situation, i.e. if you do lc_time = parse_time(glc.data.index[0]) then lc_time is a datetime.datetime object. However, @aringlis's situation remains the same. If you do lc_time = glc.data.index.values[0] the result is still a numpy.datetime64 in the local time zone. And entering this to |
@DanRyanIrish interesting, it should have solved the second one as well. Thanks for checking, I will look into again. |
@aringlis @Cadair I am a prospective GSOC 2015 student and I'd like to work on this Issue. In fact if I may this may work: https://github.com/ankitkmr/sunpy/blob/master/patch.py |
@ankitkmr could you do a pull-request with the changes within the files that are affected? Then in the message of the PR (not in the title) you can link to this issue by using # followed by the number (ie. #798 ). |
@dpshelio Yeah sure I will do that and I think tzlocal.get_localzone() wont be a problem as you can see that I have saved it's value at an instant in a variable right before converting the data in pandas dataframe . The problem is that conversion into pd.Dataframe brings in perspective of current local time in that data and we need the local time that it brings in for use in times.tz_localize(tz) for converting back to utc !! hope that helps. |
@ankitkmr I've just realised that Also, it seems Though.. I'm kind of lost now on what this needs to fix... I've found the following.
>>> parse_time('2015-03-18T12:49:22.979471000+0000')
ValueError: 2015-03-18T12:49:22.979471000+ is not a valid time string! It seems it takes the timezone away... This should be fixed.
>>> test_pandas.index.tz_convert?
...
tz : ....
None will remove timezone holding UTC time.
...
>>> test_pandas.index.tz_convert()
TypeError: tz_convert() takes exactly 2 arguments (1 given)
...
|
@dpshelio needs to be made So I can add that support but I kinda dont know how to register for the 000 in the end of +000 like '%T': '(?P\d{4})', # Assuming I replace T for end 000 what goes after colon. Need some help here Let me work out some other alternative where we convert +000 to Zulu format because that is supported by parse_time. Can I convert all localtime data into corresponding UTC and then pass it like sunpy.time.parse_time('2005-08-04T00:01:02.000Z') ? |
@dpshelio Second bullet , and str() yeah true that just that now again I have to add support for +000 formatted time_string Third bullet, thats what i used. And as I explained above astropy.times won't be any new help i think |
@dpshelio Also I didn't get the PR thing...should I start contributing to unifiedDownloader now or it is to get familiar with and base my proposal around ? A to-do list before application would clear my doubts. Also should I start my proposal now. I mean I have done like quarter of it but I would like to get focussed on it after I've completed all the prereqs Thanks a lot |
>>> parse_time('2015-03-18T12:49:22.979471000+0000') ValueError: 2015-03-18T12:49:22.979471000+ is not a valid time string! OK Fixed the issue here, run this script, https://github.com/ankitkmr/sunpy/blob/master/parse_time.py How do I add this correction in the original source code now? I am new to open source dev :( |
Workaround to this problem : https://github.com/ankitkmr/sunpy/blob/master/patch.py |
This offers a solution to the issue sunpy#798
This offers a solution to the issue sunpy#798
This offers a solution to the issue sunpy#798
This offers a solution to the issue sunpy#798
If lightcurve is dead, does this need to be open? Does it affect time series? Should I add a timeseries label? |
ping @Alex-Ian-Hamilton |
I think I might have fixed this with #2370 ? |
Came across this issue today when using a pandas DataFrame. When you explicitly ask for the values of the indices in a DataFrame, they can be returned as numpy.datetime64 objects. These time objects have the timezone attached to the end of them (see example below). parse_time at the moment cannot understand these objects.
The following example explains what I'm on about...
If you now print the values from the pandas dataframe, they are displayed in another time zone! (not UT). In the following example, it displays a numpy.datetime64 in UT-5.
Also, parse_time can't read this format at the moment.
The text was updated successfully, but these errors were encountered: