convert TimeSeries into Series #10
Comments
I should also modify get_usgs to return data as a Series instead of a DataFrame. the 'name' attribute should be able to act as a key (dv01585200 or iv01585200, for example) to match with filenames if the data gets saved, or to match up with metadata that might get collected, or to match up with a baseflow timeseries for the same site. |
Would it be an option to work with DataFrames as base for the hydropy datatype (currenty the .data of the attribute is a dataframe) of the HydroAnalysis class? Henc, a dataframe as the .data attribute provides the option (but also difficulty) to dervive infromation of multiple gauges time series at the same time (e.g. get_peaks should provide the peaks for all of them). Or should we opt for a Series as base, making all the methods work for a single gauge time series and make a separate class for the 'multiple-gauges' option? |
This is a good question! It never occurred to me to use the Series, so when I saw your usage of it, I thought it was such a good idea that I immediately wanted to use it too. I've also been thinking that it would be nice to have a data format that allowed you to save the baseflow time series along with the discharge timeseries. But that leads to the question: Should the discharge and baseflow each make a column in a dataframe, and then each additional site could add an additional dataframe to create a Panel? Or should the baseflow go into the Panel, and each site is a new column in a dataframe? I think to answer this we would have to try each, and see which is easiest to implement, and which is easiest for the user. If I can find some time, I have started playing around with this idea. I will post a branch as soon as I can.... ⌚️ |
I found some time to work on this some more. |
Some first thoughts on this, but I think it needs to destilate a bit more:
As a first braindump: I would think of a following setup:
Some methods will need both flow and rain, and these could be specified on the Analysis level. As I said, more an opinion for the moment, looking forward to feedback on this. When doing it like this, we could for the moment focus on HydroAnalysis and FlowAnalysis and create sufficient methods for these. An interesting point you make is the dailymean, and realtime as different examples for storing. Actually, I do think we should not make this as different custom datatypes, as this kind of logic is highly captures by Pandas itself. The frequency is stored inside a DateTimeIndex. Moreover, Pandas makes the difference between instantaneous TimeStamps and Period (http://pandas.pydata.org/pandas-docs/stable/timeseries.html#overview). So, using DataFrames basically supports both as such. However, some methods won't work on all frequencies and all spans. Hence, I think we should provide this kind of 'logic' to the methods (e.g. some decorators that can be reused to specify and check if the current timeseries frequency/characteristics comply to the method requirements. What do you think? |
@mroberge, with respect to the usage of Panels: pandas-dev/pandas#13563 |
Sorry that I've been away... As you say above, how should we organize the dataframe? Each column could either be a different variable or a different station. By having each column be a different station, you could perform a baseflow separation on the dataframe and produce a new dataframe with baseflow, and each column corresponds to a different station from the original dataframe. I like this approach because:
As far as the different Analysis ( |
I just fixed the TimeSeries issue, but before I close it, there is a lot of discussion going on here that should be preserved somehow... |
Apparently Pandas has depreciated TimeSeries, but all of the functionality is contained in Series.
The text was updated successfully, but these errors were encountered: