-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue with converting to dataframe #45
Comments
Thanks for filing this! I'm looking into this problem now.
I'm still working on this! |
It turns out that the request is 31.2 MB! That's without zip compression. I added a few lines of code that checks for duplicated rows and gets rid of them. This request works now, but it takes forever to combine all of the dataseries into one table. Your error message had 546 data series in it, but it was just getting started when it choked on the bad data! The final dataframe has 2618 columns!! Many of these are for temperature readings, which get summarized with a daily max and a daily min and a third column too. BTW, this is a much smaller request that duplicates your problem: |
closed with merged pull request #47. |
@taataam You can install the new version directly from github however. Try using: I'm about to merge the bugfix into develop now too. |
@mroberge Thank you for your quick response and help. I will give it a try. All the other states worked fine. I think it took about half an hour for the data of all the states over a period of one year to be downloaded and saved to a HDF file. My final goal is to get the data for a period of 20 or 30 years. |
@taataam So you are trying to download all of the data from all of the states for the past 20 to 30 years? One thing you can do is to limit your requests to only the discharge data. You probably don't want the temperature or chemistry data, for example. Also, you might want to reconsider getting all of the data locally. Why not use the internet as your hard drive, and request the data at the moment you need it? For example, if you wanted to calculate a flow duration chart for every station, you could download all of the data for one station, create your chart, and then move on to the next station. If you include all of the EPA chemistry data, there are over a million data collection sites!!! |
@mroberge I think I read somewhere in your documentation that by default it downloads only the discharge data. In the final data that I got with my code, there were only two columns other than date, discharge and the qualification. So do I have to explicitly give the data type in the request line? The reason that I download it locally is exactly because of the large amount of computations that I am planning to do with the data. They act as checkpoints so if something goes wrong somewhere in the code, whether a bug or a hardware issues (specially on a cluster) I don't have to do everything from the beginning. |
In the new versions, the software will request every variable that gets measured at a site unless you specify which parameter that you want. So, for example, if you only want discharge, then you can do this:
I'm sorry that the User's Guide is in such a woeful state! The docstrings do a much better job of explaining the parameters, and I've kept them up to date better. You can access them in IPython by typing I haven't been updating the User's Guide much lately because the code has been going through some major changes. Now that I've merged everything into my Please feel free to contact me by email too. -Marty |
Thanks for the tip. Then, I will check the help for now. The library is very useful, thanks for the time and effort. |
My pleasure!
Please let me know if there are any features that you think should be included. And of course, I would love to have you contribute some code or a test or a change to the documentation! It looks good for a project to have multiple contributors, and it helps me feel like the software is useful to someone!
…-Marty
________________________________
Martin Roberge · Professor
Geography and Environmental Planning<http://www.towson.edu/cla/departments/geography/>
Towson University<http://www.towson.edu/> · 8000 York Road · Towson, Maryland, 21252-0001
p. 410-704-5011
[cid:5db439a2-2f44-4fce-b098-9911eccf04ed]
________________________________
From: Taher Chegini <notifications@github.com>
Sent: Friday, March 22, 2019 11:44 AM
To: mroberge/hydrofunctions
Cc: Roberge, Martin; Mention
Subject: Re: [mroberge/hydrofunctions] Issue with converting to dataframe (#45)
Thanks for the tip. Then, I will check the help for now. The library is very useful, thanks for the time and effort.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub<#45 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AGIoY-e5s-ygv5RwkxeE6ew0J1qo8rA0ks5vZPpJgaJpZM4b-evu>.
|
Thank you. Sure, would be happy to contribute as much as I can. |
Description
I tried to get the streamflow data for PA but when I tried to make a dataframe I got the following error:
It works fine for other states though only PA.
What I Did
The text was updated successfully, but these errors were encountered: