Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Adding '.' as an na_value for FRED. #3469

Merged
4 commits merged into from Apr 27, 2013

Conversation

@TomAugspurger
Copy link
Contributor

commented Apr 27, 2013

The St. Louis Fed's FRED uses '.' as an NaN marker. I don't think the user has any way to specify the na_value via DataReader, so this just hard codes '.' as an na_value in the fred data-fetching function.

Before:

In [4]: from pandas.io.data import DataReader

In [5]: df = DataReader('DFII5', data_source="fred")

In [6]: df.head()
Out[6]:
           DFII5
DATE
2010-01-01     .
2010-01-04  0.52
2010-01-05  0.44
2010-01-06  0.44
2010-01-07  0.43

After

In [2]: from pandas.io.data import DataReader

In [3]: df = DataReader('DFII5', data_source="fred")

In [4]: df.head()
Out[4]:
            DFII5
DATE
2010-01-01    NaN
2010-01-04   0.52
2010-01-05   0.44
2010-01-06   0.44
2010-01-07   0.43

In [5]: pd.__version__
Out[5]: '0.11.0rc1'

Hopefully I've done everything correctly with the pull request. Let me know if anything needs fixing.

@ghost

This comment has been minimized.

Copy link

commented Apr 27, 2013

I see you haven't got travis going, is this covered by the test suite? if not, please add a test to
monitor this in the future, after that I have no problem merging.

thanks.

@TomAugspurger

This comment has been minimized.

Copy link
Contributor Author

commented Apr 27, 2013

Thanks, I'll get a test going added and I'll try to get travis set up also. I've never used it before.

One question: I could be missing it, but I'm not sure the fred code is covered by any tests. I greped through the pandas/io/tests/ directory and I didn't see any references to fred. Should I start a new file similar to the ones for yahoo and wb? I can add some more tests while I'm in there too.

@ghost

This comment has been minimized.

Copy link

commented Apr 27, 2013

Yeah, that's not a part of the code I'm familiar with, it's possible. In which case
you would be doing pandas an even greater service by throwing together some tests.

@ghost

This comment has been minimized.

Copy link

commented Apr 27, 2013

CONTRIBUTING.MD has a link to the travis instructions, and other stuff if you
have the patience to read through it.

remember to tag your tests @network(+@slow?)

@ghost

This comment has been minimized.

Copy link

commented Apr 27, 2013

related #3413

@TomAugspurger

This comment has been minimized.

Copy link
Contributor Author

commented Apr 27, 2013

OK I think Travis is testing my branch now. Assuming that passes, do you want me to try to stuff both my changes (handling the NaNs and adding the tests) into a single PR? Or should I do them separately?

@ghost

This comment has been minimized.

Copy link

commented Apr 27, 2013

Same PR makes sense, feel free to add as many commits as you like to the PR branch and
push, just make it clear whether it's ready for merging or still WIP.

If your git-fu is good, you can rewrite and force push and all that in PR branches, that's
acceptable, if not, just add commits and push and I or another dev will squash things
down for you before merging, that's not a problem.

Thanks.

@TomAugspurger

This comment has been minimized.

Copy link
Contributor Author

commented Apr 27, 2013

I added a check for if the name you gave is a valid FRED series name, and a test that goes along with it. I'm worried that it will break if the St. Louis Fed decides to change their response to an invalid name. But the way I coded it, it should only occur if we already failed to return a dataframe. It's basically just a more informative error message.

I'd like to take a crack at #3413, but I might not have time until later this week. It should be something as simple as changing the fred branch of the main DataReader function to

names = list(name)  # if string is passed for a single series
return pd.concat([get_data_fred(name, start, end) for name in names])

I may let someone else handle the history rewriting. Last time I tried that over on statsmodels, I think I gave Joseph a headache. Travis is testing now. I'll add the final bit and mark it as ready to merge when it's done.

@TomAugspurger

This comment has been minimized.

Copy link
Contributor Author

commented Apr 27, 2013

OK This should be ready to go if you think it looks acceptable.

Thanks for the help y-p.

@ghost

This comment has been minimized.

Copy link

commented Apr 27, 2013

One last thing, could you add a note to RELEASE.rst, to let users know you "fixed it ™"?

@TomAugspurger

This comment has been minimized.

Copy link
Contributor Author

commented Apr 27, 2013

Will do. I've just noticed that I'm on 0.11.0rc1 for my master, so I've got an out of date RELEASE.rst; CONTRIBUTING.md says "Don't merge upstream into a branch you're going to submit as a PR". Am I ok doing

git checkout master
git rebase upstream/master
git checkout fred_na  # my branch
git rebase master

And then adding the changes to RELEASE.rst and pushing again? Sorry.

@ghost

This comment has been minimized.

Copy link

commented Apr 27, 2013

Exactly right. some people have a hard time grokking rebase. but that's the way
you should it. You can also just leave it in, and let us handle the conflicts if you don't
feel confident.

@TomAugspurger

This comment has been minimized.

Copy link
Contributor Author

commented Apr 27, 2013

Well I think I messed something up. After fetching and rebaseing my master onto upstream/master, when I did git checkout fred_na and get rebase master things seemed to work correctly, but now I'm told that

# Your branch and 'origin/fred_na' have diverged,
# and have 121 and 3 different commits each, respectively.
#   (use "git pull" to merge the remote branch into yours)

So I did git pull but now it looks like all of those other commits are going to be part of this PR when I push, which we don't want right?

@ghost

This comment has been minimized.

Copy link

commented Apr 27, 2013

first, don't panic.

Most important git command in existence:

git reflog

Coupled with

git reset --hard commithash
@ghost

This comment has been minimized.

Copy link

commented Apr 27, 2013

that error message is actually fine. origin has the previous state based on 0.11.0rc1,
your local branch has been rebased. so they have diverged.
your local is correct, and the remote needs to be clobbered.

write down the commit hash that's "safe" to go back to: a80acc4
then clobber the GH branch with your updated local branch (only allowed
on your own PR branches, not generally) with:

git push origin *branchname* --force
Adding some tests to cover the FRED data fetching code.
This a very minimal, but it wasn't previously covered.
I've added a new to contain the tests and more or less copied code
from the test_yahoo file.
@TomAugspurger

This comment has been minimized.

Copy link
Contributor Author

commented Apr 27, 2013

I panicked :)

Seems to look ok now I think, once Travis is done?

ghost pushed a commit that referenced this pull request Apr 27, 2013
y-p
Merge pull request #3469 from TomAugspurger/fred_na
ENH: Adding '.' as an na_value for FRED.

@ghost ghost merged commit a254d35 into pandas-dev:master Apr 27, 2013

@ghost

This comment has been minimized.

Copy link

commented Apr 27, 2013

Everything passes locally on my box. in it goes...

The log shows this is your first PR to pandas. good job! 👍

@TomAugspurger TomAugspurger deleted the TomAugspurger:fred_na branch Nov 3, 2016

This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
1 participant
You can’t perform that action at this time.