New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added webscraper for Thrift Savings Plan #157

Merged
merged 1 commit into from Feb 6, 2016

Conversation

Projects
None yet
3 participants
@e2thenegpii
Contributor

e2thenegpii commented Jan 5, 2016

I added a webscraper to put TSP fund data into a dataframe. In order to import pandas_datareader on my machine I had to put a requirement requests>=2.3.0

@davidastephens

This comment has been minimized.

Member

davidastephens commented Jan 7, 2016

Thanks!

Looks like the data is somewhat unreliable (see Travis Errors - I re-ran them and got the same thing). Maybe add a check to ensure the scraper got data and a couple retries. You can then raise RemoteDataError and add a

try:
    #test code here
except RemoteDataError as e:  # pragma: no cover
    raise nose.SkipTest(e)

In the test.

Also, can you add a note in the what's new?

@e2thenegpii

This comment has been minimized.

Contributor

e2thenegpii commented Jan 10, 2016

I looked at why the test was failing, the CSV file that I was downloading
from TSP.gov was rather irregular regarding the placement of whitespace and
newlines. Pandas 14.1 and older didn't handle the sloppy whitespace well
which caused the error. I added a callback in the DataReader class that is
called when extracting the text from the response object. This should
enable subclasses to do things like fix up sloppy responses if needed as is
the case for TSP. The default operation is to return the response object
content.

I also added a what's new 2.3 document.

On Wed, Jan 6, 2016 at 11:37 PM, David Stephens notifications@github.com
wrote:

Thanks!

Looks like the data is somewhat unreliable (see Travis Errors - I re-ran
them and got the same thing). Maybe add a check to ensure the scraper got
data and a couple retries. You can then raise RemoteDataError and add a

try:
#test code here
except RemoteDataError as e: # pragma: no cover
raise nose.SkipTest(e)

In the test.

Also, can you add a note in the what's new?


Reply to this email directly or view it on GitHub
#157 (comment)
.

.. ipython:: python
import pandas_datareader.tsp as tsp
tspreader = tsp.TSPReader(start='2015-10-1',end='2015-12-31')

This comment has been minimized.

@femtotrader

femtotrader Jan 10, 2016

Contributor

a space before end and after comma is missing

tspreader = tsp.TSPReader(start='2015-10-1', end='2015-12-31')

see https://www.python.org/dev/peps/pep-0008/#whitespace-in-expressions-and-statements

out.seek(0)
return out
@classmethod
def _sanitize_response(cls,response):

This comment has been minimized.

@femtotrader

femtotrader Jan 10, 2016

Contributor
    def _sanitize_response(cls, response):
tspdata = tsp.TSPReader(start='2015-11-2',end='2015-11-2').read()
assert len(tspdata == 1)
assert round(tspdata['I Fund'][dt.date(2015,11,2)],5) == 25.0058

This comment has been minimized.

@femtotrader

femtotrader Jan 10, 2016

Contributor

fix space after coma

You may probably use pandas.util.testing.assert_almost_equal

return {'startdate':self.start.strftime('%m/%d/%Y'),'enddate':self.end.strftime('%m/%d/%Y'),
'fundgroup':self.symbols,'whichButton':'CSV'}
@classmethod

This comment has been minimized.

@davidastephens

davidastephens Jan 11, 2016

Member

Why is this a classmethod?

This comment has been minimized.

@e2thenegpii

e2thenegpii Jan 11, 2016

Contributor

The _sanitize_response method shouldn't be changing the state of the reader
object but it is providing a service for the object. It should only be
operating on the content of the response object.

On Sun, Jan 10, 2016 at 8:30 PM, David Stephens notifications@github.com
wrote:

In pandas_datareader/tsp.py
#157 (comment)
:

  • @Property
  • def url(self):
  •    return 'https://www.tsp.gov/InvestmentFunds/FundPerformance/index.html'
    
  • def read(self):
  •    """ read one data from specified URL """
    
  •    df = super(TSPReader, self).read()
    
  •    df.columns = map(lambda x:x.strip(),df.columns)
    
  •    return df
    
  • @Property
  • def params(self):
  •    return {'startdate':self.start.strftime('%m/%d/%Y'),'enddate':self.end.strftime('%m/%d/%Y'),
    
  •            'fundgroup':self.symbols,'whichButton':'CSV'}
    
  • @classmethod

Why is this a classmethod?


Reply to this email directly or view it on GitHub
https://github.com/pydata/pandas-datareader/pull/157/files#r49285545.

This comment has been minimized.

@davidastephens
@davidastephens

This comment has been minimized.

Member

davidastephens commented Jan 30, 2016

Thanks - can you squash this and we can merge?

@davidastephens

This comment has been minimized.

Member

davidastephens commented Feb 6, 2016

I think this got squashed before the lint PR - can you rebase so we can merge?

Thanks.

@e2thenegpii e2thenegpii force-pushed the e2thenegpii:master branch from 4439ccf to e400816 Feb 6, 2016

@e2thenegpii e2thenegpii force-pushed the e2thenegpii:master branch from e400816 to 315d175 Feb 6, 2016

@e2thenegpii

This comment has been minimized.

Contributor

e2thenegpii commented Feb 6, 2016

Everything should be set to go. Thanks for your help.

davidastephens added a commit that referenced this pull request Feb 6, 2016

Merge pull request #157 from e2thenegpii/master
Added webscraper for Thrift Savings Plan

@davidastephens davidastephens merged commit 7cd1a8d into pydata:master Feb 6, 2016

2 checks passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
coverage/coveralls Coverage increased (+0.3%) to 95.356%
Details
@davidastephens

This comment has been minimized.

Member

davidastephens commented Feb 6, 2016

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment