Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] Fama/French #56

Merged
merged 1 commit into from Aug 29, 2015

Conversation

@0x0L
Copy link
Contributor

0x0L commented Aug 14, 2015

  • Added get_datasets_famafrench (requires bs4) : list all available datasets from FF
  • Complete rewrite of get_data_famafrench. Can grab and parse all (so called) csv files from FF website. Also uses metadata in the files to generate documentation (see DESCR attribute)
  • Moved tests for FF to their own file (#59)

API change : indexes are now pandas.PeriodIndex for annual and monthly data, and pandas.DatetimeIndex otherwise.

Should fix #55 #20 #21 see also pandas-dev/pandas#8842

@0x0L 0x0L referenced this pull request Aug 14, 2015
@0x0L 0x0L force-pushed the 0x0L:famafrench branch 4 times, most recently from a7ca65c to b69c110 Aug 14, 2015
except: pass
try: return dt.datetime.strptime(x, '%Y%m')
except: pass
return to_datetime(x)

This comment has been minimized.

Copy link
@0x0L

0x0L Aug 15, 2015

Author Contributor

This looks terrible

This comment has been minimized.

Copy link
@davidastephens

davidastephens Aug 22, 2015

Member

I can't think of any other way to do that unless we import dateutil.

@0x0L 0x0L force-pushed the 0x0L:famafrench branch 2 times, most recently from 4d0f191 to 665a1c7 Aug 15, 2015
root = BeautifulSoup(socket.read(), 'html.parser')

l = filter(lambda x: x.startswith(_FF_PREFIX) and x.endswith(_FF_SUFFIX),
[e.attrs['href'] for e in root.findAll('a') if 'href' in e.attrs])

This comment has been minimized.

Copy link
@0x0L

0x0L Aug 15, 2015

Author Contributor

any substitute for findAll which does not involve any import ?

This comment has been minimized.

Copy link
@davidastephens

davidastephens Aug 22, 2015

Member

We already have a dependency on lxml, so maybe use that?

This comment has been minimized.

Copy link
@davidastephens

davidastephens Aug 22, 2015

Member

Check this gist out:
https://gist.github.com/yejianye/5895954

Also, we could pull the Options._parse_url method out into a common function, as it does the importing and error checking for lxml.

descr = '{0}\n{1}\n\n'.format(name.replace('_', ' '), len(name) * '-')
if doc_chunks: descr += ' '.join(doc_chunks).replace(2 * ' ', ' ') + '\n\n'

table_descr = map(lambda x: '{0:3} : {1}'.format(*x), enumerate(table_desc))

This comment has been minimized.

Copy link
@0x0L

0x0L Aug 15, 2015

Author Contributor

I suck at string formatting. Any tips on how to do that the right way ?

IMO, for later usability improvement, datasets should also implement __getindex__ with table names instead in addition to integers, the table names should also be accessible in a list and datasets['DESCR'] should read datasets.DESCR.

@davidastephens davidastephens added this to the 0.2.0 milestone Aug 23, 2015
@0x0L

This comment has been minimized.

Copy link
Contributor Author

0x0L commented Aug 23, 2015

@davidastephens

This comment has been minimized.

Copy link
Member

davidastephens commented Aug 23, 2015

Thanks. We will need to rebase/squash after #69 is merged, but other than that, looks good to me.

@davidastephens

This comment has been minimized.

Copy link
Member

davidastephens commented Aug 23, 2015

#69 merged, can you rebase?

@0x0L 0x0L force-pushed the 0x0L:famafrench branch from ba7fbf7 to 7507ca5 Aug 24, 2015
@0x0L

This comment has been minimized.

Copy link
Contributor Author

0x0L commented Aug 24, 2015

Done

I also added a test which fails on python 2.6.
It seems like it's a bug in pandas 0.13.1: docstring claims period inference should work but it doesn't

>>> df[0].to_period()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/python/2.6.9/lib/python2.6/site-packages/pandas/core/frame.py", line 4258, in to_period
    freq = self.index.freqstr or self.index.inferred_freq
  File "/opt/python/2.6.9/lib/python2.6/site-packages/pandas/tseries/index.py", line 1393, in freqstr
    return self.offset.freqstr
AttributeError: 'NoneType' object has no attribute 'freqstr'
>>> df[0].to_period('M')
         Mkt-RF   SMB    HML    RF
1926-07    2.96 -2.30  -2.87  0.22
1926-08    2.64 -1.40   4.19  0.25
1926-09    0.36 -1.32   0.01  0.23
1926-10   -3.24  0.04   0.51 ...
...
>>> print df[0].to_period.__doc__

        Convert DataFrame from DatetimeIndex to PeriodIndex with desired
        frequency (inferred from index if not passed)

        Parameters
        ----------
        freq : string, default
        axis : {0, 1}, default 0
            The axis to convert (the index by default)
        copy : boolean, default True
            If False then underlying input data is not copied

        Returns
        -------
        ts : TimeSeries with PeriodIndex
@davidastephens

This comment has been minimized.

Copy link
Member

davidastephens commented Aug 25, 2015

@jreback @jorisvandenbossche Thoughts?

Need to keep supporting pandas 13.1 or move that one to an 'allowed failure'?

@0x0L 0x0L force-pushed the 0x0L:famafrench branch from 7507ca5 to 13e8be1 Aug 25, 2015
* add get_available_datasets (requires lxml)
* complete rewrite of get_data_famafrench
@0x0L 0x0L force-pushed the 0x0L:famafrench branch from aa566cd to 896fc02 Aug 28, 2015
@0x0L

This comment has been minimized.

Copy link
Contributor Author

0x0L commented Aug 28, 2015

@davidastephens i wrote a workaround for 13.1
(tests pass for famafrench, failed on Fred on HTTP 503 error)

@davidastephens

This comment has been minimized.

Copy link
Member

davidastephens commented Aug 29, 2015

Thanks!

davidastephens added a commit that referenced this pull request Aug 29, 2015
ENH: Fama/French re-write.
@davidastephens davidastephens merged commit f7b92a2 into pydata:master Aug 29, 2015
1 of 2 checks passed
1 of 2 checks passed
coverage/coveralls Coverage decreased (-0.1%) to 90.775%
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@0x0L 0x0L deleted the 0x0L:famafrench branch Oct 3, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.