# fecon235: Introductory documentation

Here we discuss the usage of the *fecon235* repository 
for the casual user (while the development of the API 
is still in progress). 

**As of v4, all *modules* will work with Python 2.7 and 3 series.** 
However, some of the pre-2016 *notebooks* may still have python2 
idioms and Linux dependencies. Those are easy to fix as we update. 
Our goal is cross-platform performance (Linux, Mac, and Windows) 
as well as compliance with both Python kernels available to 
Jupyter notebooks (forked from IPython).

To see examples of code, please pick out a subject of interest under 
the **nb** directory, and view that notebook at GitHub. 
Better yet, fork this project, and execute the notebook locally as 
you interactively experiment. 
For developers, the main modules are located under the **lib** directory.

## Importing the project

For python3 conformity, we have adopted absolute_import 
throughout this project. 
So first be sure that your PYTHONPATH can lead up to 
the fecon235 directory. Then the following import 
permits *easy command access*. The top-level module is 
customarily given the same name as the project. In our case, 
it conveniently unifies and exposes our essential lib modules 
(older notebooks imported yi-prefixed modules individually).

In [1]:
#  Call the MAIN module: 
from fecon235.fecon235 import *
#  This loose import style is acceptable only within 
#  interactive environments outside of any fecon235 packages.
#  (Presence of __init__.py in a directory indicates 
#  it is a "package.") 
#
#  These directories: nb and tests, are explicitly NOT packages.

To use fecon235 in other projects, here are some proper examples:

    from fecon235 import fecon235 as fe
    from fecon235.lib import yi_secform
    
If we had used the first example in our notebooks, 
a function would require extra typing, 
e.g. *fe.get()* instead of plain *get()*. 
Any lib module can be imported directly 
if specialized procedures are required. 
The second example is used to parse SEC forms. 
An inventory of available procedures is 
provided below as Appendix 1.

### Every notebook states its dependencies and changes:

*Dependencies:*

- fecon235 repository https://github.com/rsvp/fecon235
- Python: matplotlib, numpy, pandas
     
*CHANGE LOG*

    2015-12-30  Add inventory of lib procedures as Appendix 1.
    2015-12-28  First version of README notebook in docs.

In [2]:
#  PREAMBLE-p6.15.1223 :: Settings and system details
from __future__ import absolute_import, print_function
system.specs()
pwd = system.getpwd()   # present working directory as variable.
print(" ::  $pwd:", pwd)
#  If a module is modified, automatically reload it:
%load_ext autoreload
%autoreload 2
#       Use 0 to disable this feature.

#  Notebook DISPLAY options:
#      Represent pandas DataFrames as text; not HTML representation:
import pandas as pd
pd.set_option( 'display.notebook_repr_html', False )
#  Beware, for MATH display, use %%latex, NOT the following:
#                   from IPython.display import Math
#                   from IPython.display import Latex
from IPython.display import HTML # useful for snippets
#  e.g. HTML('<iframe src=http://en.mobile.wikipedia.org/?useformat=mobile width=700 height=350></iframe>')
from IPython.display import Image 
#  e.g. Image(filename='holt-winters-equations.png', embed=True) # url= also works
from IPython.display import YouTubeVideo
#  e.g. YouTubeVideo('1j_HxD4iLn8', start='43', width=600, height=400)
from IPython.core import page
get_ipython().set_hook('show_in_pager', page.as_hook(page.display_page), 0)
#  Or equivalently in config file: "InteractiveShell.display_page = True", 
#  which will display results in secondary notebook pager frame in a cell.

#  Generate PLOTS inside notebook, "inline" generates static png:
%matplotlib inline   
#          "notebook" argument allows interactive zoom and resize.

 ::  Python 2.7.10
 ::  IPython 4.0.0
 ::  jupyter 1.0.0
 ::  notebook 4.0.6
 ::  matplotlib 1.4.3
 ::  numpy 1.10.1
 ::  pandas 0.17.1
 ::  pandas_datareader 0.2.0
 ::  Repository: fecon235 v3.15.1216 develop
 ::  Timestamp: 2015-12-30, 19:22:33 UTC
 ::  $pwd: /media/yaya/virt15h/virt/dbx/Dropbox/ipy/fecon235/docs


## Preamble for settings

The preamble contains the latest shortcuts for notebook commands, 
but more importantly, it lists the specific dependencies 
which makes research **reproducible**. The "Repository:" line 
should indicate the annotated tag associated with the last good state 
of repository at the time of execution. The branch is then stated, 
which completes the analogue of *requirements.txt* for notebooks.

The "Timestamp:" will indicate the staleness of the data. 
Notebooks have executed properly at the indicated time, 
and when committed to the repository. 
If notebooks are re-executed the most current data 
will be intentionally downloaded. 
Thus many observations in notebooks include their date. 
Changes upstream, in the meantime, can possibly generate  
errors in re-executed fecon235 notebooks 
(esp. deprecated pandas functions).

Notebooks implicitly function as integration tests 
of the underlying code, and thus reveal technical failures. 
Another notebook will cover unit tests in the *tests* directory 
for developers.

## Internal queries and documentation

Notebooks have a wonderful feature: **?** and **??** 
which give further information on variables, functions, 
classes, etc. And where to exactly look for the source.

The second question mark gives more verbose answers. 
All our codes have detailed docstrings and comments, 
so we strive to be self-documenting.

In [3]:
#  What the heck is "system" mentioned in the preamble?
system?

[1;31mType:        [0mmodule
[1;31mString form: [0m<module 'fecon235.lib.yi_0sys' from '/home/yaya/Dropbox/ipy/fecon235/lib/yi_0sys.pyc'>
[1;31mFile:        [0m~/Dropbox/ipy/fecon235/lib/yi_0sys.py
[1;31mDocstring:[0m
_______________|  yi_0sys.py : system and date functions including specs.

Code in this module must be compatible with both Python 2 and 3.
It is a bridge and a guardian between the two Pythons.

For example, it is used in the preamble of fecon235 Jupyter notebooks.


REFERENCES:
- Compatible IDIOMS: http://python-future.org/compatible_idioms.html
                     Nice presentation.

- SIX module is exhaustive: https://pythonhosted.org/six/
        Single file source: https://bitbucket.org/gutworth/six


CHANGE LOG  For latest version, see https://github.com/rsvp/fecon235
2015-12-29  For errf in gitinfo(), our dev_null instead of os.devnull
               Add minimumPandas variable esp. for tests.
2015-12-27  Get jupyter version among specs().
               F

## Getting data

Our project currently has free access to data on equities, 
government bonds, commodities, and futures -- as well as, 
a full range of economic statistics. The key is finding 
the string which will retrieve the desired time-series. 
(A detailed *docs* notebook dedicated to data retrieval is forthcoming.)

### Sample: Unemployment rate

Let's go through an example. The function **get** is 
designed as an overlord over specialized get functions. 

In [4]:
#  Assign a name to a dataframe
#  that will contain monthly unemployment rates.

unem = get( m4unemp )
#           m4 implies monthly frequency.

In [5]:
#  But does m4unemp really represent?
m4unemp?

[1;31mType:        [0mstr
[1;31mString form: [0mUNRATE
[1;31mLength:      [0m6
[1;31mDocstring:[0m
str(object='') -> string

Return a nice string representation of the object.
If the argument is a string, the return value is the same object.

#### Variables for data

So we see that m4unemp is our variable holding a string "UNRATE". 
That string is the internal code used by FRED, the database 
at the Federal Reserve Bank in St. Louis. Our variables are 
generally easier to remember, and mentions the frequency. 

If there is no special variable, one can 
always get("string") to directly retrieve data.

Sometimes a variable for a data set may trigger a 
subroutine which post-processes the original data 
(e.g. see our inflation measures), or brings offline 
data into memory (for example, our compressed CSV files may 
contain synthetic data, e.g. the euro exchange rate 
years prior to its official circulation).

In [6]:
#  Illustrate slicing: 1997 <= unem <= 2007:
unem07 = unem['1997':'2007']
#  Verify below by Head and Tail.

In [7]:
#  Quick summary:
stat( unem07 )

                Y
count  132.000000
mean     4.908333
std      0.646073
min      3.800000
25%      4.400000
50%      4.700000
75%      5.500000
max      6.300000


In [8]:
#  More verbose statistical summary:
stats( unem07 )

                Y
count  132.000000
mean     4.908333
std      0.646073
min      3.800000
25%      4.400000
50%      4.700000
75%      5.500000
max      6.300000

 ::  Index on min:
Y   2000-04-01
dtype: datetime64[ns]

 ::  Index on max:
Y   2003-06-01
dtype: datetime64[ns]

 ::  Head:
              Y
T              
1997-01-01  5.3
1997-02-01  5.2
1997-03-01  5.2
1997-04-01  5.1
1997-05-01  4.9
1997-06-01  5.0
1997-07-01  4.9

 ::  Tail:
              Y
T              
2007-06-01  4.6
2007-07-01  4.7
2007-08-01  4.6
2007-09-01  4.7
2007-10-01  4.7
2007-11-01  4.7
2007-12-01  5.0

 ::  Correlation matrix:
   Y
Y  1


The correlation matrix has only one entry above. 
This is because *stats()* is designed to take 
a dataframe with multiple columns as argument.
Let's see how the function is written 
and where we can find it in the filesystem.

Indeed *stats()* calls our *cormatrix()* to compute 
the correlation matrix. And one can go on 
further to query that function... eventually 
that query could reach a core numerical package 
such as numpy.

In [9]:
stats??

[1;31mSignature: [0m[0mstats[0m[1;33m([0m[0mdataframe[0m[1;33m)[0m[1;33m[0m[0m
[1;31mSource:[0m
def stats( dataframe ):
     '''VERBOSE statistics on given dataframe; CORRELATIONS without regression.'''
     print(dataframe.describe())
     print()
     print(" ::  Index on min:")
     print(dataframe.idxmin())
     print()
     print(" ::  Index on max:")
     print(dataframe.idxmax())
     print()
     print(" ::  Head:")
     print(head( dataframe ))
     print()
     print(" ::  Tail:")
     print(tail( dataframe ))
     print()
     print(" ::  Correlation matrix:")
     print(cormatrix( dataframe ))
     return
[1;31mFile:      [0m~/Dropbox/ipy/fecon235/lib/yi_1tools.py
[1;31mType:      [0mfunction

In [10]:
# #  Uncomment to see how numpy computes something simple as absolute value:
# np.abs??

## Computing from the data

The analysis of data is at the heart of this project. 
Specific computational tools will be covered in 
other notebooks under the *docs* directory.

To follow up on unemployment example, see https://git.io/fed 
which scores the Federal Reserve on their dual mandate. 
Visualization is provided by our plot tools, 
which as a by-product discredits the Phillips curve 
as adequate causal theory.

## Questions or bugs

- Chat with fellow users at Gitter: https://gitter.im/rsvp/fecon235

- Report an issue at https://github.com/rsvp/fecon235/issues

- Summarize your usage solution at our wiki: https://github.com/rsvp/fecon235/wiki

- Blame the lead developer: *Adriano* [rsvp.github.com](https://rsvp.github.com)

## Appendix 1: Procedures defined in lib modules

As of 2015-12-30, many of these procedures and functions 
are unified by the top level module **fecon235.py** 
which also simplifies their usage, for example, 
get() and plot():

#### yi_0sys.py

     getpwd():
         Get present working directory (Linux command is pwd).
     program():
         Get name of present script; works cross-platform.
     warn( message, stub="WARNING:", prefix=" !. "):
         Write warning solely to standard error.
     die( message, errcode=1, prefix=" !! "):
         Gracefully KILL script, optionally specifying error code.
     date( hour=True, utc=True, localstr=' Local' ):
         Get date, and optionally time, as ISO string representation.
     pythontup():
         Represent invoked Python version as an integer 3-tuple.
     versionstr( module="IPython" ):
         Represent version as a string, or None if not installed.
     versiontup( module="IPython" ):
         Parse version string into some integer 3-tuple.
     version( module="IPython" ):
         Pretty print Python or module version info.
     utf( immigrant, xnl=True ):
         Convert to utf-8, and possibly delete new line character.
     run( command, xnl=True, errf=None ):
         RUN **quote and space insensitive** SYSTEM-LEVEL command.
     gitinfo():
         From git, get repo name, current branch and annotated tag.
     specs():
         Show ecosystem specifications, including execution timestamp.
     ROSETTA STONE FUNCTIONS approximately bridging Python 2 and 3.
     endmodule():
         Procedure after __main__ conditional in modules.

#### yi_1tools.py

     nona( df ):
          Eliminate any row in a dataframe containing NA, NaN nulls.
     head( dfx, n=7 ):
          Quick look at the INITIAL data point(s).
     tail( dfx, n=7 ):
          Quick look at the LATEST data point(s).
     tailvalue( df, pos=0, row=1 ):
          Seek (last) row of dataframe, then the element at position pos.
     div( numerator, denominator, floor=False ):
          Division via numpy for pandas, Python 2 and 3 compatibility.
     dif( dfx, freq=1 ):
          Lagged difference for pandas series.
     pcent( dfx, freq=1 ):
          PERCENTAGE CHANGE method for pandas.
     georet( dfx, yearly=256 ):
          Compute geometric mean return in a summary list.
     zeroprice( rate, duration=9, yearly=2, face=100 ):
          Compute price of zero-coupon bond given its duration.
     ema( y, alpha=0.20 ):
          EXPONENTIAL MOVING AVERAGE using traditional weight arg.
     normalize( dfy ):
          Center around mean zero and standardize deviation.
     correlate( dfy, dfx, type='pearson' ):
          CORRELATION FUNCTION between series using pandas method.
     cormatrix( dataframe, type='pearson' ):
          PAIRWISE CORRELATIONS within a dataframe using pandas method.
     regressformula( df, formula ):
          Helper function for statsmodel linear regression using formula.
     regressTIME( dfy, col='Y' ):
          Regression on time since such index cannot be an independent variable.
     regresstime( dfy, col='Y' ):
          Regression on time since such index cannot be an independent variable.
     regresstimeforecast( dfy, h=24, col='Y' ):
          Forecast h-periods ahead based on linear regression on time.
     detrend( dfy, col='Y' ):
          Detread using linear regression on time.
     detrendpc( dfy, col='Y' ):
          Detread using linear regression on time; percent deviation.
     detrendnorm( dfy, col='Y' ):
          Detread using linear regression on time, then normalize.
     regress( dfy, dfx ):
         Perform LINEAR REGRESSION, a.k.a. Ordinary Least Squares.
     stat2( dfy, dfx ):
          Quick STATISTICAL SUMMARY and regression on two variables
     stat( dataframe, pctiles=[0.25, 0.50, 0.75] ):
          QUICK summary statistics on given dataframe.
     stats( dataframe ):
          VERBOSE statistics on given dataframe; CORRELATIONS without regression.
     todf( data, col='Y' ):
          CONVERT (list, Series, or DataFrame) TO DataFrame, NAMING single column.
     paste( df_list ):
          Merge dataframes (not Series) across their common index values.
     writefile( dataframe, filename='tmp-yi_1tools.csv', separator=',' ):
         Write dataframe to disk file using UTF-8 encoding.

#### yi_fred.py

     readfile( filename, separator=',', compress=None ):
         Read file (CSV default) as pandas dataframe.
     makeURL( fredcode ):
         Create http address to access FRED's CSV files.
     getdata_fred( fredcode ):
         Download CSV file from FRED and read it as pandas DATAFRAME.
     plotdf( dataframe, title='tmp' ):
         Plot dataframe where its index are dates.
     daily( dataframe ):
          Resample data to daily using only business days.
     monthly( dataframe ):
          Resample data to FRED's month start frequency.
     quarterly( dataframe ):
          Resample data to FRED's quarterly start frequency.
     getm4eurusd( fredcode=d4eurusd ):
          Make monthly EURUSD, and try to prepend 1971-2002 archive.
     getspx( fredcode=d4spx ):
          Make daily S&P 500 series, and try to prepend 1957-archive.
     gethomepx( fredcode=m4homepx ):
          Make Case-Shiller 20-city, and try to prepend 1987-2000 10-city.
     getinflations( inflations=ml_infl ):
          Normalize and average all inflation measures.
     getdeflator( inflation=m4infl ):
          Construct a de-inflation dataframe suitable as multiplier.
     getm4infleu( ):
          Normalize and average Eurozone Consumer Prices.
     getfred( fredcode ):
          Retrieve from FRED in dataframe format, INCL. SPECIAL CASES.
     plotfred( data, title='tmp', maxi=87654321 ):
          Plot data should be it given as dataframe or fredcode.
     holtfred( data, h=24, alpha=ts.hw_alpha, beta=ts.hw_beta ):
          Holt-Winters forecast h-periods ahead (fredcode aware).

#### yi_plot.py

     plotn( dataframe, title='tmp' ):
         Plot dataframe where the index is numbered (not dates).
     boxplot( data, title='tmp', labels=[] ):
          Make boxplot from data which could be a dataframe.
     scatter( dataframe, title='tmp', col=[0, 1] ):
         Scatter plot for dataframe by zero-based column positions.
     scats( dataframe, title='tmp' ):
         All pair-wise scatter plots for dataframe.
     scat( dfx, dfy, title='tmp', col=[0, 1] ):
         Scatter plot between two pasted dataframes.

#### yi_quandl.py

     setQuandlToken( API_key ):
          Generate authtoken.p in the local directory for API access.
     cotr_get( futures='GC', type='FO' ):
          Get CFTC Commitment of Traders Report COTR.
     cotr_position( futures='GC' ):
          Extract market position from CFTC Commitment of Traders Report.
     cotr_position_usd():
          Market position for USD from COTR of JY and EC.
     cotr_position_metals():
          Market position for precious metals from COTR of GC and SI.
     cotr_position_bonds():
          Market position for bonds from COTR of TY and ED.
     cotr_position_equities():
          Market position for equities from COTR of both SP and ES.
     fut_decode( slang ):
         Validate and translate slang string into vendor futures code.
     getfut( slang, maxi=512, col='Settle' ):
          slang string retrieves single column for one futures contract.
     getqdl( quandlcode, maxi=87654321 ):
          Retrieve from Quandl in dataframe format, INCL. SPECIAL CASES.
     plotqdl( data, title='tmp', maxi=87654321 ):
          Plot data should be it given as dataframe or quandlcode.
     holtqdl( data, h=24, alpha=ts.hw_alpha, beta=ts.hw_beta ):
          Holt-Winters forecast h-periods ahead (quandlcode aware).

#### yi_secform.py

     parse13f( url=druck150814 ):
          Parse SEC form 13F into a pandas dataframe.
     pcent13f( url=druck150814, top=7654321 ):
          Prune, then sort SEC 13F by percentage allocation, showing top N.

#### yi_simulation.py

     GET_simu_spx_pcent():
          Retrieve normalized SPX daily percent change 1957-2014.
     SHAPE_simu_spx_pcent( mean=MEAN_PC_SPX, std=STD_PC_SPX ):
          Generate SPX percent change (defaults are ACTUAL annualized numbers).
     SHAPE_simu_spx_returns( mean=MEAN_PC_SPX, std=STD_PC_SPX ):
          Convert percent form to return form.
     array_spx_returns( mean=MEAN_PC_SPX, std=STD_PC_SPX ):
          Array of SPX in return form.
     bootstrap( N, yarray ):
          Randomly pick out N without replacment from yarray.
     simu_prices( N, yarray ):
          Convert bootstrap returns to price time-series into pandas DATAFRAME.
     simu_plots_spx( charts=1, N=N_PC_SPX, mean=MEAN_PC_SPX, std=STD_PC_SPX ):
          Display simulated SPX price charts of N days, given mean and std.

#### yi_stocks.py

     stock_decode( slang ):
         Validate and translate slang string into vendor stock code.
     stock_all( slang, maxi=3650 ):
          slang string retrieves ALL columns for single stock.
     stock_one( slang, maxi=3650, col='Close' ):
          slang string retrieves SINGLE column for said stock.
     getstock( slang, maxi=3650 ):
          Retrieve stock data from Yahoo Finance or Google Finance.

#### yi_timeseries.py

     holt_winters_growth( y, alpha=hw_alpha, beta=hw_beta ):
          Helper for Holt-Winters growth (linear) model using numpy arrays.
     holt( data, alpha=hw_alpha, beta=hw_beta ):
          Holt-Winters growth (linear) model outputs workout dataframe.
     holtlevel( data, alpha=hw_alpha, beta=hw_beta ):
          Just smoothed Level dataframe from Holt-Winters growth model.
     holtgrow( data, alpha=hw_alpha, beta=hw_beta ):
          Just the Growth dataframe from Holt-Winters growth model.
     holtpc( data, yearly=256, alpha=hw_alpha, beta=hw_beta ):
          Annualized percentage growth dataframe from H-W growth model.
     holtforecast( holtdf, h=12 ):
          Given a dataframe from holt, forecast ahead h periods.
     plotholt( holtdf, h=12 ):
          Given a dataframe from holt, plot forecasts h periods ahead.
