# Real or simulated 'real' data set

I already have some simulated data in my implementation.

# Choice of comparison algorithm

### Indian Buffet Process (IBP) vs Chinese Restaurant Process (CRP)
Chinese Restaurant Process (CRP): This is an algorithm of customers' seating in a Chinese restaurant with infinite capacity. The first customer sits at an empty table with probability 1. Then starting from time 2, a new customer chooses randomly at either to the left of one of the previous customers, or at a new, unoccupied table.

Similarities
- Both algorithms models latent factors and perform dimensionality reduction.
- Both algorithms allow an infinite number of latent features.

Differences
- As the figure below, Indian Buffet Process allows each customer to be assigned to multiple components (dishes), while the Chinese Restaurant Process (CRP) assigns each customer to a single component.

### Indian Buffet Process (IBP) code from Other People
- https://github.com/davidandrzej/PyIBP: This looks much organized than mine, but mine has output images and this one does not.
- http://www.mit.edu/~ilkery/: MATLAB code with output images. Contains a lot of redundant M calculations.
- Makefile is important for the readers. Write makefiles for them
- Benchmarking (speed test): naive / Cythonized version / MATLAB

### Deleted
- https://github.com/kzhai/PyNPB: ImportError: No module named ibp.gs $\Rightarrow$ I am not going to use this.

In [10]:
from IPython.display import Image

In [11]:
Image(url='IBP_vs_CRP.png') # From: "A tutorial on Bayesian nonparametric models" by Samuel J. Gershman, David M. Blei

# Draft of Makefile

### Goal for the code
Make it run. Make it correct. Make it fast.

### Original simulated dataset
- plt.figure(tight_layout=True)
- plt.savefig('heatmap.png')

### My code results
- plt.figure(tight_layout=True)
- plt.savefig('heatmap.png')

### Speed test: Cythonized version vs original version
- with open('table.tex', 'w') as f:
- f.write(tabulate(df, headers=list(df.columns), tablefmt="latex", floatfmt=".4f"))
- Save the total time / computation time for each operation
- Latex: \input{table} => table.tex

### To-Do
- calInverse for M
- Real data?!
- Convert all code pieces to .py files
- Benchmarking + write Makefiles for the comparison algorithms!

In [None]:
IBP_report.pdf: IBP_report.tex \
    IBP_MATLABcode/Fig1_latent.png IBP_MATLABcode/Fig2_data.png IBP_MATLABcode/Fig3_histK.png IBP_MATLABcode/Fig4_results.png
    pdflatex IBP_report.tex
    bibtex IBP_report.aux
    pdflatex IBP_report.tex
    pdflatex IBP_report.tex
    
python functions.py
python naive.py
python usable.py

python Cython_setup.py build_ext --inplace
# Dont do this: python Cython_functions.pyx because Cython_setup.py incorporates Cython_functions.pyx
python Cythonized.py

# Convert iPynb to other formats:
ipython nbconvert Notebook1.ipynb --to latex/html

### Sample Makefile

In [None]:
report.pdf: report.tex table.tex heatmap.png
	pdflatex report
	pdflatex report
	pdflatex report

table.tex: cases.csv ctrls.csv
	python prepare_results.py

heatmap.png: cases.csv ctrls.csv
	python prepare_results.py

cases.csv: 
	python prepare_data.py

ctrls.csv: 
	python prepare_data.py

.PHONY: all clean allclean test

all: report.pdf 

clean:
	rm -rf *csv *png *aux *log *png table.tex *pytxcode tests/__pycache__ tests/*pyc

allclean:
	make clean
	rm -f *pdf

test:
	py.test