Big Data Architecture
fcrimins edited this page Apr 21, 2017
·
4 revisions
- Why not just use TensorFlow for everything?
- Dask creates its own execution graphs, but why is this necessary when TF already has them?
- In particular, TF even has support for reading from files. So if that is the case, then why not just construct the files and start the TF graph there?
-
.tfrecords
file format: all records for an entire training/validation/test set are intended to be written to a single file. See example here (which also includes good example usage ofargparser
andtf.app
.
-
Dask
- Out-of-core functional/numpy/dataframes promoted by @jakevdp--so it must be good.
-
Xray + Dask: Out-of-Core, Labeled Arrays in Python
- Xray seems to have a clunky interface.
- And doesn't Dask have the same functionality?
Big Data Architecture Patterns (10/3/16)
- Good YouTube talk describing all of the differences and the history of relational dbs (SQL) -> semi-structured -> document stores (NoSQL) along with a description of Hadoop (an architecture paradigm) along the way