Tales of Science and Data

WIP: The whole repo is under rework

This book is a collection of notes on Data Science, from Statistics to Machine Learning, passing through all sorts of related areas.

I've decided to give form to a rather disorderly collection of notes I had about data science & all sorts of related areas, which is how this project has generated. You can read more in the Meta page about the how's and the why's of this.

This chapter is pretty much a page for each algorithm in "shallow learning", that is, all non "deep". Neural networks, even when shallow, are not presented here as there is a dedicated chapter on them, which is the same chapter that dives into deep learning. The division here is into the main learning paradigms.

Machine learning: model assessment

This part deals with how to assess the quality of a model and diagnose problems.

Artificial neural networks

Digging into the world of Artificial Neural Networks, a fascinating area of Machine Learning particularly on the rise these days. This deserved its own chapter.

Natural language processing

Natural Language Processing (NLP) is the field (a part of Machine Learning) which deals with text, an unstructured data source. What NLP tries to do is putting text into numerical representations, and extracting information from it.

Computer vision

Images, seen by the machine. This section deals with using computers to extract and use information from visual data. We will illustrate a whole set of methods, which may or may not encompass the use of Neural Networks.

The Computer science appendix

Some (non-comprehensive) notes on Computer Science fundamentals.

The mathematics appendix

Some (non-comprehensive) notes on mathematics, used everywhere in data work. Useful little bits.

Toolbox

(Some) software tools used in Data Science, high-level overviews.

About the code parts

Several pages contain snippets of code. I've been using Python (3) and for those pages a link to a relative Jupyter notebook in the Github repo corresponding to this book is provided for your perusal if you want to play around. The overall repo is reachable on ****Github and you can also visualise the notebooks prettyfied via the Jupyter Notebooks viewer.

The libraries used in the notebooks are usually (unless specified) those of the Python data stack (Numpy, Scipy, sklearn, Pandas, ...). The plots presented in here have been customised, the repo contains all styling files.

Notify me of mistakes

Mistakes happen. Inaccuracies and oversights as well, from the content point to view to the rendering/graphics one (e.g., one TeX formula doesn't appear rendered). You are more than welcome, encouraged in fact, to submit issues to the repo for these things.

License

This book is released under the Creative Commons NoDerivatives 4.0 International (CC BY-NC-ND 4.0).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Tales of Science and Data

Contents

Meta and resources

Probability, statistics & data analysis

Machine learning: concepts and procedures

Machine learning: fundamental algorithms

Machine learning: model assessment

Artificial neural networks

Natural language processing

Computer vision

The Computer science appendix

The mathematics appendix

Toolbox

About the code parts

Notify me of mistakes

License

Files

README.md

Latest commit

History

README.md

File metadata and controls

Tales of Science and Data

Contents

Meta and resources

Probability, statistics & data analysis

Machine learning: concepts and procedures

Machine learning: fundamental algorithms

Machine learning: model assessment

Artificial neural networks

Natural language processing

Computer vision

The Computer science appendix

The mathematics appendix

Toolbox

About the code parts

Notify me of mistakes

License