*Python Data Science Handbook* by Jake VanderPlas (O'Reilly). Copyright 2017 Jake VanderPlas, 978-1-491-91295-8.

# Preface

Data science comprises three distinct and overlapping areas: 
1. **Statistics**
2. **Computer science** - used for the design and use of algorithms to efficiently store, process, and visualize data
3. **Domain expertise** - necessary to formulate the right question and to put their answers in context

Important libraries: 
1. NumPy: manipulation of homogenous data
2. Pandas: manipulation of heterogenous data
3. SciPy: common scientific computing tasks
4. Matplotlib: visualizations
5. Scikit-Learn: machine learning

# Chapter 1: IPython: Beyond Normal Python

Need help: `?` for docmentation, `??` for source code, tab key for autocompletion

Every python object contains a docstring which contains a concise summary of the object and how to use it. Python has a built in ```help()``` function that prints the docstring. This method even works for functions or objects you create yourself. To create a docstring for our function we place a string literal in the first line.

Shortcuts
- `Ctrl-a` to move cursor to the begining of the line. 
- `Ctrl-e` to move cursor to the end of the line. 
- `Ctrl-k` to cut rexr from cursor to the end of the line. 
- `Ctrl-p` to access previous command in history.
- `Ctrl-n` to access next demand in history. 
    - Note: you can use Ctrl-p/Ctrl-n or the up/down arrow keys to search through history, but only by matching characters - at the begining of the line. 
- `Ctrl-l` to clear terminal screen. 
- `Ctrl-c` to interrupt current Python command. 

In [1]:
help(len)

Help on built-in function len in module builtins:

len(obj, /)
    Return the number of items in a container.



In [5]:
len?

In [8]:
def square(a): 
    """Return the square of a."""
    return a**2

Because Python is so readable you can usually gain another level of insight by reading the source code of the object you're curious about. `??` can give a quick insight into the under the hood details. Sometimes you will notice that `??` does not display source code. This is generally because the object in question is not implemented in Python, but in C or some other language. If this is the case `??` will give you the same output as `?`. 

In [9]:
square??

Every Python object has various attribues and methods associated with it. Python has a built in `dir` function that returns a list of these, but the tab completion interface is much easier to use in practice. To see a list of all available attributes of an object, you can type the name of the object followed by a `.` and then the Tab key. If there is only a single option, pressing the Tab key will complete the line for you. Tab completion is also useful when importing objects from packages. 

In [10]:
#wildcard matching
*Warning? #returns a list of every object in the namespace that ends with Warning

In [12]:
str.*find*? #retruns a list of every string method that contains the word find somewhere in its name. 

Magic commands are prefixed by `%`. These magic commands are designed to succintly solve various common problems in standard data analysis. There are two kinds of magic commands: line magics (denoted by `%` and operate on a single line) and cell magics (denoted by `%%` and operate on multiple lines of input)

`%paste` pastes code into the cell and does so without indentation errors. This way you can copy code from online sources and paste with no troubles. 

`%cpaste` opens an interactive multiline prompt in which you can paste one or more chunks of code to be executed in a batch. 

`%run` is useful when you have created a myscript.py file you can execute this on Jupyter `%run myscript.py`. Note that any functions defined within the .py file are now available for use. 

`%timeit` determines the execution time of the single line python statement that follows it

`%magic` to access a general description of available magic functions 

`%lsmagic` to list all available magic functions

In [16]:
%timeit L = [n**2 for n in range(100)]

31.8 µs ± 858 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)


Note that list comprehensions are faster than the equivalent for loop construction. 

The `In[]` object is a list which keeps track of the commands in order. The `Out[]` object is not a list but a dictionary mapping input numbers to their outputs. 

NoteL not all operations have outputs. E.g., import statments and print statements don't affect the output (yes print statments!). This makes sense if you think about how print is a function that returns `None`; for brevity any command that returns `None` is not added to Out. Where this can be useful is if you want to interact with past results. This can be very handy if you execute a very expensive computation and want to reuse the result. 

In [19]:
import math
math.sin(2)

0.9092974268256817

In [20]:
math.cos(2)

-0.4161468365471424

In [21]:
Out[19]*Out[20]

-0.37840124765396416

Underscore shortcuts and previous outputs: 
the variable _ a single underscore _ is kept updated with the previous output. You can use a double underscore to access the second-to-last output and a triple underscore to access the third-to-last output. It stops there! 

A shorthand for `Out[X]` is `_X` (i.e., a single underscore followed by the line number)

In [22]:
print(_)

-0.37840124765396416


In [24]:
_19

0.9092974268256817