# Scipy Lecture Notes
> Ch-4, 8, 9, 10, 11

## Ch-4 Numpy: creating and manipulating numerical data

In [1]:
import numpy as np

In [3]:
L = range(1000)
%timeit [i**2 for i in L]

195 µs ± 1.64 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)


In [4]:
a = np.arange(1000)
%timeit a**2

1.02 µs ± 14.8 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)


In [5]:
np.diag(np.array([1,2,3]))

array([[1, 0, 0],
       [0, 2, 0],
       [0, 0, 3]])

In [6]:
a = np.ones((3,3))
a.dtype

dtype('float64')

There are following data type
* int8
* int16
* int32
* int64
* float32
* float64
* float96
* float128
* uint8
* uint16
* uint32
* uint64

When we slice an array, we are creating a **view** and not copying the original array. When modifying the view the original array is also modified.

Numpy arrays can also be copied using **boolean masks**.

In [15]:
a = np.random.randn(2, 4)

In [16]:
a>.2

array([[ True, False,  True,  True],
       [ True,  True, False, False]])

In [17]:
a[a>.3]

array([0.66508899, 0.67441907, 1.2684965 , 1.57253334, 1.19223281])

In this case the array is copied.

To reshape arrays use **.reshape**. It may return a view.

### Assignment never changes the dtype.

In [18]:
a = np.array([1])
a.dtype

dtype('int64')

In [20]:
a[0] = 1.2
a.dtype

dtype('int64')

In [21]:
a

array([1])

### Change the dtype using astype

In [24]:
a = a.astype(np.float32)
a

array([1.], dtype=float32)

In [35]:
# Exercise 1
a = np.arange(1,16, dtype=np.int32).reshape((3,5)).T
a

array([[ 1,  6, 11],
       [ 2,  7, 12],
       [ 3,  8, 13],
       [ 4,  9, 14],
       [ 5, 10, 15]], dtype=int32)

In [41]:
b = np.array((a[1], a[3]))
b

array([[ 2,  7, 12],
       [ 4,  9, 14]], dtype=int32)

## Ch-8 Advanced Python Constructs

### Iterators
Has a `next` method and raises `StopIteration` when it finishes.

In [42]:
nums = [1,2,3]
it = iter(nums)

In [43]:
next(it)

1

In [44]:
next(it)

2

In [45]:
next(it)

3

In [46]:
next(it)

StopIteration: 

In [48]:
(i for i in nums)

<generator object <genexpr> at 0x7f848c75d200>

In [49]:
[i for i in nums]

[1, 2, 3]

Use square brackets around a generator to get the values of the generator.

### Generators
When `next` is used, we execute the code till the next `yield` statement.

In [50]:
def f():
    print('First line of function')
    yield 1
    print('Second line of fuction')
    yield 2
    
f()

<generator object f at 0x7f848c7bdc10>

In [51]:
gen = f()

In [52]:
next(gen)

First line of function


1

In [53]:
next(gen)

Second line of fuction


2

In [54]:
next(gen)

StopIteration: 

A generator is just a different way to create an iterator object.

## Ch-9 Advanced NumPy

ndarray = block of memory + indexing scheme + data type descriptor
* raw data
* how to locate an element
* how to interpret an element

```c
typedef struct PyArrayObject{
    PyObject_HEAD
    
    // block of memory
    char *data;
    
    // Data type descriptor
    PyArray_Descr *descr;

    // Indexing scheme
    int nd;
    npy_intp *dimensions;
    npy_intp *strides;
    
    // Other stuff
    PyObject *base;
    int flags;
    PyObject *weakreflist;
} PyArrayObject;
```

In [55]:
x = np.array([1,2,3], dtype=np.float32)

In [56]:
# raw data
x.data

<memory at 0x7f848e14b940>

In [57]:
bytes(x.data)

b'\x00\x00\x80?\x00\x00\x00@\x00\x00@@'

In [58]:
x.__array_interface__

{'data': (94285341631824, False),
 'strides': None,
 'descr': [('', '<f4')],
 'typestr': '<f4',
 'shape': (3,),
 'version': 3}

In [62]:
np.array([[1, 2.], [3, 4.]], dtype=np.dtype([("first",np.int32), ("second", np.float32)]))

array([[(1, 1.), (2, 2.)],
       [(3, 3.), (4, 4.)]], dtype=[('first', '<i4'), ('second', '<f4')])

In [63]:
# You can define the dtyle of your subarrsy also
a = np.array([[1, 2.], [3, 4.]], 
             dtype=np.dtype([
                 ("first", np.int32),
                 ("second", np.float32),
             ]))
a

array([[(1, 1.), (2, 2.)],
       [(3, 3.), (4, 4.)]], dtype=[('first', '<i4'), ('second', '<f4')])

### Strides
The number of bytes to jump to get to the next element. One stride per dimension.

In [71]:
x = np.random.randint(0, 10, (3,3), dtype=np.int32)
x

array([[4, 2, 1],
       [0, 3, 8],
       [4, 3, 5]], dtype=int32)

In [72]:
x.strides

(12, 4)

In [74]:
# Now if you want to get x[1,2] you have to calculat ebyte offset
byte_offset = (12*1 + 4*2)//4 # The division because each value used 4 bytes
x.flat[byte_offset]

8

Now you can see how **view** can be changed so easily. We just change the strides.

In [75]:
y = x.T
y.strides

(4, 12)

## Ch-10 Debugging Code

* Write code with testing in mind
* Don't Repeat yourself. If you tend to repeat the same thing more than twice than it is a good idea to write a function for it.
* Try to limit interdependencies of your code
* Give variables, functions, modules meaningful names.

### Debugging Workflow
1. Make it fail reliably. Find a test case that always fails.
2. Now isolate the failing code (which module, which function, which line of code).
3. Change one thing at a time and re-run the failing test case.
4. Use a debugger.
5. Take notes and be patient.

### Invokinng the debugger
* %debug
* python -m pdf file.py
* pdf.set_trace() -> this sets the line where you want to invoke the debugger

When you get segmentation faults, pdb is useless as the Python interpreter fails before it can drop in the debugger. For this we use GDB.

## Ch-11 Optimizing code

### Optimizing Workflow
1. Make the code work.
2. Write automated test cases that can check the correctness of the code.
3. optimize the code by profiling simple use-cases to find the bottlenecks and speeding u pthese bottleneck, finding a better algorithm or implementation.

### Profiling 

#### %timeit, %time
For long running calls, use `%time` instead of `%timeit`. `%time` is less precise but faster.

In [76]:
a = np.arange(1000)

In [77]:
%timeit a**3

3.62 µs ± 170 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)


 #### Profiler
 Useful when you have a large program to profile.
 
 1. Store the data in .py file.
 2. `%run -t demo.py` to time the script
 3. `%run -p demo.py` to profile the script

### Cython
After you have done all the optimizations and if you still want to improve further you can transfer the hotspots i.e. the few lines or functions in which most of the time is spent, to compiled code.