<a href="https://colab.research.google.com/github/mrSnow95/Python/blob/master/Getting_Started_with_Scientific_Python.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
import numpy as np
from IPython.display import Math, HTML
display(HTML("<script src='https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.3/"
               "latest.js?config=default'></script>"))

# **Numpy Arrays** 

In [None]:
x = np.array([1,2,3],dtype = np.float64)
x.itemsize #bytes of each element

8

Numpy uses unary function:

In [None]:
x = np.sin(np.array([1,2,3],dtype = np.float32))
#instead of, which is also slower 
from math import sin
y = [sin(i) for i in [1,2,3]]

print(x)
print(y)

[0.84147096 0.9092974  0.14112   ]
[0.8414709848078965, 0.9092974268256817, 0.1411200080598672]


Numpy arrays follow the usual Python slicing rules in multiple dimensions as shown below where the : colon character selects all elements along a particular axis.

In [None]:
x = np.array([ [1,2,3],[4,5,6]])
print(x[:,0])
print(x[0,:])
print(x[:,::2])
print(x[:,::-1])

[1 4]
[1 2 3]
[[1 3]
 [4 6]]
[[3 2 1]
 [6 5 4]]


Numpy uses pass-by- reference semantics so that slice operations are views into the array without implicit copying. This is particularly helpful with large arrays that already strain available memory. In Numpy terminology, slicing creates views (no copying) and advanced indexing creates copies.



In [None]:
x = np.array([[1,2,3],[4,5,6],[7,8,9]])
y = x[:,[0,1,2,2]] # advanced indexing --> creates new copy
z = x[:,0] #a reference to x
print(y)
x[0,0] = 9999
print(x)
print(y)
print(z)


[[1 2 3 3]
 [4 5 6 6]
 [7 8 9 9]]
[[9999    2    3]
 [   4    5    6]
 [   7    8    9]]
[[1 2 3 3]
 [4 5 6 6]
 [7 8 9 9]]
[9999    4    7]


Note that if you want to explicitly force a copy without any indexing tricks, you can do y=x.copy(). The code below works through another example of advanced indexing versus slicing.


In [None]:
x = np.arange(5)
print(x)
y = x[[0,1,2]]
print(y)
z = x[:3]
print(z)
x[0] = 99999
print(x)
print(y) #unaffected
print(z)

[0 1 2 3 4]
[0 1 2]
[0 1 2]
[99999     1     2     3     4]
[0 1 2]
[99999     1     2]


Manipulating memory using views is particularly powerful for signal and image processing algorithms that require overlapping fragments of memory. The following is an example of how to use advanced Numpy to create overlapping blocks that do not actually consume additional memory.  The important part is that memory is re-used in the resulting 7x4 Numpy array.

In [None]:
from numpy.lib.stride_tricks import as_strided

In [None]:
x = np.arange(16,dtype = np.int64) #8 byte int
y = as_strided(x,(7,4),(16,8)) #offset of 16 bytes in j , 8 bytes in i dimension (move from x to x+1 in row,x to x+2 in col)
print(x)
print(y)

[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15]
[[ 0  1  2  3]
 [ 2  3  4  5]
 [ 4  5  6  7]
 [ 6  7  8  9]
 [ 8  9 10 11]
 [10 11 12 13]
 [12 13 14 15]]


In [None]:
#same memory shared
x[0] = 999999
print(y)

[[999999      1      2      3]
 [     2      3      4      5]
 [     4      5      6      7]
 [     6      7      8      9]
 [     8      9     10     11]
 [    10     11     12     13]
 [    12     13     14     15]]


as_strided does not check that you stay within memory block bounds.

In [None]:
def _as_strided(x,k):
  return as_strided(x,(k,n-k+1),(x.itemsize,)*2)


n = 8
x = np.arange(n)
k = 5 #number of rows desired
y = _as_strided(x,5)

# **Numpy Matrices**

Matrices in Numpy are similar to Numpy arrays but they can only have two dimensions. They implement row–column matrix multiplication as opposed to element- wise multiplication.

In [None]:
A = np.matrix([[1,2,3],[4,5,6],[7,8,9]])
x = np.matrix([[1],[0],[0]])
print(A @ x)

[[1]
 [4]
 [7]]


It is unnecessary to cast all multiplicands to matrices for multiplication.

In [None]:
A = np.ones((3,3)) #np array
print(type(A))
x = np.ones((3,1))
print(A*x) #element-wise
print(np.matrix(A,copy=False)*x) #unnecessary to cast x

<class 'numpy.ndarray'>
[[1. 1. 1.]
 [1. 1. 1.]
 [1. 1. 1.]]
[[3.]
 [3.]
 [3.]]


# **Numpy Broadcasting** 

Numpy broadcasting is a powerful way to make implicit multidimensional grids for expressions. It is probably the single most powerful feature of Numpy and the most difficult to grasp. Proceeding by example, consider the vertices of a two-dimensional unit square as shown below

In [None]:
X,Y = np.meshgrid(np.arange(2),np.arange(2))
print(X)
print(Y)
print(X+Y)

[[0 1]
 [0 1]]
[[0 0]
 [1 1]]
[[0 1]
 [1 2]]



Numpy’s meshgrid creates two-dimensional grids. The X and Y arrays have corresponding entries match the coordinates of the vertices of the unit square (e.g., (0, 0), (0, 1), (1, 0), (1, 1)).
It turns out we can skip a step here and not bother with meshgrid to implicitly obtain the vertex coordinates by using broadcasting as shown below 

In [None]:
x = np.array([0,1])
y = np.array([0,1])


print(x + y[:,None])

[[0 1]
 [1 2]]


In [None]:
x = np.array([0,1])
y = np.array([0,1,2])
X,Y = np.meshgrid(x,y)
print(X)
print(Y)
print(X+Y)

[[0 1]
 [0 1]
 [0 1]]
[[0 0]
 [1 1]
 [2 2]]
[[0 1]
 [1 2]
 [2 3]]


In [None]:
print(x + y[:,None])

[[0 1]
 [1 2]
 [2 3]]


Broadcasting works in multiple dimensions also

In [None]:
x = np.array([0,1])
y = np.array([0,1,2])
z = np.array([0,1,2,3])
print(x + y[:,None] + z[:,None,None])

[[[0 1]
  [1 2]
  [2 3]]

 [[1 2]
  [2 3]
  [3 4]]

 [[2 3]
  [3 4]
  [4 5]]

 [[3 4]
  [4 5]
  [5 6]]]


# **Numpy Masked Array**


Numpy provides a powerful method to temporarily hide array elements without changing the shape of the array itself,

In [None]:
from numpy import ma

In [None]:
x = np.arange(10)
y = ma.masked_array(x,x<5)
print(y)

[-- -- -- -- -- 5 6 7 8 9]


In [None]:
print(y.shape)
x[-1] = 999999
print(y)

(10,)
[-- -- -- -- -- 5 6 7 8 999999]


# **Floating Point Numbers** 


There are precision limitations when representing floating-point numbers on a computer with finite memory. For example, the following shows these limitations when adding two simple numbers,

In [None]:
print(0.1+0.2)

0.30000000000000004


So, then, why is the output not 0.3? The issue is the floating-point representation of the two numbers and the algorithm that adds them. To represent an integer in binary, we just write it out in powers of 2. For example, 230 = (11100110)_2. Python can do this conversion using string formatting,

In [None]:
print('{0:b}'.format(230))

def bin(x):
  b = []
  while x>0:
    x,r = divmod(x,2)
    b.append(r)
  b = b[::-1]
  return ''.join([str(i) for i in b])

print(bin(230))



11100110
11100110


Representing floating point is trickier because we have to represent these numbers as binary fractions. The IEEE 754 standard requires that floating-point numbers be represented as ±C × 2E where C is the significand (mantissa) and E is the exponent.
o represent a regular decimal fraction as binary fraction, we need to compute 23
the expansion of the fraction in the following form a1/2 + a2/4 + a3/8 ...

In [None]:
#divmod(x,y) = ((x//y),x%y)

#a1/2 + a2/4 + a3/8 + ..... = N
#a1 + a2/2 + a3/4 + ...... = 2N , a1 = floor(2N)
# after mod 1 and multiply by 2 ---> a2 + a3/2 + .... = 4N , a2 = floor(4N)
# and so on ........
def frac_bin(a):
  bits = []
  while a > 0:
    q,a = divmod(a*2,1)
    bits.append(q)

  return ''.join(['%d' %i for i in bits])


print(frac_bin(0.1))


0001100110011001100110011001100110011001100110011001101


In [None]:
Math(r'1.\overline{1001}*2^{-4}')


<IPython.core.display.Math object>


Note that the representation has an infinitely repeating pattern

Per the IEEE 74 standard, for float type , we have 24-bits for the significand and 23-bits for the fractional part. Because we cannot represent the infinitely repeating sequence, we have to round off at 23-bits, 10011001100110011001101. Thus, whereas the significand’s representation used to be 1.6, with this rounding, it is now :

In [None]:
b = '10011001100110011001101'
print(1+sum([int(i)/(2**n) for n,i in enumerate(b,1)]))

1.600000023841858


In [None]:
print(frac_bin(0.2))
print(frac_bin(0.1))

001100110011001100110011001100110011001100110011001101
0001100110011001100110011001100110011001100110011001101


So:

0.1 + 0.2 =   

              0.11001100110011001100110 +
              1.10011001100110011001101 
              -------------------------- 
              10.01100110011001100110011

In [None]:
#then  , 0.3 = 
k = '00110011001100110011010'
print('%0.12f'%((1+sum([int(i)/(2**n) for n,i in enumerate(k,1)]))/2**2))
print('%0.12f' % (np.float32(0.1) + np.float32(0.2)))

0.300000011921
0.300000011921


The entire process proceeds the same for 64-bit floats. Python has a fractions and decimal modules that allow more exact number representations. The decimal module is particularly important for certain financial computations

 ***Round off Error***

In [None]:
print('{0:b}'.format(100000000))
print('{0:b}'.format(10))


101111101011110000100000000
1010


1.01111101011110000100000000 


+


0.00000000000000000000001010 

-----------------------------
1.01111101011110000100001010

In [None]:
print(format(np.float32(100000000)+np.float32(10),'10.3f'))

100000008.000


In [None]:
import math
x = math.fsum([np.float32(100000000),np.float32(10)])
print(x)

100000010.0


#**PANDUDO**

In [None]:
import pandas as pd

x = pd.Series(index = range(5),data=[1,3,9,11,12])
print(x)

0     1
1     3
2     9
3    11
4    12
dtype: int64


The main thing to keep in mind with Pandas is that these data structures were originally designed to work with time-series data. In that case, the index in the data structures corresponds to a sequence of ordered time stamps. In the general case, the index must be a sort-able array-like entity. For example :


In [None]:
x=pd.Series(index = ['a','b','d','z','z'],data=[1,3,9,11,12])
print(x)

a     1
b     3
d     9
z    11
z    12
dtype: int64


In [None]:
print(x.a)
print(x.z)
print(x.iloc[:3])
print(x.loc['a':'d'])
print(x['a':'d'])

1
z    11
z    12
dtype: int64
a    1
b    3
d    9
dtype: int64
a    1
b    3
d    9
dtype: int64
a    1
b    3
d    9
dtype: int64


The main power of Pandas comes from its power to aggregate and group data. 

In [None]:
x = pd.Series(range(5),[1,2,11,9,10])

grp=x.groupby(lambda i:i%2) # odd or even
print(grp.get_group(0))
print(grp.get_group(1))
print(grp.max())

2     1
10    4
dtype: int64
1     0
11    2
9     3
dtype: int64
0    4
1    3
dtype: int64


The Pandas DataFrame is an encapsulation of the Series that extends to two dimensions. One way to create a DataFrame is with dictionaries as in the following:

In [None]:
df = pd.DataFrame({'col1': [1,3,11,2], 'col2': [9,23,0,2]})
df.iloc[:2,:2]

Unnamed: 0,col1,col2
0,1,9
1,3,23


In [None]:
df['col1']

0     1
1     3
2    11
3     2
Name: col1, dtype: int64

In [None]:
grp = df.groupby('col1')
print(grp.get_group(1))

   col1  col2
0     1     9


In [None]:
df['sum_col']=df.eval('col1+col2')

In [None]:
df

Unnamed: 0,col1,col2,sum_col
0,1,9,10
1,3,23,26
2,11,0,11
3,2,2,4


In [None]:
grp =  df.groupby(['sum_col','col1'])
res = grp.sum()
res

Unnamed: 0_level_0,Unnamed: 1_level_0,col2
sum_col,col1,Unnamed: 2_level_1
4,2,2
10,1,9
11,11,0
26,3,23


# **Sympy**

In [None]:
import sympy as S 

In [None]:
x = S.symbols('x')
p=sum(x**i for i in range(3))
p

x**2 + x + 1

In [None]:
S.solve(p)

[-1/2 - sqrt(3)*I/2, -1/2 + sqrt(3)*I/2]

In [None]:
from sympy.abc import a,b,c

In [None]:
p = a* x**2 + b*x + c
S.solve(p,x)

[(-b + sqrt(-4*a*c + b**2))/(2*a), -(b + sqrt(-4*a*c + b**2))/(2*a)]

In [None]:
a = S.symbols('a',real=False)
S.expand_complex(S.exp(S.I*a))

I*exp(-im(a))*sin(re(a)) + exp(-im(a))*cos(re(a))

A powerful way to use Sympy is to construct complicated expressions that you can later evaluate using Numpy via the lambdify method. For example,


In [None]:
y = S.tan(x)*x + x**2
yf = S.lambdify(x,y,'numpy')
yf(.1)

0.020033467208545055

In [None]:
y.subs(x,.1)

0.0200334672085451

# **Quick Guide to Performance and Parallel Programming**

Use in your local python enviroment , doesnt work on Jupyter

In [None]:
# filename multiprocessing_demo.py import multiprocessing
import time
from concurrent import futures
def worker(k):
  'worker function'
  print('am starting process %d' % (k)) 
  time.sleep(10) # wait ten seconds print('am done waiting!')
  return

def main():
  with futures.ProcessPoolExecutor(max_workers=3) as executor:
    list(executor.map(worker,range(10)))
  if __name__ == '__main__': main()

#python multiprocessing_demo.py

ModuleNotFoundError: ignored