# Numerical Python (Numpy) 

## Using numpy array

In [1]:
import numpy as np

## Numpy statistics operation 

In [3]:
arr = np.random.randn(2,5)
arr

array([[ 2.39586248,  1.55766698,  0.35335064, -0.21701862, -1.15431304],
       [ 0.34268217, -1.90526739,  0.27375168,  0.09558193,  0.90133209]])

In [9]:
# compute the mean for all elements
print(arr.mean())

0.2643628922454529


In [8]:
# compute the mean by row
print(arr.mean(axis=1))

[ 0.58710969 -0.0583839 ]


In [7]:
# compute the mean by column
print(arr.mean(axis=0))

[ 1.36927233 -0.17380021  0.31355116 -0.06071834 -0.12649047]


In [10]:
# sum of all element in array
print(arr.sum())

2.643628922454529


In [11]:
# compute the median of an array by row
print(np.median(arr, axis=1))

[0.35335064 0.27375168]


In [12]:
unsorted = np.random.randn(10)
print(unsorted)

[-0.34878377 -1.71926106 -1.12673504 -0.25790475 -0.18456874 -0.2625458
  1.04787669 -2.45731785  0.83187864 -1.54358134]


In [13]:
# create a copy of unsorted array to create another sorted array
sorted = np.array(unsorted)
sorted.sort()
print(sorted)
print(unsorted)

[-2.45731785 -1.71926106 -1.54358134 -1.12673504 -0.34878377 -0.2625458
 -0.25790475 -0.18456874  0.83187864  1.04787669]
[-0.34878377 -1.71926106 -1.12673504 -0.25790475 -0.18456874 -0.2625458
  1.04787669 -2.45731785  0.83187864 -1.54358134]


# Set opearation in numpy

In [14]:
arr = np.array([1,2,3,1,2,3,2,1,3,4])

# using np.unique to get only the unique element
print(np.unique(arr))
print()


[1 2 3 4]


In [20]:
s1 = np.array(['desk', 'chair', 'bulb'])
s2 = np.array(['lamp', 'bulb', 'chair'])

# intersecting the two sets
print(np.intersect1d(s1,s2))

# combining two sets
print(np.union1d(s1, s2))

# difference two sets
print(np.setdiff1d(s2,s1))

# boolean 
print(np.in1d(s1,s2))

['bulb' 'chair']
['bulb' 'chair' 'desk' 'lamp']
['lamp']
[False  True  True]


## Some Unix functionality for DS

In [25]:
ls

02_UNIX_reading.pdf  Icon  [0m[01;34munix[0m/  UNIX-Jupyter-Notebook-Example.ipynb


In [26]:
!ls ./unix/

Icon  shakespeare.txt


In [28]:
%env filename=./unix/shakespeare.txt

env: filename=./unix/shakespeare.txt


In [29]:
!echo $filename

./unix/shakespeare.txt


In [32]:
!head -n 5 $filename

This is the 100th Etext file presented by Project Gutenberg, and
is presented in cooperation with World Library, Inc., from their
Library of the Future and Shakespeare CDROMS.  Project Gutenberg
often releases Etexts that are NOT placed in the Public Domain!!



In [34]:
!tail -n 5 $filename


End of this Etext of The Complete Works of William Shakespeare





In [43]:
!wc $filename

 124505  901447 5583442 ./unix/shakespeare.txt


In [49]:
!wc -w $filename

901447 ./unix/shakespeare.txt


In [50]:
!cat $filename | wc -l

124505


In [10]:
!grep -i 'parchment' $filename

  If the skin were parchment, and the blows you gave were ink,
  Ham. Is not parchment made of sheepskins?
    of the skin of an innocent lamb should be made parchment? That
    parchment, being scribbl'd o'er, should undo a man? Some say the
    Upon a parchment, and against this fire
    But here's a parchment with the seal of Caesar;  
    With inky blots and rotten parchment bonds;
    Nor brass, nor stone, nor parchment, bears not one,


In [9]:
!grep -i 'liberty' $filename | wc -l

72


__sed (streamline editor)__
sed is a powerful stream editor, it works similarly to _grep_ but it modifies the output text by using regular expression and replacement

For example: 
    s/from/to/g
    
    - s - for substitution
    - from - is the word to match
    - to - is the replacement string
    - g - specify to apply this to all occurrences on a line, not just the first

In [5]:
%env filename=./unix/shakespeare.txt

env: filename=./unix/shakespeare.txt


In [6]:
!head -n 5 $filename

This is the 100th Etext file presented by Project Gutenberg, and
is presented in cooperation with World Library, Inc., from their
Library of the Future and Shakespeare CDROMS.  Project Gutenberg
often releases Etexts that are NOT placed in the Public Domain!!



In [19]:
!head -n 5 $filename | sort -t' ' -k2


This is the 100th Etext file presented by Project Gutenberg, and
Library of the Future and Shakespeare CDROMS.  Project Gutenberg
is presented in cooperation with World Library, Inc., from their
often releases Etexts that are NOT placed in the Public Domain!!


In [20]:
!sort $filename | wc -l

124505


In [22]:
!sort $filename | uniq -u | wc -l

110834


In [28]:
!sed -e 's/ /\n/g' -e 's/\r//g' $filename | sed '/^$/d' | sort | uniq -c | sort -nr | head -n 10

  23244 the
  19542 I
  18302 and
  15623 to
  15551 of
  12532 a
  10824 my
   9576 in
   9081 you
   7851 is
sort: write failed: 'standard output': Broken pipe
sort: write error


In [7]:
!sed -e 's/parchment/manuscript/g' $filename > temp.txt

In [8]:
!grep -i 'manuscript' temp.txt

  If the skin were manuscript, and the blows you gave were ink,
  Ham. Is not manuscript made of sheepskins?
    of the skin of an innocent lamb should be made manuscript? That
    manuscript, being scribbl'd o'er, should undo a man? Some say the
    Upon a manuscript, and against this fire
    But here's a manuscript with the seal of Caesar;  
    With inky blots and rotten manuscript bonds;
    Nor brass, nor stone, nor manuscript, bears not one,


## Numpy and list  (speed test)

In [12]:
from numpy import arange
from timeit import Timer

size = 100000
timeits = 100

In [13]:
nd_array = arange(size)
print(type(nd_array))

<class 'numpy.ndarray'>


In [15]:
timer_numpy = Timer('nd_array.sum()', 'from __main__ import nd_array')

print("Time taken by numpy array %f seconds" %
      (timer_numpy.timeit(timeits)/timeits))

Time taken by numpy array 0.000126 seconds


In [16]:
py_list = range(size)
print(type(py_list))

<class 'range'>


In [17]:
timer_list = Timer('sum(py_list)', 'from __main__ import py_list')

print("Time taken by python list %f seconds" %
     (timer_list.timeit(timeits)/timeits))

Time taken by python list 0.002382 seconds
