# "Python for Data Analysis" Course from [Oreilly](https://learning.oreilly.com/library/view/python-for-data/9781491957653/)
## Chapter 4. NumPy Basics: Arrays and Vectorized Computation [Chapter4](https://learning.oreilly.com/library/view/python-for-data/9781491957653/ch04.html)

## Preparations
- **References**
    * NA
- **Advantage**
    * ndarray, an efficient multidimensional array providing fast array-oriented arithmetic operations and flexible broadcasting capabilities.
    * Mathematical functions for fast operations on entire arrays of data without having to write loops.
    * Tools for reading/writing array data to disk and working with memory-mapped files.
    * Linear algebra, random number generation, and Fourier transform capabilities.
    * A C API for connecting NumPy with libraries written in C, C++, or FORTRAN.

In [6]:
# NumPy-based algorithms are generally 10 to 100 times faster (or more) 
# than their pure Python counterparts and use significantly less memory
import numpy as np

my_arr = np.arange(1000000)
my_list = list(range(1000000))

%time for _ in range(10): my_arr2 = my_arr * 2
%time for _ in range(10): my_list2 = [x * 2 for x in my_list]



CPU times: user 9.17 ms, sys: 5.92 ms, total: 15.1 ms
Wall time: 15.1 ms
CPU times: user 565 ms, sys: 121 ms, total: 686 ms
Wall time: 686 ms


## 4.1 The NumPy ndarray: A Multidimensional Array Object
- **Features**
    * Arrays enable you to perform mathematical operations on whole blocks of data using similar syntax to the equivalent operations between scalar elements
    * Every array has a **shape**, a tuple indicating the size of each dimension, and a **dtype**, an object describing the data type of the array
    * Since NumPy is focused on numerical computing, the data type, if not specified, will in many cases be **float64** (floating point)
- **Create Array**
    * The easiest way to create an array is to use the **array** function. 
    * It’s not safe to assume that **np.empty** will return an array of all zeros. In some cases, it may return uninitialized “garbage” values

|Function	|Description|
|-----------|-----------|
|array	|Convert input data (list, tuple, array, or other sequence type) to an ndarray either by inferring a dtype or explicitly specifying a dtype; copies the input data by default|
|asarray	|Convert input to ndarray, but do not copy if the input is already an ndarray|
|arange	|Like the built-in range but returns an ndarray instead of a list|
|ones, ones_like	|Produce an array of all 1s with the given shape and dtype; ones_like takes another array and produces a ones array of the same shape and dtype|
|zeros, zeros_like	|Like ones and ones_like but producing arrays of 0s instead|
|empty, empty_like	|Create new arrays by allocating new memory, but do not populate with any values like ones and zeros|
|full, full_like	|Produce an array of the given shape and dtype with all values set to the indicated “fill value” full_like takes another array and produces a filled array of the same shape and dtype|
|eye, identity	|Create a square N × N identity matrix (1s on the diagonal and 0s elsewhere)|

In [58]:
data = np.random.randn(3, 5)
print(data[1],"\n",data[1] * 10)

print("shape:{0}\ndtype:{1}".format(data.shape,data.dtype))

# Creating array using array
data1 = np.array(([1,2,3,4],[5,6,7,8]), dtype=np.int32)
print(data1.shape, data1.ndim, data1.dtype)
# Creating array using else functions
# arange is an array-valued version of the built-in Python range function
print(np.zeros(5),"\n", np.empty((2, 3, 4)),"\n", np.ones((2,3)),"\n", np.arange(5))

# Change array dtype
data2 = data1.astype(np.float64)
print(data2.dtype)

[ 0.49762653 -0.28507136  0.35334354 -0.25492635  0.16372366] 
 [ 4.97626529 -2.8507136   3.53343539 -2.54926353  1.63723657]
shape:(3, 5)
dtype:float64
(2, 4) 2 int32
[0. 0. 0. 0. 0.] 
 [[[-2.31584178e+077 -2.31584178e+077  1.33397724e-322  0.00000000e+000]
  [ 2.12199579e-314  4.82337433e+228  6.14415221e-144  1.16097020e-028]
  [ 1.10684323e-047  2.73622032e-052  3.50367320e-033  3.97062373e+246]]

 [[ 1.16318408e-028  1.03141449e-071  2.14746379e+184  2.21368460e+160]
  [ 9.14385702e-043  5.04621361e+180  8.37170571e-144  2.41650078e+185]
  [ 3.59751658e+252  8.76749093e+252  5.01163205e+217  8.37170074e-144]]] 
 [[1. 1. 1.]
 [1. 1. 1.]] 
 [0 1 2 3 4]
float64


In [None]:
import matplotlib.pyplot as plt
import seaborn as sns
import scipy.stats
import pandas as pd
import numpy as np
# from scipy.stats import uniform
# from scipy.stats import norm

In [None]:
data_normal = scipy.stats.norm.rvs(size=10000, loc=1, scale=1)
ax = sns.distplot(data_normal,
                  bins=100,
                  kde=True,
                  color='skyblue',
                  hist_kws={"linewidth": 15,'alpha':1})
ax.set(xlabel='Normal Distribution', ylabel='Frequency')
plt.show()