---
title: Arrays - Numpy vs Python
tags: [jupyter]
keywords: numpy
summary: "Manipulating arrays in numpy. It is faster then regular python code."
mlType: dataFrame
infoType: numpy
sidebar: numpy_sidebar
permalink: __AutoGenThis__
notebookfilename:  __AutoGenThis__
---

This was taken from [Understanding Data Types](https://jakevdp.github.io/PythonDataScienceHandbook/02.01-understanding-data-types.html) from the [Python Data Science Handbook](https://jakevdp.github.io/PythonDataScienceHandbook/index.html)

In [1]:
import numpy as np

# A Python List Is More Than Just a List

But this flexibility comes at a cost: to allow these flexible types, each item in the list must contain its own type info, reference count, and other information–that is, each item is a complete Python object. In the special case that all variables are of the same type, much of this information is redundant: it can be much more efficient to store data in a fixed-type array. 

![](https://jakevdp.github.io/PythonDataScienceHandbook/figures/array_vs_list.png)

At the implementation level, the array essentially contains a single pointer to one contiguous block of data. The Python list, on the other hand, contains a pointer to a block of pointers, each of which in turn points to a full Python object like the Python integer we saw earlier. Again, the advantage of the list is flexibility: because each list element is a full structure containing both data and type information, the list can be filled with data of any desired type. Fixed-type NumPy-style arrays lack this flexibility, but are much more efficient for storing and manipulating data.

# Data Types

![](https://drive.google.com/uc?id=1XLumLUckrkyDuop3MUm2cm4YrBepspei)

# Creating Arrays

In [3]:
singleSize = 10
doubleSize = (3,3)

## Zeros

In [4]:
np.zeros(singleSize,dtype=int)

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

In [5]:
np.zeros(doubleSize,dtype=int)

array([[0, 0, 0],
       [0, 0, 0],
       [0, 0, 0]])

## Ones

In [6]:
np.ones(singleSize)

array([1., 1., 1., 1., 1., 1., 1., 1., 1., 1.])

In [7]:
np.ones(doubleSize)

array([[1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.]])

## Full

In [9]:
value = 69
np.full(singleSize,value)

array([69, 69, 69, 69, 69, 69, 69, 69, 69, 69])

In [10]:
np.full(doubleSize,value)

array([[69, 69, 69],
       [69, 69, 69],
       [69, 69, 69]])

## Range

In [11]:
start = 0
fin = 10
stepSize = 2
np.arange(start,fin,stepSize)

array([0, 2, 4, 6, 8])

## Linspace

Even division between start and fin

In [12]:
numDiv = 5
np.linspace(start,fin,numDiv)

array([ 0. ,  2.5,  5. ,  7.5, 10. ])

## Random

### Between 0 - 1

In [13]:
np.random.random(singleSize)

array([0.50625411, 0.30866522, 0.55544225, 0.69936198, 0.90111516,
       0.11845291, 0.30593971, 0.06636299, 0.73058708, 0.47755572])

In [14]:
np.random.random(doubleSize)

array([[0.12605935, 0.99662824, 0.52395118],
       [0.03408881, 0.13773552, 0.00420693],
       [0.08413476, 0.50318612, 0.67403025]])

### With mean 0 and std = 1

In [15]:
meanDistribution = 0
std = 1
np.random.normal(meanDistribution,
                std,
                singleSize)

array([-0.10969447, -0.32679376,  0.23725352, -0.01473572,  0.01044184,
        2.25738311,  0.81501386, -1.80356716,  1.51986601,  0.28282042])

In [16]:
np.random.normal(meanDistribution,
                std,
                doubleSize)

array([[-0.04712068, -0.40976668, -0.37857325],
       [-0.07041747, -0.92932029,  0.09667112],
       [-0.62850281, -0.68910895, -0.91103534]])

### Between numA - numB

In [17]:
numA = 5
numB = 55
np.random.randint(numA,
                 numB,
                 singleSize)

array([28, 47, 43, 30,  8, 41,  9, 45, 10, 31])

In [18]:
np.random.randint(numA,
                 numB,
                 doubleSize)

array([[42, 54, 14],
       [36, 46, 38],
       [36,  8, 17]])

## Identity

**nxn** matrix of the identity matrix

In [19]:
np.eye(3)

array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

# Indexing

**Same as MATLAB indexing**

In [35]:
xSingle = np.random.random(singleSize)
xSingle

array([0.76415044, 0.97321119, 0.45453019, 0.81417035, 0.35602034,
       0.5592721 , 0.41901664, 0.0773567 , 0.69083001, 0.9730589 ])

In [36]:
xDouble = np.random.random(doubleSize)
xDouble

array([[0.81917042, 0.76320116, 0.8469064 ],
       [0.93616956, 0.95275128, 0.51745702],
       [0.75130518, 0.57925229, 0.08022912]])

In [37]:
xSingle[5]

0.5592720965024434

In [38]:
xSingle[-1]

0.9730589040169998

## Subarrays x[start:stop:stepSize]

In [39]:
xSingle[1:5:2]

array([0.97321119, 0.81417035])

xDouble[**row**,**col**]

In [41]:
xDouble[1:3,1:]

array([[0.95275128, 0.51745702],
       [0.57925229, 0.08022912]])

# Reshaping

In [43]:
np.arange(1,10)

array([1, 2, 3, 4, 5, 6, 7, 8, 9])

reshape 3x3

In [42]:
grid = np.arange(1, 10).reshape((3, 3))
print(grid)

[[1 2 3]
 [4 5 6]
 [7 8 9]]


## One Dimention to Another [1xn] - > [nx1]

In [45]:
someList = [5,9,9,1,2,3]
npList = np.array(someList)
npList

array([5, 9, 9, 1, 2, 3])

Lets say you want to make these an individual array we can do something like this these using the function **np.newaxis**

In [46]:
npList[np.newaxis, :]

array([[5, 9, 9, 1, 2, 3]])

In [47]:
npList[:, np.newaxis]

array([[5],
       [9],
       [9],
       [1],
       [2],
       [3]])

# Concatenation

In [49]:
x = np.arange(1,5)
x

array([1, 2, 3, 4])

In [51]:
y = np.arange(6,10)
y

array([6, 7, 8, 9])

In [54]:
z = np.arange(11,15)
z

array([11, 12, 13, 14])

In [53]:
np.concatenate([x,y])

array([1, 2, 3, 4, 6, 7, 8, 9])

In [55]:
np.concatenate([x,y,z])

array([ 1,  2,  3,  4,  6,  7,  8,  9, 11, 12, 13, 14])

## Axis concatenation

These will be 2D arrays

In [57]:
a = np.array([x,y])
a

array([[1, 2, 3, 4],
       [6, 7, 8, 9]])

In [58]:
b = np.array([z,z])
b

array([[11, 12, 13, 14],
       [11, 12, 13, 14]])

### Vertical (concatenate and vstack)

In [59]:
np.concatenate([a,b])

array([[ 1,  2,  3,  4],
       [ 6,  7,  8,  9],
       [11, 12, 13, 14],
       [11, 12, 13, 14]])

In [61]:
np.vstack([a,b])

array([[ 1,  2,  3,  4],
       [ 6,  7,  8,  9],
       [11, 12, 13, 14],
       [11, 12, 13, 14]])

### Horizontal (concatenate and hstack)

In [60]:
np.concatenate([a,b],axis=1)

array([[ 1,  2,  3,  4, 11, 12, 13, 14],
       [ 6,  7,  8,  9, 11, 12, 13, 14]])

In [62]:
np.hstack([a,b])

array([[ 1,  2,  3,  4, 11, 12, 13, 14],
       [ 6,  7,  8,  9, 11, 12, 13, 14]])

# Splitting

In [63]:
x = [1, 2, 3, 99, 99, 3, 2, 1]

# split after the 3rd and 5th element
x1, x2, x3 = np.split(x, [3, 5])
print(x1, x2, x3)

[1 2 3] [99 99] [3 2 1]


## Vertical

In [64]:
grid = np.arange(16).reshape((4, 4))
grid

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15]])

Spit after the 2nd row 

In [65]:
upper, lower = np.vsplit(grid, [2])

In [66]:
upper

array([[0, 1, 2, 3],
       [4, 5, 6, 7]])

In [67]:
lower

array([[ 8,  9, 10, 11],
       [12, 13, 14, 15]])

Split after the 3rd row

In [68]:
upper, lower = np.vsplit(grid, [1])

In [69]:
upper

array([[0, 1, 2, 3]])

In [70]:
lower

array([[ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15]])

## Horizontal

In [71]:
left, right = np.hsplit(grid, [2])
print(left)
print(right)

[[ 0  1]
 [ 4  5]
 [ 8  9]
 [12 13]]
[[ 2  3]
 [ 6  7]
 [10 11]
 [14 15]]


In [73]:
left, mid, right = np.hsplit(grid, [1,3])
print(left)
print(mid)
print(right)

[[ 0]
 [ 4]
 [ 8]
 [12]]
[[ 1  2]
 [ 5  6]
 [ 9 10]
 [13 14]]
[[ 3]
 [ 7]
 [11]
 [15]]


# Structured

This is similar to **dictionaries** in python

Structured arrays like the ones discussed here are good to know about for certain situations, especially in case you're using NumPy arrays to map onto binary data formats in C, Fortran, or another language. But for exploratory use you need to be using **Pandas**.

In [2]:
name = ['Alice', 'Bob', 'Cathy', 'Doug']
age = [25, 45, 37, 19]
weight = [55.0, 85.5, 68.0, 61.5]

## Initialization

In [3]:
# Use a compound data type for structured arrays
data = np.zeros(4, dtype={'names':('name', 'age', 'weight'),
                          'formats':('U10', 'i4', 'f8')})
print(data.dtype)

[('name', '<U10'), ('age', '<i4'), ('weight', '<f8')]


Note that these are the list of formats you can use for the intialization of the structured arrays.

![](https://drive.google.com/uc?id=1qJsVr1Iq-CEm6_lTMNH-6rSXH6cRv38r)

## Allocation

In [4]:
data['name'] = name
data['age'] = age
data['weight']= weight

In [5]:
print(data)

[('Alice', 25, 55. ) ('Bob', 45, 85.5) ('Cathy', 37, 68. )
 ('Doug', 19, 61.5)]


## Indexing

This is similar to pandas coloumn indexing

In [6]:
data['name']

array(['Alice', 'Bob', 'Cathy', 'Doug'], dtype='<U10')

This is similar to pandas row indexing

In [7]:
data[0]

('Alice', 25, 55.)