# Introduction

- Purpose: will learn how to effectively load, store, and manipulate in-memory data in Python

- Best to think of all data fundamentally as **arrays of numbers**

- Images can be thought of as 2D arrays: pixel brightness across the area

- Sound clips as 1D arrays: intensity versus time

- Text can be thought of as binary digits representing the frequency of certain words or pairs of words

- Regardless of the data, first step is to convert it to an array of numbers

- Efficient storage and manipulation of numerical arrays is fundamental process of doing data science
    - NumPy package and Pandas Package are specialized tools to handle such numerical arrays
- NumPy arrays are similar to Python's built-in `list` type
    - But NumPy arrays are more efficient storage and data operations as arrays grow larger


In [4]:
import numpy
numpy.__version__






'1.26.4'

## Understanding Data Types in Python



In [3]:
import array

L = list(range(10))
A = array.array('i', L)
A

array('i', [0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [12]:
import numpy as np

#integer array:
np.array([1, 4, 2, 5, 3])

np.array([3.14, 1, 2, 3, 4])

np.array([1, 3, 7, 8], dtype='float32')

# nested lists result in multi-dimensional arrays
# 2D array where the inner loop is treated as rows
np.array([range(i, i + 3) for i in [2, 4, 6]])

array([[2, 3, 4],
       [4, 5, 6],
       [6, 7, 8]])

In [20]:
# 3x5 array of 0s
print(np.zeros((3, 5), dtype='int'))
print("\n")
print(np.ones(10, dtype=float))
print(f"\n{np.full((5, 2), 3.14)}")
#similar to range() function:
#start, stop, step
print(np.arange(5, 65, 3))
print("\n")
print(np.linspace(0, 1, 7))
print("\n")
print(np.random.random((2,3)))

[[0 0 0 0 0]
 [0 0 0 0 0]
 [0 0 0 0 0]]


[1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

[[3.14 3.14]
 [3.14 3.14]
 [3.14 3.14]
 [3.14 3.14]
 [3.14 3.14]]
[ 5  8 11 14 17 20 23 26 29 32 35 38 41 44 47 50 53 56 59 62]


[0.         0.16666667 0.33333333 0.5        0.66666667 0.83333333
 1.        ]


[[0.66718794 0.39318742 0.42840278]
 [0.51740212 0.36759477 0.01848693]]


In [37]:
import numpy as np
np.random.seed(0)

x1 = np.random.randint(10, size=6)
x2 = np.random.randint(10, size=(3,4))
x3 = np.random.randint(10, size=(3,4,5))
print(x2)
print(x2[2])


[[3 5 2 4]
 [7 6 8 8]
 [1 6 7 7]]
[1 6 7 7]


In [42]:
grid = np.arange(1,10).reshape((3, 3))
grid

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])