# Array Creation

In [2]:
import numpy as np

NumPy offers **six main ways** to create arrays. Which method you use depends on whether you’re starting with existing data, generating values, or loading from files.

---

## 1. Convert from Python structures
You can turn built-in Python containers like **lists** or **tuples** into NumPy arrays.

```python

In [5]:
import numpy as np

list_data = [1, 2, 3, 4]
tuple_data = (5, 6, 7, 8)

arr_from_list = np.array(list_data)
arr_from_tuple = np.array(tuple_data)

print(arr_from_list)   # [1 2 3 4]
print(arr_from_tuple)  # [5 6 7 8]

[1 2 3 4]
[5 6 7 8]


- a list of numbers will create a 1D array,
- a list of lists will create a 2D array,
- further nested lists will create higher-dimensional arrays. In general, any array object is called an ndarray in NumPy.

In [11]:
a1D = np.array([1, 2, 3, 4])
print(a1D)
a2D = np.array([[1, 2], [3, 4]])
print(a2D)
a3D = np.array([[[1, 2], [3, 4]], [[5, 6], [7, 8]]])
print(a3D)

[1 2 3 4]
[[1 2]
 [3 4]]
[[[1 2]
  [3 4]]

 [[5 6]
  [7 8]]]


### Data Types in NumPy Arrays

When you create an array using `numpy.array()`, it’s important to think about the **data type** (`dtype`) of the elements.
You can **explicitly specify** the `dtype` when creating an array.

Why does this matter?
- The `dtype` determines how much memory each element uses.
- It controls how elements are stored internally and how they interact with low-level C/C++ functions that NumPy is built on.
- It also affects whether certain values can “fit” in the array.

---

#### Explicitly setting dtype
```python

In [14]:
import numpy as np

arr_int = np.array([1, 2, 3], dtype=np.int32)
arr_float = np.array([1, 2, 3], dtype=np.float64)

print(arr_int, arr_int.dtype)     # [1 2 3] int32
print(arr_float, arr_float.dtype) # [1. 2. 3.] float64

[1 2 3] int32
[1. 2. 3.] float64


#### When values don’t fit the dtype

If you try to put a value that doesn’t match the specified dtype, NumPy may raise an error or truncate/convert the value.

In [15]:
# Example 1: Forcing integers into dtype uint8 (0–255)
arr = np.array([100, 200, 300], dtype=np.uint8)
print(arr)   # [100 200  44]  → 300 is too large for uint8, so it wraps around!

# Example 2: Forcing non-integers into int dtype
try:
    np.array([1.2, 2.5, 3.7], dtype=np.int32)
except Exception as e:
    print("Error:", e)

OverflowError: Python integer 300 out of bounds for uint8

#### Memory usage and precision

Choosing a smaller dtype saves memory, but reduces precision.
Let’s compare the memory footprint of arrays with different dtypes.

In [16]:
# Create 1 million numbers
arr_float32 = np.ones(1_000_000, dtype=np.float32)
arr_float64 = np.ones(1_000_000, dtype=np.float64)

print("Float32 size in bytes:", arr_float32.nbytes)  # 4 MB
print("Float64 size in bytes:", arr_float64.nbytes)  # 8 MB

Float32 size in bytes: 4000000
Float64 size in bytes: 8000000


- float32 uses 4 bytes per element
- float64 uses 8 bytes per element

That’s double the memory for the same number of elements!

By controlling dtype, you balance memory efficiency and precision.
This choice becomes crucial when working with large datasets, images, or scientific computations where both performance and accuracy matter.

## 2. Use NumPy’s built-in functions

NumPy provides **40+ built-in functions** to create arrays. These are grouped under the "array creation routines".
Broadly, they can be split into three categories, based on the *dimension* of the arrays they produce:

- **1D arrays**
- **2D arrays**
- **General ndarrays** (multi-dimensional arrays of any shape)

---

### a. Creating 1D Arrays

The most common 1D creation functions are `numpy.arange()` and `numpy.linspace()`.

Both require **at least a start and stop value**. The difference is how you control the spacing:

---

##### `np.arange(start, stop, step)`
Creates arrays with regularly spaced increments.

```python

In [17]:
print(np.arange(10))
# [0 1 2 3 4 5 6 7 8 9]

print(np.arange(2, 10, dtype=float))
# [2. 3. 4. 5. 6. 7. 8. 9.]

print(np.arange(2, 3, 0.1))
# [2.  2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9]

[0 1 2 3 4 5 6 7 8 9]
[2. 3. 4. 5. 6. 7. 8. 9.]
[2.  2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9]


In [3]:
print(np.arange(0, 10, 2))  # [0 2 4 6 8]
print(np.ones((2, 3)))      # 2x3 array of ones
print(np.zeros((3, 2)))     # 3x2 array of zeros
print(np.empty((2, 2)))     # 2x2 uninitialized array

[0 2 4 6 8]
[[1. 1. 1.]
 [1. 1. 1.]]
[[0. 0.]
 [0. 0.]
 [0. 0.]]
[[0. 0.]
 [0. 0.]]


#### Notes:
- Best practice: keep start, stop, and step as integers to avoid floating-point rounding issues.
- When using floats (like step = 0.1), the output may sometimes include the stop value due to round-off error.


#### np.linspace(start, stop, num)

Creates arrays with a fixed number of elements equally spaced between start and stop.

In [18]:
np.linspace(1., 4., 6)
# [1.  1.6  2.2  2.8  3.4  4. ]

array([1. , 1.6, 2.2, 2.8, 3.4, 4. ])

Key difference:
- linspace guarantees both the number of elements and inclusion of the end point.
- arange guarantees equal step size but does not include the stop value.

### b. Creating 2D Arrays

NumPy also provides functions for creating special 2D arrays, often used in linear algebra.

#### Identity Matrix: np.eye(n, m=None)

Creates an identity-like matrix (ones on the diagonal, zeros elsewhere).

In [19]:
np.eye(3)
# [[1. 0. 0.]
#  [0. 1. 0.]
#  [0. 0. 1.]]

np.eye(3, 5)
# [[1. 0. 0. 0. 0.]
#  [0. 1. 0. 0. 0.]
#  [0. 0. 1. 0. 0.]]

array([[1., 0., 0., 0., 0.],
       [0., 1., 0., 0., 0.],
       [0., 0., 1., 0., 0.]])

#### Diagonal Matrices: np.diag()
- Given a 1D list → creates a square matrix with those values on the diagonal.
- Given a 2D array → extracts the diagonal values.

In [20]:
np.diag([1, 2, 3])
# [[1 0 0]
#  [0 2 0]
#  [0 0 3]]

np.diag([1, 2, 3], 1)   # places values one step above main diagonal
# [[0 1 0 0]
#  [0 0 2 0]
#  [0 0 0 3]
#  [0 0 0 0]]

a = np.array([[1, 2], [3, 4]])
np.diag(a)
# [1 4]

array([1, 4])

#### Vandermonde Matrix: np.vander(x, n)

Generates a matrix where each column is a power of the input vector.
Very useful in polynomial fitting and linear least squares models.

In [21]:
np.vander(np.linspace(0, 2, 5), 2)
# [[0.  1.]
#  [0.5 1.]
#  [1.  1.]
#  [1.5 1.]
#  [2.  1.]]

np.vander([1, 2, 3, 4], 4)
# [[ 1  1  1  1]
#  [ 8  4  2  1]
#  [27  9  3  1]
#  [64 16  4  1]]

array([[ 1,  1,  1,  1],
       [ 8,  4,  2,  1],
       [27,  9,  3,  1],
       [64, 16,  4,  1]])

### c. General ndarray Creation Functions

You can create arrays of any dimension using shape tuples.

**Zeros:** np.zeros(shape)

Creates arrays filled with zeros.

In [22]:
np.zeros((2, 3))
# [[0. 0. 0.]
#  [0. 0. 0.]]

np.zeros((2, 3, 2))
# 3D array of zeros

array([[[0., 0.],
        [0., 0.],
        [0., 0.]],

       [[0., 0.],
        [0., 0.],
        [0., 0.]]])

**Ones:** np.ones(shape)

Creates arrays filled with ones.

In [23]:
np.ones((2, 3))
# [[1. 1. 1.]
#  [1. 1. 1.]]

array([[1., 1., 1.],
       [1., 1., 1.]])

**Random values:** default_rng().random(shape)

Creates arrays filled with random floats between 0 and 1.

In [24]:
from numpy.random import default_rng
rng = default_rng(42)

print(rng.random((2,3)))
# [[0.77 0.43 0.86]
#  [0.70 0.09 0.98]]

[[0.77395605 0.43887844 0.85859792]
 [0.69736803 0.09417735 0.97562235]]


Setting a seed (42 here) ensures reproducible results.

**Index Grid:** np.indices()

Generates index arrays, useful for evaluating functions on a grid.

In [25]:
np.indices((3,3))
# [[[0 0 0]
#   [1 1 1]
#   [2 2 2]]
#
#  [[0 1 2]
#   [0 1 2]
#   [0 1 2]]]

array([[[0, 0, 0],
        [1, 1, 1],
        [2, 2, 2]],

       [[0, 1, 2],
        [0, 1, 2],
        [0, 1, 2]]])

### 3. Build from existing arrays

You can create new arrays by copying, reshaping, stacking, or modifying existing ones.

**Important: In NumPy, slices create views (not independent copies).
If you modify the view, the original array changes too.**


**Example:** View vs Copy

In [26]:
a = np.array([1, 2, 3, 4, 5, 6])
b = a[:2]   # view of first 2 elements
b += 1

print("a =", a)   # [2 3 3 4 5 6]
print("b =", b)   # [2 3]

a = [2 3 3 4 5 6]
b = [2 3]


To avoid this, explicitly use .copy():

In [27]:
a = np.array([1, 2, 3, 4])
b = a[:2].copy()
b += 1

print("a =", a)   # [1 2 3 4]
print("b =", b)   # [2 3]

a = [1 2 3 4]
b = [2 3]


#### Joining Arrays

Use routines like np.vstack, np.hstack, or np.block.

In [28]:
A = np.ones((2, 2))
B = np.eye(2, 2)
C = np.zeros((2, 2))
D = np.diag((-3, -4))

np.block([[A, B], [C, D]])
# [[ 1.  1.  1.  0.]
#  [ 1.  1.  0.  1.]
#  [ 0.  0. -3.  0.]
#  [ 0.  0.  0. -4.]]

array([[ 1.,  1.,  1.,  0.],
       [ 1.,  1.,  0.,  1.],
       [ 0.,  0., -3.,  0.],
       [ 0.,  0.,  0., -4.]])

In [29]:
arr = np.array([1, 2, 3])
replicated = np.tile(arr, 3)         # Repeat array 3 times
reshaped = np.arange(6).reshape(2,3) # Reshape 1D → 2D

print(replicated)  # [1 2 3 1 2 3 1 2 3]
print(reshaped)    # [[0 1 2]
                   #  [3 4 5]]

[1 2 3 1 2 3 1 2 3]
[[0 1 2]
 [3 4 5]]


### 4. Load from files

Arrays can be loaded from disk in standard formats like .csv or NumPy’s .npy.

Often, arrays are stored in external files. NumPy provides multiple ways to load them depending on the format.

**Common Binary Formats**
- HDF5 → use h5py
- FITS (astronomy) → use astropy
- Images (JPG, PNG, etc.) → use PIL or OpenCV

**Common ASCII Formats**

Delimited files like .csv or .tsv can be read using:

In [31]:
import numpy as np

# Step 1: create some sample data
data = np.array([
    [0, 0],
    [1, 1],
    [2, 4],
    [3, 9]
])

# Step 2: save as CSV with a header row
np.savetxt("simple.csv", data, delimiter=",", header="x,y", comments="")

# Step 3: load it back (skip the header row)
loaded_data = np.loadtxt("simple.csv", delimiter=",", skiprows=1)

print("Loaded Data:\n", loaded_data)

Loaded Data:
 [[0. 0.]
 [1. 1.]
 [2. 4.]
 [3. 9.]]


In [32]:
# Example: loading a CSV file
# arr = np.loadtxt("data.csv", delimiter=",")
# arr = np.genfromtxt("data.csv", delimiter=",")

# Example: saving and loading NumPy binary format
np.save("my_array.npy", arr_from_list)
loaded = np.load("my_array.npy")
print(loaded)

[1 2 3 4]


### 5. Create from raw data

If your data is in raw binary form, you can use low-level functions like:
- np.fromfile() / .tofile() → read/write binary data directly
- Mind the byte order (endianness) when doing this.

For more complex formats, consider wrapping C/C++ libraries.

In [7]:
raw = b'abcdefgh'
arr_from_buffer = np.frombuffer(raw, dtype='S1')
print(arr_from_buffer)  # [b'a' b'b' b'c' b'd' b'e' b'f' b'g' b'h']

[b'a' b'b' b'c' b'd' b'e' b'f' b'g' b'h']


### 6. Use special libraries

NumPy has submodules like numpy.random that generate random data.

In [8]:
rand_arr = np.random.rand(2, 3)       # Random floats between 0 and 1
rand_ints = np.random.randint(0, 10, (2, 3))  # Random integers 0–9

print(rand_arr)
print(rand_ints)

[[0.87430171 0.60189702 0.7986192 ]
 [0.80243201 0.28079829 0.48361783]]
[[8 1 1]
 [5 5 1]]


Many scientific libraries in Python use NumPy arrays as their standard format.
For example:
- SciPy for scientific routines
- pandas for tabular data
- OpenCV for images

### Note - With these methods, you can create either:
- ndarrays (standard multi-dimensional arrays)
- Structured arrays (arrays with fields, like database tables, useful for mixed data types)