# Arrays with NumPy

```{tip}
**DOWNLOAD THE NOTEBOOK TO RUN LOCALLY**

Click the download button (![](../assets/img/site/dl-nb.png)) on the upper right to download the notebook and run them locally.
```

## NumPy

This notebook introduces **NumPy** (Numerical Python), the fundamental package for scientific computing in Python. It provides support for large, multi-dimensional arrays and matrices, along with a collection of mathematical functions to operate on these arrays.

## Installation and Import

To use NumPy, we first need to import it. The community standard is to import it as `np`.

In [1]:
import numpy as np

print("NumPy version:", np.__version__)

NumPy version: 2.4.0


## Creating Arrays

Unlike Python lists, NumPy arrays are homogenous (all elements must be of the same type). This allows them to be much faster and more memory efficient.

### Creating arrays from lists
We can create an array using `np.array()`.

In [2]:
# A list of daily significantly completed surveys by a PSA enumerator
daily_surveys = [12, 15, 14, 10, 18]

# Convert to NumPy array
survey_array = np.array(daily_surveys)

print(survey_array)
print(type(survey_array))

[12 15 14 10 18]
<class 'numpy.ndarray'>


### Built-in array generators

NumPy offers functions to generate arrays automatically, which is useful for creating datasets for testing.

* `np.arange(start, stop, step)`: Like Python's `range` but returns an array.
* `np.linspace(start, stop, num)`: Creates `num` evenly spaced values between `start` and `stop`.
* `np.zeros((rows, cols))`: Creates an array filled with zeros.
* `np.ones((rows, cols))`: Creates an array filled with ones.

In [3]:
# Create an array of Region IDs (0 to 9)
regions = np.arange(0, 10)
print("Regions (arange):", regions)

# Create 5 evenly spaced data points between 0 and 100
percentages = np.linspace(0, 100, 5)
print("Percentages (linspace):", percentages)

# Create a 3x3 matrix of zeros (placeholder for missing data)
missing_data = np.zeros((3, 3))
print("\nZero Matrix:\n", missing_data)

Regions (arange): [0 1 2 3 4 5 6 7 8 9]
Percentages (linspace): [  0.  25.  50.  75. 100.]

Zero Matrix:
 [[0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]]


## Array Attributes

Inspect the structure of your data using these attributes:

* `ndim`: Number of dimensions (axes).
* `shape`: The dimensions of the array (rows, columns).
* `size`: Total number of elements.
* `dtype`: The data type of the elements (e.g., `int64`, `float64`).

In [4]:
# 2D Array representing population counts in 2 Districts over 3 years
# Row = District, Column = Year
pop_data = np.array([
    [1500, 1600, 1700], 
    [2000, 2100, 2200]
])

print("Dimensions (ndim):", pop_data.ndim)
print("Shape (rows, cols):", pop_data.shape)
print("Total Elements (size):", pop_data.size)
print("Data Type (dtype):", pop_data.dtype)

Dimensions (ndim): 2
Shape (rows, cols): (2, 3)
Total Elements (size): 6
Data Type (dtype): int64


## Practice Exercises

The following exercises are designed to test your understanding of NumPy basics.

### Creating a Household Income Array

**Task:** Create a NumPy array from a list of household incomes provided below. Print the array and its data type.

In [5]:
# List of household incomes in PHP
incomes_list = [25000, 32000, 18000, 45000, 29000]

# Write your solution here

In [6]:
# Solution
income_array = np.array(incomes_list)
print("Income Array:", income_array)
print("Data Type:", income_array.dtype)

Income Array: [25000 32000 18000 45000 29000]
Data Type: int64


### Generating ID Codes

**Task:** Use `np.arange` to generate an array of ID codes from 100 to 110 (inclusive of 100, exclusive of 110).

In [7]:
# Write your solution here

In [8]:
# Solution
id_codes = np.arange(100, 110)
print("ID Codes:", id_codes)

ID Codes: [100 101 102 103 104 105 106 107 108 109]


### Placeholder Data

**Task:** Create a 1D array of 5 zeros using `np.zeros` to serve as a placeholder for incoming census data.

In [9]:
# Write your solution here

In [10]:
# Solution
placeholders = np.zeros(5)
print("Placeholders:", placeholders)

Placeholders: [0. 0. 0. 0. 0.]


### Reshaping Data

**Task:** Create a 1D array containing 12 data points representing monthly inflation rates. Reshape this array into a 2D array with 4 rows (Quarters) and 3 columns (Months).

In [11]:
# Write your solution here

In [12]:
# Solution
monthly_inflation = np.arange(12) # Simulating data 0-11

quarterly_view = monthly_inflation.reshape(4, 3)
print("Quarterly View:\n", quarterly_view)

Quarterly View:
 [[ 0  1  2]
 [ 3  4  5]
 [ 6  7  8]
 [ 9 10 11]]


### Broadcasting Operations

**Task:** You have an array of population counts. You to project a 10% growth for next year. Multiply the entire array by 1.10.

In [13]:
current_pop = np.array([1000, 2500, 3000, 4200])

# Write your solution here

In [14]:
# Solution
projected_pop = current_pop * 1.10
print("Projected Population:", projected_pop)

Projected Population: [1100. 2750. 3300. 4620.]


### Statistical Calculations

**Task:** Given a list of age data, calculate the mean (average) age and the standard deviation.

In [15]:
age_list = [23, 45, 18, 60, 34, 29, 51]

# Write your solution here

In [16]:
# Solution
ages = np.array([23, 45, 18, 60, 34, 29, 51])

mean_age = np.mean(ages)
std_dev = np.std(ages)

print(f"Average Age: {mean_age:.2f}")
print(f"Standard Deviation: {std_dev:.2f}")

Average Age: 37.14
Standard Deviation: 14.24


### Filtering and Sorting

**Task 1:** Filter an array of city populations to find only those cities with a population greater than 1000000.

In [17]:
# Write your solution here

In [18]:
pop_dict = [
    {"name": "Manila", "pop2025": 1600000},
    {"name": "Quezon City", "pop2025": 2900000},
    {"name": "Cebu City", "pop2025": 922300},
    {"name": "Davao City", "pop2025": 1832753},
    {"name": "Zamboanga City", "pop2025": 985595},
    {"name": "Antipolo", "pop2025": 785715},
    {"name": "Pasig City", "pop2025": 755200},
    {"name": "Taguig City", "pop2025": 804442},
    {"name": "Cagayan de Oro City", "pop2025": 675555},
    {"name": "Iloilo City", "pop2025": 457358}
]

In [19]:
# Solution
pop_array = np.array(pop_dict)

# Filter the array to find only those cities with a population greater than 1000000
filtered_pop = pop_array[np.array([city['pop2025'] > 1000000 for city in pop_dict])]

print(filtered_pop)

[{'name': 'Manila', 'pop2025': 1600000}
 {'name': 'Quezon City', 'pop2025': 2900000}
 {'name': 'Davao City', 'pop2025': 1832753}]


**Task 2:** What are the top 5 cities based on population? Return only the names of the cities.

In [20]:
# Write your solution here

In [21]:
#Solution
pop_array = np.array(pop_dict)

# Convert the 'pop2025' value into a numpy array
pops = np.array([city['pop2025'] for city in pop_dict])
cities = np.array([city['name'] for city in pop_dict])

top_5_indices = np.argsort(pops)[::-1][:5]

print("The top 5 cities in terms of population are")
for city in top_5_indices:
    print(cities[city])

The top 5 cities in terms of population are
Quezon City
Davao City
Manila
Zamboanga City
Cebu City
