# Step 1: Install Necessary Libraries
First, ensure you have Python 3 and NumPy installed. You can install NumPy using pip:

In [1]:
pip install numpy


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.3[0m[39;49m -> [0m[32;49m23.3.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.


In [2]:
pip install --upgrade pip

Collecting pip
  Downloading pip-23.3.1-py3-none-any.whl.metadata (3.5 kB)
Downloading pip-23.3.1-py3-none-any.whl (2.1 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.1/2.1 MB[0m [31m30.6 MB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
[?25hInstalling collected packages: pip
  Attempting uninstall: pip
    Found existing installation: pip 23.3
    Uninstalling pip-23.3:
      Successfully uninstalled pip-23.3
Successfully installed pip-23.3.1
Note: you may need to restart the kernel to use updated packages.


# Step 2: Import Libraries

Import NumPy in your Python script:

In [4]:
import numpy as np

# Step 3: Define a Structured Array
Structured NumPy arrays allow you to define columns with different data types. This is useful for handling business data, such as sales, dates, and product categories.

In [5]:
# Define the data type for each column
data_type = [('Product', 'U10'), ('Quantity', 'i4'), ('Price', 'f4')]

# Create a structured array
data = np.array([
    ('Widget', 20, 6.99),
    ('Gadget', 15, 4.99),
    ('Doodad', 30, 3.50)
], dtype=data_type)


In computer programming, when we say 'U10,' it means we have a designated space to store text with a maximum length of 10 characters, like a 10-character name. 'i4' represents whole numbers stored in a specific way using 32 bits of memory, allowing us to work with numbers like 1, 2, or 100. 'f8' stands for decimal numbers with higher precision, using 64 bits of memory, suitable for numbers with fractions or decimal points, such as 3.14. These labels simply help programmers specify how they want to store and work with different kinds of data in a computer's memory, similar to choosing the right-sized containers for different ingredients in a recipe.

NB: To those of you who are interested in understand more details about data type in Computer Science and knowing the codes you can further find this [link](https://jakevdp.github.io/PythonDataScienceHandbook/02.09-structured-data-numpy.html#:~:text=Here%20'U10'%20translates%20to%20%22,codes%20in%20the%20following%20section.) and your textbook ("Structured NumPy Arrays") talk about this in details 


In [6]:
data

array([('Widget', 20, 6.99), ('Gadget', 15, 4.99), ('Doodad', 30, 3.5 )],
      dtype=[('Product', '<U10'), ('Quantity', '<i4'), ('Price', '<f4')])

In [7]:
data[0]

('Widget', 20, 6.99)

In [8]:
data[0][2]

6.99

In [14]:
# Create a structured array
data1 = np.array([
    ('Widget', 20, 6.99),
    ('Gadget', 15, 4.99),
    ('Super Long Product Name', 30.3, 3.50)
], dtype=data_type)


In [15]:
data1[2]

('Super Long', 30, 3.5)

In NumPy, when the provided string is longer than the specified length in the dtype ('U10' in this case), it usually truncates the string to fit the specified length. This means 'Super Long Product Name' would be truncated to fit 10 characters, resulting in potential loss of information or unintended data values.

So, the code will likely execute without raising an explicit error, but the data in the Product column for the row with 'Super Long Product Name' will be truncated, leading to incorrect or incomplete data in your array.

In [17]:
data

array([('Widget', 20, 6.99), ('Gadget', 15, 4.99), ('Doodad', 30, 3.5 )],
      dtype=[('Product', '<U10'), ('Quantity', '<i4'), ('Price', '<f4')])

In [18]:
# Calculate total sales for each product
total_sales = data['Quantity'] * data['Price']

total_sales[:, None]


array([[139.79999542],
       [ 74.84999657],
       [105.        ]])

In [22]:
# Add total sales as a new column
data = np.append(data, total_sales[:, None], axis=1)
data

AxisError: axis 1 is out of bounds for array of dimension 1

When working with structured arrays in NumPy, adding a new column isn't as straightforward as in Pandas (some other library we will see later) or regular NumPy arrays, because each "column" in a structured array can have a different type. Instead, you need to create a new structured array that includes the new field. 

In [23]:
#Corrected method to add a new column
# Calculate total sales for each product
total_sales = data['Quantity'] * data['Price']

# Define a new data type that includes total sales
new_data_type = [('Product', 'U10'), ('Quantity', 'i4'), ('Price', 'f4'), ('Total Sales', 'f4')]

# Create a new array with the additional column
new_data = np.zeros(data.shape, dtype=new_data_type)

# Assign existing data to the new array
for field in data.dtype.names:
    new_data[field] = data[field]

# Add the total sales data
new_data['Total Sales'] = total_sales

# Print results
print("New Data with Total Sales:", new_data)

New Data with Total Sales: [('Widget', 20, 6.99, 139.79999) ('Gadget', 15, 4.99,  74.85   )
 ('Doodad', 30, 3.5 , 105.     )]


In [27]:
new_data['Total Sales']>103

array([ True, False,  True])

In [28]:
new_data[new_data['Total Sales']>103]

array([('Widget', 20, 6.99, 139.79999), ('Doodad', 30, 3.5 , 105.     )],
      dtype=[('Product', '<U10'), ('Quantity', '<i4'), ('Price', '<f4'), ('Total Sales', '<f4')])

In [29]:
new_data[new_data['Total Sales']>103]["Product"]

array(['Widget', 'Doodad'], dtype='<U10')