#### NumPy (numeric Python) is a popular library used for data analysis, machine learning and scientific computing.

#### Numpy arrays offers several benefits over plain Python list such as,
* Less memory consumption
* Fast operations
* Convenient APIs for variety of mathematical functions

# Basic Operations

In [1]:
import numpy as np

rev_q1 = np.array([10,12,9])
rev_q1.ndim

1

In [2]:
rev = np.array([[10, 12,9], [15,11,13]])
rev.ndim # dimension

2

In [3]:
# in order to see a value
rev[1,0]

np.int64(15)

In [4]:
# in order to change a value simply
rev[1,1] = 14
rev

array([[10, 12,  9],
       [15, 14, 13]])

In [5]:
# data type
rev.dtype

dtype('int64')

In [6]:
# in order to fix the data type
rev = np.array([[10, 12,9], [15,11,13]], dtype='float64')
rev.dtype

dtype('float64')

In [7]:
# number of bytes
rev.nbytes

48

In [8]:
# item size
rev.itemsize

8

In [9]:
# size
rev.size

6

In [10]:
# shape
rev.shape

(2, 3)

In [11]:
# sorting
np.sort(rev)

array([[ 9., 10., 12.],
       [11., 13., 15.]])

In [12]:
np.sort(rev, axis=None)

array([ 9., 10., 11., 12., 13., 15.])

In [13]:
np.zeros((2,3))

array([[0., 0., 0.],
       [0., 0., 0.]])

In [14]:
np.ones((3,5))

array([[1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.]])

In [15]:
np.arange(10,20)

array([10, 11, 12, 13, 14, 15, 16, 17, 18, 19])

In [16]:
# only even elements
np.arange(20,31,2)

array([20, 22, 24, 26, 28, 30])

In [17]:
# linearly spaced elements
np.linspace(10,20,5)

array([10. , 12.5, 15. , 17.5, 20. ])

In [18]:
# in order to flatten an array
rev.flatten()

array([10., 12.,  9., 15., 11., 13.])

In [19]:
# reshape
rev.reshape(3,2)

array([[10., 12.],
       [ 9., 15.],
       [11., 13.]])

In [20]:
# minimum
rev.min()

np.float64(9.0)

In [21]:
# maximum
rev.max()

np.float64(15.0)

In [22]:
rev

array([[10., 12.,  9.],
       [15., 11., 13.]])

In [23]:
# sum of rows
rev.sum(axis=1)

array([31., 39.])

In [24]:
# sum of columns
rev.sum(axis=0)

array([25., 23., 22.])

In [25]:
for row in rev:
  print("row: ", row)

row:  [10. 12.  9.]
row:  [15. 11. 13.]


In [26]:
b = np.array([1,4,16])
np.sqrt(b) # square root

array([1., 2., 4.])

In [27]:
np.std(b) # standard daviation

np.float64(6.48074069840786)

# Matrix Operations

In [28]:
# Sales data for Quarter 1 (Matrix 1)
# Rows represents different products, columns represents different regions

q1 = np.array([
    [200,220,250],  # product A
    [150,180,210],  # product B
    [300,330,260]   # product C
])

q2 = np.array([
    [209,231,259],  # product A
    [155,192,222],  # product B
    [310,340,375]   # product C
])

In [29]:
# total sales
q1 + q2

array([[409, 451, 509],
       [305, 372, 432],
       [610, 670, 635]])

In [30]:
# sales growth
q2 - q1

array([[  9,  11,   9],
       [  5,  12,  12],
       [ 10,  10, 115]])

In [31]:
prices = np.array([
          [10, 12, 11], # Prices for Product A in different regions
          [18, 9, 10],  # Prices for Product B
          [15, 16, 17]  # Prices for Product C
])

q1_revenue = q1 * prices
q1_revenue

array([[2000, 2640, 2750],
       [2700, 1620, 2100],
       [4500, 5280, 4420]])

#### **np.dot(m1,m2)** and **np.cross(m1,m2)** can be used to perform a dot and cross products between two metrices (m1 and m2).

In [32]:
# Defining two vectors
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])

# Computing dot product
dot_product = np.dot(a, b)
print("Dot product:", dot_product)

# Computing cross product
cross_product = np.cross(a, b)
print("Cross product:", cross_product)

Dot product: 32
Cross product: [-3  6 -3]


# Slicing, Stacking

* NumPy supports index and slicing operators similar to Python native list.
* np.hstack and np.vstack are used to stack two NumPy arrays horizontally and vertically.
* np.hsplit and np.vsplit are used to split a numpy array either horizontally or vertically.

In [33]:
# Customer ID, Name
c = np.array([
      [101, 'Mira'],
      [102,'Abdul'],
      [103,'Andrea']
])

# Customer Id, Purchase Amount, Purchase Date
d = np.array([
      [101, 250.50, '2023-08-01'],
      [102, 150.00, '2023-08-02'],
      [103, 300.75, '2023-08-01']
])

In [34]:
# horizontal stack
np.hstack((c,d))

array([['101', 'Mira', '101', '250.5', '2023-08-01'],
       ['102', 'Abdul', '102', '150.0', '2023-08-02'],
       ['103', 'Andrea', '103', '300.75', '2023-08-01']], dtype='<U32')

In [35]:
# Customer ID, Name
c = np.array([
      [101, 'Mira'],
      [102,'Abdul'],
      [103,'Andrea']
])
e = np.array([
      [104, 'Venkat'],
      [105, 'John'],
      [106, 'Kathy']
])

In [36]:
# vertical stack
np.vstack((c,e))

array([['101', 'Mira'],
       ['102', 'Abdul'],
       ['103', 'Andrea'],
       ['104', 'Venkat'],
       ['105', 'John'],
       ['106', 'Kathy']], dtype='<U21')

In [37]:
transactions = np.array([
              [101, 'Mohan', 250.50, '2023-08-01'],
              [102, 'Bob', 150.00, '2023-08-02'],
              [103, 'Fatima', 300.75, '2023-08-01'],
              [104, 'David', 400.20, '2023-08-03'],
              [105, 'Aryan', 330.1, '2023-08-04']
])

# horizontal split
a , b = np.hsplit(transactions, [3])

In [38]:
b

array([['2023-08-01'],
       ['2023-08-02'],
       ['2023-08-01'],
       ['2023-08-03'],
       ['2023-08-04']], dtype='<U32')

In [39]:
# vertical split
x, y = np.vsplit(transactions, [4])

In [40]:
y

array([['105', 'Aryan', '330.1', '2023-08-04']], dtype='<U32')

In [41]:
monthly_sales = np.array([30,33,35,28,42])

result = monthly_sales < 32
result

array([ True, False, False,  True, False])

In [42]:
monthly_sales[result]

array([30, 28])

In [43]:
# index of max value
np.argmax(monthly_sales)

np.int64(4)

In [44]:
transactions

array([['101', 'Mohan', '250.5', '2023-08-01'],
       ['102', 'Bob', '150.0', '2023-08-02'],
       ['103', 'Fatima', '300.75', '2023-08-01'],
       ['104', 'David', '400.2', '2023-08-03'],
       ['105', 'Aryan', '330.1', '2023-08-04']], dtype='<U32')

In [45]:
# in order to find the maximum transaction amount from this table
t_index = np.argmax(transactions[:,2].astype(float))

In [46]:
transactions[t_index]

array(['104', 'David', '400.2', '2023-08-03'], dtype='<U32')

# Exercise

At **AtliQ**, a software service company, the **HR team** wants to understand how happy employees are with company policies and how long they've been with the company. This survey aims to help HR improve the work environment and keep employees satisfied.

**Data Structure:** Two numpy arrays capture survey results and employee details:

**Employee Details:** Contains Employee ID, Department, and Number of Years with AtliQ.

**Survey Results:** Contains Employee ID and Happiness Score (scaled 1-10).

In [47]:
import numpy as np

# Employee Details: Employee ID, Department, Number of Years with AtliQ
employee_details = np.array([
    [101, 'Sales', 3],
    [102, 'HR', 5],
    [103, 'IT', 2],
    [104, 'Sales', 8],
    [105, 'IT', 6],
    [106, 'HR', 4],
    [107, 'IT', 7],
    [108, 'Sales', 1],
    [109, 'HR', 3]
])

# Survey Results: Employee ID, Happiness Score (1-10)
survey_results = np.array([
    [101, 8],
    [102, 10],
    [103, 9],
    [104, 6],
    [105, 7],
    [106, 8],
    [107, 9],
    [108, 5],
    [109, 7]
])

### Task 1: Merge Arrays

**Scenario:** To streamline analysis, combine the employee details with their survey results.

**Action:** Use ```np.hstack``` to horizontally stack the two arrays.

In [48]:
merged = np.hstack((employee_details, survey_results))
merged

array([['101', 'Sales', '3', '101', '8'],
       ['102', 'HR', '5', '102', '10'],
       ['103', 'IT', '2', '103', '9'],
       ['104', 'Sales', '8', '104', '6'],
       ['105', 'IT', '6', '105', '7'],
       ['106', 'HR', '4', '106', '8'],
       ['107', 'IT', '7', '107', '9'],
       ['108', 'Sales', '1', '108', '5'],
       ['109', 'HR', '3', '109', '7']], dtype='<U21')

### Task 2: Print Scores

**Scenario:** HR wants a quick view of all happiness scores to gauge overall employee sentiment.

**Action:** Display all happiness scores from the merged array.

In [49]:
merged[:,4]

array(['8', '10', '9', '6', '7', '8', '9', '5', '7'], dtype='<U21')

### Task 3: Sort Scores

**Scenario:** HR needs to identify the range of happiness scores to plan specific interventions.

**Action:** Sort and print the happiness scores in ascending order.

In [50]:
np.sort(merged[:,4])

array(['10', '5', '6', '7', '7', '8', '8', '9', '9'], dtype='<U21')

### Task 4: Employee Details

**Scenario:** For a meeting, HR needs a list of employees' IDs and departments without other details.

**Action:** Iterate through the array and print each employee's ID and department.

In [51]:
for row in merged:
    print(f'ID: {row[0]}, Department: {row[1]}')

ID: 101, Department: Sales
ID: 102, Department: HR
ID: 103, Department: IT
ID: 104, Department: Sales
ID: 105, Department: IT
ID: 106, Department: HR
ID: 107, Department: IT
ID: 108, Department: Sales
ID: 109, Department: HR


### Task 5: Happiness Scores

**Scenario:** HR wants to review individual happiness scores to follow up with specific departments or employees.

**Action:** Print the happiness score alongside each employee ID from the merged array.

In [52]:
for row in merged:
    print(f'ID: {row[0]}, Happiness Score: {row[4]}')

ID: 101, Happiness Score: 8
ID: 102, Happiness Score: 10
ID: 103, Happiness Score: 9
ID: 104, Happiness Score: 6
ID: 105, Happiness Score: 7
ID: 106, Happiness Score: 8
ID: 107, Happiness Score: 9
ID: 108, Happiness Score: 5
ID: 109, Happiness Score: 7


### Task 6: Convert Scores

**Scenario:** For statistical analysis, the HR team needs the happiness scores in a consistent format.

**Action:** Convert the happiness scores to float type using ```astype(float)```.

In [53]:
merged[:,4].astype(float)

array([ 8., 10.,  9.,  6.,  7.,  8.,  9.,  5.,  7.])

### Task 7: Average Score

**Scenario:** To measure the effectiveness of current policies, HR needs the average happiness score.

**Action:** Calculate and print the average happiness score of all employees.

In [54]:
np.mean(merged[:, 4].astype(float))

np.float64(7.666666666666667)

### Task 8: Unique Departments

**Scenario:** HR is reviewing which departments are represented in the survey to ensure all are included in future plans.

**Action:** Use ```np.unique``` to find and print all unique departments from the employee details.

In [55]:
# first printing departments from all the records
merged[:,1]

array(['Sales', 'HR', 'IT', 'Sales', 'IT', 'HR', 'IT', 'Sales', 'HR'],
      dtype='<U21')

In [56]:
# now printing unique departments
np.unique(merged[:,1])

array(['HR', 'IT', 'Sales'], dtype='<U21')

### Task 9: HR Happiness

**Scenario:** HR is conducting a self-assessment to understand how their own department perceives company policies.

**Action:** Calculate and print the average happiness score within the HR department.

In [57]:
# First printing all the records for HR department
merged[merged[:,1]=='HR']

array([['102', 'HR', '5', '102', '10'],
       ['106', 'HR', '4', '106', '8'],
       ['109', 'HR', '3', '109', '7']], dtype='<U21')

In [58]:
merged[merged[:,1]=='HR'][:,4].astype(float)

array([10.,  8.,  7.])

In [59]:
np.average(merged[merged[:,1]=='HR'][:,4].astype(float))

np.float64(8.333333333333334)