CSC461 Machine Learning
=======================

Addendum: Indexing
------------------------

### Getting the example data

In [1]:
import numpy as np
import pandas as pd # Remember, Pandas is built on top of numpy (dataframe is superclass of ndarray)
dataframe = pd.read_csv('~/data/pima/diabetes.csv') # Read in database

In [2]:
# Off the shelf function for displaying dataframes side-by-side
from IPython.display import display

In [3]:
NUMBER_DATA_POINTS=6 # Cut out dataset down to 6 elements (easier to view)
example_df = dataframe.iloc[0:NUMBER_DATA_POINTS,:]
example_features = example_df.loc[:,['Age','BMI','Glucose']] # Cut our data down to 3 features
display(example_features)
print ("Shape of Features: {}, {} rows and {} columns".format(example_features.shape, example_features.shape[0], example_features.shape[1]))
example_features = example_features.to_numpy()
example_features_original = example_features.copy()

Unnamed: 0,Age,BMI,Glucose
0,50,33.6,148
1,31,26.6,85
2,32,23.3,183
3,21,28.1,89
4,33,43.1,137
5,30,25.6,116


Shape of Features: (6, 3), 6 rows and 3 columns


### Indexing into Rows
The first value (index 0) of the **shape** corresponds to the number of rows. 
Similarly, the first value (index 0) of the n-dimensional array's square bracket **array index operator** will determine the row (or rows) you are accessing.

In [4]:
print("Type of Features: {}".format(type(example_features)))
print("Shape of Features: {}".format(example_features.shape))
num_rows = example_features.shape[0]
for i in range(0,num_rows):
    row_data = example_features[i] # Only the i'th row, all items in the column (implicit)
    print ("Row {}: {}".format(i,row_data))

Type of Features: <class 'numpy.ndarray'>
Shape of Features: (6, 3)
Row 0: [ 50.   33.6 148. ]
Row 1: [31.  26.6 85. ]
Row 2: [ 32.   23.3 183. ]
Row 3: [21.  28.1 89. ]
Row 4: [ 33.   43.1 137. ]
Row 5: [ 30.   25.6 116. ]


### Indexing into Columns
The second value (index 1) of the **shape** corresponds to the number of columns. 
Similarly, the second value (index 1) of the n-dimensional array's square bracket **array index operator** will determine the column (or columns) you are accessing.

<u>Note:</u> The ":" symbol used in the row's place of the array index operator can be thought of as a placeholder for now. It is used to indicate "all" items in a row.

In [5]:
print("Type of Features: {}".format(type(example_features)))
print("Shape of Features: {}".format(example_features.shape))
num_cols = example_features.shape[1]
for i in range(0,num_cols):
    col_data = example_features[:,i] # All items in a row, only the i'th column
    print ("Column {}: {}".format(i, col_data))

Type of Features: <class 'numpy.ndarray'>
Shape of Features: (6, 3)
Column 0: [50. 31. 32. 21. 33. 30.]
Column 1: [33.6 26.6 23.3 28.1 43.1 25.6]
Column 2: [148.  85. 183.  89. 137. 116.]


### Indexing into Rows and Columns
The n-dimensional array's square bracket **array index operator** can also be used to access a specific cell by supplying a scalar value (as opposed to a special value like ":") in both the rows and columns places.

In [6]:
print("Type of Features: {}".format(type(example_features)))
print("Shape of Features: {}".format(example_features.shape))
num_rows = example_features.shape[0]
num_columns = example_features.shape[1]
row_size = num_columns
for i in range(0,num_rows):
    for j in range(0,num_columns):
        cell_index = i*row_size + j
        cell_data = example_features[i,j] # The item in row i, column j
        print ("Cell {}, [row {}, col {}]: {}".format(cell_index, i, j, cell_data))

Type of Features: <class 'numpy.ndarray'>
Shape of Features: (6, 3)
Cell 0, [row 0, col 0]: 50.0
Cell 1, [row 0, col 1]: 33.6
Cell 2, [row 0, col 2]: 148.0
Cell 3, [row 1, col 0]: 31.0
Cell 4, [row 1, col 1]: 26.6
Cell 5, [row 1, col 2]: 85.0
Cell 6, [row 2, col 0]: 32.0
Cell 7, [row 2, col 1]: 23.3
Cell 8, [row 2, col 2]: 183.0
Cell 9, [row 3, col 0]: 21.0
Cell 10, [row 3, col 1]: 28.1
Cell 11, [row 3, col 2]: 89.0
Cell 12, [row 4, col 0]: 33.0
Cell 13, [row 4, col 1]: 43.1
Cell 14, [row 4, col 2]: 137.0
Cell 15, [row 5, col 0]: 30.0
Cell 16, [row 5, col 1]: 25.6
Cell 17, [row 5, col 2]: 116.0


### Array Index Operator Assignment
In addition to **getting** values using the array index operator like we did above, you can also use this operator for **setting** values.

In [7]:
# Make a fresh copy since we will be modifying the data
example_features = example_features_original.copy()

print("Type of Features: {}".format(type(example_features)))
print("Shape of Features: {}".format(example_features.shape))
num_rows = example_features.shape[0]
num_columns = example_features.shape[1]
row_size = num_columns # Each row has a value per column
col_size = num_rows # Each column has a value per row

# Zero out a row
row_0_original = example_features[0]
all_zeros = np.zeros((row_size))
print("*** Assigning to rows ***")
print("Original row 0: {}, shape: {}".format(row_0_original, row_0_original.shape))
print("All zeros: {}, shape: {}".format(all_zeros, all_zeros.shape))
example_features[0] = all_zeros # HERE IS THE ASSIGNMENT
row_0_modified = example_features[0]
print("Modified row 0: {}, shape: {}".format(row_0_modified, row_0_modified.shape))

# Zero out a column
col_0_original = example_features[:,0]
all_zeros = np.zeros((col_size))
print("*** Assigning to columns ***")
print("Original col 0: {}, shape: {}".format(col_0_original, col_0_original.shape))
print("All zeros: {}, shape: {}".format(all_zeros, all_zeros.shape))
example_features[:,0] = all_zeros # HERE IS THE ASSIGNMENT
col_0_modified = example_features[:,0]
print("Modified col 0: {}, shape: {}".format(col_0_modified, col_0_modified.shape))

# Zero out a cell
cell_1_1_original = example_features[1,1]
print("*** Assigning to cells ***")
print("Original row 1 col 1: {}".format(cell_1_1_original))
example_features[1,1] = 0 # HERE IS THE ASSIGNMENT
cell_1_1_modified = example_features[1,1]
print("Modified row 1 col 1: {}".format(cell_1_1_modified))

Type of Features: <class 'numpy.ndarray'>
Shape of Features: (6, 3)
*** Assigning to rows ***
Original row 0: [ 50.   33.6 148. ], shape: (3,)
All zeros: [0. 0. 0.], shape: (3,)
Modified row 0: [0. 0. 0.], shape: (3,)
*** Assigning to columns ***
Original col 0: [ 0. 31. 32. 21. 33. 30.], shape: (6,)
All zeros: [0. 0. 0. 0. 0. 0.], shape: (6,)
Modified col 0: [0. 0. 0. 0. 0. 0.], shape: (6,)
*** Assigning to cells ***
Original row 1 col 1: 26.6
Modified row 1 col 1: 0.0
