-
-
Notifications
You must be signed in to change notification settings - Fork 103
Description
Hi there,
Thanks for this handy set of tools and the excellent article you posted on towrdsdatacience.com.
I have just been playing with the nominal tools and found that that correlation_ratio is throwing me an error when I pass it my array.
KeyError: '[ 0 ... 76] not in index
Digging into the function it seems that this is being generated at:
cat_measures = measurements[np.argwhere(fcat == i).flatten()]
I have double checked the array and tested the function with a range of data structures and the same error is returned. (Array consists of 8 continuous variables and 1 categorical). All indexes appear to be the same and carry through the prior steps of the function as you would expect.
I don't suppose you have any insights into why this might be occurring?
Full example of test:
df_dython = df_prep.drop(['Bedrock value'], axis = 1)
cols = df_dython.columns[0:9]
def correlation_ratio(categories, measurements):
fcat, _ = pd.factorize(categories)
cat_num = np.max(fcat)+1
y_avg_array = np.zeros(cat_num)
n_array = np.zeros(cat_num)
for i in range(0,cat_num):
cat_measures = measurements[np.argwhere(fcat == i).flatten()]
n_array[i] = len(cat_measures)
y_avg_array[i] = np.average(cat_measures)
y_total_avg = np.sum(np.multiply(y_avg_array,n_array))/np.sum(n_array)
numerator = np.sum(np.multiply(n_array,np.power(np.subtract(y_avg_array,y_total_avg),2)))
denominator = np.sum(np.power(np.subtract(measurements,y_total_avg),2))
if numerator == 0:
eta = 0.0
else:
eta = numerator/denominator
return eta
correlation_ratio(df_dython['Bedrock'], df_dython[cols]