Skip to content

correlation_ratio produces key error in flatten to cat_measures #3

@tpfd

Description

@tpfd

Hi there,

Thanks for this handy set of tools and the excellent article you posted on towrdsdatacience.com.

I have just been playing with the nominal tools and found that that correlation_ratio is throwing me an error when I pass it my array.

KeyError: '[ 0 ... 76] not in index

Digging into the function it seems that this is being generated at:
cat_measures = measurements[np.argwhere(fcat == i).flatten()]

I have double checked the array and tested the function with a range of data structures and the same error is returned. (Array consists of 8 continuous variables and 1 categorical). All indexes appear to be the same and carry through the prior steps of the function as you would expect.

I don't suppose you have any insights into why this might be occurring?

Full example of test:

df_dython = df_prep.drop(['Bedrock value'], axis = 1)
cols = df_dython.columns[0:9]

def correlation_ratio(categories, measurements):
    fcat, _ = pd.factorize(categories)
    cat_num = np.max(fcat)+1
    y_avg_array = np.zeros(cat_num)
    n_array = np.zeros(cat_num)
    for i in range(0,cat_num):
        cat_measures = measurements[np.argwhere(fcat == i).flatten()]
        n_array[i] = len(cat_measures)
        y_avg_array[i] = np.average(cat_measures)
    y_total_avg = np.sum(np.multiply(y_avg_array,n_array))/np.sum(n_array)
    numerator = np.sum(np.multiply(n_array,np.power(np.subtract(y_avg_array,y_total_avg),2)))
    denominator = np.sum(np.power(np.subtract(measurements,y_total_avg),2))
    if numerator == 0:
        eta = 0.0
    else:
        eta = numerator/denominator
    return eta

correlation_ratio(df_dython['Bedrock'], df_dython[cols]

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions