How can I estimate the parameters when there are missing values? #25

Goodelang · 2020-03-16T09:58:37Z

How can I estimate the parameters when there are missing values. For example :
[[True, Null, True, False],
[Null, Null, Null, True],].
Will future versions support processing of this type of data structure like [[user, item, resp], [...] ...]
Thanks

eribean · 2020-03-16T14:56:29Z

There isn't currently a way to estimate parameters with missing values. It is definitely in the pipeline of features to include. The main push right now is to bring up core functionality with respect to model estimation (polytomous, multi-dimensional, unfolding). I will begin to look at how I would include this and am open to suggestions, thank you for letting me know.

The main interface is a numpy array and many formats can be converted into an array. I don't plan on supporting any types of alternative file formats unless there is a great demand. I specifically decided against pandas as it is too messy in my opinion.

Your lists of lists can be converted into an array in many ways, below is a way to do it, there are other better alternatives as well

import numpy as np

## List of lists converted to a numpy array
# [ID, Item#, Response]
alt_format = [[0, 0, 1], [0, 1, 0], [0, 2, 1],
              [5, 1, 0], [5, 0, 1], [5, 2, 0]]

# Make temporary array from list
unformatted_array = np.asarray(alt_format)

# Get the number of participants
participant_id = np.unique(unformatted_array[:, 0])

# Get the number items
n_items = np.unique(unformatted_array[:, 1]).size

# Create place holder for formated data
formatted_array = np.zeros((n_items, participant_id.max()+1))
formatted_array[unformatted_array[:, 1], unformatted_array[:, 0]] = unformatted_array[:, 2]

# Trim only valid participant ids
formatted_array = formatted_array[:, participant_id]

print(formatted_array)

>>> [[1. 1.]
     [0. 0.]
     [1. 0.]]

eribean · 2020-03-18T13:14:17Z

For dichotomous parameter estimation, replacing null values with 0 in the kernel estimation ought to make optimization invariant to missing values. This will only work for joint and marginal likelihood and not conditional likelihood.

This will essentially ignore the missing values and does not try to impute them, treating them as MCAR (missing completely at random). This should be sufficient for now and will open two more issues to address imputation procedures as well as polytomous data.

In progress ...

Goodelang · 2020-03-18T13:34:30Z

it’s maybe awesome by set correct response as 1, incorrect response as -1 and null as 0. Then let likelihood as log1p((2*p-1)*resp). It’s come from github/aimir.He does it just for JMLE.I do something for pair-wise Incremental parameter estimation method by online learning. Hope that helps 发自我的iPhone

…

在 2020年3月18日，下午9:14，eribean ***@***.***> 写道： For dichotomous parameter estimation, replacing null values with 0 in the kernel estimation ought to make optimization invariant to missing values. This will only work for joint and marginal likelihood and not conditional likelihood. This will essentially ignore the missing values and does not try to impute them, treating them as MCAR (missing completely at random). This should be sufficient for now and will open two more issues to address imputation procedures as well as polytomous data. In progress ... —

eribean · 2020-03-19T14:41:39Z

I do something for pair-wise Incremental parameter estimation method by online learning.

Online IRT parameter estimation is an interesting problem. The separate parameter estimations i have for the unidimensional model would fit nicely into this paradigm. Keep track of the ratio of true / false for each item and a running integral. This would make it easy to do constant updates, might try to put something together. I also now understand why you would want the input format you wanted in the original, [ID, item, response], this is how an online update would occur.

eribean · 2020-03-19T14:45:00Z

Refactored the mml / jml code base to account for missing values.

A missing value is represented with NAN which is found in numpy as numpy.nan.

Writing unittests now and will create a pull request shortly.

eribean · 2020-03-19T15:43:28Z

Uni dimensional missing data example:

from girth import twopl_separate
from girth import create_synthetic_irt_dichotomous

# Create Synthetic Data
np.random.seed(42)
difficulty = np.linspace(-2, 2, 10)
discrimination = 0.5 + np.random.rand(10) * 2
theta = np.random.randn(400)

syn_data = create_synthetic_irt_dichotomous(difficulty, discrimination, theta)
syn_data = syn_data.astype('float')

# Add nans (missing values)
mask = np.random.rand(*syn_data.shape) < 0.125

syn_data_orig = syn_data.copy()
syn_data[mask] = np.nan

# Estimate parameters
a, b = twopl_separate(syn_data)
ao, bo = twopl_separate(syn_data_orig)

print("Discrimination Estimation")
print("RMSE: Missing | Full")
print(np.sqrt(np.square(a - discrimination).mean()).round(3), np.sqrt(np.square(ao - discrimination).mean()).round(3))
print("\n")
print("Difficulty Estimation")
print("RMSE: Missing | Full")
print(np.sqrt(np.square(b - difficulty).mean()).round(3), np.sqrt(np.square(bo - difficulty).mean()).round(3))```

TejaswiniiB · 2022-07-25T06:51:27Z

Hi @eribean
Would this feature you added, handle missing values in all models present under Unidimensional models section here like grm_mml, twopl_mml etc?

eribean added the enhancement New feature or request label Mar 16, 2020

eribean self-assigned this Mar 18, 2020

eribean linked a pull request Mar 19, 2020 that will close this issue

Missing values #28

Merged

eribean closed this as completed in #28 Mar 19, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How can I estimate the parameters when there are missing values? #25

How can I estimate the parameters when there are missing values? #25

Goodelang commented Mar 16, 2020

eribean commented Mar 16, 2020

eribean commented Mar 18, 2020

Goodelang commented Mar 18, 2020 via email

eribean commented Mar 19, 2020

eribean commented Mar 19, 2020

eribean commented Mar 19, 2020

TejaswiniiB commented Jul 25, 2022

How can I estimate the parameters when there are missing values? #25

How can I estimate the parameters when there are missing values? #25

Comments

Goodelang commented Mar 16, 2020

eribean commented Mar 16, 2020

eribean commented Mar 18, 2020

Goodelang commented Mar 18, 2020 via email

eribean commented Mar 19, 2020

eribean commented Mar 19, 2020

eribean commented Mar 19, 2020

TejaswiniiB commented Jul 25, 2022