New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to adapt the transformation function to account for variable sequence length? #12
Comments
Hi @grzechowiak, Regarding your rolling window setup, TimeSHAP does not currently implement anything to accommodate that directly. The way to emulate your desired behavior, is to divide a sequence into the respective sequences you want to explain. Considering your example, the sequence with ID 1, needs to be divided into 4 sequences to be individually explained: Row IDs: Regarding the issue with Finally, from what I can understand, TimeSHAP can work with your described use case, as TimeSHAP explains each sequence individually and can work with any sequence length. To help you with the |
Hi @JoaoPBSousa, I have created some mock data as well as a simple LSTM model which is available on my github here: link. There are two files: first python notebook How can I adapt the transformation function / TimeSHAP in order to make it work on our data and a rolling window setup? |
Hi @grzechowiak, I looked at your repo and the only thing I could find is that the In order to fix this issue that are two options depending on your use-case:
Note: TimeSHAP (and SHAP) is design to explain the difference between a baseline score and the score of the instance being explained. I noted that all the (mock) sequences scores are really low (max Hope this answer was helpful. If you have any further questions don't hesitate to contact. |
Closed this issue due to inactivity. If you have any further questions feel free to re-open the issue or create a new one. |
I am trying to use TimeSHAP on my use case. Per my understanding, in AReM example, the way you transform the data using the
df_to_numpy
function is to make a prediction for the last value of the sequence – see the screen below:In the case of AReM tutorial data, the predictions are based on the whole sequence - all rows (rows ID 1-10) are being used for sequence ID 1 (light blue color) and the predictions are made for the Timestamp 10 (dark blue color; rows id 10). Later the light orange color is used (Row IDs 11-20) to predict a label marked as dark orange color (Row ID 20).
In the case of my use case, the model predicts on a rolling-window basis and I would need predictions for every row (not only for a sequence). See the screen and explanation below.
Let's say my rolling window is 6 and Row IDs 1-6 (light green) are used to predict row 7 (dark green), later Row IDs 2-7 (light grey) are being used to predict Row ID 8 (dark grey), etc. When a new Sequence starts, we repeat the process, so we take Row IDs 11-16 and predict Row ID 17, etc. For my use case, it's important to evaluate the predictions for every Row ID, not only for the whole sequence.
The problem which I am facing is that when I try to run the function
get_avg_score_with_avg_event
on the data defined as in the picture above I am getting the following error:The way my data is transformed from 2D into 3D format is defined by the function below:
My question is whether it’s possible to make TimeSHAP work for the data which is transformed in a way described in my use case? When I use the transformation which is defined in your function
df_to_numpy
, I am not getting an error, however, it is not adapted to my use case.The text was updated successfully, but these errors were encountered: