Skip to content

Conversation

@marcus1487
Copy link

@marcus1487 marcus1487 commented Oct 12, 2023

In current master when, this line is hit it raises an error reporting: AttributeError: module 'array' has no attribute 'typecode'. This is due to an apparent bug in the formatting of the intended error text. In order to report the valid error text this should be resolved. I believe the intent here is to report the typecode ordinal value (with the character provided earlier), but the intent may have been to report the value.typecode in this error text. I can revert that in this PR if that is indeed the intent.

@bioinformed bioinformed self-requested a review October 12, 2023 16:44
@bioinformed
Copy link
Member

Thanks! I think your fix is the appropriate one.

@marcus1487
Copy link
Author

Digging a bit deeper I think this is obfuscating a deeper bug in this function. It seems that when the valuetype is passed for an array type (B) here, that the typecode variable is now overloaded to indicate the type of the tag (int, string, array) as well as the type of the underlying array in this block. I think a second variable should be introduced for the array type so that the typecode can be reserved for the tag type and not the "sub-type" of the array. I can open a new PR for this or add this fix in here if you prefer.

@marcus1487
Copy link
Author

I am relatively certain that this is a bug now. It looks like the overloading of the typecode variable is intentional from the definition of DATATYPE2FORMAT so it seems that the intended behavior for this check is to ensure either the valuetype was not set (typecode == 0) or that it was set to array type (typecode == ord("B")). An error should be raised in the case that a value of type array.array was passed with a valuetype other than B. Then the array type should be determined. Happy to add this onto this PR to keep things simple.

@jmarshall
Copy link
Member

jmarshall commented Oct 12, 2023

I have been looking into this in the context of #1233. The correct fix to the error message is to change that to value.typecode and I have further improvements to the error message also drafted.

The function is unnecessarily incomprehensible because “typecode” is used to mean two different things: the Python array type code and the SAM/BAM aux type code. So the variables representing the SAM thing need to be renamed to auxtype or so.

Your last comment has been misled by this: B here means unsigned char (i.e. C in SAM-speak), not B-aux-array. So restricting the types of Python array passed in would not be correct.

@marcus1487
Copy link
Author

marcus1487 commented Oct 12, 2023

Your last comment has been misled by this: B here means unsigned char (i.e. C in SAM-speak), not B-aux-array. So restricting the types of Python array passed in would not be correct.

I think we are saying the same thing here (made confusing by the overloaded meaning put on the typecode variable). I was saying that since the typecode is overloaded when it enters this block it holds the value b'B' representing the aux-array type of the tag. Then after the check the meaning of the value is updated to the type of the aux array, but when the b'B' value is passed in this update to the variable does not happen.

In any case if you are working on this method and it will be fixed in the next release I will leave it to you. If you'd like for me to have a go at it I'd be more than happy to do so.

I guess google has not scraped issue #1233 as I did not get this hit when starting to look into this issue.

jmarshall added a commit to jmarshall/pysam that referenced this pull request Oct 11, 2024
There is no `array.typecode` class attribute, so the `array.array`
arm produced an AttributeError instead of the intended ValueError.
Perhaps `value.typecode` was intended; but this error only arises
when a `value_type` has been specified, so the array's typecode
has no influence on `typecode` anyway. So we simplify the message.

Also add a similar check in the scalar arm of this if-else to avoid
an invalid `typecode` here being reported as a raw KeyError.

Fixes pysam-developers#1233. Closes pysam-developers#1235 -- hat tip @marcus1487 for issue analysis.

[TODO] All this code is confusing and should be rewritten using
HTSlib's more recent aux field manipulation APIs.
@jmarshall
Copy link
Member

The correct fix to the error message is to change that to value.typecode and I have further improvements to the error message also drafted.

Revisiting this again now, I think your version is closer to the right answer. This error only arises when a non-None value_type has been specified, so value.typecode has no direct influence on the value of typecode anyway. So there's not much point printing it out. But the ordinal value of the user's specified value_type is not much use either, so I've just simplified the message.

However I also think it's intentional and reasonable that the value_type parameter is the SAM auxtype for scalar fields but the subtype for B-array fields. So I don't think there are bugs here — just a somewhat awkward API interface and a very confusing implementation. Long-term the thing to do is replace all this code by using HTSlib's more recent aux field manipulation APIs instead.

The “further improvements” I had planned were to raise a bespoke exception that would track the read's QNAME etc so that you could identify where problems occurred more easily. But that can be deferred to another time.

@jmarshall jmarshall closed this in 7f83184 Oct 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants