-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add region, allow empty regions, complete coordinate systems, GenomicPosition extends Position #34
Conversation
…Position extends Position
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've a few minor points, but I'll address these.
} | ||
|
||
|
||
default int distanceTo(GenomicRegion other) { | ||
if (contigId() != other.contigId()) { | ||
if (contigId() != other.contigId()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add missing braces
@@ -174,7 +142,10 @@ default GenomicRegion toRegion(int upstream, int downstream) { | |||
} else if (upstream >= downstream) { | |||
throw new IllegalArgumentException("Cannot apply negative padding: " + upstream + ", " + downstream); | |||
} | |||
return DefaultGenomicRegion.of(contig(), strand(), CoordinateSystem.ZERO_BASED, position().shift(upstream).asPrecise(), position().shift(downstream).asPrecise()); | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just keep this as it was i.e. position.shift(upstream), position.shift(downstream) ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought that keeping the confidence interval might be unnecessary overhead.
@@ -131,7 +98,8 @@ default boolean isDownstreamOf(GenomicRegion other) { | |||
* @return the region | |||
*/ | |||
default GenomicRegion toRegion() { | |||
return DefaultGenomicRegion.of(contig(), strand(), CoordinateSystem.ZERO_BASED, position().asPrecise(), position().shift(1).asPrecise()); | |||
return DefaultGenomicRegion.of(contig(), strand(), CoordinateSystem.ZERO_BASED, | |||
Position.of(pos() - 1), Position.of(pos())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not use the shift method as before?
} | ||
|
||
default int distanceTo(Region region) { | ||
region = region.toOneBased(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't re-assign this - use a new oneBased variable
|
||
|
||
default boolean overlapsWith(Region other) { | ||
other = other.withCoordinateSystem(coordinateSystem()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't re-assign input
"contig=" + contig.id() + | ||
", id='" + id + | ||
", position=" + position + | ||
", strand=" + strand + '\'' + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Keep original - it's not helpful having the super.toString()
@Override | ||
public Strand strand() { | ||
return strand; | ||
public static PartialBreakend of(Contig contig, String id, Strand strand, Position position) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this doing now? Shouldn't this require a coordinate system?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think Breakend
(and in result PartialBreakend
as well) should not be a GenomicPosition
anymore, but rather a GenomicRegion
. Breakend requires a coordinate system, which a position alone does not have.
If breakend is a GenomicRegion
, then BreakendVariant
delegates all it's GenomicRegion
-ed bits to the left
breakend.
This would remove an important use case for GenomicPosition
. I would still like to keep the GenomicPosition
, to be able to do things like creating splice donor site from an exon:
GenomicRegion exon = ...;
GenomicRegion donor = exon.endGenomicPosition().withPadding(-3, 6);
But now I am starting to see that GenomicPosition
needs to know whether it is open or closed. Otherwise, the code above would not generate the same results for exon
s in different coordinate system.
I would like to try to re-define the GenomicPosition
as:
- extends
Position
,Stranded<GenomicPosition>
- has
Contig
andEndpoint
- knows how to "open/close" itself
- has natural order
- has methods
distanceTo()
,isUpstreamOf()
, etc. - knows how to convert itself into a region in a certain coordinate system
Do you think this sounds like a good thing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I fully understand as I'm trying to refactor the code from this PR at the moment. Nothing major but I'm renaming CoordinateSystem.Endpoint
to Boundary
(probably ought to be BoundaryType
) as the endpoint is really the start/end position on an interval.
Also noticed that the Boundary
types were starting to leak outside of the CoordinateSystem
e.g. in Region
:
default int normalisedStart(Endpoint endpoint) {
return start() + coordinateSystem().startDelta(endpoint);
}
what does 'normalised' really mean here? it's actually the adjusted start for the provided coordinate system given the current one:
default int startWithCoordinateSystem(CoordinateSystem target) {
return start() + coordinateSystem().startDelta(target);
}
'but why are you hiding the Endpoint/Boundary?' you ask. Because of this:
normalisedStartPosition(requiredCoordinateSystem.startEndpoint()),
normalisedEndPosition(requiredCoordinateSystem.endEndpoint()));
requiredCoordinateSystem.startEndpoint()
was always paired with normalisedStartPosition
and requiredCoordinateSystem.endEndpoint())
with normalisedEndPosition
. The CoordinateSystem
knows its boundary types, so delegate it to the class and don't pollute the rest of the codebase with unnecessary information and potential misuse:
return newRegionInstance(contig, strand, requiredCoordinateSystem,
startPositionWithCoordinateSystem(requiredCoordinateSystem),
endPositionWithCoordinateSystem(requiredCoordinateSystem));
It's all nice and clean like this, but why do I mention this now? Because you want to add Boundary
(old Endpoint
) to GenomicPosition
.
This is a long-winded way of saying lets talk about this tomorrow! I'll hold off comitting/pushing my changes for the moment until we've worked through things together.
Hi @julesjacobsen this is the today's work. I think that I completed all things we discussed today (including my notes below).
In addition to the tasks we agreed upon, I added
BaseGenomicPosition
, which is extended byPartialBreakend
andDefaultGenomicPosition
.The tests where we call
withStrand().withCoordinateSystem().withStrand().withCoordinateSystem()
and we check that we got the same thing still remain to be added...implement
Comparable
inRegion
subclassesbehavior of coordinate system conversion of empty regions
update genomic position implementations
most of the methods like
contains
oroverlapsWith()
should be defined inRegion
. The extended versions thatalso check for contig and strand should be defined in
GenomicRegion
remove endpoint from position, add
CoordinateSystem
toRegion
contig should not be a region, use length to calculate the magic number for strand flipping
figure out the best place for getting the magic number for a contig that we need to flip strand of a region
make sure we can flip strand of an empty region in any coordinate system