Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add region, allow empty regions, complete coordinate systems, GenomicPosition extends Position #34

Merged
merged 1 commit into from
Jan 12, 2021

Conversation

ielis
Copy link
Collaborator

@ielis ielis commented Jan 12, 2021

Hi @julesjacobsen this is the today's work. I think that I completed all things we discussed today (including my notes below).

In addition to the tasks we agreed upon, I added BaseGenomicPosition, which is extended by PartialBreakend and DefaultGenomicPosition.

The tests where we call withStrand().withCoordinateSystem().withStrand().withCoordinateSystem() and we check that we got the same thing still remain to be added...


  • implement Comparable in Region subclasses

    • naturalOrder as in GenomicRegion
  • behavior of coordinate system conversion of empty regions

    • 1-based empty region is possible if we allow end < start
    • let's allow this to not have to throw an exception
  • update genomic position implementations

    • do we ever have an imprecise position that is not GenomicPosition as well? If not, then we can get rid of ImprecisePosition and create ImpreciseGenomicPosition
    • not really, we use composition and we're fine
  • most of the methods like contains or overlapsWith() should be defined in Region. The extended versions that
    also check for contig and strand should be defined in GenomicRegion

    • let's make 2 versions, one for object the other is a pure function
  • remove endpoint from position, add CoordinateSystem to Region

  • contig should not be a region, use length to calculate the magic number for strand flipping

    • function probably on a region
  • figure out the best place for getting the magic number for a contig that we need to flip strand of a region

  • make sure we can flip strand of an empty region in any coordinate system

Copy link
Contributor

@julesjacobsen julesjacobsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've a few minor points, but I'll address these.

}


default int distanceTo(GenomicRegion other) {
if (contigId() != other.contigId()) {
if (contigId() != other.contigId())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add missing braces

@@ -174,7 +142,10 @@ default GenomicRegion toRegion(int upstream, int downstream) {
} else if (upstream >= downstream) {
throw new IllegalArgumentException("Cannot apply negative padding: " + upstream + ", " + downstream);
}
return DefaultGenomicRegion.of(contig(), strand(), CoordinateSystem.ZERO_BASED, position().shift(upstream).asPrecise(), position().shift(downstream).asPrecise());

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just keep this as it was i.e. position.shift(upstream), position.shift(downstream) ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought that keeping the confidence interval might be unnecessary overhead.

@@ -131,7 +98,8 @@ default boolean isDownstreamOf(GenomicRegion other) {
* @return the region
*/
default GenomicRegion toRegion() {
return DefaultGenomicRegion.of(contig(), strand(), CoordinateSystem.ZERO_BASED, position().asPrecise(), position().shift(1).asPrecise());
return DefaultGenomicRegion.of(contig(), strand(), CoordinateSystem.ZERO_BASED,
Position.of(pos() - 1), Position.of(pos()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not use the shift method as before?

}

default int distanceTo(Region region) {
region = region.toOneBased();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't re-assign this - use a new oneBased variable



default boolean overlapsWith(Region other) {
other = other.withCoordinateSystem(coordinateSystem());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't re-assign input

"contig=" + contig.id() +
", id='" + id +
", position=" + position +
", strand=" + strand + '\'' +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keep original - it's not helpful having the super.toString()

@Override
public Strand strand() {
return strand;
public static PartialBreakend of(Contig contig, String id, Strand strand, Position position) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this doing now? Shouldn't this require a coordinate system?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Breakend (and in result PartialBreakend as well) should not be a GenomicPosition anymore, but rather a GenomicRegion. Breakend requires a coordinate system, which a position alone does not have.
If breakend is a GenomicRegion, then BreakendVariant delegates all it's GenomicRegion-ed bits to the left breakend.

This would remove an important use case for GenomicPosition. I would still like to keep the GenomicPosition, to be able to do things like creating splice donor site from an exon:

GenomicRegion exon = ...;
GenomicRegion donor = exon.endGenomicPosition().withPadding(-3, 6);

But now I am starting to see that GenomicPosition needs to know whether it is open or closed. Otherwise, the code above would not generate the same results for exons in different coordinate system.

I would like to try to re-define the GenomicPosition as:

  • extends Position, Stranded<GenomicPosition>
  • has Contig and Endpoint
    • knows how to "open/close" itself
    • has natural order
    • has methods distanceTo(), isUpstreamOf(), etc.
    • knows how to convert itself into a region in a certain coordinate system

Do you think this sounds like a good thing?

Copy link
Contributor

@julesjacobsen julesjacobsen Jan 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I fully understand as I'm trying to refactor the code from this PR at the moment. Nothing major but I'm renaming CoordinateSystem.Endpoint to Boundary (probably ought to be BoundaryType) as the endpoint is really the start/end position on an interval.

Also noticed that the Boundary types were starting to leak outside of the CoordinateSystem e.g. in Region:

 default int normalisedStart(Endpoint endpoint) {
    return start() + coordinateSystem().startDelta(endpoint);
}

what does 'normalised' really mean here? it's actually the adjusted start for the provided coordinate system given the current one:

default int startWithCoordinateSystem(CoordinateSystem target) {
    return start() + coordinateSystem().startDelta(target);
}

'but why are you hiding the Endpoint/Boundary?' you ask. Because of this:

        normalisedStartPosition(requiredCoordinateSystem.startEndpoint()),
        normalisedEndPosition(requiredCoordinateSystem.endEndpoint()));

requiredCoordinateSystem.startEndpoint() was always paired with normalisedStartPosition and requiredCoordinateSystem.endEndpoint()) with normalisedEndPosition. The CoordinateSystem knows its boundary types, so delegate it to the class and don't pollute the rest of the codebase with unnecessary information and potential misuse:

return newRegionInstance(contig, strand, requiredCoordinateSystem,
                startPositionWithCoordinateSystem(requiredCoordinateSystem),
                endPositionWithCoordinateSystem(requiredCoordinateSystem));

It's all nice and clean like this, but why do I mention this now? Because you want to add Boundary (old Endpoint) to GenomicPosition.

This is a long-winded way of saying lets talk about this tomorrow! I'll hold off comitting/pushing my changes for the moment until we've worked through things together.

@julesjacobsen julesjacobsen merged commit cd58fd1 into coordinate-systemed-region Jan 12, 2021
@ielis ielis deleted the allow_empty_regions branch January 14, 2021 18:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants