Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Breakend creation #43

Closed
ielis opened this issue Jan 28, 2021 · 4 comments
Closed

Breakend creation #43

ielis opened this issue Jan 28, 2021 · 4 comments

Comments

@ielis
Copy link
Collaborator

ielis commented Jan 28, 2021

@julesjacobsen I think we might need to look at the ways how we can create Breakend instance. In particular, how someone can use the code in a wrong way.

At the moment, we can create Breakend using any coordinate system:

public class BreakendTest {

    private final Contig contig = TestContig.of(1, 5);

    @ParameterizedTest
    @CsvSource({
            "FULLY_CLOSED, 2,   2, 2",
            "   LEFT_OPEN, 2,   2, 2",
            "  RIGHT_OPEN, 2,   2, 2",
//            "  FULLY_OPEN, 2,   2, 2",  // fails
    })
    public void differentWaysOfBreakendCreation(CoordinateSystem coordinateSystem, int pos, int start, int end) {
        Breakend breakend = Breakend.of(contig, "id", Strand.POSITIVE, coordinateSystem, Position.of(pos));

        assertThat(breakend.start(), is(start));
        assertThat(breakend.end(), is(end));
    }
}

The 4th case fails and in result it is not legal to create Breakend using a FULLY_OPEN coordinate system.

I think we might need to revise the static constructor and allow to specify start and end positions.

@julesjacobsen
Copy link
Contributor

Damn breakends! Damn fully-open coordinate systems! Damn them all!

I see some options. Currently the breakend is implicitly fully-open :

  1. Keep the single Position input and 'expand' it to a single empty base for FULLY_OPEN case:
private DefaultBreakend(Contig contig, String id, Strand strand, CoordinateSystem coordinateSystem, Position position) {
    super(contig, strand, coordinateSystem, position, coordinateSystem == CoordinateSystem.FULLY_OPEN ? position.shift(1) : position);
    this.id = Objects.requireNonNull(id);
}

This is a bit odd as in some systems the length is 1 (fully open/closed), whereas in a half-open system the length is 0.

  1. Add an end to the constructor signature and then check that the length is zero/one
private DefaultBreakend(Contig contig, String id, Strand strand, CoordinateSystem coordinateSystem, Position position) {
    super(contig, strand, coordinateSystem, position, coordinateSystem == CoordinateSystem.FULLY_OPEN ? position.shift(1) : position);
    if (super.length() >= 0) {
        throw new IllegalStateException("Breakend must have length zero!");
    }
    this.id = Objects.requireNonNull(id);
}

We had decided at one point to define a breakend as being exclusively a single base, but this was due to the limitation of not being able to handle zero-length intervals at the time and also that we were focussing on VCF representation. Although its a bit odd, I think keeping the constructor as requiring a single position as it is meant to represent a broken end. This position can be expanded into a region where fully-open/closed systems represent a whole base and the half-open/closed systems a slice 5' or 3' of the base, or that they are always a single base i.e. the position is the position (ordinal) on the contig.

@julesjacobsen
Copy link
Contributor

But then these rules are somewhat arbitrary, perhaps they should only ever be a single base long as this is representable in all systems and will translate clearly between coordinate systems - changing the length depending on the type of CoordinateSystem has a pretty high wtf score.

I propose we keep the single position, and represent the breakend as either length 1 or 0 for all coordinate systems. Length 1 fits with the VCF view, but it might be easier to use length 0, in which case we need to double-check our current breakend tests. Coordinate systems which do not naturally represent a full/empty base will adjust the input to correct this. Will need to consider how Breakend.unresolved() represents things. Currently:

private UnresolvedBreakend() {
    super(Contig.unknown(), Strand.POSITIVE, CoordinateSystem.FULLY_CLOSED, Position.of(1), Position.of(0));
}

So maybe this answers my ramblings - they should be length zero.

@julesjacobsen
Copy link
Contributor

public class BreakendTest {

    @ParameterizedTest
    @CsvSource({
            "FULLY_CLOSED, 1,   1, 0",
            "FULLY_CLOSED, 5,   5, 4",

            "LEFT_OPEN,    0,   0, 0",
            "LEFT_OPEN,    5,   5, 5",

            "RIGHT_OPEN,   1,   1, 1",
            "RIGHT_OPEN,   6,   6, 6",

            "FULLY_OPEN,   1,   1, 2",
            "FULLY_OPEN,   5,   5, 6",
    })
    public void createWithCoordinatesSystems(CoordinateSystem coordinateSystem, int pos, int start, int end) {
        Contig ctg5 = TestContig.of(5, 5);
        Breakend breakend = Breakend.of(ctg5, "id", Strand.POSITIVE, coordinateSystem, Position.of(pos));
        assertThat(breakend.start(), equalTo(start));
        assertThat(breakend.end(), equalTo(end));
        assertThat(breakend.length(), equalTo(0));
    }
}

@ielis
Copy link
Collaborator Author

ielis commented Feb 3, 2021

This has been adressed

@ielis ielis closed this as completed Feb 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants