Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix bugs in IndexFactory and BinningIndexBuilder. #906

Merged
merged 1 commit into from
Aug 9, 2017

Conversation

cmnbroad
Copy link
Collaborator

@cmnbroad cmnbroad commented Jun 20, 2017

Description

Fixes #393 and a few other issues when creating indices through IndexFactory.create*Index methods that take a file:

  • Block gzipped inputs were wrapped with PositionalBufferedStreams, resulting in byte offsets being handed to the indexer rather than virtual file pointers.
  • The first feature was always indexed as if it were at offset 0, even if it was preceded by a header. Most of the time this worked, since the feature iterator thats used to read the indexed file would scan right through the header. But it failed when the header was large (i.e. hg38 large), since the scanner would be short-circuited by the linear index before the header was exhausted.
  • The BinningIndexBuilder assumed that 0 was never a legitimate feature location, but a file with no header at all (such as a header-less BED file) can have features at offset/virtual-file-pointer 0, even if gzipped.

Checklist

  • [ X] Code compiles correctly
  • [ X] New tests covering changes and new functionality
  • [ X] All tests passing
  • Extended the README / documentation, if necessary
  • Is not backward compatible (breaks binary or source compatibility)

@codecov-io
Copy link

codecov-io commented Jun 20, 2017

Codecov Report

Merging #906 into master will increase coverage by 0.196%.
The diff coverage is 80%.

@@              Coverage Diff               @@
##              master     #906       +/-   ##
==============================================
+ Coverage     65.564%   65.76%   +0.196%     
- Complexity      7374     7401       +27     
==============================================
  Files            528      529        +1     
  Lines          31981    31968       -13     
  Branches        5467     5463        -4     
==============================================
+ Hits           20968    21022       +54     
+ Misses          8868     8801       -67     
  Partials        2145     2145
Impacted Files Coverage Δ Complexity Δ
...sjdk/tribble/readers/PositionalBufferedStream.java 53.465% <ø> (ø) 24 <0> (ø) ⬇️
...n/java/htsjdk/tribble/index/tabix/TabixFormat.java 54.286% <ø> (ø) 6 <0> (ø) ⬇️
src/main/java/htsjdk/tribble/FeatureCodec.java 0% <ø> (ø) 0 <0> (ø) ⬇️
...va/htsjdk/tribble/TribbleIndexedFeatureReader.java 67.222% <ø> (-1.111%) 22 <0> (ø)
...n/java/htsjdk/tribble/readers/AsciiLineReader.java 84.444% <100%> (+42.339%) 17 <4> (+3) ⬆️
.../samtools/reference/FastaSequenceIndexCreator.java 61.818% <100%> (ø) 8 <7> (ø) ⬇️
...rc/main/java/htsjdk/tribble/AsciiFeatureCodec.java 100% <100%> (+16.667%) 9 <2> (+1) ⬆️
...main/java/htsjdk/samtools/BinningIndexBuilder.java 85.965% <66.667%> (+0.251%) 15 <0> (ø) ⬇️
...c/main/java/htsjdk/tribble/index/IndexFactory.java 75.817% <70.833%> (-2.85%) 26 <0> (ø)
src/main/java/htsjdk/tribble/bed/BEDCodec.java 80.233% <75%> (-0.78%) 25 <5> (+3)
... and 12 more

@cmnbroad cmnbroad mentioned this pull request Jun 20, 2017
5 tasks
@droazen droazen self-requested a review June 21, 2017 02:04
@droazen droazen self-assigned this Jun 21, 2017
Copy link
Member

@magicDGS magicDGS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for doing this @cmnbroad! I learned a lot from this fix. Some minor comments and some the stream problem with my suggestions.

Looking forward to get this in and retrieve the tribble/tabix indexing support in htsjdk and derived projects!

public AsciiLineReader(final InputStream is){
// NOTE: This will wrap the input stream in a PositionalBufferedStream even if its already a PositionalBufferedStream
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this constructor could be safer by using a private method to wrap the input stream:

private PositionalBufferedStream wrapStream(final InputStream is) {
    if (is instanceof PositionalBufferedStream) {
        return (PositionalBufferedStream) is;
    } else if (is instanceof BlockCompressedInputStream) {
        throw new IllegalArgumentException("Cannot use AsciiLineReader with BlockCompressedInputStream; use BlockCompressedAsciiLineReader instead");
    } else {
        return new PositionalBufferedStream(is);
    }
}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure it should be illegal to pass a BCS to AsciiLineReader (my comment that it violates the contract might be an overstatement). Perhaps we should just change the doc to be more explicit about the fact that it provides buffering, which affects the positions returned.

if (bufferedInputStream instanceof PositionalBufferedStream) {
pbs = (PositionalBufferedStream) bufferedInputStream;
public LocationAware makeIndexableSourceFromStream(final InputStream inputStream) {
if (inputStream instanceof BlockCompressedInputStream) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I propose to deal with this in a new static method AsciiLineReader.from(InputStream is) and deprecate the constructor. This will be safer for the users of AsciiLineReader and will simplify the code here.

Copy link
Collaborator Author

@cmnbroad cmnbroad Jun 22, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like that would put an unnecessary constraint on AsciiLineReader (it should be legal to wrap an AsciiLineReader around a BCS). Having said that, its confusing because AsciiLineReader confers both a line orientation, as well as buffering, which is what messes up the positions. This would be a lot simpler if the buffering were separate.

Also, given how brittle all this code is, I like the idea of having this behavior and the comments explicitly included right here, where the explicit control over the buffering/positioning matters (this method is specifically to support indexing).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we had to duplicate this code in GATK anyway, I added a "from" method per this suggestion.

* An AsciiLineReader implementation that wraps a BlockCompressedInputStream and provides no additional buffering.
* Useful for cases where we need to preserve virtual file pointers in the underlying stream, such as during indexing.
*/
public class BlockCompressedAsciiLineReader extends AsciiLineReader {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the static method is done in AsciiLineReader, this could be a private inner class to avoid direct usages.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think now that we have that static method we should make this either a package protected class or an inner class of AsciiLineReader. I don't think we want to expose it unless we have a use case for that.

@@ -28,7 +28,11 @@

/**
* A wrapper around an {@code InputStream} which performs it's own buffering, and keeps track of the position.
*
*
* TODO: This class implements Positional, which in turn extends LocationAware, which requires preservation of
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be fixed by throwing an IllegalArgumentException if the InputStream is a BlockCompressedInputStream as a temporary solution. It is much better than violate the contract and assume that it is working as expected...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that would be the right thing to do, but htsjdk is kind of painted into a corner at this point. There are several other code paths that would have to change, but worse, BinaryFeatureCodec (and thus BCF2Codec) use PositionalBufferedStream for a SOURCE type parameter, even though BCF can be block compressed. So that codec actually depends on being able to wrap a PositionalBufferedStream around a block compressed stream.

@@ -92,6 +88,7 @@ public void testWriteIndex(final File inputFile, final IndexFactory.IndexType ty
Assert.assertTrue(tempIndex.exists());
// load the generated index
final Index loadedIndex = IndexFactory.loadIndex(tempIndex.getAbsolutePath());
//TODO: This not very useful; the test can pass even if the generated index is unuseable for queries.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I implemented this test only for checking if writing the index does not blow up, and if the in-memory index and the one written have the same contents. For testing if it is usable for queries, it should be a different test (which actually does not require to write the index down, I guess).

}

@Test(dataProvider = "bedTabixIndexTestData")
public void testBedTabixIndex(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice test!

@@ -102,6 +114,8 @@ public void testQueryProvidedItemsAmount() throws IOException {
) // create TabixIndex straight from plaintext VCF
.write(plainTextVcfIndexFile); // write it

//TODO: you can pass in a .tbi file as the index for a plain .vcf, but if you *don't* pass in the file name and
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But is it correct to use a .tbi index with a plain text file? As you noted in a previous comment, this is not allowed by tabix... I suggest to blow up in this case!

Copy link
Contributor

@droazen droazen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did an initial review pass -- haven't looked at the tests yet, so a second pass will be needed.

} else {
pbs = new PositionalBufferedStream(bufferedInputStream);
return new AsciiLineReaderIterator(new AsciiLineReader(inputStream));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about this case? Seems like you're not "indexable" if you reach this else clause.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly; it depends on the type of the inputStream. I left this in since it represents the previous behavior for this method. If inputStream isn't wrapping a block compressed stream, and reports it's position correctly, I think it would work. We could throw instead, but if we we're going to do that, we could deprecate this method and replace it method with two overloads with the correct stream argument types.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like we should have a test case for this case.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should there at least be a warning if we reach this? It seems very possible that you're going to try indexing stdin or something like that here.

* An AsciiLineReader implementation that wraps a BlockCompressedInputStream and provides no additional buffering.
* Useful for cases where we need to preserve virtual file pointers in the underlying stream, such as during indexing.
*/
public class BlockCompressedAsciiLineReader extends AsciiLineReader {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This class needs unit tests

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

}

/**
* Read a single line of inpu, advnced the underlying stream only enough to read the line.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

inpu -> input
advnced -> advanced

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

};

public String readLine(final PositionalBufferedStream stream) {
throw new IllegalArgumentException("A BlockCompressedAsciiLineReader class cannot be used to read from a PositionalBufferedStream");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UnsupportedOperationException

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -50,6 +52,10 @@
*/
public BinningIndexBuilder(final int referenceSequence, final int sequenceLength) {
this.referenceSequence = referenceSequence;
// Initially set each window to -1 so we can distinguish between windows that have no overlapping
// features, and those who's lowest offset is 0, which is a valid (virtual file) offset for feature
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

who's -> whose

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -33,6 +34,7 @@
* Builder for a BinningIndexContent object.
*/
public class BinningIndexBuilder {
private static final int UNINITIALIZED_WINDOW = -1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a unit test for this class that tests with a record at position 0.

Copy link
Collaborator Author

@cmnbroad cmnbroad Jul 25, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. There were previously no unit tests so now there are tests for a few basic cases.

@@ -243,6 +243,7 @@ private void readHeader() throws IOException {
try {
is = ParsingUtils.openInputStream(path, wrapper);
if (hasBlockCompressedExtension(new URI(URLEncoder.encode(path, "UTF-8")))) {
// TODO: TEST/FIX THIS!
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you ensure that a github ticket is filed for this?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

* feature query (otherwise the feature reader can terminate prematurely if the
* header is large).
* @param lineIterator
* @return
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fill in @return

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@Override
public Object readActualHeader(LineIterator reader) {
public Object readActualHeader(final LineIterator lineIterator) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add unit test for this method

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

final PositionalBufferedStream pbs;
if (bufferedInputStream instanceof PositionalBufferedStream) {
pbs = (PositionalBufferedStream) bufferedInputStream;
public LocationAware makeIndexableSourceFromStream(final InputStream inputStream) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you file a ticket against LocationAware to either make all implementation obey the interface contract, or change the interface to reflect the actual behavior?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cmnbroad
Copy link
Collaborator Author

cmnbroad commented Jul 24, 2017

This also PR fixes #943, which is a tracking ticket for the BinningIndexBuilder issue fixed as part of this PR.

@cmnbroad
Copy link
Collaborator Author

Round 1 done. Back to @droazen.

return new PositionalBufferedStream(fileStream);
} catch (final FileNotFoundException e) {
throw new TribbleException.FeatureFileDoesntExist("Unable to open the input file, most likely the file doesn't exist.", inputFile.getAbsolutePath());
} catch (final IOException e) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets make sure we don't get npr in these exceptions

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

final File inputVCF,
final Interval queryInterval) throws IOException {
// copy the original file and create the index for the copy
final File tmpVCF = new File(IOUtil.createTempDir("testCreateTabixIndexFromVCF", null), inputVCF.getName());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cleanup this temp dir afterwards

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@Test(dataProvider = "bcfDataFactory")
public void testCreateLinearIndexFromBCF(final File inputBCF) throws IOException {
// copy the original file and create the index for the copy
final File tmpBCF = new File(IOUtil.createTempDir("testCreateIndexFromBCF", null), inputBCF.getName());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same cleanup issue here I think

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

index.writeBasedOnFeatureFile(tmpBCF);

try(final VCFFileReader originalReader = new VCFFileReader(inputBCF,false);
final VCFFileReader tmpReader = new VCFFileReader(tmpBCF, new File(tmpBCF.getAbsolutePath()+ ".idx"),true)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hardcoded file extension

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

final List<Interval> queryIntervals
) throws Exception {
// copy the input file and create an index for the copy
final File tmpBed = new File(IOUtil.createTempDir("testBedTabixIndex", null), inputBed.getName());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

temp dirs all over the place

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Member

@lbergelson lbergelson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 This looks good to me modulo the few minor issues. Merge when those are addressed.

final TabixIndex tabixIndexGz = IndexFactory.createTabixIndex(tmpVCF, new VCFCodec(), null);
tabixIndexGz.writeBasedOnFeatureFile(tmpVCF);

try(final VCFFileReader originalReader = new VCFFileReader(inputVCF,false);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need space after try, and while below.

BEDCodec bedCodec = new BEDCodec();
try (final InputStream is = ParsingUtils.openInputStream(gzippedBedFile.getPath());
final BlockCompressedInputStream bcis = new BlockCompressedInputStream(is))
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

open { should go on previous line, unless the line is too long

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For multiline resource blocks I think this way is clearer.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A matter of taste :)
One data point, oracle's Java documentation: https://docs.oracle.com/javase/tutorial/essential/exceptions/tryResourceClose.html

@DataProvider(name = "bedDataProvider")
public Object[][] getLinearIndexFactoryTypes(){
return new Object[][] {
new Object[] { new File(TestUtils.DATA_DIR, "bed/Unigene.sample.bed") },
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unnecessary new Object[], and there shouldn't be a space around the contents of an array initializer, e.g.

public Object[][] getLinearIndexFactoryTypes(){
    return new Object[][] {
            {new File(TestUtils.DATA_DIR, "bed/Unigene.sample.bed")},
            {new File(TestUtils.DATA_DIR, "bed/Unigene.sample.bed.gz")}
     };
 }

final File discontinuousFile = new File(TestUtils.DATA_DIR + "bed/disconcontigs.bed");
final BEDCodec bedCodec = new BEDCodec();
@DataProvider(name = "bedDataProvider")
public Object[][] getLinearIndexFactoryTypes(){
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing space before {

// for ( int mid = 0; mid <= end; mid += 1000000 ) {
// tests.add(new Object[]{0, mid, mid+1000000, end});
// }
List<Object[]> tests = new ArrayList<>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be shorter to write:

return new Object[][] {{1226943, 1226943, 1226943, 2000000}};


public class BinningIndexBuilderTest extends HtsjdkTest {

static int referenceSequenceIndex = 19;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be private, final and all uppercase, e.g. REFERENCE_SEQUENCE_INDEX

@cmnbroad
Copy link
Collaborator Author

cmnbroad commented Aug 3, 2017

@jrobinso We're planning to make some changes to the way indexing works in order to fix several bugs, and wanted to get out in front of any issues that might arise for IGV. Amongst other things, with this PR, when indexing we no longer close the input SOURCE/stream after reading the header (this affected indexing for at least one codec in GATK). Can you take a look and see if this will require any IGV changes ? Thx.

@jrobinso
Copy link
Contributor

jrobinso commented Aug 7, 2017

I don't see any IGV impact, and if there are I will deal with it in IGV.

public final String readLine(final PositionalBufferedStream stream) throws IOException{
// TODO: This should be deprecated and made private/merged with readLine, or made
// static, as it reads from the provided input stream rather than the wrapped input stream
@Deprecated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missed this before, but could you add a deprecation date?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and a javadoc comment

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I put an @deprecated inside the javadoc, with the date, and added an @see to the "from" method. I can't figure out how to anchor to the OTHER readLine method though. Maybe I could if I swapped the order of the methods....

* TODO: This class implements LocationAware, which requires preservation of virtual file pointers on BGZF inputs.
* However, if the inputStream wrapped by this class is a BlockCompressedInputStream, it violates that contract by
* wrapping the stream and returning positional file offsets instead.
*/
public AsciiLineReader(final InputStream is){
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we deprecate this constructor and eventually make it private? Is there a case outside of this class that we would want to call this constructor instead of from()?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm. We could do that. There are a bunch of internal uses of it, but thats easily fixed. But should we deprecate the other one too (the one that takes a PositionalBufferedInputStream), since "from" is the preferred method ? Otherwise its weird to leave the PBS constructor, but not have a BCS constructor (you'd have to use .from for that case). It would be most consistent to just have .from, but thats less BWC.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you're right that it would make sense to also deprecate the other one. What's BWC stand for?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry...backward compatible. I'll deprecate the other one as well.

Copy link
Member

@lbergelson lbergelson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cmnbroad A few last minute comments. Sorry!

@cmnbroad
Copy link
Collaborator Author

cmnbroad commented Aug 8, 2017

@lbergelson I may have added more questions than answers... See what you think.

@droazen
Copy link
Contributor

droazen commented Aug 9, 2017

@cmnbroad Before merging this, can we ensure that there are clear and complete instructions for codec authors on how to write a codec that is properly indexable? (eg., in a comment to FeatureCodec)

@cmnbroad
Copy link
Collaborator Author

cmnbroad commented Aug 9, 2017

@droazen Done. We should let @lbergelson have one more look at the previous set of changes I made when he gets to it.

@lbergelson
Copy link
Member

@cmnbroad Looks good to me. 👍 Merge when ready.

@cmnbroad
Copy link
Collaborator Author

cmnbroad commented Aug 9, 2017

Squashed and rebase once more. Will merge when tests pass.

protected AsciiLineReader() {};

/**
* TODO: This class implements LocationAware, which requires preservation of virtual file pointers on BGZF inputs.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you want TODO here. This sounds more like a warning or a note to the user of the class.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, it is a TODO, but you're right the TODO doesn't need to be in the javadoc, just the warning.

* However, if the inputStream wrapped by this class is a BlockCompressedInputStream, it violates that contract by
* wrapping the stream and returning positional file offsets instead.
*
* @deprecated 8/8/2017 see {@link #from}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the format you want here is use {@link #from(InputStream)}. Saying "see" to me sounds like @see which has a different usage in javadoc. You can provide the types of the parameters if you want to specify an overridden method.

Also it's not great form to use a date, since the date when the software released is not yet fixed, or easy for a user to determine given a .jar they've got lying around. If you want to include this information, use the version number.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed the "see"s to "use". AFACIT, we've been putting deprecation dates, I think mostly for our own benefit, since we have existing deprecated stuff but we don't know when it was deprecated. As soon as the maintainer groups defines the deprecation rules, we'll need that info...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've started adding the dates so we can get a rough sense of how long something has been deprecated. We could always look in the blame which is more accurate, but it's a pain if you're scanning through lots of things. We could provide the version number, but our deprecation removal policy (which doesn't really exist in any official form) is "has this been deprecated for a long time" rather than "has this been deprecated for N versions".

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lbergelson I left the date in - any input on that ? Phil is probably right that its of limited use to the user, but not sure if you guys have discussed the plan for deprecation.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Never mind - just saw your comment....

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cmnbroad Yeah, it's really not intended for the users except for them to say "oh, yeah, this is ancient and I shouldn't use it." A version number wouldn't be a bad idea, but then we have to be sure we guess what version the commit is going into, which is awkward. Checking the blame is the ultimate guide, but it's a pain to do in bulk.

/**
* Read a single line of input, advance the underlying stream only enough to read the line.
*/
public String readLine() throws IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If these are overridden, they need an @Override annotation. Also there's no need to javadoc an overridden method, as javadoc will copy down the base class method doc.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch.

private char[] lineBuffer;
private int lineTerminatorLength = -1;

protected AsciiLineReader() {};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a great use of an empty constructor. If subclassed improperly a user ends up with a class that's in an invalid state. It will only work correctly if certain methods are overridden, which are not part of a documented API.

The proper solution is to create an abstract base class (or interface with default methods) that both AsciiLineReader and BlockCompressedAsciiLineReader inherit from. That makes it explicit which methods must be overridden and prevents having inherited fields in BlockCompressedAsciiLineReader that are invalid.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed thats a better solution, but it its not source compatible. We recognize that we need to do something more comprehensive to fix the underlying problems, and there are a couple of issue tickets to represent that. Ideally we'll only break compatibility once for these issues.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants