nucleotide-count: poorly described and forces implementation #842

shaleh · 2017-09-06T06:45:46Z

This is all the README says about the interface:

Given a DNA string, compute how many times each nucleotide occurs in the string.

Running the tests we discover we have to create a DNA and a Histogram. DNA has to have Count and Counts both of which can error.

But the test is really exact in what it wants. DNA has to be a type synonym for string. Histogram is expected to be a synonym for a map. This means the developer cannot make any stab at efficiency. This is just define a map, loop over the string, stuff the map, return. Blah and ho-hum. The whole thing ends up being a bunch of Go boiler plate for what is essentially perl/ruby/python stuff a hash code.

The text was updated successfully, but these errors were encountered:

robphoenix · 2017-09-06T09:57:59Z

Running the tests we discover we have to create a DNA and a Histogram. DNA has to have Count and Counts both of which can error.

This should definitely be better documented in the README.

I agree, this is not a complex exercise. Currently it appears quite far along in the exercise evolution, and I can understand why the exercise could prove disappointing.

In the Exercism reboot it appears much earlier, being unlocked by the third exercise. As an exercise for someone newer to Go I think, with better documentation, it is suitable as-is. If it were to appear later in the track, we could look into changing Histogram to a struct perhaps?

type Histogram struct {
        A, C, G, T int
}

This might free up the possibilities for implementation, and perhaps even provide a basis for exploring concurrency, which is something we need more exercises to do.

Thanks @shaleh for highlighting this. Your audit as you go through the track is really helpful. 😃

tleen · 2017-09-06T12:11:03Z

he whole thing ends up being a bunch of Go boiler plate for what is essentially perl/ruby/python stuff a hash code.

There is value in highlighting where Go may not be a great fit? You would not want to do this in C either, but may be value in trying it in C and then Perl just to see why. I'm still uncomfortable using Go for JSON manipulation or any sort of string ops because it can be done faster (for me) in other languages.

That being said we have a whole bunch of similar exercises that are basic string munging and/or character mapping. One of the changes in Excercism v2 (nextercism) is that users will no longer be forced into completing the same exercise over and over under slightly different circumstances. Instead we will have a core group of exercises unlocking other exercises that are not necessary for moving forward in the track (#835). Hopefully this will limit a sort of boredom of repetitive exercises. Contribution to #830 is a good way to keep track of the redundancies so we can keep them out of the core track. @shaleh your feedback there and like (@robphoenix said) auditing the exercises are fantastically helpful thanks!

shaleh · 2017-09-06T15:06:08Z

With one small change to the tests the developer is freed to come up with different internals.

	{"ACT", 'G', 0},

Right now, DNA has to be a string synonym because "ACT" is used raw. If the line read:

	{DNA{"ACT"}, 'G', 0},

Poof. Suddenly the developer can cache results, use a smaller data structure, etc. Since this came later in the series I (incorrectly) assumed that we were expected to write something a little more interesting. I had actually coded this:

type DNA struct {
    nucleotides string
    counts [4]int
    populated bool
}

func idxForProtein(n byte) {
    switch n {
    case 'A': return 0
    case 'C': return 1
    case 'G': return 2
    case 'T': return 3
    }
    return -1
}

func (dna DNA) Count(n byte) (int, error) {
    if ! dna.populated {  // need this because New() constructor is not used
        if err := (&dna).populate(); err != nil {
            return -1, err
        }
    }

    idx := idxForProteins(n)
    if idx == -1 {
        return -1, DNAInvalidProtein(n)  // error omitted for brevity
    }
    return dna.counts(idx), nil
}

func (dna *DNA) populate() error {
    for _, n := range dna.nucleotides {
        idx := idxForProtein(n)
        if idx == -1 {
            return DNAInvalidProtein(n)
        }
        dna.counts[n]++
    }
    dna.populated = true
    return nil
}

Counts took my structure and made a map to satisfy the test. I agree a struct would probably be more reasonable in real code.

This demonstrates the advantage of Go or C over Python and Ruby. This code will be a good deal faster and use significantly less memory plus have the data stored in a way to utilize localization and caches instead of spread out in a linked list. If I know the counts will be under 2^16 I could even store the counts in a single int64. In a big piece of bioscience software this might mean the code could handle way, way more DNA structures at once.

I agree more go routines would be great in the exercises. This is not one of them. The fastest way to get the counts is to walk the list once, filling a data structure. Walking that list 4 times in 4 threads is not going to help.

junedev · 2021-10-23T18:51:29Z

This was already addressed in instructions.append.md and in the stub file.

robphoenix added Nextercism and removed Nextercism labels Sep 6, 2017

hilary added per-exercise and removed per-exercise labels Aug 28, 2018

github-actions bot removed the README/inadequate label Oct 22, 2021

junedev mentioned this issue Oct 23, 2021

Add stub functions for practice exercises #1893

Closed

junedev closed this as completed Oct 23, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

nucleotide-count: poorly described and forces implementation #842

nucleotide-count: poorly described and forces implementation #842

shaleh commented Sep 6, 2017

robphoenix commented Sep 6, 2017 •

edited

Loading

tleen commented Sep 6, 2017

shaleh commented Sep 6, 2017 •

edited

Loading

junedev commented Oct 23, 2021

nucleotide-count: poorly described and forces implementation #842

nucleotide-count: poorly described and forces implementation #842

Comments

shaleh commented Sep 6, 2017

robphoenix commented Sep 6, 2017 • edited Loading

tleen commented Sep 6, 2017

shaleh commented Sep 6, 2017 • edited Loading

junedev commented Oct 23, 2021

robphoenix commented Sep 6, 2017 •

edited

Loading

shaleh commented Sep 6, 2017 •

edited

Loading