Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nucleotide-count: poorly described and forces implementation #842

Closed
shaleh opened this issue Sep 6, 2017 · 4 comments
Closed

nucleotide-count: poorly described and forces implementation #842

shaleh opened this issue Sep 6, 2017 · 4 comments

Comments

@shaleh
Copy link

shaleh commented Sep 6, 2017

This is all the README says about the interface:

Given a DNA string, compute how many times each nucleotide occurs in the string.

Running the tests we discover we have to create a DNA and a Histogram. DNA has to have Count and Counts both of which can error.

But the test is really exact in what it wants. DNA has to be a type synonym for string. Histogram is expected to be a synonym for a map. This means the developer cannot make any stab at efficiency. This is just define a map, loop over the string, stuff the map, return. Blah and ho-hum. The whole thing ends up being a bunch of Go boiler plate for what is essentially perl/ruby/python stuff a hash code.

@robphoenix
Copy link
Contributor

robphoenix commented Sep 6, 2017

Running the tests we discover we have to create a DNA and a Histogram. DNA has to have Count and Counts both of which can error.

This should definitely be better documented in the README.

I agree, this is not a complex exercise. Currently it appears quite far along in the exercise evolution, and I can understand why the exercise could prove disappointing.

In the Exercism reboot it appears much earlier, being unlocked by the third exercise. As an exercise for someone newer to Go I think, with better documentation, it is suitable as-is. If it were to appear later in the track, we could look into changing Histogram to a struct perhaps?

type Histogram struct {
        A, C, G, T int
}

This might free up the possibilities for implementation, and perhaps even provide a basis for exploring concurrency, which is something we need more exercises to do.

Thanks @shaleh for highlighting this. Your audit as you go through the track is really helpful. 😃

@tleen
Copy link
Member

tleen commented Sep 6, 2017

he whole thing ends up being a bunch of Go boiler plate for what is essentially perl/ruby/python stuff a hash code.

There is value in highlighting where Go may not be a great fit? You would not want to do this in C either, but may be value in trying it in C and then Perl just to see why. I'm still uncomfortable using Go for JSON manipulation or any sort of string ops because it can be done faster (for me) in other languages.

That being said we have a whole bunch of similar exercises that are basic string munging and/or character mapping. One of the changes in Excercism v2 (nextercism) is that users will no longer be forced into completing the same exercise over and over under slightly different circumstances. Instead we will have a core group of exercises unlocking other exercises that are not necessary for moving forward in the track (#835). Hopefully this will limit a sort of boredom of repetitive exercises. Contribution to #830 is a good way to keep track of the redundancies so we can keep them out of the core track. @shaleh your feedback there and like (@robphoenix said) auditing the exercises are fantastically helpful thanks!

@shaleh
Copy link
Author

shaleh commented Sep 6, 2017

With one small change to the tests the developer is freed to come up with different internals.

	{"ACT", 'G', 0},

Right now, DNA has to be a string synonym because "ACT" is used raw. If the line read:

	{DNA{"ACT"}, 'G', 0},

Poof. Suddenly the developer can cache results, use a smaller data structure, etc. Since this came later in the series I (incorrectly) assumed that we were expected to write something a little more interesting. I had actually coded this:

type DNA struct {
    nucleotides string
    counts [4]int
    populated bool
}

func idxForProtein(n byte) {
    switch n {
    case 'A': return 0
    case 'C': return 1
    case 'G': return 2
    case 'T': return 3
    }
    return -1
}

func (dna DNA) Count(n byte) (int, error) {
    if ! dna.populated {  // need this because New() constructor is not used
        if err := (&dna).populate(); err != nil {
            return -1, err
        }
    }

    idx := idxForProteins(n)
    if idx == -1 {
        return -1, DNAInvalidProtein(n)  // error omitted for brevity
    }
    return dna.counts(idx), nil
}

func (dna *DNA) populate() error {
    for _, n := range dna.nucleotides {
        idx := idxForProtein(n)
        if idx == -1 {
            return DNAInvalidProtein(n)
        }
        dna.counts[n]++
    }
    dna.populated = true
    return nil
}

Counts took my structure and made a map to satisfy the test. I agree a struct would probably be more reasonable in real code.

This demonstrates the advantage of Go or C over Python and Ruby. This code will be a good deal faster and use significantly less memory plus have the data stored in a way to utilize localization and caches instead of spread out in a linked list. If I know the counts will be under 2^16 I could even store the counts in a single int64. In a big piece of bioscience software this might mean the code could handle way, way more DNA structures at once.

I agree more go routines would be great in the exercises. This is not one of them. The fastest way to get the counts is to walk the list once, filling a data structure. Walking that list 4 times in 4 threads is not going to help.

@junedev
Copy link
Member

junedev commented Oct 23, 2021

This was already addressed in instructions.append.md and in the stub file.

@junedev junedev closed this as completed Oct 23, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants