Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shrink & clean up custom_set.json #232

Merged
merged 1 commit into from May 16, 2016

Conversation

IanWhitney
Copy link
Contributor

Discussed here: exercism/discussions#10

I'm trying to improve custom_set in two ways:

  1. Reducing the test suite, to remove redundancy or uninteresting
    implementations.
  2. Changing test order to improve flow

Reducing the test suite

The previous test suite contained 74 tests, which is a lot. I haven't
checked all the exercises, but it's the biggest test suite that I've
come across.

If all of those tests provided value (exposing corner cases, improving
implementations, etc.) then that's fine. But I found there to be a lot
of duplicate tests. With the subset/union/etc tests, a student's
implementation is usually done by the 2nd or 3rd test, so the remaining
tests didn't provide any additional value.

So I have removed the tests that seemed redundant.

The tests also expected methods like size, delete or is_empty. I
have removed those because

  • They aren't vital to the behavior of a Set
  • They are usually implemented as an alias
  • They aren't used by the set operations (diff/subset/etc)

Changing test order

The previous test suite started with equal. I found that this requires
students to implement two things:

  • Creating a new element
  • Comparing two collections of elements

I have chosen to start the tests with contains, since that only
requires one set. And, helpfully, when the student implements add and
equal, they can leverage their already-existing contains function.

@NobbZ
Copy link
Member

NobbZ commented Apr 20, 2016

I do not understand how size, is_empty and delete are not vital for a set.

I need the size very often when dealing with sets in my code, also I do need a fast check for emptiness which does not count a billion of entry's before it tells me that I have a size unequal to 0.

Also when we are able to insert a single element, we should also be able to remove one. For some implementations a specialised delete is much faster than creating and subtracting a singleton set.

@IanWhitney
Copy link
Contributor Author

@NobbZ, I think it depends on what we want this exercise to teach. I think there are two options:

  1. We want it to show the student how to create a full-featured Set library.
  2. We want it to show a student how to handle the unique parts of Set logic (no uniques, unions, etc.)

If it's 1, then you're absolutely right. We should do delete, size and probably a few more methods (superset?, merge?).

But if it's 2, then I think we can focus the API on teaching what we want the exercise to teach. And I didn't see how delete and is_empty did that.

@kytrinyx
Copy link
Member

I think that whether we lean towards 1 or 2 we should clarify in the README what the purpose is.

@IanWhitney
Copy link
Contributor Author

@exercism/track-maintainers, thoughts on what the purpose of this exercise is?

@masters3d
Copy link
Contributor

masters3d commented Apr 22, 2016

One

@kytrinyx
Copy link
Member

I was actually thinking (2) more than (1).

@IanWhitney
Copy link
Contributor Author

@kytrinyx's vote puts at 2-1 in favor of option (2). That's not a very high turnout, though. I'd like to hear from more people...

@ryanplusplus
Copy link
Member

I like (1), personally

@ryanplusplus
Copy link
Member

Although I don't think that the options are really mutually exclusive

@Cohen-Carlisle
Copy link
Member

I'm leaning towards 2 because there are plenty of other places to learn about implementing data structures in the language of your choice. I wouldn't call that a primary purpose of exercism.

However, I agree with @ryanplusplus that the two aren't mutually exclusive.
If requiring a fully featured set results in more interesting code discussions, then I'm all for it.

Still, I feel like 1 might be putting too many details into the exercise and could result in more muddied code and/or discussions.

tl;dr: 2, but open to changing

@petertseng
Copy link
Member

Hmm. I do find when reviewing submissions of the more involved exercises (ones that require a lot of code) that sometimes it's hard to figure out how to go about it. There's a lot of code, where do I start reviewing? etc.

For that reason, it is good to keep exercises focused. No more than what they need to be.

... All right, admitted that the above doesn't take a stance on whether this exercise should focus on the full set library or just the interesting set operations. I'll pick just the set operations, but I could be convinced either way.

@IanWhitney
Copy link
Contributor Author

@ryanplusplus What tests would you see in the suite if we combined options 1 & 2?

@ryanplusplus
Copy link
Member

@IanWhitney basically my thought was that we'd strip the exercise down to the essential set operations (as in 1), but reorder those tests to try to minimize the amount of code that has to be written to pass each successive test (as in 2). I think that the example of starting with contains instead of equals is perfect and is exactly how I approach the problems when developing with TDD: I choose features strategically in order to minimize the code written at each step and keep my cycle time low.

@IanWhitney
Copy link
Contributor Author

I like that approach, @ryanplusplus. What would the right order be, do you think?

I think I would go with:

  • is_empty (gets a test passing very quickly)
  • contains (another one that gets a test passing quickly)
  • add (pretty much foundational for everything that comes next)
  • delete? (I'm still not sure we need it, but if we do it makes sense to do it after add)
  • equal (this could go other places, I think)
  • subset
  • disjoint
  • intersect
  • difference
  • union

I lean towards dropping symmetric_difference, since (in my experience) it's not used often. But I did also find it interesting to implement in Rust, so maybe it stays.

Downside of this approach is that the test suite stays pretty big, right? Probably ~45 tests. Is having those extra methods worth it?

@ryanplusplus
Copy link
Member

@IanWhitney I like your suggested order, but would move equal to after subset and difference in order encourage more creative implementations of equal.

@IanWhitney
Copy link
Contributor Author

Sure. I keep forgetting that equality can be done much better than I did in my terrible implementation.

@IanWhitney
Copy link
Contributor Author

Changes!

  • Added tests for empty
  • Moved tests for equal
  • Dropped tests for symmetric_difference because no one made a case for them

I'm going to let this sit until Monday. If things look good then, I'll merge.

"equal": {
"description": ["Test two sets for equality."],
"empty": {
"description": "Returns true if the set contains any elements",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic inverted. true if it contains no elements, or false if it contains any elements.

@IanWhitney
Copy link
Contributor Author

Updated to address @petertseng's comments. I'll rebase all this stuff before any merge.

"set2": [4],
"expected": [1, 2, 3, 4]
"description": "sets with the same elements are equal",
"set1": [1,2],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we care very much about consistent spacing (space after comma)? If so, should deal with this and the other test a few lines below

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On Sat 2016-05-14 15:08, Peter Tseng wrote:

      },
      {
  •        "description": "no elements in common",
    
  •        "set1": [1, 2, 3],
    
  •        "set2": [4],
    
  •        "expected": [1, 2, 3, 4]
    
  •        "description": "sets with the same elements are equal",
    
  •        "set1": [1,2],
    

do we care very much about consistent spacing (space after comma)? If so, should deal with this and the other test a few lines below

Yes, always care about consistency, anything that is a change in
consistency is used in communicating something. Sometimes it
communicates we aren't paying attention, someone looking at it will
spend thought cycles trying to determine if there is something special
about the difference, and may eventually determine the truth... someone
was not paying attention, or may attribute it to something else.

So yes, we should very much care about consistency.

@petertseng
Copy link
Member

I think that's all I have to say about what's here, and I can't think of any test that should be here that shouldn't be.

@petertseng
Copy link
Member

Maybe a test that [1, 2, 3] is not a subset of [1, 2], but that may have already been covered by the test that [1, 2, 3] is not a subset of [1, 2, 4], unsure.

@petertseng
Copy link
Member

Possible thing someone may say: does it seem "weird" to have equal come between difference and union? Maybe because we're going back and forth between operations that take two sets and return

  • subset (set, set) -> bool
  • disjoint (set, set) -> bool
  • intersect (set, set) -> set
  • difference (set, set) -> set
  • equal (set, set) -> bool
  • union (set, set) -> set

Seems weird to go back and forth like that, I guess?

I know there was a desire to have equal after difference though, so it's a bit tough. Not sure if anyone will be bothered by it

@IanWhitney
Copy link
Contributor Author

(to be fair, I had the Rust PR ready to go before I even merged the test changes)

@ryanplusplus
Copy link
Member

Okay, I realize I'm late on this, but I think there are some issues with the new tests. Many of the tests that come before "equal" depend upon "equal" in the assertion (which kind of defeats the purpose of the reordering). I think this is probably my fault because I suggested putting "equal" after "subset" and "difference". Given this, I think we need to make a few more changes before we roll this out.

I think this can be fixed by moving "equal" up above all dependent tests. It can still stay below "subset", but can't be below "difference". Looking through the tests, I think we'd be okay as long as we move "subset" and "equal" below "add".

Thoughts?

@IanWhitney
Copy link
Contributor Author

I'm not sure what you mean. How do the earlier tests depend on equal?

@ryanplusplus
Copy link
Member

ryanplusplus commented May 21, 2016

Take an "add" test:

         {
            "description": "add to non-empty set",
            "set": [1, 2, 4],
            "element": 3,
            "expected": [1, 2, 3, 4]
         },

It's strongly implied that the expectation will be checked with "equal". I realize that you can check with repeated uses of "contains" (and I started by doing this), but eventually it becomes absurd. Additionally, it appears that your tests for Rust use "equal" prematurely (thought I'm not a Rust guy so maybe I'm misunderstanding something).

@IanWhitney
Copy link
Contributor Author

Maybe it's semantics, but is the equal method the same as == (or whatever your language chooses)?. In Rust I can say

assert_eq!(Set::new(1), Set::new(1)) 

And that can work, even if I don't define the equal method.

In Ruby I could do the same:

class CustomSet
  #...
  def ==(other)
    self.collection == other.collection
   end
  #...
end

assert_equal CustomSet.new(1), CustomSet.new(1)

Then, later on, I implement an actual equal method and maybe I do it with a more set-like approach

class CustomSet
  def equal(other)
    self.difference(other).empty? && other.difference(self).empty?
  end

  def ==(other)
    self.collection == other.collection
  end
end

Maybe this approach doesn't work in other languages, though. These two are the only ones I'm familiar with.

@ryanplusplus
Copy link
Member

ryanplusplus commented May 21, 2016

I think that for many (most?) of the language tracks == will not be the same as equal and will only determine referential equality (ie: they will not be equal because they are not the same instance). The only language I'm familiar with that will do a value comparison for composite data types is C. Additionally, I don't think that the example you gave in Ruby really illustrates your point since you are defining == and are not getting it for free as you did in Rust (I am not a Ruby expert, but I'm almost certain you get referential equality by default). So for the Ruby track, you'd have to define both == and equals? which is not productive since == would be defined only for the purposes of making the tests happy (and needs to have the same semantics as equals? anyway -- see below).

Even if a language does check value equality by default, a set can be represented multiple ways and you're not going to be able to use value equality as a reliable measure of set equality. For instance, say that your internal representation is a vector of all of the set elements. Because sets are not ordered, you might have a set with an internal representation [1, 2] which would not be considered == to another set with an internal representation [2, 1] although you would definitely describe those sets as "equal".

@IanWhitney
Copy link
Contributor Author

This is bikeshedding in the extreme (which is fine by me, btw. not complaining). In the cases above, neither language has an equal (or equals? for Ruby) method until we implement it. In both Rust and Ruby, you can compare two objects (or types, in Rust) by implementing either == or eq. So the student can compare two CustomSets for equality by implementing one of those methods. That will allow all tests that use assert_equal to pass.

When the Student reaches the test for implementing the equal method, how they do so is up to them.

I'm fully on board with your suggestion that this might be a problem in other languages. Maybe a maintainer of a language that has a problem with our current test implementation can contribute here? Or do we need a new issue, since I'm not sure how many people read comments on closed pull requests.

@ryanplusplus
Copy link
Member

ryanplusplus commented May 21, 2016

I think the fundamental problem is that regardless of what mechanism(s) a language provides, implementing == (or an equivalent) must be semantically identical to equal. There's no way around this since the tests rely upon both of these to determine set equality. The distinction you've drawn between == and equal does not make sense and I contend that there cannot be any (semantic) distinction.

Given this, the reordering doesn't make sense because if someone goes through the tests from top to bottom, they will need to implement set equality before getting to the equal tests. I'm happy to put together a PR that resolves this, but I first want to reach agreement on the problem because it's entirely possible I'm a raving lunatic and simply haven't figured it out yet.

@verdammelt
Copy link
Member

Pardon me for butting in, have not been following this closely, but I do think that this test has a bit of a problem. Since sets are unordered one is unlikely to be able to use a built in equality operator to check if the "given set plus an element" is equivalent to the "expected set".

In the Common Lisp track the test's assertion would most likely use the equal equality operator which checks for "structural equality", effectively for a list: are the first elements equal, are the second element equal... so the order matters. The equal function written by the submitter for their set data type could work differently of course (as the equal operator does with respect to pathnames: they must be "functionality equivalent" i.e. not compared as strings).

@IanWhitney
Copy link
Contributor Author

IanWhitney commented May 21, 2016

I'm not familiar with Lisp, but since you say that the Student could implement their own equal function, it sounds like they could write one that doesn't rely on order.

Also, in the case of Lisp (or other languages with an existing equal method/function), the naming is just a suggestion. So if it causes less confusion to call the method is_equal, then go for that.

@ryanplusplus, maybe a longer Ruby example will make my idea clear.

class CustomSet
  def ==(other)
    collection == other.collection
  end

  def eq?(other)
    self == other
  end

  def initialize(element)
    self.collection << [element]
  end

  def collection
    @collection ||= []
  end
end

With that code, and that code only, this test passes:

set1 = CustomSet.new(1)
set2 = CustomSet.new(1)
assert_equal(set1, set2)

Because we implemented the eq and == methods, and that's what Ruby uses for equality comparisons between instances. But this test would fail:

set1 = CustomSet.new(1)
set2 = CustomSet.new(1)
assert set1.equal(set2)

#NoMethodError: undefined method `equal' for #<CustomSet:0x007fd7ad006db0 @collection=[[1]]>
#   from (irb):21
#   from /Users/Ian/.rubies/2.2.3/bin/irb:11:in `<main>'

Because there is no global equal method in Ruby. I have to write it. I can do it a simple way:

class CustomSet
  # everything else is unchanged
  def equal(other)
    self == other
  end
end

Or I could do it in a wrong way

class CustomSet
  # everything else is unchanged
  def equal(other)
    self != other
  end
end

Or I could do it using my set methods

class CustomSet
  # everything else is unchanged
  def equal(other)
    self.difference(other).empty? && other.difference(self).empty?
  end
end

The behavior of my two correct implementations are exactly the same, sure. But (in Ruby, at least) I'm not required to define equal to get previous tests to pass.

If your language allows this sort of test ordering, then I think it's valuable to leave the definition of equal until later. Like you've said, the Student has to have already implemented some way of defining equality. So this late introduction of an equal method is weird. I think it's good that it's weird. It should, hopefully, encourage some Students to think about the correct implementation of equal based on the work that they've already done.

But if there's a language where this sort of approach is impossible, then move the equal test earlier. I'm not sure that requires a reordering of the test file. It might be handled fine by a comment at the top.

@ryanplusplus
Copy link
Member

ryanplusplus commented May 21, 2016

@IanWhitney believe me, I understand that you can implement both == and equal (or is_equal or equals? or equal_to or...) independently. That's not what I'm trying to discuss. My concern with the existing test suite is that you must have implemented set equality (by some name) long before you get to the tests for set equality. It's a red herring to focus on the name given to the method instead of on the fact that you must implement set equality before it is addressed in the test suite. This is particularly relevant since one of the goals you gave for this update a situation where two things needed to be implemented at one time in the tests for equal (see "Changing test order" in the PR description).

To hopefully make this more clear, consider this add test (which comes before the tests for equal):

          {
             "description": "add to non-empty set",
             "set": [1, 2, 4],
             "element": 3,
             "expected": [1, 2, 3, 4]
          },

Even in your sample Ruby implementation this will fail for == because in your implementation of == the order of the collection matters. The case you provide where it does pass is one of a few special cases where structural and set equality just so happen to be identical because of implementation details. So even at this point you must have implemented set equality if you want these tests to pass.

It is untrue to say that == can be implemented in a meaningfully different way than equal, so why not explicitly address set equality prior to this point in the test suite? This will mean that students no longer need to implement both add and set equality (by whatever name you choose) at the same time just to get add to pass. This is easily accomplished because the tests for equal do not rely upon anything except for the set constructor.

@IanWhitney
Copy link
Contributor Author

I think I need to punt on this one. It's not that I don't see the problem you're describing, it's that I don't see it as a problem.

The best approach here is to submit a PR, get feedback from people with opinions other than mine and then I'll update the Rust tests to do whichever.

@wobh
Copy link

wobh commented May 21, 2016

Backing up @ryanplusplus and @verdammelt here.

For Common Lisp this exercise could easily devolve into an exercise of how well a programmer can implement generic equality since one could easily implement something with weird behaviors if they rely on equal internally. For example, here's an ad hoc, informally-specified, bug-ridden, slow implementation of half of CDR 8: Generic equality and comparison for Common Lisp

  (defmethod equals ((obj1 t) (obj2 t)
                     &rest keys
                     &key recursive (case-sensitive t) &allow-other-keys)
    (declare (ignorable recursive))
    (when (equal (type-of obj1) (type-of obj2))
      (typecase obj1
        (number (= obj1 obj2))
        (cons   (tree-equal obj1 obj2 :test #'equals))
        (character (funcall (if case-sensitive #'char= #'char-equal) obj1 obj2))
        (string (funcall (if case-sensitive #'string= #'string-equal) obj1 obj2))
        (array (loop for i from 0 below (array-total-size obj1)
                  always (apply #'EQUALS
                                (row-major-aref obj1 i)
                                (row-major-aref obj2 i)
                                keys)))
        (hash-table (error "Not implemented.")) ; FIXME
        ;; structure not implemented either but default behavior probably okay
        (otherwise (equalp obj1 obj2)))))

  (defmethod equals ((set1 set) (set2 set) 
                     &rest keys
                     &key recursive &allow-other-keys)
    (null (set-difference (elements-of set1) (elements-of set2) :test #'equals)))

@ryanplusplus
Copy link
Member

@IanWhitney is right, a PR is the best way to further discuss these changes. Please see #257.

robphoenix pushed a commit to robphoenix/exercism-go that referenced this pull request Jan 17, 2017
updated tests for custom-set exercism#328

In the process of updating the tests I ended up simplifying(hopefully)
the tests cases. I removed all the extra methods and functions including
String which fundamentally changed the tests. I've also added a stub
with only the Set type in it, so that the exercise focuses on the set
operations rather than defining a set type. This also means that the set
type is agreed up front and so there's no need to accomodate all
possible types that people could come up with for a set by defining it
with something like a string method.

Also relevant for reference:
exercism/problem-specifications#232
exercism/problem-specifications#257
robphoenix pushed a commit to robphoenix/exercism-go that referenced this pull request Jan 17, 2017
updated tests for custom-set exercism#328

In the process of updating the tests I ended up simplifying(hopefully)
the tests cases. I removed all the extra methods and functions including
String which fundamentally changed the tests. I've also added a stub
with only the Set type in it, so that the exercise focuses on the set
operations rather than how to define the set type. This also means that
the set type is agreed up front and so there's no need to accomodate all
possible types that people could come up with for a set by defining it
with something like a string method.

Also relevant for reference:
exercism/problem-specifications#232
exercism/problem-specifications#257
robphoenix pushed a commit to robphoenix/exercism-go that referenced this pull request Jan 20, 2017
robphoenix pushed a commit to robphoenix/exercism-go that referenced this pull request Jan 20, 2017
robphoenix pushed a commit to robphoenix/exercism-go that referenced this pull request Jan 20, 2017
robphoenix pushed a commit to robphoenix/exercism-go that referenced this pull request Jan 20, 2017
robphoenix pushed a commit to robphoenix/exercism-go that referenced this pull request Jan 21, 2017
robphoenix pushed a commit to robphoenix/exercism-go that referenced this pull request Jan 21, 2017
emcoding pushed a commit that referenced this pull request Nov 19, 2018
Removed of_rna transcription logic and tests for rna_transcription exercise
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

10 participants