Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow the autograder to assign partial credit #974

Closed
jhamrick opened this issue Jun 2, 2018 · 17 comments · Fixed by #1090
Closed

Allow the autograder to assign partial credit #974

jhamrick opened this issue Jun 2, 2018 · 17 comments · Fixed by #1090
Labels
Milestone

Comments

@jhamrick
Copy link
Member

@jhamrick jhamrick commented Jun 2, 2018

I could have sworn there was already a discussion about this in an issue somewhere, but I wasn't able to find it...

I think this might be possible to do if a test cell returns a value between 0 and 1, which could then be interpreted by the autograder as the percent score achieved on that test. If the cell doesn't return anything, it is interpreted as full credit as it currently does, and if it raises an error, it is interpreted as zero credit.

@jhamrick jhamrick added the enhancement label Jun 2, 2018
@jhamrick jhamrick added this to the 0.6.0 milestone Jun 2, 2018
@ellisonbg

This comment has been minimized.

Copy link
Contributor

@ellisonbg ellisonbg commented Jul 18, 2018

Could also do this through a custom MIME type, but the simple float number would be a good starting point.

@lwasser

This comment has been minimized.

Copy link

@lwasser lwasser commented Mar 18, 2019

hey @jhamrick we are interested in potentially working on this partial credit item (pinging @kcranston) . would you be open to a PR with this functionality down the road? and if so, can you tell us a bit about your preferred workflow for pr's ??

@kcranston

This comment has been minimized.

Copy link
Contributor

@kcranston kcranston commented May 8, 2019

As I think about this more, I think it is less error-prone for the cell to return the number of points, rather than the fraction of points, i.e. return a number greater than or equal to 0 rather than a number between 0 and 1. The reason is that returning the fraction requires that the instructor include the maximum number of points in two places - in the cell metadata and in the cell body (to be able to output points / max_points from the tests). This opens the possibility of mismatch between the two max_points, and it would be challenging to test for this case. For returning a total points, we can simply validate that points <= max_points.

Alternately, this suggestion on the Jupyter gitter channel to enable a JSON output would allow us to customize (returning points, error messages, traceback, etc), and incorporate into the feedback report.

Thoughts?

@jhamrick

This comment has been minimized.

Copy link
Member Author

@jhamrick jhamrick commented May 12, 2019

Yeah, the issue with having the points be mismatched is I think why I was originally thinking about having it be between 0 and 1, but I see the annoyance in having to specify the maximum number of points twice, too.

I guess if the instructor is using the validate script to test their solutions, and we check for the number of points being less than the maximum, then it shouldn't be too much of a problem, so I would be fine with that.

I think having the cell return a JSON/structured output would indeed be much better than just a string output---if you want to try implementing it that way that would be awesome! (However, I'm not too familiar with how defining custom output types works, so I am not sure how much guidance I can give regarding that if you have questions there.)

@lwasser

This comment has been minimized.

Copy link

@lwasser lwasser commented May 13, 2019

i think support JSON output would be pretty great. it would allow us to build out some additional functionality including custom error messages potentially.

@psychemedia

This comment has been minimized.

Copy link

@psychemedia psychemedia commented Jun 11, 2019

One of the things I thing could be useful would be to be able to set an assignment question weighted eg four marks and then have separate assignment test cells following that score those 4 marks. So for example, the first tests cell might award two marks, the second and third test cells one mark each.

This adds complexity to writing the tests, but is what we do as manual markers.

Eg write a function to do X, (4 marks), might have 3 marks for the function and 1 discriminator mark for including a docstring. A manual marker may also be looking deeper into the function to allocate the 3 available marks across different criteria, a breakdown which it may or may be not be possible to write autograded tests against.

@kcranston

This comment has been minimized.

Copy link
Contributor

@kcranston kcranston commented Jun 11, 2019

@psychemedia - unless I am misunderstanding your proposal, you can do this already in nbgrader. You can have as many test cells as you want for a given assignment question.

Ah, I think what you are asking for is that the points are assigned to the question, requiring some way of linking which test cells go with which question cells.

@psychemedia

This comment has been minimized.

Copy link

@psychemedia psychemedia commented Jun 11, 2019

@kcranston Yes... I want to set one question, have N test cells, with their marks aggregated back and awarded to the single question (i.e the last question to be seen above the test cells).

This might also allow me to test and generate feedback on different elements of the student answer.

@psychemedia

This comment has been minimized.

Copy link

@psychemedia psychemedia commented Jun 11, 2019

Another reading (which could work for equally weighted components), would be to have a single test cell awarding N marks, with eg N test statements in it, and then award 1 mark for each test / line successfully passed within the cell. But I'm not sure how you'd run that cell to completion eg if the first step failed? It would also be clunky / hacky if you wanted to have unequally weighted tests unless you eg duplicated the same test to double weight it etc etc...

@jhamrick

This comment has been minimized.

Copy link
Member Author

@jhamrick jhamrick commented Jun 16, 2019

With #1090 you should be able to do this, e.g.

score = 0
try:
    # execute test 1
except:
    pass
else:
    score += 1

try:
    # execute test 2
except:
    pass
else:
    score += 3

...

score
@psychemedia

This comment has been minimized.

Copy link

@psychemedia psychemedia commented Jul 22, 2019

That partial score recipe looks handy...

I started sketching out a (very crude) recipe for handling feedback; it's very literal at the moment, though!

For example, a chart scorer and feedbacker:

# target chart is plotted from a Series, which is correctly defined as: ___plot_df
# chart to be marked should be assigned to: ax

from plotchecker import BarPlotChecker

def ___html_display(msg):
    display(HTML(msg))
    
pc = BarPlotChecker(ax)

___handled=set()
___goals={'title':{'hit':'Good, you added a title to the chart.',
                    'miss':'You need to add a title to the chart.'},
          'xticklabels':{'hit':'Good, you correctly ordered the items on the x-axis.',
                         'miss':'You need to correctly order the items on the x-axis by applying .sort_values(ascending=False) to the Series you are plotting from.'}}

___hit_msg = '<div class="alert alert-success">{}</div>'
___miss_msg = '<div class="alert alert-danger">{}</div>'

___score = 0

try:
    pc.assert_xticklabels_equal(___plot_df.sort_values(ascending=False).index.values.tolist())
except:
    pass
else:
    ___handled.add('xticklabels')
    ___score += 1

try:
    pc.assert_title_exists()
except:
    pass
else:
    ___handled.add('title')
    ___score += 1

for ___goal in ___handled:
    ___html_display(___hit_msg.format(___goals[___goal]['hit']))
for ___goal in set(___goals.keys()).difference(___handled):
    ___html_display(___miss_msg.format(___goals[___goal]['miss']))

___score
@lwasser

This comment has been minimized.

Copy link

@lwasser lwasser commented Jul 22, 2019

@psychemedia THIS IS AWESOME. cc'ing @kcranston on this... she's implemented an example for how this could work in our autograding notebook here. note that we've utilized a lot of the awesome work @jhamrick did with plotchecker and created matplotcheck which also handles spatial plots and images but a very similar approach.

we should about eventually adding some wrappers for autograding into matplotcheck once we are ALL happy with the implementation. very happy to get feedback from you on this!!

@psychemedia

This comment has been minimized.

Copy link

@psychemedia psychemedia commented Jul 22, 2019

@lwasser Would be happy to bounce ideas around on this; I'm just starting to explore feedback bits and bobs in notebooks, esp. seeing how nbgrader can be co-opted to provide feedback.

The partial-grade PR mentioned the possibility of using JSON objects to pass partial grade and feedback options, so that's in part the space I'm trying to explore towards...

I also want to try some (crappy!) spacy based sentence similarity grades to try to get a feel for how short answer marking might work too / whether a similar feedback model would work.

I also wonder whether a bank of standard feedback sentences, eg for charts, would be possible, at least for a basic chart feedback engine.

I also can't help feeling that the components of a chart feedback engine (eg some of the internals of the plotgrader tool) may not be a million miles from a chart accessibility tools such as longdesc generators? eg notes on generating accessible text descriptions.

@kcranston

This comment has been minimized.

Copy link
Contributor

@kcranston kcranston commented Jul 23, 2019

@psychemedia - thanks for this! Will be super helpful in getting the feedback into the html report (rather than simply hijacking stdoud / stderr, which we are doing right now).

@kcranston

This comment has been minimized.

Copy link
Contributor

@kcranston kcranston commented Jul 23, 2019

In general, I think it would be more robust for partial grade cells (or all autograde cells?) to return JSON. Then we can be explicit about points and messages rather than the current partial-grade PR, which is a somewhat unsatisfying set of if / else checks.

@psychemedia

This comment has been minimized.

Copy link

@psychemedia psychemedia commented Jul 23, 2019

If I iterate again, I think I'll try to package up something into a structure that can be iterated around rather than having a long sequence of try: except: blocks, one for each test.

I'm also mindful that many folk wouldn't want to have to try to build the test structure themself but would probably rather fill in a form that generates a structure that can then be used as part of the test automation.

Re: the possible returned items from the partial scorer, when I sketched out a standalone / Thebelab / binder powered assessment completion thing (here), it also struck me that it would be nice to be able to package student answers in a JSON file that could then be handled by an autograder?

@lwasser

This comment has been minimized.

Copy link

@lwasser lwasser commented Aug 21, 2019

thank you @jhamrick we are so appreciative of this being merged :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants
You can’t perform that action at this time.