Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduction Factors #40

Closed
wants to merge 25 commits into from
Closed

Introduction Factors #40

wants to merge 25 commits into from

Conversation

tomwright01
Copy link
Contributor

This PR introduces the concept of factors. This involved the use of a new example dataset Sites-.csv with variables suitable for interpretation as factors.
There is also more information on the different ways of addressing/slicing a dataframe, particularly logical indexing is introduced as this is very common.
All changes are to the Rmd files (except Sites-*.csv), the other files were generated by make.
This is my first sizable merge so apologies if I'm doing this wrong.

…o output/ directory to avoid collision with lesson 02
…on into gh-pages

Conflicts:
	00-first-timers.md
	01-starting-with-data.Rmd
	01-starting-with-data.html
	01-starting-with-data.md
	01-supp-read-write-csv.md
	02-func-R.Rmd
	02-func-R.html
	02-func-R.md
	03-loops-R.html
	03-loops-R.md
	03-supp-loops-in-depth.html
	03-supp-loops-in-depth.md
	04-cond-colors-R.html
	04-cond-colors-R.md
	04-cond.Rmd
	04-cond.html
	04-cond.md
	05-testing-R.md
	06-best-practices-R.md
	06-cmdline.html
	06-cmdline.md
	07-knitr-R.md
	08-making-packages-R.md
	fig/01-starting-with-data-plot-avg-inflammation-1.png
	fig/01-starting-with-data-plot-max-inflammation-1.png
	fig/01-starting-with-data-plot-min-inflammation-1.png
	fig/02-func-R-rescale-test-1.png
	fig/02-func-R-rescale-test-2.png
	fig/03-loops-R-loop-analyze-4.png
	fig/03-loops-R-loop-analyze-5.png
	fig/03-loops-R-loop-analyze-6.png
	fig/03-loops-R-loop-analyze-7.png
	fig/03-loops-R-loop-analyze-8.png
	fig/03-loops-R-loop-analyze-9.png
	guide.md
@jdblischak
Copy link
Contributor

Thanks for this PR, @tomwright01. It is quite an ambitious PR. You both change the data set and add extra sections. It may be different enough that we want to just create a new lesson set. Could you please provide a more detailed description of what you changed/added and your motivation for doing so?

@gvwilson
Copy link
Contributor

@jdblischak @tomwright01 What's the consensus - put this into one lesson, or have a separate lesson?

@jdblischak
Copy link
Contributor

We still haven't decided.

Here's some more description from Tom on the r-discuss mailing list (post):

Basically I wanted to introduce factors, to do this I created a new
sample dataset with things like treatment group, gender and subject id.
I've also added more information on addressing data (indexing (slicing),
logical indexing and the $ operator). I used the datacarpentry material
on factors.

One key problem I found was converting the analyze() function that is
used extensively in the second half of the material. As it stood this
function uses the apply function extensively on the entire dataset. I
found modifying the functions to deal with just the numeric values
clumsy dat[,6:9]. My approach was to keep the apply function in the
manipulating data section and to change the analyze() function to use a
different plotting function.

@sritchie73
Copy link
Contributor

I quite like the change, but would be interested to see what feedback is from learners in the wild.

@tomwright01
Copy link
Contributor Author

Hi Scott,
There wasn't that much feedback from the wild learners that related
specifically to the new lesson format, probably because people have little
to compare it with. I guess the positive message here is there wasn't any
specific negative feedback!

Feedback more directly related to the r-session included:
"R-Studio was very useful"
"Making essential skills the focus"
"The introduction to R was too slow"

Most feedback related to a capstone linking R and SQL.
Feedback here was about 25% positive "liked linking different tools" and
75% negative "too fast".
e.g.:
"showing how to manage SQL in R was the most useful and applicable"
"Last afternoon session should be lighter because people get tired"

Speaking as the instructor I think the learners gained a pretty good grasp
of factors and dataframes however I did not cover as much as I wanted. Flow
control and loops were not given the time I think they deserve. Also the
lesson plan doesn't cover the different data types in as much detail as I
think it should. Next time I'd like to add more information on vectors,
matrices, data-frames and lists.

Perhaps the people assisting me in this lesson could add more insight?

On Mon, Feb 23, 2015 at 7:06 PM, Scott Ritchie notifications@github.com
wrote:

I quite like the change, but would be interested to see what feedback is
from learners in the wild.


Reply to this email directly or view it on GitHub
#40 (comment)
.

@sritchie73
Copy link
Contributor

Hi Tom,

That closely mirrors the feedback on a recent workshop I ran (using the materials at resbaz.github.io/r-novice-gapminder). We've had mixed results covering all the different data structures/types:

  • In the most recent workshop most learners found it too slow (and most of them seemed really bored), but the instructor went through the material quite thoroughly so novices could keep up.
  • In the December workshop, I went very quickly through the same material, trying to make it an information dump. The more intermediate learners found it enlightening, but the novices had no hope of keeping up.

We've been considering skipping that material as a result, but leaving it as reference material for more advanced attendees (although personally I think Hadley Wickham's advanced R manual does them better justice).

@tomwright01
Copy link
Contributor Author

Thanks for the warning, and the hint.
If I understand correctly the novice-gapminder is currently longer than a half day session. In my opinion a working introduction could be covered in just a few minutes. Personally I think that the material on environments (stacks / scopes) is too verbose and time could be saved here (it's also not entirely accurate for R as one of the learners pointed out).

@sritchie73
Copy link
Contributor

Yeah the materials I've written are designed to be run in a full-day (there's still too much material). That being said, we've (cc: @DamienIrving ) run several workshops in with the python-inflammation lessons, and they've taken us a full day to get through as well.

@naupaka
Copy link
Member

naupaka commented Feb 24, 2015

My initial impression of this PR is that there is a ton of great material in here, including many of the improvements we have been talking about making for a long time, but that it also is too much material for a half-day session (perhaps even too much for a full day), and some of it is a bit too in-depth (IMO) for novices who have never seen R before.

What are options for having this as exist on its own vs merged with the current versions? Or have a number of smaller pull requests that allow us to handle these additions/changes in chunks instead of all at once? What's the rule of thumb for the amount of content to be in or out of scope for novice SWC lessons?

I think we should keep a simple set of lessons for true novices, and then have all the other great material we want to teach as either intermediate lessons or as a different 'flavor' of novice lessons. Not sure what best-practices are w/ regards to the above thoughts and the new organizational structure of SWC lessons.

@tomwright01
Copy link
Contributor Author

I agree with @naupaka the true novice lesson should be simpler. Unfortunately I don't think this PR can be split into smaller chunks, changing the dataset touches a lot of the lessons.

After further experience with, and thought about, the r-novice and r-python lessons I think the scope of the lessons needs to be narrowed.
I feel the aim of the novice lesson should be to have a language agnostic lesson that covers variables, flow control and functions, printing stuff to screen, reading stuff from files, debugging, unit testing etc. I think the python-novice-inflammation lesson is nearly suitable as a template for this (except for the bit about integer division). Although I say language agnostic, obviously a language has to be used. Perhaps the lessons should be:

  • programming-novice-python
  • programming-novice-R
  • programming-novice-matlab

In this pull request I try to expand the scope of the lesson to also cover the strengths (vagaries) of R. As such it moves away from being language agnostic and should become a stand alone lesson. I suggest that we (I) try to restructure this lesson, removing the content that would be covered in the novice programming lesson. A novice workshop would then consist of:

  • programing-novice-
  • R-novice-

@jdblischak
Copy link
Contributor

Based on this discussion, my vote would be that these changes with a new data set would be best put in its own repository.

As for making the current r-novice-inflammation lessons easier, there was a recent PR merged into python-novice-inflammation that, among other things, moved the part about the call stack to a supplementary lesson. I had previously sent out a message to r-discuss to see if there was any interest in implementing these changes.

@tomwright01 tomwright01 mentioned this pull request Mar 9, 2015
@jdblischak
Copy link
Contributor

Closing. See PRs #52 and #57.

@jdblischak jdblischak closed this Mar 23, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants