Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify start of Data Structures episode #475

annakrystalli opened this issue Feb 17, 2019 · 0 comments


None yet
2 participants
Copy link

commented Feb 17, 2019

Hi! 馃憢

I recently taught some episodes from this lesson and wanted to share some feedback on the start of the Data Structures episode, which I found hard to navigate.

I appreciate and have seen that students want to get stuck in with recognisable data structures asap and therefore want to work with dataframes. So I can understand trying to entice them by showing a dataframe and getting them to feel comfortable reading and writing one in straight away.

However I feel this exercise, and the subsequent editing of the file, might introduce too much in one go, at a point where a lot of what's going on won't be touched on for some time. Admittedly, in the workshop I should have handled the direction of questions better which I'm sure would have helped, but I found it hard to tie up some of the confusion generated at this point.

Let me try and explain where I think some of the material may be setting up subtle confusion traps:

  • Reading in columns of data that were created as text throws students straight into factors and the eternally confusing stringAsFactors default of read.csv() without discussing any of this explicitly till much later.
  • Factors are seen again in cats$weight + cats$coat
  • Indexing and coercion also introduced in the first few lines of code.
  • The exercise to induce coercion in the dataframe I found challenging to both understand and implement:
  • A user has added details of another cat. This information is in the file data/feline-data_v2.csv.

    • it's not clear how this file is supposed to appear.
    • After some confused pondering, I decided I could just edit it live as a text file.
    • But this opened up the potential for typing errors corrupting the file, or saving over the original rather than saving as data/feline-data_v2.csv.
    • Additionally, the coercion leads us back to talking about factors without having formally explained them. At this point I admit I gave in to the confused faces and tried to explain factors, to then return back to data types. This was largely ineffective and produced sadly more confusion. 馃槩

Luckily we were able to pull it back in the next section by switching to the data carpentry lesson and soon a lot of the stuff we had talked about started falling into place. Personally, I think the R for Reproducible Scientific Analysis lesson is great. I just think that particular episode could be simplified to run a bit smoother. Here are my suggestions:


Overall my suggestion is to reverse the focus from deconstructing a dataframe to building a dataframe.

  • Move dataframe section to later. Atleast after we've introduced:
    • vectors
    • coercion
    • factors
  • Start with datatypes. The pace would flow well from the earlier first intriduction to R
  • Show coercion on vectors
    • start with simple coercions netween character, numeric and logical on vectors
    • discuss stringAsFactor and coercion to factors in dataframes

I'd like to reiterate that the effectiveness of the particular session I taught could have definitely been improved by better handling of questions and avoiding too much deviation for the materials. My suggestions here are to further reduce the opportunity for deviation.

Let me know what you think 馃槉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can鈥檛 perform that action at this time.