Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Exploring Data Frames: order of objectives #356
I would like to propose rearranging the order of the material in Exploring Data Frames. I think finding the basic properties of the data frames should come before the rest of the objectives. It is important to review the properties of your data frame before any manipulations take place. Therefore, I think learning str(), head(), tail(), dim(), etc., should come at the beginning of the lesson, followed by the data manipulation (adding/removing rows, changing from factor to character, etc.).
changed the title from
Exploring Data Frames: order of objectives/material within lesson
Exploring Data Frames: order of objectives
Mar 14, 2018
I think it's a good point, but it would mean we'd jump from cats (at the end of the previous episode) to gapminder (for properties of data frames) and then back to cats (for data manipulation). Personally I'm not keen on switching between data-sets if it can be avoided, as I think it can cause confusion.
We couldn't use the cats data for properties of data-frames, since it's too small for
(though it would mean the first use of / loading gapminder would need to go in another episode - or perhaps load it at the end of this episode use it for some properties of data-frames exercises? (and if necessary mention
Is there a reason why we couldn't switch out the cats data to be a subset of the gapminder data? 3 or 4 rows, 3 or 4 columns of data from spanning a few countries and maybe two continents? Then we could have the proper ordering and no loss of cohesion. I have felt that the cats are a little bit random amidst the rest of it.
I agree that switching between data sets would be confusing (didn't think of that when I first proposed the change), but Naupaka brings up a good solution with using a subset of the gapminder data. It could definitely allow for more fluidity as we move from one episode to the next.
I am new to the lesson material, and have only seen it in action once, so I acknowledge that this might not be worth the effort of changing. However, I did notice a lot of people taking the course enjoy working with the gapminder data set more than cats, because it has more real-world application.
As mentioned above: when dropping the cats data in this section ('Exploring Data Frames'),
Still, it is nice for didactic reasons to have a small example data set entered manually and used to explain various aspects of/operations on data structures.
A nice choice might be:
[With population size in millions. Apart from rounding these data correspond to the gapminder data]
This minimalistic set of countries is interesting because of contrasting population
Later on, a vector with the continents could be added, country could be changed to character,
Data for the same variables, but in a different year could be loaded from a file
After this it woud be natural to add a variable with years to both data frames, using
In the section ('Exploring Data Frames'), the step from the small data set to the gapminder
One could state that this type of demographic and economic data is in fact already available,
@emielvanloon I like this approach. It seems much more conceptually coherent to work with the same dataset throughout if at all possible. And it seems like this change wouldn't require too much reworking of the existing content--it's just swapping out the data and changing the wording to reflect it.