New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ISSUE-600: data-shell/data/animals.txt is truncated #722

Open
wants to merge 2 commits into
base: gh-pages
from

Conversation

Projects
None yet
5 participants
@agrimstrup

agrimstrup commented Mar 3, 2018

Added additional records to bring the animals.txt file up to 586 lines as mentioned in the text. Added ellipses to the file listing to indicate more data exists in the file than listed.

Before the change, checked the output of the example
[arne@localhost data]$ cat animals.txt | head -n 5 | tail -n 3 | sort -r
2012-11-06,rabbit
2012-11-06,deer
2012-11-05,raccoon

Added additional records and checked the distribution of values
[arne@localhost data]$ wc -l animals.txt
586 animals.txt

[arne@localhost data]$ cat animals.txt | sort | uniq -c
43 2012-11-05,bear
35 2012-11-05,deer
44 2012-11-05,fox
44 2012-11-05,rabbit
45 2012-11-05,raccoon
42 2012-11-06,bear
37 2012-11-06,deer
44 2012-11-06,fox
39 2012-11-06,rabbit
35 2012-11-06,raccoon
36 2012-11-07,bear
32 2012-11-07,deer
40 2012-11-07,fox
36 2012-11-07,rabbit
34 2012-11-07,raccoon

Tested the example pipeline to ensure no changes resulted
[arne@localhost data]$ cat animals.txt | head -n 5 | tail -n 3 | sort -r
2012-11-06,rabbit
2012-11-06,deer
2012-11-05,raccoon


agrimstrup agrimstrup
ISSUE-600: Added additional records to bring the animals.txt file up …
…to 586 lines as mentioned in the text. Added ellipses to the file listing to indicate more data exists in the file than listed.
@gcapes

This comment has been minimized.

Contributor

gcapes commented Mar 5, 2018

If this file is to be updated, the zip file for learners to download also needs updating.

@colinmorris

This comment has been minimized.

Contributor

colinmorris commented Mar 5, 2018

This looks good to me, though I'd like to get a +1 from another maintainer before merging this in case there's something I'm missing. @shwina , any thoughts?

@colinmorris colinmorris requested a review from shwina Mar 5, 2018

@agrimstrup

This comment has been minimized.

agrimstrup commented Mar 5, 2018

@gcapes I didn't see the zip file in the first pass. An updated version of the zip file has been added to the PR.

@gcapes

This comment has been minimized.

Contributor

gcapes commented Mar 6, 2018

Sorry to be late with my thoughts, but is there actually a need to have a larger data set? If the file works for the examples, then it seems fit for purpose.
The suggestion on #600 and #720 to indicate this is a subset of a larger data set seems like it might be a good light-touch fix?

@gdevenyi

This comment has been minimized.

Contributor

gdevenyi commented Mar 7, 2018

I agree that there's no need for a "full" dataset. The truncation can serve fine as an example of a subset.

@colinmorris

This comment has been minimized.

Contributor

colinmorris commented Mar 19, 2018

I'm personally fine with either fix (increasing the file size to match the existing lesson text, or changing the lesson text to match the existing file size). However the former has the advantage that we have a pull request that implements it (right here), and the latter is hypothetical.

Is there a downside to merging this? It seems like a strict improvement over the current state of the lesson.

@shwina

This comment has been minimized.

Member

shwina commented Apr 4, 2018

+1 from me

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment