Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ISSUE-600: data-shell/data/animals.txt is truncated #722

Closed
wants to merge 2 commits into from

Conversation

agrimstrup
Copy link

Added additional records to bring the animals.txt file up to 586 lines as mentioned in the text. Added ellipses to the file listing to indicate more data exists in the file than listed.

Before the change, checked the output of the example
[arne@localhost data]$ cat animals.txt | head -n 5 | tail -n 3 | sort -r
2012-11-06,rabbit
2012-11-06,deer
2012-11-05,raccoon

Added additional records and checked the distribution of values
[arne@localhost data]$ wc -l animals.txt
586 animals.txt

[arne@localhost data]$ cat animals.txt | sort | uniq -c
43 2012-11-05,bear
35 2012-11-05,deer
44 2012-11-05,fox
44 2012-11-05,rabbit
45 2012-11-05,raccoon
42 2012-11-06,bear
37 2012-11-06,deer
44 2012-11-06,fox
39 2012-11-06,rabbit
35 2012-11-06,raccoon
36 2012-11-07,bear
32 2012-11-07,deer
40 2012-11-07,fox
36 2012-11-07,rabbit
34 2012-11-07,raccoon

Tested the example pipeline to ensure no changes resulted
[arne@localhost data]$ cat animals.txt | head -n 5 | tail -n 3 | sort -r
2012-11-06,rabbit
2012-11-06,deer
2012-11-05,raccoon


…to 586 lines as mentioned in the text. Added ellipses to the file listing to indicate more data exists in the file than listed.
@gcapes
Copy link
Contributor

gcapes commented Mar 5, 2018

If this file is to be updated, the zip file for learners to download also needs updating.

@colinmorris
Copy link
Contributor

This looks good to me, though I'd like to get a +1 from another maintainer before merging this in case there's something I'm missing. @shwina , any thoughts?

@colinmorris colinmorris requested a review from shwina March 5, 2018 22:20
@agrimstrup
Copy link
Author

@gcapes I didn't see the zip file in the first pass. An updated version of the zip file has been added to the PR.

@gcapes
Copy link
Contributor

gcapes commented Mar 6, 2018

Sorry to be late with my thoughts, but is there actually a need to have a larger data set? If the file works for the examples, then it seems fit for purpose.
The suggestion on #600 and #720 to indicate this is a subset of a larger data set seems like it might be a good light-touch fix?

@gdevenyi
Copy link
Contributor

gdevenyi commented Mar 7, 2018

I agree that there's no need for a "full" dataset. The truncation can serve fine as an example of a subset.

@colinmorris
Copy link
Contributor

I'm personally fine with either fix (increasing the file size to match the existing lesson text, or changing the lesson text to match the existing file size). However the former has the advantage that we have a pull request that implements it (right here), and the latter is hypothetical.

Is there a downside to merging this? It seems like a strict improvement over the current state of the lesson.

@shwina
Copy link

shwina commented Apr 4, 2018

+1 from me

@gcapes
Copy link
Contributor

gcapes commented Jan 16, 2019

Thanks for the PR. This inconsistency looks to have been fixed now.

Thanks again for your contribution.

@gcapes gcapes closed this Jan 16, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants