Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explain NULLs when teaching count #324

Open
wants to merge 2 commits into
base: gh-pages
Choose a base branch
from

Conversation

chrisroat
Copy link

@chrisroat chrisroat commented Sep 14, 2020

The removes a small oversight in the current explanation, that columns with NULL values are not counted. It also teases the fuller explanation of NULL and aggregation that comes later.

Closes #305

The removes a small oversight in the current explanation, that columns with NULL values are not counted.  It also teases the fuller explanation of NULL and aggregation that comes later.
@henrykironde
Copy link
Contributor

henrykironde commented Sep 15, 2020

@r4space @remram44 any input on this?

@remram44
Copy link
Contributor

remram44 commented Sep 15, 2020

"how many values there are" is still a bit ambiguous, it could be read as "how many distinct values" (count(distinct reading)).

- counts rows, not values
- be more specific about the NULL in the person column of the dataset
@chrisroat
Copy link
Author

chrisroat commented Sep 15, 2020

Good point -- changed it to rows. Also better explained (I think) the NULL in the person column.

@chrisroat
Copy link
Author

chrisroat commented Oct 3, 2020

Any additional suggestions?

1 similar comment
@chrisroat
Copy link
Author

chrisroat commented Dec 25, 2020

Any additional suggestions?

@chrisroat
Copy link
Author

chrisroat commented Dec 31, 2021

If this change is not desired, I can drop this pull request.

@PaulHancock
Copy link

PaulHancock commented Jul 21, 2022

I would suggest noting that all these aggregation functions ignore null values in their computations (otherwise avg, sum, would result in null).
For the most part we get exactly what we intend when running these functions, with the exception of count.

The count function will count the number of non-null entries when run on a column, but will count the number of rows when run with count(*) (even if the rows contain nulls).

So count(colname) should be used when you want to know the number of non-null entries, where as count(*) should be used to count the number of rows total.

The current text:

We used count(reading) here, but we could just as easily have counted quant or any other field in the table, or even used count(*), since the function doesn’t care about the values themselves, just how many values there are.

Is therefore not correct, and should be updated.

I think the changes in 4e1bcb5, and 7fff12b should be accepted.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Ep. 6: Explanation of count() could be improved.
4 participants