New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nods towards normalisation in hygene #181

Open
Denubis opened this Issue Jul 21, 2017 · 0 comments

Comments

Projects
None yet
1 participant
@Denubis
Contributor

Denubis commented Jul 21, 2017

I previously had this in a push request, but as we've been most firmly instructed to not add content, I'll add it here as an issue instead.

I think it should be here because otherwise the third rule is a negative prohibition that doesn't provide useful hints as to how. By including a link to wikipedia they'll be able to explore the terms on their own time and a paragraph on the topic, we introduce a bit more structure than "split up tables."

I'm not sure you should keep this (again, not a push request per se) but I think the topic should be re-framed from the current third point.

In lesson 8, I added (to the present paragraph, at first):

In fact,
we could use a single table that recorded all the information about each reading in each row,
just as a spreadsheet would.
The problem is that it's very hard to keep data organized this way consistent:
if we realize that the date of a particular visit to a particular site is wrong,
we have to change multiple records in the database.
What's worse,
we may have to guess which records to change,
since other sites may also have been visited on that date.

Because we can join tables, we can make individual tables about one and only one thing, so as to avoid problems. The most important thing in a database is that each "fact" about the things we are recording is recorded once and only once. If it isn't, then the database doesn't know which value is the "correct" value. By splitting up tables into topics that relate to one and only one thing, we reduce problems and increase flexibility. (Consider, some databases only allow us to record a single phone number. What happens when we want to also be called at our mobile number?)

There are a few rules of thumb that we can follow to keep a database consistent:

  • Every reading in each row should have at most one value. Otherwise we need to teach the database how to split up individual cells, which makes it slow and complex.
  • Don't make duplicate rows or columns. If we find ourselves saying something like "phone 1" or "phone 2" -- that's an excellent sign that we need a new table. Remember, we can refer to the primary key of the parent table with a join when we want to access this data.
  • The truth primary key, the full primary key, and nothing but the primary key should tell us about the other attributes. If I'm looking for where a site is, I shouldn't need to know about when I visited.

rgaiacs added a commit to rgaiacs/swc-sql-novice-survey that referenced this issue Apr 17, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment