# Modeling Entities Reading

### Introduction

In this lesson, we'll put together modeling of our entities.  Let's go.  

### Reviewing our Data

We can start by taking another look at our remaining foursquare data.

In [3]:
bookstore = {'categories': [{
   'name': 'Bookstore',
   'pluralName': 'Bookstores'}],
 'id': '513a18937e2793d197a900d5',
 'name': 'Barnes & Noble'}

Looking at the information above, we can identify two main entities.  A venue, or a specific store, and the categories this venue relates to.  Each of these should be separate tables.  After all, if we did not separate them, we would quickly see repetition in our data.

<img src="./venue-category.png" width="60%">

> Notice above that `bookstore` appears twice.

So instead we need to build a separate categories table. 

<img src="./categories-table.png" width="40%">

Now the relationship between venues and categories is a many to many relationship.  After all, a category like bookstore would have many venues associated with it, and a venue may have many categories.  

The many to many relationship means that we cannot place a foreign key on either the categories or the venues table, but rather must create a third, join table -- called `venues_categories`. 

<img src="./venue-categories.png" width="100%">

So above, we associate barnes and noble with categories through the `venues_categories` table.  We can see that barnes and noble has categories of both bookstore and coffee.  While borders only has the bookstore category.

### Connecting to our location data

Now we can connect our venue data to the location by adding a `location_id` column to venue.  

> So here we would say that a venue `has_one` location, and also a location `has_one` venue.  We only have to place the foreign key on one of the tables to make the association.  We place it on venue_id because we want to tie a venue to a particular location.

<img src="./foursquare-modeling.png" width="60%">

If a location has one venue, and a location `has_one` venue, why not just combine the two tables.  Engineers may model differently, but to me, when we have three or four columns that go together, it justifies grouping those columns in a separate table.  This keeps our tables more small and understandable.

So the above is a good first pass at modeling our data.

### Bonus: Another table?

<img src="./venue-categories.png" width="100%">

Now if you look at the venue data above, can you see any other opportunity for repetition?  Well think about if there are multiple barnes and nobles.  We probably would like to reduce this repetition (and perhaps a way to track identify all of the barnes and nobles).  

We could fix this by creating a `chains` table.  

And thinking through the relationships, we would say that a chain has many restaurants and a restaurant has one chain.  

> For those restaurants that do not have a chain, we can have an `chains.id` that points to the value n/a (not applicable).

Performing this modeling, it looks like the following.

<img src="./venues-chains.png" width="60%">

> With the modeling above, we still may have the name under venues repeated multiple times.  But the important part is that we identified this repetition as indicating a missing entity in our data modeling -- chain.