# Choosing How to Representing Data

_Today we'll revisit how redundant storage of data can improve the efficiency of particular queries. We'll introduce a technique to ensure consistency among the redundant data. We'll then take advantage of our repetitive data to eliminate the need for some storage._

## A Meeting Calendar


The data we'll be working with is a calendar of meetings. Meetings have a title, a date, and a set of invited people. Our users want to add meetings to their calendar with an add_meeting function. Let's keep a dictionary keyed by the meeting dates, so that we can query for the meetings on a particular day.

In [1]:
class Meeting():
    def __init__(self, title, date, invitees):
        self.title = title
        self.date = date
        self.invitees = invitees 
        
        # store the other dates on which this meeting occurs
        self.recurrences = set() 
            
    ########### ignore this method ###########
    def pprint(self, num_people_to_show=5):
        invitees = ""
        personi = 0
        for person in self.invitees:
            if personi <= num_people_to_show:
                invitees = invitees + person + ", "
                personi = personi + 1
            else:
                invitees = invitees[:-2] + "....."
                break
        result = f"{self.title} with {invitees[:-2]}"
        
        # add recurrences to printout
        if len(self.recurrences) != 0 : 
            recurrs = "; recurring on "
            for date in self.recurrences:
                recurrs = recurrs + date + ", "
            recurrs = recurrs[:-2]
            result = result + recurrs
        
        print(result)
        
        
class Calendar():
    def __init__(self):
        self.dates = {}
        self.people = {}
        
        self._checkRep()

    def _isrecurring(self, new_meeting):
        """Checks whether this meeting already
        exists with the same title and invitees.
        """
        
        for meetings in self.dates.values():
            for meeting in meetings:
                if meeting.title == new_meeting.title and \
                    meeting.invitees == new_meeting.invitees:
                        return meeting
        return False
        
    def add_meeting(self, meeting):
        
        recurring_meeting = self._isrecurring(meeting)
            
        if not recurring_meeting: 
            # Implementation before support for recurring meetings:
            if not meeting.date in self.dates:
                self.dates[meeting.date] = [meeting]
            else: 
                self.dates[meeting.date].append(meeting)

            for invitee in meeting.invitees:
                if not invitee in self.people:
                    self.people[invitee] = [meeting]
                else: 
                    self.people[invitee].append(meeting)  
        else:
            # the recurring case
            recurring_meeting.recurrences.add(meeting.date)             
            
        # uncomment this line and our checkRep breaks---this meeting is only added to the people dictionary:
        # self.people["Adam"].append(Meeting("Meeting I forgot about!", "04-04-19", {"Adam"}))

        self._checkRep()

    def get_persons_meetings(self, person):
        if person in self.people:
            return self.people[person]
        else:
            raise LookupError("This person has no meetings.")
            
        # First solution with looping needed, when we didn't store the people dictionary:
        # results = []
        # for meetings in self.dates.values():
        #     for meeting in meetings:
        #         if person in meeting.invitees:
        #             results.append(meeting)
        # return results


    def _checkRep(self):
        """Ensures that the internal representations of the meeting data
        are consistent; checks that all meetings in the people dictionary
        are also in the date dictionary, and vice versa. If not, raises 
        an AssertionError. """
        
        for dated_meetings in self.dates.values():
            for meeting in dated_meetings:
                for person in meeting.invitees:
                    assert meeting in self.people[person]
                
        for personed_meetings in self.people.values():
            for meeting in personed_meetings:
                assert meeting in self.dates[meeting.date]
                

    ############## ignore this method ##############
    def pprint(self):
        for date, meetings in self.dates.items():
            print(date + ": ") 
            for meeting in meetings:
                meeting.pprint()
            print()
        self._checkRep()

## Add a Bunch of Meetings

Let's practice adding meetings to our calendar.

In [2]:
calendar = Calendar()

In [3]:
meeting_to_add = Meeting("Lecture 9", "04-01-19", {"Srini"})
calendar.add_meeting(meeting_to_add)

In [4]:
calendar.pprint()

04-01-19: 
Lecture 9 with Srini



Let's add a bunch more. (Don't get caught up in these details--lots of code just to add fun example meetings...)

In [5]:
instructors = {"Adam", "Srini"}
tas = {"Maryam", "Aron", "Jeff", "Zack", "Zach", "Oscar", "Elijah", "Kentaro", "Kevin", 
            "Yanni", "Heather", "Cavin", "Joseph", "Rami", "Valerie", "Subby", "Jisoo"}

tutorial_team = {"Valerie", "Kevin"}
mon_staff = {"Yanni", "Cavin", "Kentaro", "Subby"}
tues_staff = {"Subby", "Jisoo", "Valerie", "Kevin"}

In [6]:
calendar.add_meeting(Meeting("Tutorial Prep", "04-01-19", tutorial_team | instructors))
calendar.add_meeting(Meeting("Tutorial", "04-03-19", tutorial_team))

calendar.add_meeting(Meeting("Quiz Planning", "04-05-19", instructors))
calendar.add_meeting(Meeting("Staff Meeting", "04-08-19", tas | instructors))

calendar.add_meeting(Meeting("Monday OHs", "04-08-19", mon_staff))
calendar.add_meeting(Meeting("Tuesday OHs", "04-09-19", tues_staff))
calendar.add_meeting(Meeting("Monday OHs", "04-15-19", mon_staff))
calendar.add_meeting(Meeting("Tuesday OHs", "04-16-19", tues_staff))
calendar.add_meeting(Meeting("Monday OHs", "04-22-19", mon_staff))
calendar.add_meeting(Meeting("Tuesday OHs", "04-23-19", tues_staff))
calendar.add_meeting(Meeting("Monday OHs", "04-29-19", mon_staff))
calendar.add_meeting(Meeting("Tuesday OHs", "04-30-19", tues_staff))


In [11]:
calendar.pprint()

04-01-19: 
Lecture 9 with Srini
Tutorial Prep with Kevin, Adam, Valerie, Srini

04-03-19: 
Tutorial with Kevin, Valerie

04-05-19: 
Quiz Planning with Adam, Srini

04-08-19: 
Staff Meeting with Oscar, Maryam, Heather, Zack, Kentaro, Rami...
Monday OHs with Subby, Yanni, Kentaro, Cavin; recurring on 04-15-19, 04-29-19, 04-22-19

04-09-19: 
Tuesday OHs with Kevin, Subby, Valerie, Jisoo; recurring on 04-30-19, 04-16-19, 04-23-19



## Adam wants to know his Schedule

In [9]:
adams_meetings = calendar.get_persons_meetings("Adam")

for mtg in adams_meetings:
    mtg.pprint()

Tutorial Prep with Kevin, Adam, Valerie, Srini
Quiz Planning with Adam, Srini
Staff Meeting with Oscar, Maryam, Heather, Zack, Kentaro, Rami...


Check out the original get_persons_meetings solution. There's a lot of looping! On large calendars in which any individual is invited to only a small subset of meetings, this is very inefficient. Let's trade memory use for greater performance on these queries by also storing another dictionary, keyed by people. 

Now check out the one-line solution to get_persons_meetings, after implementing support for the people dictionaries! The dictionary access is almost instant. You might ask whether we just moved the inefficiency to the addMeeting function, though. Not necessarily, since addMeeting occurs just once for each meeting, but we may query for a particular person's meetings many times! This echoes the value of preprocessing which we've seen throughout the class. 

In [10]:
adams_meetings = calendar.get_persons_meetings("Adam")

for mtg in adams_meetings:
    mtg.pprint()

Tutorial Prep with Kevin, Adam, Valerie, Srini
Quiz Planning with Adam, Srini
Staff Meeting with Oscar, Maryam, Heather, Zack, Kentaro, Rami...


Now we can tell Adam his meetings much more quickly!

## Ensuring Consistency

We now have two different representations of our meetings. How can we make sure that they remain internally consistent? Let's make a function to check the representation, checkRep. 

Note the use of the underscore at the beginning of the method name. This is a python convention for functions which aren't meant to be accessed outside of the class. We don't expect clients making Calendars to need to check their internal representation, so we keep this method private to the implementers.

Try breaking the consistency between the two internal data structures yourself, and see that checkRep raises an error. (Try uncommenting the line in addMeeting, for example.)

_Our implementation of checkRep, of course, is quite inefficient. Perhaps we'd nix it after testing our Calendar, in favor of efficiency._

## Reducing Redundancy

So far, we decided to introduced lots of redundancy by storing meetings twice: once keyed by the dates, and once keyed by the invitee. We ensured that these two representations were consistent using a checkRep function. Now, let's try to win back a bit of storage space by not storing redundant recurring meeting information.

We could just add a method like add_recurring_meeting, but we already have Calendar users depending on the addMeeting function which we gave them in our specifications! So, within add_meeting, let's detect whether the meeting is a repeat.

Check out the new add_meeting function. In return for some implementation complexity, we now just store a set of the recurring dates of a meeting rather than the entire repeated Meeting objects!

## The Takeaways!

- Redundant storage can sometimes improve performance
- Redundant storage introduces the risk of internal inconsistency and weird behavior
    - One solution is to use a checkRep function to fail fast whenever inconsistencies arise
- Sometimes data has inherent redundancies which can be exploited to reduce storage space
    - Here, we saw this was a tradeoff with implementation complexity
    
You'll see these ideas in lab 6 and beyond!