# Functions and relations in Clear Data

First, we import pandas and Clear Data.

In [3]:
# Because this is the development repo, we import in this ugly way, but
# if you've done pip install clear-data, these two steps are not needed.
import sys
sys.path.append( os.getcwd()+"/../src" )

# In your own code, do just this:
import pandas as pd
import clear_data

Now I create a tiny example DataFrame for use in testing, below.
Imagine we have a large database of businesses in a city, and this is a small
example of what that big database might be like.

In [4]:
df = pd.DataFrame( {
    "Name"       : [ "Annie's Pretzels", "Nathan's Hot Dogs", "Joe's Garage" ],
    "Address"    : [    "1 Example St.",   "2 Imaginary Dr.",    "3 Foo Ln." ],
    "#Employees" : [                 12,                   3,              5 ]
} )
df

Unnamed: 0,Name,Address,#Employees
0,Annie's Pretzels,1 Example St.,12
1,Nathan's Hot Dogs,2 Imaginary Dr.,3
2,Joe's Garage,3 Foo Ln.,5


## Are all the business names unique?

Old way:

In [5]:
len( df.Name.unique() ) == len( df.Name )

True

Clear Data way:

In [6]:
df.Name.has_no_duplicates()

True

## Can I use the business name column to look up number of employees?

(In other words, is the table a function from business name to number of employees, or might there be multiple copies of the same business, perhaps at different addresses, with different numbers of employees?)

Old way, which no one ever did, because of the hassle:

In [7]:
all( df.groupby( "Name" )["#Employees"].agg( lambda x: len(x.unique()) ) == 1 )

True

Clear Data way, much easier:

In [8]:
df.is_a_function( "Name", "#Employees" )

True

## Since I can look up number of employees for any business by name, how do I do so?

Old way:

(Although most people would just filter for the rows they want and then manually extract the answer from there, such an approach is not useful in programmatic code, such as a call to `.apply()`.)

In [9]:
get_num_emp = lambda x: df[df.Name == x]["#Employees"].item()
get_num_emp( "Joe's Garage" )

5

Clear Data way:

In [10]:
get_num_emp = df.to_function( "Name", "#Employees" )
get_num_emp( "Joe's Garage" )

5

## Can I represent that lookup operation as a dictionary?

Old way, short but arcane:

In [11]:
dict( zip( df["Name"], df["#Employees"] ) )

{"Annie's Pretzels": 12, "Nathan's Hot Dogs": 3, "Joe's Garage": 5}

Clear Data way:

In [12]:
df.to_dictionary( "Name", "#Employees" )

{"Annie's Pretzels": 12, "Nathan's Hot Dogs": 3, "Joe's Garage": 5}

It can also be useful to think of data tables (or some columns within them) as
relations/predicates.  To see a better example of this, let's imagine that instead
of the businesses data table above, let's consider a table of all the plays in
some NFL games.  Here's a tiny example of what it might be like.

In [13]:
df = pd.DataFrame( {
    "Offense"    : [        "Denver",       "New England", "San Francisco" ],
    "Defense"    : [   "New England",            "Denver",     "Cleveland" ],
    "Date"       : [        "Nov 12",            "Nov 12",        "Nov 19" ],
    "Time"       : [          "8:03",              "8:27",          "7:30" ],
    "Play"       : [ "Up the middle", "Quarterback sneak",       "Reverse" ]
} )
df

Unnamed: 0,Offense,Defense,Date,Time,Play
0,Denver,New England,Nov 12,8:03,Up the middle
1,New England,Denver,Nov 12,8:27,Quarterback sneak
2,San Francisco,Cleveland,Nov 19,7:30,Reverse


## I want a function that tests whether a team has ever run a certain play.

Old way:

In [14]:
def team_ran_play ( team, play ):
    return len( df[(df["Offense"] == team) & (df["Play"] == play)] ) > 0
team_ran_play( "Cleveland", "Up the middle" )

False

Clear Data way:

In [15]:
team_ran_play = df.to_predicate( "Offense", "Play" )
team_ran_play( "Cleveland", "Up the middle" )

False