# More on Intentions and Column Expressions

In [3]:
import pandas as pd
from dfply import *
import matplotlib.pylab as plt
%matplotlib inline

In [4]:
artist_url = "https://github.com/MuseumofModernArt/collection/raw/master/Artists.csv"
artists =  pd.read_csv(artist_url)
artists.head()

Unnamed: 0,ConstituentID,DisplayName,ArtistBio,Nationality,Gender,BeginDate,EndDate,Wiki QID,ULAN
0,1,Robert Arneson,"American, 1930–1992",American,Male,1930,1992,,
1,2,Doroteo Arnaiz,"Spanish, born 1936",Spanish,Male,1936,0,,
2,3,Bill Arnold,"American, born 1941",American,Male,1941,0,,
3,4,Charles Arnoldi,"American, born 1946",American,Male,1946,0,Q1063584,500027998.0
4,5,Per Arnoldi,"Danish, born 1941",Danish,Male,1941,0,,


In [5]:
artwork_url = "https://github.com/MuseumofModernArt/collection/raw/master/Artworks.csv"
artwork = pd.read_csv(artwork_url) # Big file, be patient
artwork.head()

Unnamed: 0,Title,Artist,ConstituentID,ArtistBio,Nationality,BeginDate,EndDate,Gender,Date,Medium,...,ThumbnailURL,Circumference (cm),Depth (cm),Diameter (cm),Height (cm),Length (cm),Weight (kg),Width (cm),Seat Height (cm),Duration (sec.)
0,"Ferdinandsbrücke Project, Vienna, Austria (Ele...",Otto Wagner,6210,"(Austrian, 1841–1918)",(Austrian),(1841),(1918),(Male),1896,Ink and cut-and-pasted painted pages on paper,...,http://www.moma.org/media/W1siZiIsIjU5NDA1Il0s...,,,,48.6,,,168.9,,
1,"City of Music, National Superior Conservatory ...",Christian de Portzamparc,7470,"(French, born 1944)",(French),(1944),(0),(Male),1987,Paint and colored pencil on print,...,http://www.moma.org/media/W1siZiIsIjk3Il0sWyJw...,,,,40.6401,,,29.8451,,
2,"Villa near Vienna Project, Outside Vienna, Aus...",Emil Hoppe,7605,"(Austrian, 1876–1957)",(Austrian),(1876),(1957),(Male),1903,"Graphite, pen, color pencil, ink, and gouache ...",...,http://www.moma.org/media/W1siZiIsIjk4Il0sWyJw...,,,,34.3,,,31.8,,
3,"The Manhattan Transcripts Project, New York, N...",Bernard Tschumi,7056,"(French and Swiss, born Switzerland 1944)",(),(1944),(0),(Male),1980,Photographic reproduction with colored synthet...,...,http://www.moma.org/media/W1siZiIsIjEyNCJdLFsi...,,,,50.8,,,50.8,,
4,"Villa, project, outside Vienna, Austria, Exter...",Emil Hoppe,7605,"(Austrian, 1876–1957)",(Austrian),(1876),(1957),(Male),1903,"Graphite, color pencil, ink, and gouache on tr...",...,http://www.moma.org/media/W1siZiIsIjEyNiJdLFsi...,,,,38.4,,,19.1,,


## `X` is an `Intention`

<img src="img/dfply_X_intention_1.png" width = 800>

Think of it as recording an expression for later evaluation

In [6]:
expr = X.BeginDate.head()
expr

<dfply.base.Intention at 0x7f1468f55910>

## Use `evaluate` to apply the expression

We can apply an expression *later* using `evaluate` with a dataframe.

In [7]:
expr.evaluate(artists)

0    1930
1    1936
2    1941
3    1946
4    1941
Name: BeginDate, dtype: int64

## Intention expressions are reusable!

In [8]:
# Reusable!
expr.evaluate(artwork)

0    (1841)
1    (1944)
2    (1876)
3    (1944)
4    (1876)
Name: BeginDate, dtype: object

## <font color="red"> Exercise 2.3.1 </font>
    
**Tasks:**

1. Create and evaluate a column expression, saved as `my_expr`, that checks that the `Height` column is larger than 40. **Hint:** The space and `()` in the column name requires you to use `X['col name']` format.
2. Evaluate that column expression to the `Artist` and `Artwork` data frame, that is evaluate `my_expr.evaluate(df)`.
3. Use the expression object in filter, e.g. `filter(my_expr)` on the `Artwork` data set 
4. Now try to perform the filter with a pipe and no expression, e.g. `filter_by(artwork['Height cm' > 40)`.  Why does this still work?  When might we run into trouble?
5. Write a paragraph that summarizes how this all works.

In [9]:
# Code for Task 1

heightTest = X['Height (cm)']
my_expr = heightTest > 40

In [10]:
# Code for Task 2
my_expr.evaluate(artwork)

0          True
1          True
2         False
3          True
4         False
          ...  
139932    False
139933    False
139934    False
139935    False
139936    False
Name: Height (cm), Length: 139937, dtype: bool

In [11]:
# Code for Task 3
filter_by(artwork['Height (cm)'] > 40)

<dfply.base.pipe at 0x7f1468902a60>

it works because it is filtering the data frame of artwork to be greater then 40 although we can only use this in the df of artwork

## Not just for data frames ... works for any* expression

In [12]:
double, line = 2*X, 3*X + 5

In [15]:
double.evaluate(3), line.evaluate(6)

(6, 23)

## We can make functions lazy too!

Decorate a function with `make_symbolic` to allow lazy evaluation of `Intention` objects

In [16]:
from math import log
log = make_symbolic(log)

In [17]:
log(8, 2) # Works as expected with numbers

3.0

## Passing in `X` now makes a `log` expression

In [18]:
expr = log(X, 2) # Passing in X makes it lazy/symbolic
expr

<dfply.base.Intention at 0x7f14689022e0>

In [19]:
expr.evaluate(16) # Evaluate later

4.0