## First let's read in our CSVs

In [1]:
import pandas as pd

imdb = pd.read_csv('assets/imdb.csv')
trump = pd.read_csv('assets/trump.csv')

CParserError: Error tokenizing data. C error: Expected 1 fields in line 104, saw 3


### Take a look

In [20]:
# print "IMDB"
# print imdb.head()
# print
# print
# print "TRUMP"
# print trump.head()

## Now comes the tricky stuff

You're probably used to using CREATE and INSERT to create a table then put it into the database. This time we are going to be using an SQL ORM (Object Relational Mapper) to set our columns. Take a look at the class below. We are defining our table as 'classimdb' and setting all of the variables to the type of values we are going to be putting into the database.

In [21]:
from sqlalchemy import Column, Integer, String, Float, Date, Boolean
from sqlalchemy.ext.declarative import declarative_base

Base = declarative_base()

class Movies(Base):
    __tablename__ = 'classimdb'

    _id = Column(Integer, primary_key=True)
    title = Column(String)
    poster = Column(String)
    plot = Column(String)
    genre = Column(String)
    year = Column(Integer)
    imdbRating = Column(Float)
    created = Column(Date)
    imdbID = Column(String)
    awards = Column(String)
    actors = Column(String)
    country = Column(String)
    director = Column(String)
    gross = Column(String)
    imdbVotes = Column(Integer)
    language = Column(String)
    rated = Column(String)
    released = Column(String)
    response = Column(Boolean)
    runtime = Column(Integer)
    _type = Column(String)
    writer = Column(String)

    def __init__(self, title, poster, plot, genre, year, imdbRating, created, imdbID, awards, 
                 actors, country, director, gross, imdbVotes, language, rated, released, response, 
                 runtime, _type, writer):
        self.title = title
        self.poster = poster
        self.plot = plot 
        self.genre = genre 
        self.year = year 
        self.imdbRating = imdbRating 
        self.created = created 
        self.imdbID = imdbID 
        self.awards = awards 
        self.actors = actors 
        self.country = country 
        self.director = director 
        self.gross = gross 
        self.imdbVotes = imdbVotes
        self.language = language
        self.rated = rated
        self.released = released
        self.response = response
        self.runtime = runtime 
        self._type = _type 
        self.writer = writer

### Let's make sure we made everything correctly

In [22]:
Movies.__table__ 

Table('classimdb', MetaData(bind=None), Column('_id', Integer(), table=<classimdb>, primary_key=True, nullable=False), Column('title', String(), table=<classimdb>), Column('poster', String(), table=<classimdb>), Column('plot', String(), table=<classimdb>), Column('genre', String(), table=<classimdb>), Column('year', Integer(), table=<classimdb>), Column('imdbRating', Float(), table=<classimdb>), Column('created', Date(), table=<classimdb>), Column('imdbID', String(), table=<classimdb>), Column('awards', String(), table=<classimdb>), Column('actors', String(), table=<classimdb>), Column('country', String(), table=<classimdb>), Column('director', String(), table=<classimdb>), Column('gross', String(), table=<classimdb>), Column('imdbVotes', Integer(), table=<classimdb>), Column('language', String(), table=<classimdb>), Column('rated', String(), table=<classimdb>), Column('released', String(), table=<classimdb>), Column('response', Boolean(), table=<classimdb>), Column('runtime', Integer(

### Everything looks good. Now let's create the table

In [23]:
from sqlalchemy import create_engine

engine = create_engine('<your remote address here>')

Base.metadata.create_all(engine)

### No errors. Things are looking good! 

Let's get ready to insert our values into our new table. We need to create our session first. Notice we are binding it to our engine which is hooked up to our remote postgresql server

In [24]:
from sqlalchemy.orm import sessionmaker
Session = sessionmaker(bind=engine)
session = Session()

### Now let's put everything into our new table. This cell just prepares the insert for us

In [25]:
movie_arr = []
for idx,x in imdb.iterrows():
    movie = Movies(x['title'], x['poster'], x['plot'], x['genre'], x['year'], x['imdbRating'], x['created'],
                   x['imdbID'], x['awards'], x['actors'], x['country'], x['director'], x['gross'], 
                   x['imdbVotes'], x['language'], x['rated'], x['released'], x['response'], x['runtime'], 
                   x['type'], x['writer'])
    movie_arr += [movie]
    
session.add_all(movie_arr)

### To save the changes to our database, we need to commit() our changes. This is a lot like the syntax we've seen  with Spark and collect().

In [26]:
session.commit()

### Now onto the good stuff. You can put your select statement into a dataframe like this

In [30]:
print pd.read_sql("SELECT * FROM classimdb;", engine)[:1]

   _id     title                                             poster  \
0    1  La Haine  https://images-na.ssl-images-amazon.com/images...   

                                                plot         genre  year  \
0  24 hours in the lives of three young men in th...  Crime, Drama  1995   

   imdbRating     created     imdbID                    awards  \
0         8.1  2016-11-13  tt0113247  8 wins & 13 nominations.   

         ...                   director     gross imdbVotes language  \
0        ...          Mathieu Kassovitz  309811.0    104913   French   

       rated    released response runtime  _type             writer  
0  NOT RATED  1996-02-23     True      98  movie  Mathieu Kassovitz  

[1 rows x 22 columns]


### Difficulty 1

### Print out all the drama movies

### Print out all the movies with a director with 'C' or 'S' in their last names. Ignore any movies with multiple directors.

### Difficulty 3

### Print out each director's longest movie.

### Print out the gross of each director's highest rated movie. Order by created desc.

### Difficulty 5

### Print out the movie title, votes, and director of each director's highest voted on movie. Order by highest votes desc.

### Print out the movie title, rating, and director of each director's highest rated movie if that director has directed more than one movie. Order by rating desc.

### Difficulty 7

### Print out the released date, title, and the season (winter, spring, summer, fall) in which each movie was released. Let's assume for simplicity sake that seasons end at the end of each month instead of on the 21st.

In [None]:
case1 = "WHEN Extract(month from released: :DATE) BETWEEN"
case2 = "WHEN Extract(month from released: :DATE) BETWEEN"
case3 = "WHEN Extract(month from released: :DATE) BETWEEN"
case4 = "WHEN Extract(month from released: :DATE) BETWEEN"

### Print out the released date, title, and the relation to today's date (disregard year). If the movie came out before today's month, print 'before' in the relationship column. If the movie came out after todays month, print out 'after' in the relationship column. If the movie was released in the same month, but not the same day, print out 'close'. If the movie came out on today's day and month print 'match'.

# Disclaimer:
#### <span style="color:red">Before you continue to Difficulty 9, you need to read trump.csv into your database. Repeat the steps you used to create the imdb table. Do you work below.</span>

### Difficulty => 9

### Print out all the times Trump has sent a tweet with a movie title in the text. The movie title must have a string length greater than 1.

### Print out a movie if it is included in the following groups:
* Has the longest runtime
* Is created by the director who has the most imdbvotes
* Has been translated into more than 4 languages
* Has won 2 Oscars

### Make sure each movie is printed out at most one time!

## Good job on completing this assignment! You should now be ready for any SQL-based problem thrown at you in the future!

![alt text](http://i.giphy.com/AL0XsYU0pkFTq.gif "Congrats!")

#### If you haven't finished the assignment, scroll back up. This message isn't for you.