# Project: How do Movie Metrics affects its viewer rating on iMDB?

## Table of Contents:
* [Introduction](#1)
* [Wrangling](#2)
* [Exploratory Visuals](#3)
* [Explanatory Visuals](#4)
* [Conclusion](#5)

## Introduction:<a class="anchor" id="1"></a>
Have you ever browsed iMDB looking for good movies to watch, sorted by rating? Or browsing the movie you just watched on iMDB, only to find that it has a shockingly low or high viewer rating? What can we say about the high or low ratings of a movie on iMDB?

We take a dive into the dataset provided by Kaggle (but now replaced with TMDB ratings due to DMCA Takedown https://www.kaggle.com/tmdb/tmdb-movie-metadata/home) to see what's going on behind these scores!

In [1]:
# import all packages and set plots to be embedded inline
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sb
from patsy import dmatrices
import statsmodels.api as sm;
from statsmodels.stats.outliers_influence import variance_inflation_factor

%matplotlib inline

In [3]:
# read in .csv
df_og = pd.read_csv('parking-violations-issued-fiscal-year-2018.csv', low_memory = False)

In [5]:
df_og.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5906123 entries, 0 to 5906122
Data columns (total 43 columns):
Summons Number                       int64
Plate ID                             object
Registration State                   object
Plate Type                           object
Issue Date                           object
Violation Code                       int64
Vehicle Body Type                    object
Vehicle Make                         object
Issuing Agency                       object
Street Code1                         int64
Street Code2                         int64
Street Code3                         int64
Vehicle Expiration Date              float64
Violation Location                   float64
Violation Precinct                   int64
Issuer Precinct                      int64
Issuer Code                          int64
Issuer Command                       object
Issuer Squad                         object
Violation Time                       object
Time First Ob

In [7]:
df_og.head()

Unnamed: 0,Summons Number,Plate ID,Registration State,Plate Type,Issue Date,Violation Code,Vehicle Body Type,Vehicle Make,Issuing Agency,Street Code1,...,Vehicle Color,Unregistered Vehicle?,Vehicle Year,Meter Number,Feet From Curb,Violation Post Code,Violation Description,No Standing or Stopping Violation,Hydrant Violation,Double Parking Violation
0,1105232165,GLS6001,NY,PAS,2018-07-03T00:00:00.000,14,SDN,HONDA,X,47130,...,BLUE,0.0,2006,-,0,,,,,
1,1121274900,HXM7361,NY,PAS,2018-06-28T00:00:00.000,46,SDN,NISSA,X,28990,...,GRY,0.0,2017,-,0,,,,,
2,1130964875,GTR7949,NY,PAS,2018-06-08T00:00:00.000,24,SUBN,JEEP,X,64,...,GREEN,0.0,0,-,0,,,,,
3,1130964887,HH1842,NC,PAS,2018-06-07T00:00:00.000,24,P-U,FORD,X,11310,...,WHITE,0.0,0,-,0,,,,,
4,1131599342,HDG7076,NY,PAS,2018-06-29T00:00:00.000,17,SUBN,HYUND,X,47130,...,GREEN,0.0,2007,-,0,,,,,


In [10]:
df_og['Feet From Curb'].value_counts()

0     5756894
5       32034
6       19043
3       18046
4       17409
2       15795
7       14176
8       12046
1       11953
9        5605
10       2774
11        206
12        107
13         14
15         12
14          8
16          1
Name: Feet From Curb, dtype: int64

### Data Issues
* column names: add _ between blank space and all lower case
* violation code is type int -> cast as string
* 