# Halloween Viz Visualization

# Creating a Story
Jeffrey A. Shaffer has been counting the number of candies he gives out on a halloween evening since 2012. Each year he collects the data on the number of candies given every half an hour between `1800` and `2030`. Jeffrey would now like to get some insights from the data he has collected.

## Storyboarding
In this section, we will put down thoughts around the story in points/post-its.

1. `Issue`: Jeffrey A. Shaffer wants to better understand the data he is collecting.
2. `Why?`: He expects to better understand the number of ghosts coming for halloween to his place for trick-or-treat. Jeff does whatever he can to reduce his carbon footprint, which will help prevent climate change. If he is able to get approximately the right number of candies for halloween, he will be able to save those resources, rather, use it efficiently.
3. `Expectations`: Jeff expects to use a Dashboard to understand relationships between variables in the dataset.
4. `Additional`: Adding external features like weather and the number of horror movies released can better help understand the data.

## Who is the Audience?
Jeffrey A. Shaffer and anyone else, who is interested in optimising the number of candies, is the audience.

## What do we want to convey to our Audience?
We want the audience to better estimate the number of candies that will be required for trick-or-treating in a particular year. Hence, create a dashboard to help visualize the key insights that can be gathered from the data.

## Additional Feature
Jeff thinks that the number of horror movies released a particular year plays a big role in determining how many candies will be given away on that Halloween night. Also,this dataset is simple and can be combined with other external datasets to provide better insight into the trick-or-treat figures.

We will be considering [Kaggle's IMDB horror movies dataset](https://www.kaggle.com/PromptCloudHQ/imdb-horror-movie-dataset) and [IMDB's Dataset](https://www.imdb.com/interfaces/) to build on the halloween dataset. The number of movies/TV series released in a year and their average IMDB ratings will be considered.
> **The relationship between the number of _good_ horror movies/TV series released and number of candies distributed in a particular can be explored.**

Later, a simple regression model can be used to make predicitons on what the demand for candies will be like in 2021.

_Note: As 2020 had been the year of the pandemic, there should be a sharp rise in the number of trick-or-treaters in 2021._


# Halloween Viz EDA
In this notebook, I am going to explore two datasets.

First being the [Halloween Viz dataset](https://www.dataplusscience.com/HalloweenData.html) created by [Jeffrey A. Shaffer](https://twitter.com/HighVizAbility), a resident of Cincinnati on the number of candies given out on Halloween each year. The dataset is further broken down into number of candies given out each half an hour between the trick-or-treat timings (6 to 8 p.m.). Jeffrey is an _"Author, Data Viz Professor, Tableau Zen Master, Data Mining Geek, Recovering Musician"_ (Source: Twitter). 

Second being the [Kaggle's IMDB Movies Dataset](https://www.kaggle.com/stefanoleone992/imdb-extensive-dataset).

# Load Libraries

In [1]:
import pandas as pd
from pandas_profiling import ProfileReport
from tqdm import tqdm
import pathlib
import os
import datetime
import matplotlib.pyplot as plt
import seaborn as sns

!jupyter nbextension enable --py widgetsnbextension

Enabling notebook extension jupyter-js-widgets/extension...
      - Validating: [32mOK[0m


# Load Data

In [5]:
raw_data_dir = pathlib.Path("Data/Raw")
kaggle_dir = raw_data_dir / "IMDB_Kaggle"

data_dict = {
    "halloween": {
        "path": raw_data_dir / "HalloweenTableau2020.xlsx",
    },
    "kaggle_movies": {
        "path": kaggle_dir / "IMDb movies.csv"
    },
    "kaggle_ratings": {
        "path": kaggle_dir / "IMDb ratings.csv"
    }
}


for source_name, data_source in data_dict.items():
    if not data_source["path"].exists():
        raise FileNotFoundError(f"No file named {source_name} is present at the location provided. Path: {data_source['path']}")
    else:
        print(f"File {data_source['path'].name} exists.")

File HalloweenTableau2020.xlsx exists.
File IMDb movies.csv exists.
File IMDb ratings.csv exists.
