<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Load-datasets" data-toc-modified-id="Load-datasets-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Load datasets</a></span></li><li><span><a href="#Create-dictionaries-for-type-and-category-of-elements" data-toc-modified-id="Create-dictionaries-for-type-and-category-of-elements-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Create dictionaries for type and category of elements</a></span></li><li><span><a href="#Add-labels-to-the-initial-dataset" data-toc-modified-id="Add-labels-to-the-initial-dataset-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Add labels to the initial dataset</a></span></li><li><span><a href="#Export-dataset" data-toc-modified-id="Export-dataset-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>Export dataset</a></span></li></ul></div>

# Bob Ross paintings | Add labels for types of painting element and groups of categories
This is part of data preparation for MakeoverMonday 2020 Week 50. The topic is the elements in the paintings of Bob Ross. More info on: __[FiveThirtyEight](https://fivethirtyeight.com/features/a-statistical-analysis-of-the-work-of-bob-ross/)__.

Source of data:
* __[github from fivethirtyeight](https://github.com/fivethirtyeight/data/blob/master/bob-ross/elements-by-episode.csv)__
* __[other format](https://data.world/makeovermonday/2020w50)__

In [1]:
import pandas as pd

## Load datasets

In [2]:
df = pd.read_excel("elements-by-episode_vlitt.xlsx", sheet_name="elements-by-episode")

In [3]:
df.head()

Unnamed: 0,Element,Episode,Title,Included
0,Apple Frame,S01E01,A WALK IN THE WOODS,0
1,Aurora Borealis,S01E01,A WALK IN THE WOODS,0
2,Barn,S01E01,A WALK IN THE WOODS,0
3,Beach,S01E01,A WALK IN THE WOODS,0
4,Boat,S01E01,A WALK IN THE WOODS,0


In [4]:
df_types = pd.read_excel("elements-by-episode_vlitt.xlsx", sheet_name="dict_elements")

In [5]:
df_types

Unnamed: 0,Element,Type,HasLocation
0,Apple Frame,Frame,0
1,Aurora Borealis,Sky,1
2,Barn,Human_built,1
3,Beach,Water,1
4,Boat,Human_presence,1
...,...,...,...
62,Waves,Water,1
63,Windmill,Human_built,1
64,Window Frame,Frame,0
65,Winter,Season_and_weather,0


In [6]:
df_categories = pd.read_excel("elements-by-episode_vlitt.xlsx", sheet_name="dict_categories")

## Create dictionaries for type and category of elements

In [7]:
dict_types = dict(zip(df_types["Element"], df_types["Type"]))
dict_types

{'Apple Frame': 'Frame',
 'Aurora Borealis': 'Sky',
 'Barn': 'Human_built',
 'Beach': 'Water',
 'Boat': 'Human_presence',
 'Bridge': 'Human_built',
 'Building': 'Human_built',
 'Bushes': 'Vegetation',
 'Cabin': 'Human_built',
 'Cactus': 'Vegetation',
 'Circle Frame': 'Frame',
 'Cirrus': 'Sky',
 'Cliff': 'Topography',
 'Clouds': 'Sky',
 'Conifer': 'Vegetation',
 'Cumulus': 'Sky',
 'Deciduous': 'Vegetation',
 'Diane Andre': 'Painted_by_a_guest',
 'Dock': 'Human_built',
 'Double Oval Frame': 'Frame',
 'Farm': 'Human_built',
 'Fence': 'Human_built',
 'Fire': 'Human_presence',
 'Florida Frame': 'Frame',
 'Flowers': 'Vegetation',
 'FOG': 'Season_and_weather',
 'Framed': 'Frame',
 'Grass': 'Vegetation',
 'Guest': 'Painted_by_a_guest',
 'Half Circle Frame': 'Frame',
 'Half Oval Frame': 'Frame',
 'Hills': 'Topography',
 'Lake': 'Water',
 'Lakes': 'Water',
 'Lighthouse': 'Human_built',
 'Mill': 'Human_built',
 'Moon': 'Sky',
 'Mountain': 'Topography',
 'Mountains': 'Topography',
 'Night': 'Time_

In [8]:
dict_categories = dict(zip(df_categories["Type of element"], df_categories["Category"]))
dict_categories

{'Person': 'Human',
 'Human_built': 'Human',
 'Human_presence': 'Human',
 'Painting_type': 'Human',
 'Rocks': 'Landscape',
 'Vegetation': 'Landscape',
 'Water': 'Landscape',
 'Sky': 'Landscape',
 'Topography': 'Landscape',
 'Season_and_weather': 'Time_and_atmosphere',
 'Time_of_day': 'Time_and_atmosphere',
 'Painted_by_a_guest': 'Execution',
 'Frame': 'Execution'}

In [9]:
dict_HasLocation = dict(zip(df_types["Element"], df_types["HasLocation"]))
dict_HasLocation

{'Apple Frame': 0,
 'Aurora Borealis': 1,
 'Barn': 1,
 'Beach': 1,
 'Boat': 1,
 'Bridge': 1,
 'Building': 1,
 'Bushes': 1,
 'Cabin': 1,
 'Cactus': 1,
 'Circle Frame': 0,
 'Cirrus': 0,
 'Cliff': 1,
 'Clouds': 0,
 'Conifer': 1,
 'Cumulus': 0,
 'Deciduous': 1,
 'Diane Andre': 0,
 'Dock': 1,
 'Double Oval Frame': 0,
 'Farm': 1,
 'Fence': 1,
 'Fire': 0,
 'Florida Frame': 0,
 'Flowers': 1,
 'FOG': 1,
 'Framed': 0,
 'Grass': 1,
 'Guest': 0,
 'Half Circle Frame': 0,
 'Half Oval Frame': 0,
 'Hills': 1,
 'Lake': 1,
 'Lakes': 1,
 'Lighthouse': 1,
 'Mill': 1,
 'Moon': 0,
 'Mountain': 1,
 'Mountains': 1,
 'Night': 0,
 'Ocean': 1,
 'Oval Frame': 0,
 'Palm Trees': 1,
 'Path': 1,
 'Person': 0,
 'Portrait': 0,
 'Rectangle 3D Frame': 0,
 'Rectangular Frame': 0,
 'River': 1,
 'Rocks': 1,
 'Seashell Frame': 0,
 'Snow': 1,
 'Snowy Mountain': 1,
 'Split Frame': 0,
 'Steve Ross': 0,
 'Structure': 1,
 'SUN': 0,
 'Tomb Frame': 0,
 'Tree': 1,
 'Trees': 1,
 'Triple Frame': 0,
 'Waterfall': 1,
 'Waves': 1,
 'Wind

## Add labels to the initial dataset

In [10]:
df["Type_of_element"] = df["Element"].map(dict_types)

In [11]:
df.head()

Unnamed: 0,Element,Episode,Title,Included,Type_of_element
0,Apple Frame,S01E01,A WALK IN THE WOODS,0,Frame
1,Aurora Borealis,S01E01,A WALK IN THE WOODS,0,Sky
2,Barn,S01E01,A WALK IN THE WOODS,0,Human_built
3,Beach,S01E01,A WALK IN THE WOODS,0,Water
4,Boat,S01E01,A WALK IN THE WOODS,0,Human_presence


In [12]:
df["Category_of_element"] = df["Type_of_element"].map(dict_categories)

In [13]:
df.head()

Unnamed: 0,Element,Episode,Title,Included,Type_of_element,Category_of_element
0,Apple Frame,S01E01,A WALK IN THE WOODS,0,Frame,Execution
1,Aurora Borealis,S01E01,A WALK IN THE WOODS,0,Sky,Landscape
2,Barn,S01E01,A WALK IN THE WOODS,0,Human_built,Human
3,Beach,S01E01,A WALK IN THE WOODS,0,Water,Landscape
4,Boat,S01E01,A WALK IN THE WOODS,0,Human_presence,Human


In [14]:
df["HasLocation"] = df["Element"].map(dict_HasLocation)

In [15]:
df.head()

Unnamed: 0,Element,Episode,Title,Included,Type_of_element,Category_of_element,HasLocation
0,Apple Frame,S01E01,A WALK IN THE WOODS,0,Frame,Execution,0
1,Aurora Borealis,S01E01,A WALK IN THE WOODS,0,Sky,Landscape,1
2,Barn,S01E01,A WALK IN THE WOODS,0,Human_built,Human,1
3,Beach,S01E01,A WALK IN THE WOODS,0,Water,Landscape,1
4,Boat,S01E01,A WALK IN THE WOODS,0,Human_presence,Human,1


## Export dataset

In [16]:
df.to_csv("categorized_elements_by_episode.csv")