# Bob Ross Exploration

An exploration into the exciting world of Bob Ross' paintings and correlations between objects he chose to paint!

***

Start by downloading a CSV of all of Bob's episodes:
https://github.com/fivethirtyeight/data/blob/master/bob-ross/elements-by-episode.csv

For each episode, objects are tagged as present (1) or absent (0).

Save the CSV into the same folder as this Notebook.

Then, import pandas and get all the episode data into a DataFrame:

In [1]:
import pandas as pd

In [2]:
bob_ross = pd.read_csv('https://raw.githubusercontent.com/fivethirtyeight/data/master/bob-ross/elements-by-episode.csv')
bob_ross

Unnamed: 0,EPISODE,TITLE,APPLE_FRAME,AURORA_BOREALIS,BARN,BEACH,BOAT,BRIDGE,BUILDING,BUSHES,...,TOMB_FRAME,TREE,TREES,TRIPLE_FRAME,WATERFALL,WAVES,WINDMILL,WINDOW_FRAME,WINTER,WOOD_FRAMED
0,S01E01,"""A WALK IN THE WOODS""",0,0,0,0,0,0,0,1,...,0,1,1,0,0,0,0,0,0,0
1,S01E02,"""MT. MCKINLEY""",0,0,0,0,0,0,0,0,...,0,1,1,0,0,0,0,0,1,0
2,S01E03,"""EBONY SUNSET""",0,0,0,0,0,0,0,0,...,0,1,1,0,0,0,0,0,1,0
3,S01E04,"""WINTER MIST""",0,0,0,0,0,0,0,1,...,0,1,1,0,0,0,0,0,0,0
4,S01E05,"""QUIET STREAM""",0,0,0,0,0,0,0,0,...,0,1,1,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
398,S31E09,"""EVERGREEN VALLEY""",0,0,0,0,0,0,0,0,...,0,1,1,0,0,0,0,0,0,0
399,S31E10,"""BALMY BEACH""",0,0,0,1,0,0,0,0,...,0,0,0,0,0,1,0,0,0,0
400,S31E11,"""LAKE AT THE RIDGE""",0,0,0,0,0,0,0,0,...,0,1,1,0,0,0,0,0,0,0
401,S31E12,"""IN THE MIDST OF WINTER""",0,0,1,0,0,0,0,0,...,0,1,1,0,0,0,0,0,1,0


It's always helpful to use ```.info()``` on your DataFrame to check whether any columns are missing data before you start working with it. So do that now:

In [3]:
extra = bob_ross.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 403 entries, 0 to 402
Data columns (total 69 columns):
 #   Column              Non-Null Count  Dtype 
---  ------              --------------  ----- 
 0   EPISODE             403 non-null    object
 1   TITLE               403 non-null    object
 2   APPLE_FRAME         403 non-null    int64 
 3   AURORA_BOREALIS     403 non-null    int64 
 4   BARN                403 non-null    int64 
 5   BEACH               403 non-null    int64 
 6   BOAT                403 non-null    int64 
 7   BRIDGE              403 non-null    int64 
 8   BUILDING            403 non-null    int64 
 9   BUSHES              403 non-null    int64 
 10  CABIN               403 non-null    int64 
 11  CACTUS              403 non-null    int64 
 12  CIRCLE_FRAME        403 non-null    int64 
 13  CIRRUS              403 non-null    int64 
 14  CLIFF               403 non-null    int64 
 15  CLOUDS              403 non-null    int64 
 16  CONIFER             403 no

## Correlation 

Now we can go ahead and get a correlation matrix by simply calling ```.corr()``` on the DataFrame.

In order to see all the columns and rows, uncomment the two lines in the next cell.

In [4]:
pd.set_option('display.max_row', None)
pd.set_option('display.max_columns', None)
correlation.corr()

NameError: name 'correlation' is not defined

The correlation matrix is itself a DataFrame, so go back and save it as its own object. Name it ```bob_ross_corr```.

## Start the Investigation

Now that you have a DataFrame and a correlation matrix, try to use code to perform the following:

### Sunny Days

Output (as a DataFrame) the episode and title of every episode in which Bob painted the sun.

*Hint: use the SUN column where value == 1*

In [13]:
Sunny_Days = bob_ross.loc[bob_ross.SUN == 1, ['EPISODE', 'TITLE']]
Sunny_Days

Unnamed: 0,EPISODE,TITLE
2,S01E03,"""EBONY SUNSET"""
14,S02E02,"""WINTER SUN"""
15,S02E03,"""EBONY SEA"""
21,S02E09,"""BLACK & WHITE SEASCAPE"""
57,S05E06,"""OCEAN SUNRISE"""
80,S07E03,"""EVERGREENS AT SUNSET"""
93,S08E03,"""WARM WINTER DAY"""
104,S09E01,"""WINTER EVERGREENS"""
124,S10E08,"""GOLDEN SUNSET"""
128,S10E12,"""WINTER FROST"""


### Cones Please

What percentage of paintings included a conifer? Use code to calculate this. See if you can do it in one line of code.

It's okay to Google for ideas, but cite your source with a comment and full link to where you found it.

TypeError: 'module' object is not subscriptable

### Water

I want to know about episodes in which Bob might have painted water. Assume that any of the following objects would include water:

'BOAT', 'BEACH', 'OCEAN', 'LAKE', 'WATERFALL', 'WAVES', 'RIVER', 'DOCK', 'BEACH'

Create a new column in the original DataFrame called "WATER" and set it to 1 if any of the above columns have 1, otherwise 0.

Hints: use a few code cells to do this in steps
- Turn my list of water columns into a list called water_cols
- Output the DataFrame but just the subset of waters columns. You'll use this view to verify your work.
- Create a new column called water using this notation: ```df['WATER'] = ``` where df is the name of your DataFrame
- Now the tricky part. You want to set that new column to a boolean value based on whether the number 1 is in any of the water columns. You'll need to use ```.isin()``` and ```.any(axis='columns')```
- You can change the boolean values to int's using .astype(int) at the end of your expression

In [31]:
water_cols = ['BOAT', 'BEACH', 'OCEAN', 'LAKE', 'WATERFALL', 'WAVES', 'RIVER', 'DOCK', 'BEACH']
bob_ross.loc[:, ['BOAT', 'BEACH', 'OCEAN', 'LAKE', 'WATERFALL', 'WAVES', 'RIVER', 'DOCK', 'BEACH']]

Unnamed: 0,BOAT,BEACH,OCEAN,LAKE,WATERFALL,WAVES,RIVER,DOCK,BEACH.1
0,0,0,0,0,0,0,1,0,0
1,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0
3,0,0,0,1,0,0,0,0,0
4,0,0,0,0,0,0,1,0,0
...,...,...,...,...,...,...,...,...,...
398,0,0,0,0,0,0,0,0,0
399,0,1,1,0,0,1,0,0,1
400,0,0,0,1,0,0,0,0,0
401,0,0,0,0,0,0,0,0,0


In [38]:
df['WATER'] = ({'WATER' : ['BOAT', 'BEACH', 'OCEAN', 'LAKE', 'WATERFALL', 'WAVES', 'RIVER', 'DOCK']})
print(df)

NameError: name 'df' is not defined

In [54]:
df = df.assign(bob_ross=water_cols)

NameError: name 'df' is not defined

In [None]:
df['WATER']

### Super Bonus 🌶️

Can you find the highest and lowest correlation for any column? 

So, pick a column, like ROCKS. Other than ROCKS (which would have a correlation of 1.00 with itself) what are the most and least correlated objects?

Can you find that for every object?

### Super Super Bonus 🌶️🌶️

And the icing on the cake- get the least and most correlated item for every item in the correlation matrix.

*Hint: you will want to turn your code above into a function that takes an item (like "SNOW") and outputs the answer. Then, to iterate over the items, use ```iteritems()``` like this:*

```for item in bob_ross_corr.iteritems():```

The ```.iteritem()``` function returns a tuple, and you'll need to take the first element of the tuple and pass it to your function.